<h1 id="toctitle">Working with the operating system</h1>
<ul id="toc"/>

##Working with files

###About paths

__File paths__ are the strings that we use to describe to Python where to find the files that we want. 

On Windows we are used to writing paths like this:

`c:\path\to\a\file`

but in Python, characters preceded by \ often have a special meaning. To avoid this use / instead:

`c:/path/to/a/file`

On Mac/Linux/Unix, use the / character as normal:

`/home/martin/path/to/a/file`

###Basic file manipulation

Functions for file manipulation live in the `os` module. Renaming files is straightforward:

```python
import os
os.rename("old.txt", "new.txt")
os.rename("biology/old.txt", "biology/new.txt")
os.rename("old_folder", "new_folder")
```

Moving files is the same as renaming them:

```python
os.rename("biology/old.txt", "python/old.txt")
```

We can create a folder:

```python
os.mkdir("c:/martin/python")
```

###Copying and trees

For more advanced stuff, use the `shutil` module. 

Copying is different for a file:

```python
shutil.copy("original.txt", "copy.txt")
```

vs a folder:

```python
shutil.copytree("original_folder", "copy_folder")
```

We can check if a file or folder exists:

```python
if os.path.exists("c:/martin/email.txt"):
	# do something
```

###Deleting stuff

Deleting files is dangerous in Python - no take backs! Use different functions in increasing order of danger. 

Deleting a file:
```python
os.remove("c:/martin/unwanted_file.txt")
```

Deleting an empty folder:
```python
os.rmdir("c:/martin/emtpy")
```

Deleting a folder and all its contents:
```python
shutil.rmtree("c:/martin/full")
```

###Listing folder contents

With the `os` module we can list files and folders in the current working directory:

```python
for file_name in os.listdir("."):
    print("one file name is " + file_name)
```

or in a different directory:

```python
for file_name in os.listdir("c:\martin"):
    print("one file name is " + file_name)
```



##Running external programs

Sometimes it's helpful to be able to run an exising program (e.g. an analysis tool e.g. BLAST) from within a Python program. 

To run a program and display the output on the screen:

```python
import subprocess

# run a program with some options
subprocess.call("/bin/date +%B", shell=True)
```

To run a program and capture the output in a string, use `check_output`:

```python
cmd = "/bin/date +%B"
month = subprocess.check_output(cmd, shell=True)
```

##Getting user input

We can get user input interactively:

```python 
accession = raw_input("Enter the accession name")
```

In Python 3, `raw_input()` has been renamed to simply `input()`.

We can also get options from the command line (not in Canopy):

```python
# e.g. python myscript apple banana
import sys
first = sys.argv[1] #apple
second = sys.argv[2] #banana
```

This is only useful if you're working on the command line anyway. 

##Exercises

###Binning DNA sequences

Inside the dna_files folder is a collection of files that end in .dna . Each file holds a collection of DNA sequences, one per line. 

Write a program which creates nine new folders – one for sequences between 100 and 199 bases long, one for sequences between 200 and 299 bases long, etc. Write out each DNA sequence in the input files to a separate file in the appropriate folder.

###Kmer counting 

Write a program that will calculate the number of all kmers of a given length across all DNA sequences in the input files and display just the ones that occur more than a given number of times. You program should take two interactive arguments – the kmer length, and the cutoff number.  


In [2]:
# ignore this cell, it's for loading custom js code
from IPython.core.display import Javascript
Javascript(filename="custom.js")

<IPython.core.display.Javascript object>

In [1]:
# ignore this cell, it's for loading custom css code
from IPython.core.display import HTML
HTML(filename="custom.css")