#### Bi 410/510 (Fall 2019)

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

**Group Members**

If this is a group submission edit this cell to add the names and e-mail addresses of the other people who worked on the project.
* name (xxx@uoregon.edu)
* name (xxx@uoregon.edu)
* name (xxx@uoregon.edu)

# <span style="color:seagreen;">Project 8: &nbsp; The `os` Library</span>

###  <span style="color:seagreen">Instructions</span>

Put a copy of the zip file named `sim.zip` in the same folder as this notebook.
* you can use the file you downloaded earlier for Project 1, or download it again from the Bi 410 server

As you work on the exercises for this project your functions will modify the `sim` folder by moving files around, deleting files, _etc_.  

Each time you change your function you'll want to restore the `sim` folder to its original condition before you run the function again.  You can execute this code cell to execute the shell commands that delete the current `sim` folder and replace it with a new copy.
* the `-rf` option on the `rm` removes the sim folder and all its subfolders
* the "r" stands for "recursive" and the "f" stands for "force" (meaning "don't ask any questions")
* the `-q` option on teh `unzip` command means "quiet" (so it won't print any file names as it unpacks the file)

In [1]:
! rm -rf sim
! unzip -q sim.zip

###  <span style="color:seagreen">Imports</span> 

Use this code cell to import `os` and any other libraries you might need.

In [2]:
import os
import shutil

###  <span style="color:seagreen">Exercise 8.1: Make a Backup File</span> 

Fill in the body of the function named `backup`.  It will be passed a string that is the path to an existing file.  Your function should make a copy of the file, inserting the string `.orig` into the name of the new file just before the extension.  For example, if we back up `foo.txt` the new file will be `foo.orig.txt`.

The return value should be the name of the new file.

The function should check for two errors.  If the path passed to the function does not exist, print "no such file" and return None.  If there is already a file with the name of the copy, print "copy exists" and return None.

Example:
```
>>> backup('sim/clusters/clusters.txt')
'sim/clusters/clusters.orig.txt'
```

#### Design 

TO DO
* Get file
* Copy as file with new name, of old name + .orig
* add tests 

#### Code 

In [3]:
def backup(filename):
    if not os.path.isfile(filename):
        print("no such file")
        return None
    
    pre, suf = os.path.splitext(filename)
    file_back = pre + '.orig' + suf
    
    if os.path.isfile(file_back):
        print("copy exists")
        return None
    else:
#     print(file_back)
        shutil.copyfile(filename, file_back)
        return file_back
    

#### Sandbox 

You can use this code cell while you are working on your code.  It should make a backup of `clusters/clusters.txt` in the `sim` folder and return the string `sim/clusters/clusters.orig.txt`.

In [4]:
backup('sim/clusters/clusters.txt')

'sim/clusters/clusters.orig.txt'

#### Tests 

We will use these code cells to test your code.  

**Note:** You can execute these code cells if you wish, but they may fail if the `sim` folder has been modified by a previous call to `backup` (_i.e._ erase and unzip `sim` before you run these tests).

The first test cell copies a file, the second makes sure the copy exists, the third tries to copy it again (which should print an error), the fourth tries to copy a file that doesn't exist.

In [5]:
assert backup('sim/map/otus.fasta') == 'sim/map/otus.orig.fasta'

In [6]:
assert 'otus.orig.fasta' in os.listdir('sim/map')

In [7]:
assert backup('sim/map/otus.fasta') is None

copy exists


In [8]:
assert backup('sim/X_R1.fastq') is None

no such file


###  <span style="color:seagreen">Exercise 8.2: Move Log Files</span> 

Fill in the body of the function named `move_logs`.  It will be passed the name of a 16S rRNA project directory.  Inside that directory is a subdirectory named `merged`  that contains 0 or more log files with names that start with "log". 

Your function should make a new top level folder named `log` and move all of the log files from `merged` to the new log directory.

Note: before you make the new `log` directory make sure there isn't already a directory with that name.

#### Design 

TO DO
* Check for and make dir
* Check for files in the source dir that have 'log' prefix
*

#### Code 

In [9]:
def move_logs(dirname):
    target = dirname + '/log/'
    source = dirname + '/merged/'
    
    if not os.path.isdir(target): #test for that dir existance, if not then make dir
        os.mkdir(target)
    
    for file in os.listdir(source):
        if file.startswith('log'):
#             print(file)
            shutil.move(source+file, target)
        

#### Sandbox

You can use this code cell to test various functions in the `os` and `shutil` libraries.

#### Tests 

When your function is working execute this code cell.  The tests will make sure
* the new `log` directory exists and has 3 files, and that there are 3 files left in `merged`
* the 3 files in the `log` folder have the expected names

In [10]:
move_logs('sim')

In [11]:
assert os.path.isdir('sim/log')
assert len(os.listdir('sim/log')) == 3
assert len(os.listdir('sim/merged')) == 3

In [12]:
for n in ['1','2','3']:
    assert os.path.isfile('sim/log/log.{}.txt'.format(n))

###  <span style="color:seagreen">Exercise 8.3: Rearrange FASTQ Files</span> 

Implement another function that rearranges the files inside a 16S project directory.  The function named `move_pairs` should create a new directory named `fastq`, and inside that directory make one new directory for each pair of FASTQ files.  For example, the sim folder has 6 FASTQ files:
```
A_R1.fastq
A_R2.fastq
B_R1.fastq
B_R2.fastq
C_R1.fastq
C_R2.fastq
```
The pair names are the parts of the file names before the underscore.  This folder has three pairs, and the new subdirectories will have names `A`, `B`, and `C`.  The function should then move each FASTQ file to the new directory for its pair.

This picture shows the contents of the `sim` folder before and after calling `move_pairs`:

<img width=1200 src="https://pages.uoregon.edu/conery/Bi410/before_and_after.png">

**Note:** Don't assume there are always three pairs.  Your function should figure out how many pairs there are and make a new folder for each pair.  You can assume that FASTQ file names always have one letter before the underscore (this is not true in general, but for this project the `sim` folder will always have 26 or fewer pairs).

#### Design 

You can put your "to do list" here

#### Code 

In [13]:
def move_pairs(dirname):
    target = dirname + '/fastq/'
    
    if not os.path.isdir(target): #test for that dir existance, if not then make dir
        os.mkdir(target)
    
    for file in os.listdir(dirname): #list files in parent dir
        if file.endswith('.fastq'): 
            folder = file.split('_')[0] #get prefix
            dest = target+folder 
#             print(file, dest)

            if not os.path.isdir(dest): #make subfolder if it doesn't exisst
                os.mkdir(dest)    
            
            shutil.move(dirname+"/"+file, dest)

#### Sandbox 

#### Test 

When your function is working execute this code cell.

In [14]:
move_pairs('sim')

The test cells will
* look to see if the new `fastq` directory exists and it contains 3 items
* make sure there are no `.fasq` files remaining in the top level `sim` folder
* look for the `R1` and `R2` files in each folder in the new `fastq` directory

In [15]:
assert os.path.isdir('sim/fastq')
assert len(os.listdir('sim/fastq')) == 3
assert not any(x.endswith('.fastq') for x in os.listdir('sim'))

In [16]:
for ch in os.listdir('sim/fastq'):
    files = os.listdir(os.path.join('sim/fastq', ch))
    for pair in ['1','2']:
        assert '{}_R{}.fastq'.format(ch,pair) in files