# Files and Directories

## Creating files and directories
The following code is going to create a new directory **if it doesn't already exist** - notice the method `os.path.isdir()` not covered in the slides. The code will then create a new file based on an existing data file (the `numbers.txt` file we are familiar with). 

Run the code twice to verify that the behaviour changes once the directory has been created.  

In [14]:
import os

new_dir = '../tmp/'
existing_file = '../data/numbers.txt'
new_file = new_dir + 'positive_numbers.txt'

if os.path.isdir(new_dir):
    print('Directory', new_dir, 'already exists.')
else:
    os.mkdir(new_dir)
    print('Directory', new_dir, 'created.')

with open(existing_file, 'r') as f_in:
    with open(new_file, 'w') as f_out:
        for line in f_in:
            if float(line) > 0:
                f_out.write(line)

Directory ../tmp/ already exists.


Next you can examine the contents of the new file before "cleaning up" (deleting both the file and the directory):

In [15]:
import os

with open(new_file, 'r') as f:
    for n in f:
        print(n, end='')
        
os.remove(new_file)
os.rmdir(new_dir)

1.63
25.307
8.0
31.33333
780.4592
87.612
928.7
1153.04
4.2
0.932
5.65
5.912
2347.105
39.2
61.5


OSError: [Errno 39] Directory not empty: '../tmp/'

The code below differs from the code above in having a different directory name and different output file. Modify this code so that it uses `os.path.isfile()` to check whether the new file already exists. In other words, adopt the same approach for the file as has already been adopted for the directory. 

In [26]:
import os

new_dir2 = '../tmp2/'
new_file2 = new_dir2 + 'negative_numbers.txt'

if os.path.isdir(new_dir2):
    dir_already_exists = True
    print('Directory', new_dir2, 'already exists.')
else:
    os.mkdir(new_dir2)
    print('Directory', new_dir2, 'created.')


with open(existing_file, 'r') as f_in:
    if os.path.isfile(new_file2):
        print('File', new_file2, 'already exists')
    else:
        with open(new_file2, 'w') as f_out:
            for line in f_in:
                if float(line) < 0:
                    f_out.write(line)

Directory ../tmp2/ already exists.
File ../tmp2/negative_numbers.txt already exists


Again, you can examine the contents of the new file before "cleaning up" (deleting both the file and the directory):

In [27]:
import os

with open(new_file2, 'r') as f:
    for n in f:
        print(n, end='')
        
os.remove(new_file2)
os.rmdir(new_dir2)

-32.78
-4.1
-422.343
-187.0
-8205.9
-2749.655


## Walking through directories

The following code uses the `os.walk()` function to count the number of files of different types found within your `biocomp1` directory and its subdirectories. Note the use of `os.path.splitext()` to gain access to the file extension.


Feel free to modify it to explore other directories and file types! 

In [45]:
import os

file_list = []
for (dpath, names, lnames) in os.walk('..'):      
    file_list.extend(lnames)    
print(file_list)

notebook_counter = 0
notebook_extension = '.ipynb'
textfile_counter = 0
textfile_extension = '.txt'
for file in file_list:
    name, extension = os.path.splitext(file)
    if extension == notebook_extension:
        notebook_counter += 1
    elif extension == textfile_extension:
        textfile_counter += 1

print('Notebooks:', notebook_counter) 
print('Text files:', textfile_counter) 

['py8_2_modules.ipynb', 'bin_class.py', 'py8_practical.ipynb', 'py8_1_functions.ipynb', 'sequtils-checkpoint.py', 'py8_2_modules-checkpoint.ipynb', 'py8_1_functions-checkpoint.ipynb', 'untitled-checkpoint.txt', 'bin_class-checkpoint.py', 'py8_practical-checkpoint.ipynb', 'py_exam1.py', 'py_exam1.ipynb', 'py_exam1_ori-checkpoint.ipynb', 'py_exam1-checkpoint.ipynb', 'py_exam1-checkpoint.py', 'P42858.fasta', 'sample1.txt', 'A0A0G2RZ64.fasta', 's1.txt', 'plot_data.txt', 'species2.txt', 'words1.txt', 'garden_birds.txt', 'taxonomy.txt', 'PDB_growth.csv', 'sub.txt', 'pdb_chains2.txt', 'names2.txt', 'common.txt', 'names4.txt', 'clever_birds.txt', 'atoms.txt', 'seq_ss_n2.txt', 'P03437.fasta', '12e8.h', 'chain_ids.txt', 'WormNet.v3.benchmark.txt', 'seq_ss_n.txt', 'integers.txt', 'P00451_1.gb', 'codons.txt', 'common_scientific.txt', 'HLA-B1550.txt', 'PDB_data.csv', 'bacteria.txt', 'seq1.txt', 'words3.txt', 'hAPP.clustal', 'add.txt', 'dna.txt', 'HLA-B1542.txt', 'seq_n.txt', 'names1.txt', 'emdb.db'