### Functions and Modules

Your tool will, in most cases, read, transform and write data. We have learned how to transform data (e.g loops with if...else) but not how to read and write. This is fairly simple in Python


#### Read File

In [1]:
# there is a file in the folder with alignments named 14_0903_05_40cm.10.pe.fasta.m8

lines = []
with open('14_0903_05_40cm.10.pe.fasta.m8') as handle:  # Open a context to read the file
    for line in handle:                                 # actually read the file line by line
        lines.append(line.strip())                      # The strip removes whitespace and new lines from the end of the line
print(len(lines)) # ~50k lines
print(lines[0])
print('READ\tREFERENCE\tALIGNMENT_COVERAGE\tPERCENT_IDENTITY\t....')

53359
xleqcuysdv.28736646.1	silva_138_complink_cons_otu_38034	100.0	100.0	1	200	760	959	200	200	0	0	-
READ	REFERENCE	ALIGNMENT_COVERAGE	PERCENT_IDENTITY	....


#### Transform

In [3]:
# Lets remove all alignments that have a percent identity < 99.0

# split will split a string by a certain character. Here we split by tab with will create a list. 
for line in lines:
    alignment = line.split('\t')
    print(alignment)
    percent_identity = float(alignment[3])
    print(percent_identity)
    break
    

    
filtered_lines = []
# Lets do that for all lines
for line in lines:
    alignment = line.split('\t')
    percent_identity = float(alignment[3])
    if percent_identity < 99.0:
        continue
    else:
        filtered_lines.append(line)

print(f'{len(lines)} alignments before filtering\n{len(filtered_lines)} alignments after filtering')


['xleqcuysdv.28736646.1', 'silva_138_complink_cons_otu_38034', '100.0', '100.0', '1', '200', '760', '959', '200', '200', '0', '0', '-']
100.0
53359 alignments before filtering
13308 alignments after filtering


#### Write File

In [4]:
with open('14_0903_05_40cm.10.pe.fasta.filtered.m8', 'w') as handle: # With 'w' we indicate that we want to write to this file. Careful, will overwrite
    for line in filtered_lines:
        handle.write(f'{line}\n')
    

#### Advanced

There is no need to parse a tab/comma separated file by hand. You can also use the `csv` package. It will be faster and you can specify casting in advance.