From python for biologists p. 65
#### Writing a FASTA file
Write the following sequence data into a file (with the name `out.fasta`) in FASTA format. For example the first record should look like:

```
>ABC123
ATCGTACGATCGATCGATCGCTAGACGTATCG
>DEF456
ACTGATCGACGATCGATCGATCACGACT
>HIJ789
ACTGACACTGTACTGTACATGTG
```

| header | sequence |
| -------|: --------|
| ABC123 | ATCGTACGATCGATCGATCGCTAGACGTATCG |
| DEF456 | actgatcgacgatcgatcgatcacgact |
| HIJ789 | ACTGAC-ACTGT--ACTGTA----CATGTG |

*NOTE*: all sequences must be uppercase and only contain 'A', 'C', 'T' and 'G'

#### Writing multiple FASTA files
Write the same data, but this time each sequence must be in its own file.

The name of the file should be the header name followed by `.fasta`, for example `ABC123.fasta`.


In [4]:
# 1. 
data = [['ABC123', 'ATCGTACGATCGATCGATCGCTAGACGTATCG'],
       ['DEF456', 'actgatcgacgatcgatcgatcacgact'],
       ['HIJ789', 'ACTGAC-ACTGT--ACTGTA----CATGTG']]

output_filename = 'out.fasta'
output_file = open(output_filename, 'w')
for sequence_info in data:
    header = '>' + sequence_info[0] + '\n'
    output_file.write(header)
    sequence = sequence_info[1].upper().replace('-', '') + '\n'
    output_file.write(sequence)
output_file.close()

result = open(output_filename).read()
print(result)

>ABC123
ATCGTACGATCGATCGATCGCTAGACGTATCG
>DEF456
ACTGATCGACGATCGATCGATCACGACT
>HIJ789
ACTGACACTGTACTGTACATGTG



In [6]:
# 1. 
header1 = 'ABC123'
seq1 = 'ATCGTACGATCGATCGATCGCTAGACGTATCG'
header2 = 'DEF456'
seq2 = 'actgatcgacgatcgatcgatcacgact'
header3 = 'HIJ789'
seq3 = 'ACTGAC-ACTGT--ACTGTA----CATGTG'

output_filename = 'out.fasta'
output_file = open(output_filename, 'w')
output_file.write('>' + header1 + '\n')
output_file.write(seq1 + '\n')
output_file.write('>' + header2 + '\n')
output_file.write(seq2.upper() + '\n')
output_file.write('>' + header3 + '\n')
output_file.write(seq3.replace('-', '') + '\n')
output_file.close()

result = open(output_filename).read()
print(result)

>ABC123
ATCGTACGATCGATCGATCGCTAGACGTATCG
>DEF456
ACTGATCGACGATCGATCGATCACGACT
>HIJ789
ACTGACACTGTACTGTACATGTG



In [7]:
# 2. 
data = [['ABC123', 'ATCGTACGATCGATCGATCGCTAGACGTATCG'],
       ['DEF456', 'actgatcgacgatcgatcgatcacgact'],
       ['HIJ789', 'ACTGAC-ACTGT--ACTGTA----CATGTG']]

for seq_info in data:
    output_filename = seq_info[0] + '.fasta'
    output_file = open(output_filename, 'w')
    header = '>' + seq_info[0] + '\n'
    output_file.write(header)
    sequence = seq_info[1].replace('-', '').upper() + '\n'
    output_file.write(sequence)
    output_file.close()


### Discussion

In [4]:
# >ABC123
# ATCGTACGATCGATCGATCGCTAGACGTATCG
# >DEF456
# ACTGATCGACGATCGATCGATCACGACT
# >HIJ789
# ACTGACACTGTACTGTACATGTG
header1 = 'ABC123'
sequence1 = 'ATCGTACGATCGATCGATCGCTAGACGTATCG'
fasta1 = '>' + header1 + '\n' + sequence1 + '\n'
print(fasta1)
print('>', header1, '\n', sequence1, sep='')

>ABC123
ATCGTACGATCGATCGATCGCTAGACGTATCG

>ABC123
ATCGTACGATCGATCGATCGCTAGACGTATCG


In [3]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

