# This notebook shows the kinds of operations we can perform on FASTA files

In [1]:
import fasta
import utilities

In [2]:
seqs = utilities.load_sequences("files/example.fasta")

### The function `load_sequences` takes a FASTA file and reads it into a dictionary with sequence IDs as the keys and the full Sequence objects as the values

In [3]:
for seq_id, seq_record in seqs.items():
    print ("The sequence ID is ", seq_id)
    print ("And the full record is %s \n" % (seq_record))

The sequence ID is  ARO89866.1
And the full record is ID: ARO89866.1
Name: ARO89866.1
Description: ARO89866.1 cytochrome P450 Cyp2u1 [Andrias davidianus]
Number of features: 0
/Database=Unknown
Seq('MEAARLDAGLLLAMLPSPGAALLLGTLLLLGALLLQRRFGRVPAGCFPPGPRPW...TTR', SingleLetterAlphabet()) 

The sequence ID is  NP_001106471.1
And the full record is ID: NP_001106471.1
Name: NP_001106471.1
Description: NP_001106471.1 cytochrome P450 family 2 subfamily U member 1 [Xenopus tropicalis]
Number of features: 0
/Database=NCBI
Seq('MSGTLDWKQMGYASWSLLGDCASVSALLLYIALFLGLYLLMGSLWRYYQIIHSN...TKR', SingleLetterAlphabet()) 

The sequence ID is  XP_018106696.1
And the full record is ID: XP_018106696.1
Name: XP_018106696.1
Description: XP_018106696.1 PREDICTED: cytochrome P450 2U1-like [Xenopus laevis]
Number of features: 0
/Database=NCBI
Seq('MSGPGEDSMSGTLDWKQMYYASWSQMSNSASLSTMLLYIVLFLGLYLLMGCLWR...KER', SingleLetterAlphabet()) 

The sequence ID is  XP_018409984.1
And the full record is ID: XP_018409984.1
N

## There are some basic functions we can use to get some quick information on the FASTA file

In [4]:
fasta.print_record_overview(seqs)

There are 5 sequences and the average length of sequence is 532


In [6]:
for seq_id, seq_record in seqs.items():
    print ("The sequence ID is ", seq_id)
    print ("And the full record is %s amino acids long \n" % (len(seq_record.seq)))

The sequence ID is  ARO89866.1
And the full record is 531 amino acids long 

The sequence ID is  NP_001106471.1
And the full record is 547 amino acids long 

The sequence ID is  XP_018106696.1
And the full record is 555 amino acids long 

The sequence ID is  XP_018409984.1
And the full record is 549 amino acids long 

The sequence ID is  XP_006787161.1
And the full record is 481 amino acids long 



## We can also save out our new records at anytime 

In [8]:
fasta.write_fasta(records=seqs, filename="files/new_file.fasta")

AttributeError: 'str' object has no attribute 'id'