# Lab 09
# Dictionaries & File Input/Output

Today we will learn about a new object in Python called a dictionary. A dictionary is a data object much like a list or a string. However, whereas a list stores ordered information, a dictionary stores information as key‐value pairs, much like the words and definitions in a real dictionary. You will learn how to initialize, add to, and look through dictionaries. You will also learn how to open, read, and write to files within a Python script.

You will write the script yourself by following the code written in this document.

REMEMBER, THE INTERNET IS YOUR FRIEND: https://docs.python.org/2.7/tutorial/

Part A: Dictionaries

A dictionary is a mapping of key‐value pairs. Dictionaries are indexed by keys, which can be any immutable (i.e. unchangeable) type of data: e.g. strings and numbers. A list cannot be a key for a dictionary, because lists can be modified. In this way, dictionaries in Python are much like dictionaries in real life: there is one specific key (e.g. a word) which is associated with a value (e.g. a definition). However, in Python dictionaries are unordered sets of key‐value pairs. Additionally, the keys must be unique, i.e. if you assign a value to a key that already exists and has a key‐value pair, the old value will be overwritten. However, the values of a dictionary can be anything – i.e. a string, a number, a list, or even another dictionary! Let’s create and initialize a dictionary.

In [1]:
#!/usr/bin/python

#whereas lists are declared with square braces: []
#dictionaries are declared with curly braces: {}
dict = {}         #empty dictionary

#use (:) to denote key:value pairs, and (,) to separate pairs
#this dictionary lists the ages of characters by their names
ages = {'Frodo':50, 'Gandalf':2000, 'Aragorn':87}
print (ages)  #prints dictionary
print (len(ages)) #prints number of keys
print (ages.keys())  #prints keys, i.e. names
print (ages['Gandalf'])  #prints Gandalf's age

#adding and updating the dictionary
ages['Legolas'] = 800  #add a new item to the dictionary
ages['Frodo'] += 3  #change a value; Frodo ages 3 years
del ages['Gandalf']  #deletes a key-value pair
print (ages)

#NOTE: you can only change the value of an existing key
#i.e. the following will not work: ages['Vader'] += 1
#because 'Vader' does not exist in the dictionary yet
#keys can only be immutable (i.e. a string, number, etc.)
#values can be anything, even other dictionaries!

#fancy dictionary with strings and lists as values
fancy = {'Joshua':'Loving', 'Sara':[1,2,3,4,5]}
fancy['Colors'] = {'Red':10, 'Blue':20, 'Orange':'Black'}
print (fancy)

#looping through dictionaries
#use brackets [] to access the value of a key
print ('Name ( Age )')
for name in ages:
      print (name, '(', ages[name], ')')

{'Frodo': 50, 'Gandalf': 2000, 'Aragorn': 87}
3
dict_keys(['Frodo', 'Gandalf', 'Aragorn'])
2000
{'Frodo': 53, 'Aragorn': 87, 'Legolas': 800}
{'Joshua': 'Loving', 'Sara': [1, 2, 3, 4, 5], 'Colors': {'Red': 10, 'Blue': 20, 'Orange': 'Black'}}
Name ( Age )
Frodo ( 53 )
Aragorn ( 87 )
Legolas ( 800 )


Part B: opening and writing to files

We can use the function open to open existing files, and to create new file. We can create a new type of object, file, which is a reference to physical information on the hard disk; this is also known as a “file handle”. The function open takes one required argument (the file name), and an optional argument dictating what will be done with the file. The option ‘r’ is for reading information from a file, ‘w’ is for writing to a file (this will overwrite an existing file if used), and ‘a’ is for appending to a file (adding more lines to an existing file).

In [None]:
#create a new file. A file handle is used to access the file object.
handle = open('new_file.txt', 'w')

#handle represents the file object, and has file methods
#the write() method will write to a file
#note: write() is a lot like print, but it will NOT add a newline
#use \n to add a newline (enter) after each line
handle.write('My FIRST line of text! \n')
handle.write('My SECOND line of text! \n')
handle.write('My THIRD line of text! \n')
handle.close()    #close the file in order to access it

#If you look at the file now, you should have 3 lines
#trying to write again (handle.write('FOURTH LINE!') will not work

Part C: reading files

Now that we’ve looked at opening and writing files, let’s work on reading the information from existing files. In order to read information from a file, that file has to exist (i.e. we cannot read information from a file that has not been created). We also need to open the file in order to access the information.

In [11]:
#open the file we just created, to read
#we can also just use open('new_file.txt'); the default is to read
readIt = open('new_file.txt', 'r')

#we can use readline() to read each line individually
#when reading a file, we cannot see what is being read
#unless we print the line or save it to a variable
readIt.readline()  # first line, can't see it
print (readIt.readline())  # second line, can see it
saveIt = readIt.readline()  # third line, saved
readIt.close()

print (saveIt + 'is my favorite')    # note the newline
#we can use strip() to remove the terminal newline
print (saveIt.strip() + ' is my favorite again')

#we can also use a for loop to read in the contents of a file
#here we read in the file using the default read ('r')
readItAll = open('new_file.txt')
counter = 1
for line in readItAll:
      print ('Line #', counter, ': ', line.strip())
      counter += 1

FileNotFoundError: [Errno 2] No such file or directory: 'new_file.txt'

BACKGROUND: The FASTA file format (pronounced “fast‐A”; stands for “fast all”) is a common mode for storing DNA and protein sequence information. A FASTA file containing three sequences looks like this:

Each sequence is preceded by a line starting with “>” that contains header information: e.g., identification numbers for this sequence, the name of the sequence, and the organism from which it is derived, etc. All remaining lines store continuous sequence information. When it comes time to manipulate the sequence within a program, it is usually convenient to have it stored as a single string (rather than split across lines).

LAB TASK: Implement a program in python that will open a FASTA file, concatenate its multiline sequences into single strings, store them in a dictionary using the sequence ID from the sequence header (value between the “|” symbols) as a key, and then print the IDs and sequences as two columns in a new file.

OBJECTIVE(S):

1. Write your code in the block below. Download the file called ‘myoglobin.fasta’ from Blackboard, and make sure to save it in the same location as your lab task script.
2. Create an empty dictionary to store sequence information.
3. Using the open function, open the FASTA file (‘myoglobin.fasta’).
4. When you find a line beginning with the “>” character (a header) extract the ID code between the “|” symbols and start a new dictionary entry using the ID as a key.
5. If a line isn’t a header (i.e. it is a sequence), strip off the newline character at the end and append the sequence to a growing string (to the growing sequence that is the dictionary value) stored within the most recent dictionary key.
6. Close the original file.
7. Open a new file for writing, e.g. “myoglobin_processed.txt”.
8. Loop through the dictionary and write the ID keys and their corresponding sequences to the new file, separating them with a tab (‘\t’) to generate two columns.
9. Close the new file.
10. Run your script. Upload the script and output (‘myoglobin_processed.txt’) to Blackboard for lab credit. Don’t forget comments!

Your output for two sequences should look like this (note how the sequence now is a single string):

P02189	MGLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGNTVLTALGGILKKKGHHEAELTPLAQSHATKHKIPVKYLEFISEAIIQVLQSKHPGDFGADAQGAMSKALELFRNDMAAKYKELGFQG

P04247 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSEDLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRHSGDFGADAQGAMSKALELFRNDIAAKYKELGFQG

In [5]:
import numpy as np
import os
s=os.getcwd() 
print(s)
f=open("blast_results(1).csv",'r')

sum=""
ct=0 
for line in f:
   
 word=line.split(',')   
 print(word[1])
 print(ct)
 ct=ct+word[1].count(";")+1
 num=word[12]
 print(num)
print("total sunject id:", ct) 

C:\Users\GARIMA\Desktop\BF527Application in Bioinformatics\submission due\oct 13


FileNotFoundError: [Errno 2] No such file or directory: 'blast_results(1).csv'