# This is an overview of python commands covered in weeks 1-2 of HUMBIO51, Fall 2019

 <ol>
 <li><a href=#0>Miscellaneous essential concepts </a></li>
 <li><a href=#1>Moving around the file system </a></li>
 <li><a href=#2>Reading and writing files</a></li>
 <li><a href=#3>Strings</a></li>
 <li><a href=#4>Lists</a></li>
 <li><a href=#5>Tuples</a></li>
 <li><a href=#6>Dictionaries</a></li>
 <li><a href=#7>For loops</a></li>
 <li><a href=#8>If statements</a></li>
 <li><a href=#9>Functions</a></li>
</ol>

## Miscellaneous essential concepts <a name='0' />

In [94]:
# This is a comment. Comments are lines that begin with a hash(#) symbol. They are not executed and are used as notes for the programmer. 

In [95]:
#The print function prints text to the screen. 
#Make sure you surround the text you wish to print in quotes 
print ('Hello World')


Hello World


In [96]:
#We can also print variables by "casting" them as strings. For example 
a=1 
print(str(a))

1


In [97]:
# Use the help function when confused about a variable / function 
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [98]:
#We can use the "+" sign to join strings for printing 
print("a is:"+str(a))

a is:1


## Moving around the file system <a name='1' />

In [99]:
#the import statement adds python libraries to your PYTHONPATH and allows the interpreter to find them 
#os library has not been imported: 
os.getcwd() #will fail because the os library has not been imported. 

'/home/jovyan/humbio51_instructor'

In [100]:
import os  #to fix the error, first import the os library. 
os.getcwd() #We execute the getcwd function to print the current working directory 

'/home/jovyan/humbio51_instructor'

In [143]:
os.chdir("/home/jovyan/humbio51_instructor") #changes your working directory. There are shortcuts we can use: 

In [144]:
os.chdir("/home/jovyan/humbio51_instructor/helpers") #starting the directory name with "/" indicates an absolute path. 
os.chdir("..") #moves one directory up (to the parent directory)
os.chdir("helpers") #omitting the parent directories indicates a relative path. 

In [145]:
#lists files in the current directory, a single period (.)  stands for the current directory 
os.listdir('.')

['__init__.py',
 'sequence_alignment_helpers.py',
 'startup.sh',
 'plotly_helpers.py',
 'alignment.py',
 '.DS_Store',
 'RNAseq_helpers.py',
 'central_dogma_helpers.py',
 '__pycache__',
 'viz_sequence.py',
 'kmeans_helpers.py']

In [104]:
#create a new directory 
os.mkdir('mydir')

## Reading and writing files <a name='2' />

In [105]:
#open a file for writing 
f=open("myfile.txt",'w')

In [106]:
#write text to a file 
f.write("hello world! \n goodbye world\n") #use \n to indicate newline characters. 

29

In [107]:
#open a file for reading 
f=open('myfile.txt','r')


In [108]:
#read an open file 
contents=f.read()
print(contents)

hello world! 
 goodbye world



In [109]:
#read the lines contained in a file, split by newlines 
contents=f.readlines()
print(contents) #note! this can be done only once, contents will be empty the second time you iterate through. 

[]


## Working with strings <a name='3' />

In [110]:
#you can concatenate strings with the "+" sign 
a="hello"
b="world"
c=a+b
print(c)

helloworld


In [111]:
#you can find and replace string characters 
c=c.replace("hello","goodbye") #make sure you assign the new string to a new variable, so your edits stick. 
print(c)

goodbyeworld


In [112]:
#string indexing 
sequence="ACGTACGT"
print(sequence[1]) #we can select a single character at a numeric index (0-based)
print(sequence[1:3])#we can slice the string to select several contiguous characters
print(sequence[::-1]) #string reversal


C
CG
TGCATGCA


## Working with lists <a name='4' />

In [113]:
#both of the below are valid ways to create an empty list 
a=[] 
b=list()
print("a"+str(a))
print("b"+str(b))

a[]
b[]


In [114]:
#creating lists with some values in them 
a=[1,2,3,4]
b=['a','b','c','d']
print(a)
print(b)

[1, 2, 3, 4]
['a', 'b', 'c', 'd']


In [115]:
#use the "append" command to add values to a list 
a.append(5)
print(a)

[1, 2, 3, 4, 5]


In [116]:
#we can update list values with new values 
a[0]=7
print(a)

[7, 2, 3, 4, 5]


In [117]:
#use the "join" command to join the elements in a list 
c='.'.join(b)
print(c)


a.b.c.d


In [118]:
#use the "split" command to split a string into a list using a specified delimiter 
d=c.split('.')
print(d)

['a', 'b', 'c', 'd']


In [119]:
#use the "len" command to get the length of a list 
print(len(d))

4


## Tuples <a name='5' />

In [120]:
#Unlike lists, tuples are immutable. 
#Tuples can be defined as follows: 
a=(1,2)
print(a)


(1, 2)


In [121]:
#you can index into a tuple, just like a list 
print(a[0])
print(a[1])

1
2


In [122]:
#However, you cannot reassign values to a tuple 
a[0]=5 #should give an error 

TypeError: 'tuple' object does not support item assignment

## Dictionaries <a name='6' />

In [123]:
#Dictionaries are key value pairs that are efficient to use 
#Dictionaries can be defined in two ways: 

my_dict={} 
my_dict=dict() 

In [124]:
#We can populate a dictionary by assigning keys and values 
my_dict['a']=1
my_dict['b']=2
my_dict['c']=3
print(my_dict)

{'a': 1, 'b': 2, 'c': 3}


In [125]:
#We can query a dictionary by looking up specific keys 
my_dict['a']

1

In [126]:
#we can also print all keys or all values in a dictionary 
my_dict.keys()

dict_keys(['a', 'b', 'c'])

In [127]:
my_dict.values()

dict_values([1, 2, 3])

The alignments output of the pairwise2 algorithm is a list of a data type called tuples. 

* A **list** is denoted by square brackets. For example: 

       alignments=[alignment0, alignment1, alignment2] 

       Individual elements of a list can be referred to using an index. 

       alignments[0]= alignment0  
       alignments[1]= alignment1 
       alignments[2]= alignment2  
       
       Values of lists can be changed, for example:  
       
       alignments[2]= new_alignment2 
       
       alignments=[alignment0, alignment1,new_alignment2]
       
       Values in lists can be different data types. In the alignments example, the data types are tuples. 


* A **tuple** is denoted by parentheses. A tuple behaves similarly to a list, but it is "immutable". That means that once you define a tuple in your script or program, you cannot change, add or remove elements. Why would you ever want this constraint? There are two main reasons: using tuples can make some operations faster due to how they are stored internally in the computer's memory. Additionally, tuples can be used as dictionary keys (more on dictionary keys below), while lists cannot.

       alignments[0]=('sequence1','sequence2',alignment_score,start,stop) 
       
       Individual elements of a tuple can be referred to using an index. 
       
       alignments[0][0]='sequence1' 
       alignments[0][1]='sequence2' 
       alignments[0][2]= alignment_score
       alignments[0][3]=start
       alignments[0][4]=stop
       
       alignments[0][1]='new_sequence2' will give error message: 'tuple' object does not support item assignment
       
       

* As a review, a third data type that we have already seen is a **dictionary**. Dictionaries are denoted by curly braces and define a map of a value to a key.  
       
       In the pre-class assignment, for example, we saw: 
       
       aminoacid_molecular_weight={'M':149,'F':165,'L':131,'A':89}
       
       Elements in dictionaries are referred to as keys and values. 

       a={key1:value1,key2:value2,key3:value3}

       a[key1]=value1

       a[key2]=value2

       a[key3]=value3

## For loops <a name='7' />

In [128]:
#iterate by values 
sequence='AGCCCTCCA'
for i in sequence:
    print (i)


A
G
C
C
C
T
C
C
A


In [129]:
#iterate by index 
for i in [1,2,3,4]: 
    print(sequence[i])

G
C
C
C


In [130]:
#use the range command to get all integers in a given range 
list(range(0,3)) #first number is included, second number is excluded

[0, 1, 2]

In [131]:
#if only one integer is provided to range, the first integer is assumed to be 0 
list(range(3))

[0, 1, 2]

In [132]:
#A common pattern is to combine range with len to iterate through all indices in a list 
for i in range(len(sequence)):
    print(sequence[i])

A
G
C
C
C
T
C
C
A


## If statement logic <a name='8' />

Equals: a == b

Not Equals: a != b

Less than: a < b

Less than or equal to: a <= b

Greater than: a > b

Greater than or equal to: a >= b

In [133]:
#An "if statement" is written by using the if keyword.
a=5
b=10
if a>b: 
    print(str(a) + " is greater than " + str(b))
else: 
    print(str(b) + " is greater than "+ str(a))

10 is greater than 5


In [134]:
#an elif can be used to perform multiple comparisons 
if a>b: 
    print(str(a) + " is greater than " + str(b))
elif a==b: 
    print(str(a) + " is equal to " + str(b))
else: 
    print(str(b) + " is greater than "+ str(a))

10 is greater than 5


## Using functions <a name='9' />

In [135]:
#Functions are re-usable chunks of code that can be called with a single command 
#functions are defined as follows 

#def my_function(arg1,arg2):
#    #do something with arg1, arg2
#    return output  #return an output value 



In [136]:
#For example: 
def multiply(a,b): 
    return a*b


In [137]:
#Functions can be executed after they are defined by passing arguments: 
product=multiply(2,3)
print(product)

6
