![python image](https://www.python.org/static/img/python-logo@2x.png)

# <center> Python Tutorial -Session 1b</center>


## What we covered in our previous session:
- Print/Input 
- Variables
- Function
- Loop

 - ### [Data Structures](#data)
     - [Tuple](#tuple)
     - [List](#list)
     - [Dictionary](#dictionary)
     - [Set](#set)
 - ### [String Operations](#string)
 - ### [Saving to a file](#file)


## <a id="data">Data Structures</a>

In python (and any other programming languages), data structures are basically containers that allow you to store a collection of data.<br>
There are multiple builtin data structures in python.  The table below gives a breakdown on when they can be used.  

<b>Tuple</b>(Immutable): Use with round brackets, i.e. <font size="+2">()</font>

<b>List(Mutable)</b>: Use with square brackets, i.e. <font size="+2"> []</font>

<b>Sets(No duplicate elements)</b>:Use with curly brackets, i.e. <font size="+2"> {} </font>

<b>Dictionaries(Key-value pairs)</b>:Use with curly brackets <font size="+2"> {} </font>

### <a id="tuple">Tuple</a>

A tuple is one of the ways to store a collection of data.  It is "immutable" in the sense that you are unable to change the values.  This is in contrast to a list, which allows you to change the data. You use round brackets to denote a tuple as follow

In [1]:
a=(1,2,3)

Here are some simple stuff that you can do with a tuple.  You can print the length of your collections, find the sum, find the minimum, find the maximum, etc.

In [2]:
type(a)

tuple

The following command, where I am trying to change the value of index 0, will not work in a tuple. This will result in a type error.

a=(1,2,3)

a[0]=10

By the way, you can use indexing and slicing with tuple. In python, the first index starts at 0, and the last index is equal to length of the collection minus 1.

In [5]:
a=(1,2,3)
print(a[0])
print(a[len(a)-1])
print(a[2])

1
3
3


In order to create a slice of your collection, you use the : operator, and you supply the range.  Note that the last index value does not count

In [6]:
a=(1,2,3,4,5,6,7)
print(a[0:3])

(1, 2, 3)


In [7]:
a[-2]

6

You can find the length of a tuple as follow:

In [8]:
len(a)

7

You can also sum the tuple

In [9]:
sum(a)

28

Find the minimum.

In [10]:
min(a)

1

Find the maximum

In [11]:
max(a)

7

Note that you can sort a tuple.  However the output is a list as the square bracket below suggests.

In [14]:
a=(11, 9,4,6)
print(type(a))
print(sorted(a))
print(type(sorted(a)))
print(tuple(sorted(a)))

<class 'tuple'>
[4, 6, 9, 11]
<class 'list'>
(4, 6, 9, 11)


You can use either a tuple(or a list as I will show below) with a loop.  The first example is using the index.  We use the range command in conjunction with len to get the index

In [15]:
a=("apple","banana","pineapple","mango")
for i in range(len(a)):
    print(a[i])

apple
banana
pineapple
mango


The "in" operator can also be used to test whether an item can be found in a tuple or not, as follows.

In [16]:
"apple" in a

True

In [17]:
"orange" in a

False

Alternatively, you can use "in" as follows:

In [18]:
b=("apple","banana","pineapple","mango")
for fruit in b:
    print(fruit)

apple
banana
pineapple
mango


Concatenating tuple. It can be done as follows

In [19]:
a=(1,2,3)
b=("apple","banana","orange")
c=a+b
print(c)

(1, 2, 3, 'apple', 'banana', 'orange')


##### Exercise 10
<i>  Given a tuple of motifs and a sequence, print the motif if it is found in the sequence. </i>.

<i> If none of the motifs are found in the sequence, return "None of the motifs are found in the sequence"
    
Time: <b> 2 minutes </b>

------------

In [20]:
def find_motifs(motifs, sequence):
    ##Code here
        pass 

In [21]:
find_motifs(("ATGC","GGCA","ATGG"),"ATGGCA")

In [22]:
find_motifs(("AGCA","GGCA","GGGG"),"GGGGAAGCCCC")

In [23]:
find_motifs(("AAAAA","GGGGG"),"ATGCA")

### <a id="list">List </a>
A list is another way to store a collection of data.  A list starts and ends with square brackets(i.e. "[" and "]"). Some example of lists below
You use lists if you want to change your data and perform manipulations on it.
Tuple on the other hand is faster. If you want speed and you don't want to change the original data, you go with tuple.

In [24]:
list1=[1,2,3,4,5]
list2=["a","b","c","d","e"]
list3=["a",1,2,3]
print(f"list 1:{list1}")
print(f"list2:{list2}")
print(f"list 3:{list3}")

list 1:[1, 2, 3, 4, 5]
list2:['a', 'b', 'c', 'd', 'e']
list 3:['a', 1, 2, 3]


A list is "mutable", meaning that you are able to change values.  Note that a list is 0-indexed.  

In [25]:
list1=[1,2,3,4,5]
print(f"List 1 before modification: {list1}")
list1[0]=10
print(f"List 1 after modification:{list1}")

List 1 before modification: [1, 2, 3, 4, 5]
List 1 after modification:[10, 2, 3, 4, 5]


Be careful when you assign a list to another variable.  If this second list is changed, the first list will also change

In [26]:
list1=[1,2,3,4,5]
list2=list1
print(f"List 1 before changing List2 {list1}")
list2[0]=10
print(f"List 1 after changing list2 {list1}")


List 1 before changing List2 [1, 2, 3, 4, 5]
List 1 after changing list2 [10, 2, 3, 4, 5]


Notice above that list 1 was never changed.  Only list 2 was changed.  However, list 1 also changed when list 2 was changed. In order to rectify this, you can use the .copy() method

In [27]:
list1=[1,2,3,4,5]
list2=list1.copy()
print(f"List 1 before changing List 2{list1}")
list2[0]=10
print(f"List 1 after changing List 2{list1}")

List 1 before changing List 2[1, 2, 3, 4, 5]
List 1 after changing List 2[1, 2, 3, 4, 5]


You can create slices of a list

In [28]:
list1=[1,2,3,4]
list2=list1[0]
print(list2)

1


The ":" operator can be used to specify the range.  For example list1[0:4].  Note that index #4 is not counted. The slice only goes from 0 to 3. Notice though that you when you assign using the slice method, then list 1 is not modified.

In [29]:
list1=[1,2,3,4]
list2=list1[0:4]
list2[0]=12
print(list2)
print(list1)

[12, 2, 3, 4]
[1, 2, 3, 4]


Notice though that when you assign list 2 to list 1 using the "=" operator and modify list 2, list 1 also changes.  

In [31]:
list1=[1,2,3,4]
list2=list1
list2[2:4]=[9,6]
print(list2)
print(list1)

[1, 2, 9, 6]
[1, 2, 9, 6]


If you want to assign to a new list, assign the slice directly

In [32]:
list1=[1,2,3,4]
list2=list1[2:4]
print(f"Original list 2{list2}")
list2[0]=10
print(f"List 2 after changing the first index. {list2}")
print(f"List 1 after changing the first index{list1}")

Original list 2[3, 4]
List 2 after changing the first index. [10, 4]
List 1 after changing the first index[1, 2, 3, 4]


Another way to copy from 1 list to another, where modifying the second list does not modify the first list is by using[:]

In [33]:
list1=[1,2,3,4]
list2=list1[:]
list2[0]="apple"
print(list1)
print(list2)

[1, 2, 3, 4]
['apple', 2, 3, 4]


You can also index from the reverse using negative value

In [34]:
list1=[1,2,3,4]
print(list1[-1])

4


Remember that starting index is 0, but if you want to index from the reverse, the starting index is -1

In [35]:
list1[0]

1

You can also slice from reverse as follows. As usual, the last index does not count.

In [36]:
list1=[1,2,3,4,5,6]
print(list1[-3:-1])

[4, 5]


You can also specify how much you want to step by as followss:

In [37]:
a=[1,2,3,4,5,6]
print(a[0:6:2])

[1, 3, 5]


Actually, you do not need to specify the start and end.  The following will skip by 2 starting with the first index

In [38]:
print(a[::2])

[1, 3, 5]


We can also mix and match positive and negative indices as follows:

In [39]:
print(list1[2:-1])

[3, 4, 5]


Examples of different operations you can perform on lists.

We can take the same, find the length, find the minimum and maximum, etc.

In [41]:
list1=[1,2,3]
sum(list1)

6

In [42]:
len(list1)

3

In [43]:
min(list1)

1

In [44]:
max(list1)

3

Converting to a string.  Is this what we want?

In [45]:
str(list1)

'[1, 2, 3]'

In [46]:
print(str(list1)[0])

[


The above output is probably not what you are looking for.  You probably want to convert each number to a character.  I will explain how to do that later.

Anyway, we can also sort a list.  There are two ways to do this.  First, using sorted, and the other using sort().  I will explain the difference below.

In [47]:
list1=[10,14,17,6,11,1,15]

In [48]:
sorted(list1)

[1, 6, 10, 11, 14, 15, 17]

In [49]:
list1

[10, 14, 17, 6, 11, 1, 15]

In [50]:
list1.sort()

In [51]:
list1

[1, 6, 10, 11, 14, 15, 17]

The difference between list1.sort() and sorted(list1) is that is with list1.sort() the list is irreversably sorted. 

In [52]:
sorted(list1,reverse=True)

[17, 15, 14, 11, 10, 6, 1]

There is also the reversed function. Note that using reversed on a list makes it an iterator class. You need to use the list() function to convert it to a list.

In [53]:
reversed(list1)

<list_reverseiterator at 0x20c86f97460>

In [54]:
list(reversed(list1))

[17, 15, 14, 11, 10, 6, 1]

Reversing a list can also be done using slicing

In [55]:
a=list(range(1,11))
print(a)
print(f"reversed:{a[::-1]}")

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
reversed:[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]


Converting a range (remember that range is used in a for loop) to a list.  Here again, you use the list function

In [56]:
list(range(1,10))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Converting a tuple

In [57]:
tuple_example=(1,2,3)
list(tuple_example)

[1, 2, 3]

In [58]:
list1

[1, 6, 10, 11, 14, 15, 17]

The enumerate function can be used to enumerate items in the list

In [59]:
list1=['apple','banana','cat','dog']
list(enumerate(list1))

[(0, 'apple'), (1, 'banana'), (2, 'cat'), (3, 'dog')]

This can be combined with a  for loop as follows:

In [60]:
for i,j in enumerate(list1):
    print(f"Item number {i+1}: {j}")

Item number 1: apple
Item number 2: banana
Item number 3: cat
Item number 4: dog


The zip function allows us to use two lists in conjunction as follows:

In [62]:
list1=['apple','ball','cat','dog']
list2=['red','blue','orange','purple']
for k,j in zip(list1,list2):
    print(j,k)

red apple
blue ball
orange cat
purple dog


The zip() function accepts multiple lists.

In [63]:
list1=['apple','ball','cat','dog']
list2=['red','blue','orange','purple']
list3=["a","b","d","d"]
for k,j,l in zip(list1,list2,list3):
    print(j,k,l)

red apple a
blue ball b
orange cat d
purple dog d


Previously, I showed that you can convert a tuple to a list.  You can also convert list to a tuple.

In [64]:
a=(1,2,3)
b=[1,2,3]
print(list(a))
print(tuple(b))

[1, 2, 3]
(1, 2, 3)


<b>List manipulations.</b>
First is extend. If you have two lists, extend will concatenate the two lists

In [65]:
a=[1,2,3]
b=[4,5,6]
a.extend(b)
print(a)

[1, 2, 3, 4, 5, 6]


The append command on the other hand is used to append values to a list.

In [66]:
a=[1,2,3]
b=4
a.append(b)
print(a)

[1, 2, 3, 4]


Example of passing a list to a function:

In [67]:
def example_list(data):
    data.append(3)
    return(data)

In [68]:
j=[1,2,3,4,5]

In [69]:
example_list(j)
print(j)

[1, 2, 3, 4, 5, 3]


In [70]:
example_list(j)
print(j)

[1, 2, 3, 4, 5, 3, 3]


The original list that we passed to the function also got modified as the print(j) function indicated.  If we do not want the original list to be modified, what should we do?

In [71]:
def example_list2(data):
    data2=data.copy()
    data2.append(3)
    return(data2)

In [72]:
j=[1,2,3,4,5]

In [73]:
example_list2(j)

[1, 2, 3, 4, 5, 3]

In [74]:
print(j)

[1, 2, 3, 4, 5]


##### Exercise 11
<i>Write a function called reverse_complement(), where, given a DNA sequence in the form of a list, you return the reverse complement, also in the form of a list.
    <b> String manipulation will be taught later </b>

<i>A reverse complement is where:
    
<i>A changes to T
    
<i> T changes to A
    
<i> G changes to C
    
<i> C changes to G</i>


Example:  
    reverse complement of ATAC is: GTAT


Time:<b> 2 minutes </b>


In [75]:
def reverse_complement(DNA):
    DNA_copy=DNA.copy()
    #code goes here
    return DNA_copy

In [76]:
reverse_complement(['A','T','G','C'])
reverse_complement(['A','A','T','T','G','G','C'])

['A', 'A', 'T', 'T', 'G', 'G', 'C']

-----------------


##### Exercise 12:
<i>You are given a list of tuples with variable length of items per tuple.Extract the last number from each tuple.
    
<i>Then use this extracted tuple,return a sorted list (with smallest number first). </i>


<i> For example:  If the list of tuples is </i><br>
<i> [(3,4,5),(1,2,3,4,8),(8,9,1)] </i>

<i> Then return </i>

<i> [1,5,8] </i>



In [223]:
def new_list(tuple_list):
    # Your code here
    pass

In [224]:
new_list([(1,2,3,4,5),(1,2,3),(4,5,6,7)])
new_list([(9,9,9,9),(1,12,45),(6,7,10)])

### <a id="dictionary"> Dictionary </a>

A dictionary is a type of data structure, where a value is associated to a key. We declare that a variable is a dictionary using "{}".

In [79]:
a={}
a['John']="(123)456-6780"
a['Jane']="(198)765-4321"
a['Doe']="(423)541-6234"

Another way to create a dictionary:

In [80]:
b={'John':'(123)456-7890','Jane':'(987)654-3210'}

To access a key you can use the square bracket as follows

In [81]:
a['John']

'(123)456-6780'

To make it look a little bit better:

In [82]:
print(f"John's phone number is {a['John']}")

John's phone number is (123)456-6780


The update method can be used to insert a key-value pair to a dictionary.

In [85]:
a.update({"Michael":"(245)678-4310"})

In [86]:
print(a)

{'John': '(123)456-6780', 'Jane': '(198)765-4321', 'Doe': '(423)541-6234', 'Michael': '(245)678-4310'}


In [87]:
print(len(a))

4


<b>Removing from a dictionary.</b>  
There are a few ways to do this.

First is pop, where you specify the key

In [88]:
a.pop("John")
print(a)

{'Jane': '(198)765-4321', 'Doe': '(423)541-6234', 'Michael': '(245)678-4310'}


Using popitem() removes the last key:value pair

In [89]:
a.popitem()

('Michael', '(245)678-4310')

In [90]:
print(a)

{'Jane': '(198)765-4321', 'Doe': '(423)541-6234'}


Finally you can use the del command.

In [91]:
del a["Jane"]

In [92]:
print(a)

{'Doe': '(423)541-6234'}


#### Looping Through a dictionary
You can loop through a dictionary many different ways.  Some examples below

In [93]:
a={}
a['John']="(123)456-6780"
a['Jane']="(198)765-4321"
a['Doe']="(423)541-6234"

In the example below, the "x" represents keys.  You can print values as follows

In [94]:
for x in a:
    print(x)

John
Jane
Doe


You can print values using the key as follows:

In [95]:
for x in a:
    print(a[x])

(123)456-6780
(198)765-4321
(423)541-6234


Another way is to use the keys() method.  This will loop through keys

In [96]:
for x in a.keys():
    print(f"{x},{a[x]}")

John,(123)456-6780
Jane,(198)765-4321
Doe,(423)541-6234


You can also loop through values as follows

In [97]:
for x in a.values():
    print(x)

(123)456-6780
(198)765-4321
(423)541-6234


There is also the items() function which can loop through both key-value pairs.

In [98]:
for key,val in a.items():
    print(f"{key}'s phone number is {val}")

John's phone number is (123)456-6780
Jane's phone number is (198)765-4321
Doe's phone number is (423)541-6234


You can also store lists in a dictionary

In [99]:
c={'fruits':['apple','orange','pineapple','mango'],'vegetables':['spinach','broccoli','zucchini','mushroom']}

In [100]:
print(c['fruits'])

['apple', 'orange', 'pineapple', 'mango']


In [101]:
d={'fruits':('apple','orange','pineapple','mango'),'vegetables':('spinach','broccoli','zucchini','mushroom')}

In [102]:
print(d['fruits'])

('apple', 'orange', 'pineapple', 'mango')


In [103]:
type(c["fruits"])

list

In [104]:
type(d["fruits"])

tuple

As the two examples above show, the values can be lists or tuples

In [105]:
fruity="fruits"
print(c[fruity])

['apple', 'orange', 'pineapple', 'mango']


In [106]:
{("a","b","c"):"def"}

{('a', 'b', 'c'): 'def'}

In the example above, the key is a tuple, while value is a string.  Note that dictionary keys cannot be a list, while a tuple is allowed.

The get method can be used to print value for a key.  However, if the key does not exist, you can use it to print a default value.

In [112]:
print(c.get("fruits"))
print(c.get("food"))
print(c.get("food","rice"))

['apple', 'orange', 'pineapple', 'mango']
None
rice


The setdefault  method can be used to add default values to a dictionary if it does not exist.

In [113]:
c.setdefault("juice",['orange juice','lemon juice','pineapple juice'])

['orange juice', 'lemon juice', 'pineapple juice']

In [114]:
c

{'fruits': ['apple', 'orange', 'pineapple', 'mango'],
 'vegetables': ['spinach', 'broccoli', 'zucchini', 'mushroom'],
 'juice': ['orange juice', 'lemon juice', 'pineapple juice']}

In [115]:
c.setdefault("juice",['tomato juice','apple juice','mango juice'])
print(c['juice'])

['orange juice', 'lemon juice', 'pineapple juice']


In the example above, if values are already present, then setdefault will not modify it.

<b> Exercise 13 </b>

<i> You are given a dictionary, where the keys are the different codons, and the values are the amino acids.  </i>

<i> You are given a coding cDNA sequence. Translate it to protein sequence.  </i>

<i> Note that you can loop through a string object </i>

Time: <b> 5 minutes </b>


In [118]:
codons={"TTT":"F",
                      "TTC":"F",
                      "TTA":"L",
                      "TTG":"L",
                      "CTT":"L",
                      "CTC":"L",
                      "CTA":"L",
                      "CTG":"L",
                      "ATT":"I",
                      "ATC":"I",
                      "ATA":"I",
                      "ATG":"M",
                      "GTT":"V",
                      "GTC":"V",
                      "GTA":"V",
                      "GTG":"V",
                      "TCT":"S",
                      "TCC":"S",
                      "TCA":"S",
                      "TCG":"S",
                      "CCT":"P",
                      "CCC":"P",
                      "CCA":"P",
                      "CCG":"P",
                      "ACT":"T",
                      "ACC":"T",
                      "ACG":"T",
                      "ACA":"T",
                      "GCT":"A",
                      "GCC":"A",
                      "GCA":"A",
                      "GCG":"A",
                      "TAT":"Y",
                      "TAC":"Y",
                      "TAA":"*",
                      "TAG":"*",
                      "CAT":"H",
                      "CAC":"H",
                      "CAA":"Q",
                      "CAG":"Q",
                      "AAT":"N",
                      "AAC":"N",
                      "AAA":"K",
                      "AAG":"K",
                      "GAT":"D",
                      "GAC":"D",
                      "GAA":"E",
                      "GAG":"E",
                      "TGT":"C",
                      "TGC":"C",
                      "TGA":"*",
                      "TGG":"W",
                      "CGT":"R",
                      "CGC":"R",
                      "CGA":"R",
                      "CGG":"R",
                      "AGT":"S",
                      "AGC":"S",
                      "AGA":"R",
                      "AGG":"R",
                      "GGT":"G",
                      "GGC":"G",
                      "GGA":"G",
                      "GGG":"G"}

In [225]:
def translate_to_AA(sequence,codons):
    ##Your code goes here
    pass

In [120]:
sequence="ATGGCGGCTAACGCTACTACCAACCCGTCGCAGCTGCTGCCCTTAGAGCTTGTGGACAAATGTATAGGATCAAGAATTCACATCGTGATGAAGAGTGATAAGGAAATTGTTGGTACTCTTCTAGGATTTGATGACTTTGTCAATATGGTACTGGAAGATGTCACTGAGTTTGAAATCACACCAGAAGGAAGAAGGATTACTAAATTAGATCAGATTTTGCTAAATGGAAATAATATAACAATGCTGGTTCCTGGAGGAGAAGGACCTGAAGTGTGA"
translate_to_AA(sequence,codons)

Now I am going to introduce the \*args and \*\*kwargs argument to pass variable number of arguments to a function.  Basically speaking, \* args is a list, while \*\* kwargs is a dictionary

In [121]:
def sum_variable(*args):
    sm=0
    for arg in args:
        sm+=arg
    return sm

In [122]:
sum_variable(1,2,3,4,5,6,7,8,9,10)

55

In [123]:
sum_variable(10,20,30)

60

In [124]:
def sum_variable2(**kwargs):
    sm=0
    for key,value in kwargs.items():
        sm+=value
    return sm

In [125]:
print(sum_variable2(a=1,b=2,c=3,d=4))

10


Essentially speaking, *args expects arguments without keywords, while **kwargs requires keyword, along with the value.

### <a id="set"> Set </a>

Sets are used for storing unique values.  If there are two items with the same value, it will still store just one value.  Sets are declared within curly brackets.  Note that, while dictionaries also start with curly brackets, there is no key-value pairs like a dictionary.
Sets are unordered.  Therefore you cannot use it as an index.

In [126]:
set1={"apple","banana","orange"}
print(f"Set1: {set1}")

set2={"apple","banana","orange","apple","apple"}
print(set2)



Set1: {'banana', 'orange', 'apple'}
{'banana', 'orange', 'apple'}


Sets can also be created by using the set() constructor.  Note that you need the double bracket.  Otherwise, you end up with an error message.

In [127]:
set3=set(("apple","apple","banana","banana"))
print(set3)

{'apple', 'banana'}


The "in" keyword can be used to see if an item can be found in a set

In [128]:
"orange" in set1

True

Looping through a set can be achieved with a for loop as follows.  You treat the set as an iterator

In [129]:
for val in set1:
    print(val)

banana
orange
apple


You can add to a set.  Use the add() method to add to a set.

In [130]:
set1.add("Mango")
print(set1)

{'Mango', 'banana', 'orange', 'apple'}


In [131]:
print(len(set1),len(set2))
print(type(set1))

4 3
<class 'set'>


You can combine multiple sets using the update() method as follows:

In [132]:
set5={"orange","banana","apple"}
print(f"Set 5:{set5}")
set6={"mango","pear","pomegranate"}
print(f"Set 6: {set6}")
set5.update(set6)
print(f"New Set 5: {set5}")

Set 5:{'banana', 'orange', 'apple'}
Set 6: {'pear', 'pomegranate', 'mango'}
New Set 5: {'pear', 'mango', 'banana', 'pomegranate', 'apple', 'orange'}


We can also use a list to update a set rather than another set as follows

In [133]:
set5={"orange","banana","apple"}
print(f"Set 5:{set5}")
set6=["mango","pear","pomegranate","mango","apple","apple"]
print(f"Set 6: {set6}")
set5.update(set6)
print(f"New Set 5: {set5}")

Set 5:{'banana', 'orange', 'apple'}
Set 6: ['mango', 'pear', 'pomegranate', 'mango', 'apple', 'apple']
New Set 5: {'banana', 'apple', 'orange', 'pear', 'pomegranate', 'mango'}


Items can be removed from a set by using either the remove() or discard() methods.  They key difference between them is that remove prints an error message if an item does not exist.  Discard does not, as can be seen in the two examples below.

In [134]:
set7={"a","b","c","d"}
print(set7)
set7.remove("a")
print(set7)
set7.remove("a")

{'a', 'b', 'c', 'd'}
{'b', 'c', 'd'}


KeyError: 'a'

In [135]:
set7={"a","b","c","d"}
print(set7)
set7.discard("a")
print(set7)
set7.discard("a")

{'a', 'b', 'c', 'd'}
{'b', 'c', 'd'}


The pop() method removes a random item from the set

In [136]:
print(set7)
set7.pop()
print(set7)

{'b', 'c', 'd'}
{'c', 'd'}


Clear will clear a set, although the variable name exists.  The del keyword completely deletes a set, including variable name

In [137]:
set7.clear()
print(set7)

set()


In [138]:
del set7

If you try to print() set 7, it will generate an error message

Set operations can be performed using set as follows.  First is the union.  This prints fruits found in any of the two sets.

In [139]:
set_fruit={"apple","banana","pear","orange","pomegranate"}
set_fruit2={"apple","pear","pineapple","grape","watermelon","orange"}

In [140]:
set_fruit_union=set_fruit.union(set_fruit2)
print(set_fruit_union)
print(set_fruit)
print(set_fruit2)

{'banana', 'apple', 'orange', 'pineapple', 'pear', 'grape', 'pomegranate', 'watermelon'}
{'pear', 'banana', 'pomegranate', 'apple', 'orange'}
{'pineapple', 'pear', 'watermelon', 'grape', 'apple', 'orange'}


Next is the intersection.  This prints only the overlap.  The intersection() method creates a new set

In [141]:
set_fruit_intersection=set_fruit.intersection(set_fruit2)
print(set_fruit_intersection)
print(set_fruit)
print(set_fruit2)

{'pear', 'apple', 'orange'}
{'pear', 'banana', 'pomegranate', 'apple', 'orange'}
{'pineapple', 'pear', 'watermelon', 'grape', 'apple', 'orange'}


On the other hand, the intersection_update() method updates the set itself

In [142]:
print(set_fruit)
print(set_fruit2)
set_fruit.intersection_update(set_fruit2)
print(set_fruit)
print(set_fruit2)

{'pear', 'banana', 'pomegranate', 'apple', 'orange'}
{'pineapple', 'pear', 'watermelon', 'grape', 'apple', 'orange'}
{'pear', 'apple', 'orange'}
{'pineapple', 'pear', 'watermelon', 'grape', 'apple', 'orange'}


The difference() method will print items in <b> set1 that are not found in set 2 </b>

In [143]:
set_fruit={"apple","banana","pear","orange","pomegranate"}
set_fruit2={"apple","pear","pineapple","grape","watermelon","orange"}
set_fruit_difference=set_fruit.difference(set_fruit2)
print(f"{set_fruit_difference}")
print(f"Set_fruit is not changed:{set_fruit}")

{'pomegranate', 'banana'}
Set_fruit is not changed:{'pear', 'banana', 'pomegranate', 'apple', 'orange'}


There is also the difference_update method which will update set1.

In [144]:
set_fruit={"apple","banana","pear","orange","pomegranate"}
set_fruit2={"apple","pear","pineapple","grape","watermelon","orange"}
set_fruit.difference_update(set_fruit2)
print(f"Set_fruit is not changed:{set_fruit}")

Set_fruit is not changed:{'banana', 'pomegranate'}


The Symmetric difference method can be used to <b>find  all items in all sets that are not overlapping. </b>

In [145]:
set_fruit={"apple","banana","pear","orange","pomegranate"}
set_fruit2={"apple","pear","pineapple","grape","watermelon","orange"}
set_fruit_symmetric=set_fruit.symmetric_difference(set_fruit2)
print(f"{set_fruit_symmetric}")
print(f"Set_fruit is not changed:{set_fruit}")

{'pineapple', 'watermelon', 'banana', 'grape', 'pomegranate'}
Set_fruit is not changed:{'pear', 'banana', 'pomegranate', 'apple', 'orange'}


The symmetric update method updates the set itself.

In [146]:
set_fruit.symmetric_difference_update(set_fruit2)
print(set_fruit)

{'pineapple', 'banana', 'watermelon', 'grape', 'pomegranate'}


There are also methods to test whether two sets are disjoint, if one is a subset of another, or if one is a superset of another. Subset means that set1 is found fully in set2.  Superset means that set2 has all items in set1.  Superset is the opposite of subset

In [147]:
set1={"a","b","c"}
set2={"d","e","f"}
set1.isdisjoint(set2)

True

In [148]:
set1={"a","b","c"}
set2={"a","b","c","d","e","f"}
set1.issubset(set2)

True

In [149]:
set2.issuperset(set1)

True

<b>Exercise 14</b>

<i>You are given a sequence.  Return "Unique" if all characters in a sequence are unique.  Otherwise, return "Not Unique"</i>

<b>Time: 2 Minutes</b>

In [151]:
def is_unique(sequence):
   pass

In [154]:
is_unique("AATT")
is_unique("ATGC")

-------------------------------------

## <a id="string"> String Operations </a>

There are many operations that can be performed on strings. 
This becomes important, especially in Bioinformatics, as we have to deal with sequences all the time(e.g. AAAAAAAAAAAATTTTTTTTTTTGGACA)

Just like lists and tuples, you can access characters of a string using the index number.  You can also slice a string.  Note that string is immutable, meaning that you cannot change strings using indexing.  You can slice it and save it to a different variable though.

Below are some basic examples.



In [155]:
a="APple"
print(a[0])
print(a[0:3])

A
APp


In the above example, notice that a few letters are upper case, and others are lower case.  Here is how to convert everything to uppercase or lowercase. Of course, you can always store the converted values to a variable.  We can also convert to a title format, where the first letter for each word is upper-case. The capitalize method on the other hand only has the first letter of the first word as upper-case. We can also test if a string is all uppercase or all lowercase. There is the isspace method to test if all characters are spaces. Finally there is the swapcase command that can be used to convert lowercase to uppercase, and uppercase to lowercase.

In [157]:
print(a.upper())
print(a.lower())
print("characterization of the p53 gene".title())
print("characterization of the p53 gene".capitalize())
print("Apple".islower())
print("Apple".isupper())
print("Apple".istitle())
print("    ".isspace())
print("Apple".swapcase())

APPLE
apple
Characterization Of The P53 Gene
Characterization of the p53 gene
False
False
True
True
aPPLE


In [159]:
print("apple".center(100))
print("apple".center(0),)
print("apple".center(200))

                                               apple                                                
apple
                                                                                                 apple                                                                                                  


Find and index will return where in the string a character is found.  The difference between index and find is that find returns -1 when a character is not found.  On the other hand, index generates an error message

In [160]:
print("apple".find("e"))
print("apple".find("p"))
print("apple".find("z"))

4
1
-1


Note that what you want to find does not have to be just a single character.  It can be multiple letters

In [161]:
print("apple".find("pl"))

2


In [162]:
print("apple".index("e"))
print("apple".index("p"))
print("apple".index("z"))

4
1


ValueError: substring not found

You can combine the index command with list indexing to extract strings, etc.

In [163]:
idx="apple".index("p")
"apple"[idx:]

'pple'

The rfind and rindex methods can be used to find the last match instead of the first match as in the above examples.

In [164]:
print("apple".rfind("p"))
print("apple".rindex("p"))

2
2


We can also test whether a string is alphabetic, numeric or alphanumeric

In [165]:
print("apple".isalpha())
print("apple".isalnum())
print("apple123".isalnum())
print("123".isnumeric())
print("123".isalnum())

True
True
True
True
True


Splitting a string can be achived by using the split method. It will return a list

In [166]:
"apple is yummy".split()

['apple', 'is', 'yummy']

Note that split cannot be used to split a word into character.  For that you can use either a for loop or a list comprehension as follows:

In [167]:
 sp=[]
for j in "apple":
    sp.append(j)
print(sp)

['a', 'p', 'p', 'l', 'e']


The above code also demonstrates that you can loop through a string

The partition function is used to partition a string into 3 parts.  You need to specify where you want to partition the string as follows. The first example below splits that string at "is". Whatever you specify is the middle value.  The next one splits it at "apple".  Notice that the third value is empty.

In [168]:
print("apple is yummy".partition("is"))
print("apple is yummy".partition("apple"))
print("apple is yummy".partition("yummy"))

('apple ', 'is', ' yummy')
('', 'apple', ' is yummy')
('apple is ', 'yummy', '')


There is also a join command, that allows you to join multiple strings.  In the examples below, the first line joins a list using a single space as a sepeator.  The second line joins the list without any separators.  The third example is using a tuple.  You mainly need an iterator of some sort for this to work.

In [169]:
print(" ".join(["apple", "is","yummy"]))
print("".join(['a','p','p','l','e']))
print(" ".join(("apple","is","yummy")))

apple is yummy
apple
apple is yummy


You can also combine split and join as follows.  Here I replaced the space with underscore.

In [170]:
"_".join("Apple is yummy".split())

'Apple_is_yummy'

##### Exercise 15

<i>Write a function called reverse_complement_str(), where, given a DNA sequence, you return the reverse complement.  This exercise is similar to a previous exercise on lists, but here we are manipulating strings.
    
<i>A reverse complement is where:
    
<i>A changes to T
    
<i> T changes to A
    
<i> G changes to C
    
<i> C changes to G</i>


Example:  
    reverse complement of ATAC is: GTAT


Time:<b> 3 minutes </b>


In [171]:
def reverse_complement_str(sequence):
    pass
    #code goes here

In [172]:
reverse_complement_str("AAAATTTTCG")

-------------------

Stripping characters can be done multiple ways

In [173]:
(" apple ".strip())


'apple'

In [174]:
(" apple ".lstrip())

'apple '

In [175]:
(" apple ".rstrip())

' apple'

In [176]:
("#number ".strip("#"))

'number '

In [177]:
("#number".rstrip("#"))

'#number'

In [178]:
("#number".lstrip("#"))

'number'

You can specify what you want to strip.  In the above examples, we specified "#"

In [179]:
a="NNNNNNNAAAATTTTGGGCCCNNNNN"
a.strip("N")

'AAAATTTTGGGCCC'

In a fasta file, sometimes, you have "N" instead of ATGC.  Here I stripped all the leading and trailing "N"

The handy maketrans function. Convert A to T and G to C using maketrans.  In the example below, we have a string.  We want to find its reverse complement.  The way I achieve this is by first using slicing to reverse the string.  I step by -1.  Next, using maketrans, I specify that I want A to be converted to T, T to A, G to C, etc.

In [181]:
a="AATTGGCC"
table=str.maketrans("ATGC","TACG")

In [182]:
print(a[::-1])
a[::-1].translate(table)

CCGGTTAA


'GGCCAATT'

As far as I know, there is no standard function available for reversing a string.  You need to use slicing as shown above.

In [183]:
"apple".startswith("a")

True

In [184]:
"apple".endswith("a")

False

In [185]:
"apple".endswith("e")

True

The zfill command can be used for padding.  Note that the width is the total size of the string. If the specified with is less than the length of the string, nothing gets padded.

In [186]:
j="apple"
print(j.zfill(5))
print(j.zfill(6))
print(j.zfill(10))

apple
0apple
00000apple


The ljust and rjust commands can be used to justify a string.  The rjust command adds spaces to the left side, so the the string shifts to the right. The ljust command does the opposite.

In [188]:
j="apple"
print("banana",j.rjust(30),"orange")


banana                          apple orange


In [189]:
j="apple"
print("banana",j.ljust(50),"apple")


banana apple                                              apple


Finally, we have the format method for formatting a string

In [190]:
marbles="I have {marbles:.2f} marbles"
print(marbles.format(marbles=29))
print(marbles.format(marbles=50))
marbles="I have {marbles:.4f} marbles"
print(marbles.format(marbles=29))
print(marbles.format(marbles=50))


I have 29.00 marbles
I have 50.00 marbles
I have 29.0000 marbles
I have 50.0000 marbles


In [193]:
marbles=29
print(f"I have {marbles:.2f} marbles")

I have 29.00 marbles


In [206]:
print("I have {dollars:.1e}dollars".format(dollars=50000))
print("The value is {percent:.1%}".format(percent=0.5))
print("There are {houses:,}".format(houses=10000))

I have 5.0e+04dollars
The value is 50.0%
There are 10,000


In [207]:
dollars=50000
print(f"I have {dollars:.1e} dollars")
percent=0.5
print(f"The value is {percent:.1%}")


I have 5.0e+04 dollars
The value is 50.0%


Many more formatting examples can be found below:

https://www.w3schools.com/python/ref_string_format.asp

<b> Exercise 16 </b>

<i>Given an integer, determine if the sum of its digits is odd or even.  

<i>For example, if the integer is 22, the sum of the digits is 4, which is even.

<i>On the other hand, the sum of 25 is 7, which is odd.
    
Time:<b> 2 minutes </b>

In [215]:
def odd_or_even(intval):
    pass
    #code goes here

In [216]:
odd_or_even(55)

-----------------------------

## <a id="file">Saving to a file </a>

A lot of times, you want to work with files.  
How can you open a file to read it or write to it?  This is where the open() function comes in.
The open function can be used to read or write to a file.
Use "w" to write to a file
Use "r" to read a file
There are other options as well, such as "a" for append.
The difference between f.read() and f.readline() is that f.readline() reads one line at a time.

Different ways to open a file:
"w": for writing. Wipes out preexisting data

"w+": For reading and writing.  Wipes out preexisting data.

"r":  For reading only.

"r+": For reading and writing.  Does not truncate.  Position at the beginning.

"a":  For writing/appending" to an existing file.  Position at the end of the file.

"a+": For reading and writing to an existing file.  Position at the end of the file.


In [218]:
f=open("new_file.txt","w")
f.write(">Chr1\n")
f.write("AAATTGGCCATTTACCCCGATACTA")
f.close()

In [219]:
f=open("new_file.txt","r")
print(f.read())

>Chr1
AAATTGGCCATTTACCCCGATACTA


In [220]:
f=open("new_file.txt","r")
print(f.readline())
print(f.readline())

>Chr1

AAATTGGCCATTTACCCCGATACTA


A better way to write to/read a file is using with and open statements.  That way, you don't need to use the f.close() command.

In [221]:
with open("new_file.txt","r") as f:
    print(f.readline())

>Chr1



In [222]:
with open("new_file.txt","r") as f:
    for line in f:
        print(line)

>Chr1

AAATTGGCCATTTACCCCGATACTA


## What we have covered today:
<b>
    
 - Data Collection
     - Tuple
     - List
     - Dictionary
     - String
 - String Manipulation
 - Opening and Closing files </b>
 --------------------------------------------------------------------------------------------------------------------
 ## What we will cover in Session 2
<b>
    
- List Comprehension
- Dictionary Comprehension
- Different packages
- Different packages (e.g. the Collection package)
- Some quick Bioinformatics examples
</b>