# Day 2

## Lists

### Lists
- Lists are a python object that stores a collection of objects
- Lists are enclosed in brackets with each element of the list separated with a comma.
- Lists can contain different data types.
    - For example a list could contain ints, floats, and strings
- Never use the word "list" as a variable name because list is a special python keyword, and using it as a variable name will override python's default functionality

In [None]:
# Define a list
my_stuff = ['hello world', 0, 3.3, "another thing"]
print(my_stuff)

In [None]:
# Get the length of the list (AKA how many items are in the list?)
len(my_stuff)

In [None]:
# len also works on str data types
len('hello world')


### Strings can be converted to lists

In [1]:
# certain objects such as strings can be converted into lists by the list function
dna_sequence = 'AAAGCGTTTCGA'
print(dna_sequence)
# each character will become an element of a list
dna_sequence = list(dna_sequence)
print(dna_sequence)

AAAGCGTTTCGA
['A', 'A', 'A', 'G', 'C', 'G', 'T', 'T', 'T', 'C', 'G', 'A']


In [None]:
# str can also convert lists into strings
my_list = [1, 2, 3]
my_list_as_a_string = str(my_list)
print(type(my_list))
print(my_list_as_a_string)
print(type(my_list_as_a_string))

### Indexing
- Python has zero-based indexing
    - The first element of the list is accessed with 0
- Some other languages like MATLAB use 1 based indexing, meaning the first element would be accessed with 1
- Zero-based indexing may not be intuitive for beginners but it has certain advantages

#### Positive Indexing (start from the left side of list)
| 'hello world' | 0             | 3.3           | 'another thing'|
|---------------|---------------|---------------|----------------|
|0              |1              |2              |3               |

#### Negative indexing (start from the right side of list)
| 'hello world' | 0             | 3.3           | 'another thing'|
|          :--- |          :--- |          :--- |           :--- |
|             -4|             -3|            - 2|              -1|


![](https://media.geeksforgeeks.org/wp-content/uploads/List-Slicing.jpg)

In [None]:
my_stuff = ['hello world', 0, 3.3, "another thing"]
# Get the first element of my_stuff
my_stuff[0]

In [None]:
# Get the second element of my_stuff
my_stuff[1]

In [None]:
# Print the last element of my_stuff
my_stuff[-1]

In [None]:
# you can index a list with a variable
index = 1
my_stuff[index]

In [None]:
# you can reassign the value at an index
print(my_stuff)
my_stuff[2] = 'reassignment'
print(my_stuff)

### List methods
- lists are python objects, objects contain data and functions/methods
    - basically everything in python is an object
- list methods are accessed by putting a dot after the list
- (You can use tab-completion in ipython)
- `my_stuff.append('yet another thing')`

In [None]:
# append an element to the end of the list
my_stuff.append('yet another thing')
print(my_stuff)

In [2]:
# count a specific element of a list
dna_sequence = 'AAAGCGTTTCGA'
# each character will become an element of a list
dna_sequence = list(dna_sequence)
print(dna_sequence)
# How many A nucleotides in the sequence list?
dna_sequence.count('A')

['A', 'A', 'A', 'G', 'C', 'G', 'T', 'T', 'T', 'C', 'G', 'A']


4

In [3]:
# get the index of the first occurrence of an element
dna_sequence.index('T')

6

In [4]:
# insert an element at a specific position in a list
# this will insert "Nucleotide" as the 4th element of the list and shift all other elements
dna_sequence.insert(3,'Nucleotide')
print(dna_sequence)

['A', 'A', 'A', 'Nucleotide', 'G', 'C', 'G', 'T', 'T', 'T', 'C', 'G', 'A']


In [5]:
# pop will remove an element from a list and output that element
word = dna_sequence.pop(3)
print(word)
print(dna_sequence)
# dna_sequence.pop(4) does not need to be assigned to anything

Nucleotide
['A', 'A', 'A', 'G', 'C', 'G', 'T', 'T', 'T', 'C', 'G', 'A']


In [6]:
# if you do not supply an argument to pop it will remove the last element of the list
dna_sequence.append('the last element')
print(dna_sequence)
dna_sequence.pop()
print(dna_sequence)

['A', 'A', 'A', 'G', 'C', 'G', 'T', 'T', 'T', 'C', 'G', 'A', 'the last element']
['A', 'A', 'A', 'G', 'C', 'G', 'T', 'T', 'T', 'C', 'G', 'A']


In [7]:
# remove will remove the first occurrence of a specific element of the list
dna_sequence.insert(3,"Nucleotide")
print(dna_sequence)
dna_sequence.remove("Nucleotide")
print(dna_sequence)
# pop removes and element using its index, remove removes an element using the value of the element

['A', 'A', 'A', 'Nucleotide', 'G', 'C', 'G', 'T', 'T', 'T', 'C', 'G', 'A']
['A', 'A', 'A', 'G', 'C', 'G', 'T', 'T', 'T', 'C', 'G', 'A']


In [None]:
# reverse will return the list in reverse order
num = list(range(1,11))
print(num)
# reverse works on a list inplace meaning that the output of the reverse function is None
num.reverse()
print(num)

In [None]:
# sort will sort a list in numerical or alphabetical order
print(num)
num.sort()
print(num)
dna_sequence.sort()
print(dna_sequence)

### extend vs append
- We have seen that we can insert a value into a list at a specific postion
- we can also append a value to the end of a list
- What happens if we try to combine two lists?

In [None]:
# append will add an entire list as a single element to the end of the list
my_list = list('abcd')
print(my_list)
my_list.append(list('efgh'))
print(my_list)

We can see that append added the entire list as a single element. This is called a nested list because there is a list inside the list. What if we don't want to append it as a list, but we want each unique element of the list to be appended to the end of the list?

In [None]:
my_list = list('abcd')
print(my_list)
# extend will append each element to the list separately
my_list.extend(list('efgh'))
print(my_list)

In [None]:
# You can also use the + or += operator to combine lists
my_list += [1, 2, 3]
print(my_list)
my_list = my_list + ['x','y','z']
print(my_list)

In [None]:
# You can also mimic append with the + operator
my_list = list('abcd')
# double brackets is a list inside a list
# the outer list has only one element which is the inner list
my_list += [['j','k','l']]
# the result is that the final element of my_list is a list
print(my_list)

In [None]:
# * operator repeats a list
my_list = list('abcd')

# repeat the elements of my_list 3 times
my_list * 3

In [None]:
# if you list contains only numbers you can perform some numeric operations
numeric_list = list(range(25))
print(numeric_list)
sum(numeric_list)

In [None]:
# get the average of the list
sum(numeric_list) / len(numeric_list)

In [None]:
# we can make an average function
def average(n):
    return(sum(n) / len(n))

average(numeric_list)

### Copying list objects
- list functions operate on the list in place
- many list functions return `None` instead of a list because they act on the original list
- `None` is a null object, analagous to NaN in numbers
- you can copy a list with `.copy`
- In MATLAB and other languages if you set `x = y`, `x` will be a copy of `y`
    - Changing `x` will be independent of `y` and vice versa
- In python, if `y` is a list, then `x = y` will assign a variable called `x` to the same list referred to by `y`. 
    - Instead of `x` being a copy of `y`, `x` is `y`
    - This means that `x` and `y` are the same list, so modifying `x` will also modify `y` and vice versa

In [8]:
my_list = list('abcdefgh')
print(my_list)
# new_list will not be the list resulting from the right hand side
new_list = my_list.extend(list('xyz'))
# new_list is None because .extend() operates on the list "in place"
print(new_list)
# my_list has been extended
print(my_list)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
None
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'x', 'y', 'z']


In [10]:
# You can copy lists with .copy()
my_list = list('abcdefgh')
print(my_list)
# assign another variable to my_list
my_list_2 = my_list
# copy the list
my_list_copy = my_list.copy()
# reverse the original list
my_list.reverse()
print('original my_list:')
print(my_list)
# my_list_2 is also reversed because it is the same list as my_list
print('my_list_2:')
print(my_list_2)
# reverse acted on the original list but did not affect the copy
print('my_list_copy:')
print(my_list_copy)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
original my_list:
['h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']
my_list_2:
['h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']
my_list_copy:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']


In [11]:
# is operator tests if left hand side and right hand side are the same object
print(my_list is my_list_2)
print(my_list is my_list_copy)

True
False


<center><big><strong><code>my_list</code> vs <code>my_list_2</code></strong></big></center>
<center><img src="https://i.kym-cdn.com/entries/icons/mobile/000/023/397/C-658VsXoAo3ovC.jpg" alt="" width="400"/></center>

### Slicing
- You can access a range of values in a `list` or `str` with slicing
- `my_stuff[start:end]`
- This will give you the range of list elements `my_stuff[start] <= list elements < my_stuff[end]`
- The last value is not included in the range

#### **Diagram of how indexing and slicing works**
![](https://media.geeksforgeeks.org/wp-content/uploads/List-Slicing.jpg)

In [12]:
my_stuff = ['hello world', 0, 3.3, "another thing", 1024, 'delta']
# Get the first 3 values of my_stuff
my_stuff[0:3]

['hello world', 0, 3.3]

This gives us `my_stuff[0]`, `my_stuff[1]`, and `my_stuff[2]`. However, it **DOES NOT INCLUDE** `my_stuff[3]`

In [13]:
alphabet = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]

# You can slice any range of indices, for example to get the 4th, 5th, and 6th elements:
alphabet[3:6]

# remember indexing starts at zero so the fourth element has index 3
# NOTE that unlike English, in python the comma goes after the quotation mark

['d', 'e', 'f']

In [14]:
# You can slice using negative indices
# another way to get the 4th, 5th, and 6th index
alphabet[-7:-4]

['d', 'e', 'f']

- The fourth index is 7 indices from the end. 
- In general, you should index this way  because this example is less clear and intuitive than using positive indexing, even though both methods are correct. 
- Generally negative values are only used when the slice of a list includes the last element.

### Slicing from the first element
- If you begin indexing from the beginning you can omit the zero
- For example `alphabet[:3]` will return the first 3 elements of `alphabet`

In [15]:
alphabet[:3]

['a', 'b', 'c']

### Slicing up to the final element
- You can omit the second index if you are slicing up until the end
- `alphabet[5:10]`and `alphabet[5:]` are all the same

In [16]:
# get the length of alphabet
len(alphabet)

10

In [17]:
alphabet[5:10]

['f', 'g', 'h', 'i', 'j']

In [18]:
alphabet[5:]

['f', 'g', 'h', 'i', 'j']

In [19]:
# alphabet[5:-1] is not the same because the second index is not included in the range
alphabet[5:-1]

['f', 'g', 'h', 'i']

In [20]:
# you can also express this with:
alphabet[5:len(alphabet)]

# but alphabet[5:] is much simpler and cleaner

['f', 'g', 'h', 'i', 'j']

### Slicing every `n` elements
- You can skip every n elements by adding another colon and number to the slice
- This takes the form `list_name[start:end:increment]`
- For example `alphabet[0:4:2]`

In [21]:
# get every other element of the first 6 elements of alphabet
alphabet[0:6:2]

['a', 'c', 'e']

In [22]:
# get every other element of all of alphabet
alphabet[::2]

['a', 'c', 'e', 'g', 'i']

In [23]:
# get every other element of alphabet beginning with the second element
alphabet[1::2]

['b', 'd', 'f', 'h', 'j']

### Reverse slicing
- You can get the elements in reverse order using a negative increment

In [24]:
# get the reverse of alphabet
alphabet[::-1]

['j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a']

### All indexing and slicing rules also apply to strings

In [25]:
alphabet = 'abcdefghij'
# reverse
print(alphabet[::-1])
# first 4 characters
print(alphabet[:4])
# etc.

jihgfedcba
abcd


### Why does python do zero-based indexing and slicing?
- 1 based indexing is often more intuitive to beginners
    - Some other programming languages such as MATLAB and R use one-based indexing
- One advantage of python's indexing is that it reduces off-by-one errors which are a common computing problem
- In python `list_name[:n]` gets the first *n* elements of a list
    - This is also why the second value in a slice expression is not included in the range of values returned
- However under **one-based indexing** to getting the first *n* elements would be `list_name[1:n+1]` **NOT** `list_name[1:n]`
    - `list_name[1:n]` would actually be the first ***n-1*** elements of the list
- An every day example of off by one errors is dates. 
    - The new millenium technically began in 2001 not 2000 because there was no year 0.
    - In South Korea when a baby is born, they are considered 1 year old instead of 0 years old, so Korean ages are one year greater than non-Korean ages
        - Not really an error if you consider the day someone is born to be their "first" birthday

# Tuples

## Tuples
- Tuples are like lists except that they are **immutable**
- You can reassign values in a list which is why a list is a **mutable** type
- Tuples are lists where the values cannot be changed
- You won't usually use them, but they have a few specific applications where they are needed
- Tuples are defined with parenthese

In [26]:
# define a tuple
my_tuple = ('a','b','c')
print(my_tuple)

('a', 'b', 'c')


In [None]:
# tuples are immutable, trying to change their value will cause an error
my_tuple = ('a','b','c')
try:
    my_tuple[1] = 'z'
except Exception as ex:
    print(ex)

In [None]:
# you can convert a list to a tuple
my_list = list('abcdef')
print(my_list)
my_tuple = tuple(my_list)
print(my_tuple)

# range

### range
- range is a special object that contains a sequence of numbers
- it is expressed as `range(start,stop,increment)`
- Python indexing rules apply to start and stop

In [None]:
# The numbers 1 to 10
my_range = range(1,11,1)
print(my_range)

In [None]:
# To be able to see the values in the range we need to convert the range to a list
my_range_list = list(my_range)
print(my_range_list)

In [None]:
# get the first 10 numbers
first10 = range(10)
print(list(first10))

## Dictionaries

### Dictionaries
- Dictionaries (also called dict) are an incredibly useful object that matches pairs of keys and values
- Every key in a dictionary must be unique, but the same value can be associated with multiple keys
- Dictionary keys must be immutable objects, such as:
    - int
    - float
    - string
    - tuple
- Dictionaries are defined with curly brackets, and colons separating keys and values

In [27]:
# define a dictionary
my_dict = {"first name": "Luke", "last name": "Skywalker", "age": 19, "home planet": "Tattooine"}
print(my_dict)

{'first name': 'Luke', 'last name': 'Skywalker', 'age': 19, 'home planet': 'Tattooine'}


### Indexing dicts
- Dictionary keys are analagous to list indices
- Just like with lists you use square brackets to access the value associated with a key

In [28]:
# get values from my_dict
first_name = my_dict["first name"]
print(first_name)

Luke


In [29]:
# you can also index a dict with a variable just like lists
field = 'last name'
print(my_dict[field])

Skywalker


In [30]:
# We can also access the value of a key with the get method
my_dict.get('last name')

'Skywalker'

In [31]:
# Dictionaries can be elements of a list
luke = my_dict
leia = {"first name": "Leia", "last name": "Organa", "age": 19, "home planet": "Alderaan"}
my_list_of_dicts = [leia, luke]
print(my_list_of_dicts)

[{'first name': 'Leia', 'last name': 'Organa', 'age': 19, 'home planet': 'Alderaan'}, {'first name': 'Luke', 'last name': 'Skywalker', 'age': 19, 'home planet': 'Tattooine'}]


In [32]:
# You can chain indexing in a list of lists or a list of dictionaries
# get the value from key "home planet" in the first dictionary in the list
planet = my_list_of_dicts[0]["home planet"]
print(planet)

Alderaan


In [33]:
# You can access all keys in a dict
leia.keys()

dict_keys(['first name', 'last name', 'age', 'home planet'])

In [34]:
# you can also access all the values in a dict
leia.values()

dict_values(['Leia', 'Organa', 19, 'Alderaan'])

In [35]:
# you can get the key value pairs with items
leia.items()

dict_items([('first name', 'Leia'), ('last name', 'Organa'), ('age', 19), ('home planet', 'Alderaan')])

In [36]:
# You can replace the value associated with a key in a dict
luke['age'] = 22
print(luke)

{'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine'}


In [37]:
# You can add a new key and value to an existing dict
luke['space ship'] = 'X-wing'
print(luke)

{'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine', 'space ship': 'X-wing'}


The `luke` dict object has been modified, but `luke` is also contained in `my_list_of_dicts`. We can confirm that the dictionary at index 1 in my_list_of_dicts has been updated because it is the same object in you computer's memory as `luke`.

In [38]:
print(my_list_of_dicts)

[{'first name': 'Leia', 'last name': 'Organa', 'age': 19, 'home planet': 'Alderaan'}, {'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine', 'space ship': 'X-wing'}]


We can see that Luke's age has been modified and the new key-value pair of space ship and X-wing have been added.

In [39]:
# pop removes a key and its corresponding value
ship = luke.pop('space ship')
print(luke)
print(ship)

{'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine'}
X-wing


In [40]:
# popitem removes the last inserted item
luke['space ship'] = 'X-wing'
print(luke)
last_item = luke.popitem()
print(last_item)
print(luke)

{'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine', 'space ship': 'X-wing'}
('space ship', 'X-wing')
{'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine'}


In [41]:
# You can use update to add key value pairs from another dict to a dict
jedi = {'title':'jedi knight', 'mentor': 'Yoda'}
luke.update(jedi)
print(luke)

{'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine', 'title': 'jedi knight', 'mentor': 'Yoda'}


In [42]:
# setdefault will return the value of a key or insert a value if the key does not exist
master = jedi.setdefault('mentor','Obi-wan Kenobi')
print(master)
# "Obi-wan Kenobi" is not set as the value of "mentor" because "mentor" already exists as akey

Yoda


In [43]:
# however the key "training location" does not exist
place = jedi.setdefault('training location', 'Dagobah')
print(place)
print(jedi)

Dagobah
{'title': 'jedi knight', 'mentor': 'Yoda', 'training location': 'Dagobah'}


In [44]:
# If the key exists you do not need to specify a default value
jedi.setdefault('training location')

'Dagobah'

In [45]:
# If the key does not exist setdefault will use None as the value
color = jedi.setdefault("lightsaber color")
print(color)
print(jedi)

None
{'title': 'jedi knight', 'mentor': 'Yoda', 'training location': 'Dagobah', 'lightsaber color': None}


In [46]:
# Note that even though we updated the jedi dict, 
# it does not change the luke dict because those key value pairs were copied over to luke
print(luke)

{'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine', 'title': 'jedi knight', 'mentor': 'Yoda'}


In [47]:
# when it comes to copying, dicts behave similar to lists
dude = luke
guy = luke.copy()
luke['first name'] = 'Anakin'
print(luke)
# the value of "first name" has also been updated in dude because it is the same dict as luke
print(dude)
# however guy was a copy so it is unchanged
print(guy)

{'first name': 'Anakin', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine', 'title': 'jedi knight', 'mentor': 'Yoda'}
{'first name': 'Anakin', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine', 'title': 'jedi knight', 'mentor': 'Yoda'}
{'first name': 'Luke', 'last name': 'Skywalker', 'age': 22, 'home planet': 'Tattooine', 'title': 'jedi knight', 'mentor': 'Yoda'}


# Sets

## Sets
- Sets in python are like sets in math
- They are collections of objects like lists
- Every element in a set must be unique and immutable
- Sets are defined with curly brackets
- Unlike lists sets are unordered

In [48]:
# define a set
nucleotides = {"A", "T", "C", "G"}
print(nucleotides)

{'A', 'G', 'C', 'T'}


In [49]:
# you can convert an object to a set with "set"
nucleotides = set("ATCGCAGTCCCAAAGGCGCG")
print(nucleotides)

{'G', 'C', 'T', 'A'}


### Set operations
- you can add or remove elements to a set
- you can perform mathematical set operations such as union, difference, etc

In [50]:
# add an element to a set
nucleotides.add("U")
print(nucleotides)

{'G', 'A', 'U', 'T', 'C'}


In [51]:
# remove an element from a set with remove or discard
# remove will cause an error if the object to remove is not in the set
# discard will not cause an error and simply do nothing if the object is not in the set
nucleotides.remove('U')
print(nucleotides)

{'G', 'A', 'T', 'C'}


In [53]:
# U is not part of the set anymore
nucleotides.discard("U")
print(nucleotides)
# nothing happened

{'G', 'A', 'T', 'C'}


In [54]:
# pop removes a random element from the set
n = nucleotides.pop()
print(n)
print(nucleotides)

G
{'A', 'T', 'C'}


## Set operations

In [55]:
# Union: join sets, 2 ways 
# I prefer functions because it's easier to remember what they do
s1 = {1, 2, 3}
s2 = {4, 5, 6}
# union operator
u = s1 | s2
print(u)
# union function
u2 = s1.union(s2)
print(u2)

{1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5, 6}


In [56]:
# You can join multiple sets
s3 = {100, 200, 300}
u = s1 | s2 | s3
print(u)
u2 = s1.union(s2,s3)
print(u2)

{1, 2, 3, 4, 5, 6, 100, 200, 300}
{1, 2, 3, 4, 5, 6, 100, 200, 300}


In [57]:
# intersection
s1 = set(range(0,11))
print(s1)
s2 = set(range(6,16))
print(s2)
# intersection operator
u1 = s1 & s2
print(u1)
u2 = s1.intersection(s2)
print(u2)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
{6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
{6, 7, 8, 9, 10}
{6, 7, 8, 9, 10}


In [58]:
# difference is the set of all elements in one set but not the other
diff = s1 - s2
print(diff)
diff2 = s1.difference(s2)
print(diff2)

{0, 1, 2, 3, 4, 5}
{0, 1, 2, 3, 4, 5}


In [59]:
# symmetric difference: elements of s1 that are not in s2
# and elements of s2 that are not in s1
sd = s1 ^ s2
print(sd)
sd2 = s1.symmetric_difference(s2)
print(sd2)

{0, 1, 2, 3, 4, 5, 11, 12, 13, 14, 15}
{0, 1, 2, 3, 4, 5, 11, 12, 13, 14, 15}


In [60]:
# test if 2 sets do not have any elements in common
s1.isdisjoint(s2)

False

In [61]:
# test if 2 sets do not have any elements in common
{1, 2, 3}.isdisjoint({4, 5, 6})

True

In [62]:
# subsets - test if all elements of one set are completely contained in another
dna = {'A','T','C','G'}
nucleic_acids = {'A','T','G','C','U'}
# using the <= operator
print(dna <= nucleic_acids)
# using the issubset function
print(dna.issubset(nucleic_acids))

True
True
