# Data Analysis

# Python Intro, Part 3: Tuples, Sets, Dictionaries, Classes/Objects

In this notebook we will, after a quick recap, introduce more data structures: tuples, sets, dictionaries, and series. Sets and dictionaries are built-in Python data structures, but we will mostly work with series and dataframes which are part of the [pandas library](http://pandas.pydata.org/) tailored to data science applications. We will also briefly introduce objects and object oriented programming. 

## If-Statements Recap

* By using if-elif-else statements we can implement conditional flow in a program. 
* If an elif take a parameter that is tested for truthiness. The parameter can be a boolean (`True` or `False`), or any other data type. 
    * For numerical data types, 0 and None evaluates to false, everything else to true.
    * For lists, strings, dictionaries, an empty container evaluates to false.

See the [documentation](https://docs.python.org/3/library/stdtypes.html#truth-value-testing) to understand what is considered true and false.

In [None]:
def factors(x):
    # notice the use of the negation and the use of 0 as false
    if not x % 2:
        print("2 is a factor of " + str(x))  
    elif not x % 3:     # only evaluated when if was false
        print("3 is a factor of " + str(x))
    else: # only evaluated when both if and elif were false
        print("Neither 2 nor 3 are factors of " + str(x))

factors(4)
factors(9)
factors(13)


## Lists Recap

A list is a collection of items.

Lists are created with square brackets `[]` and can be accessed via an index:

In [None]:
beatles = ["Paul", "John", "George", "Ringo"]
# printing the whole array
print(beatles)
# printing the first element of that array, at index 0
print(beatles[0])
# fourth element, at index 3
print(beatles[3])
# access the one-but-last element
print(beatles[-2])

We can also create **slices of an array with the slice operator `:`**

```python
a[start:end] # items start through end-1
a[start:]    # items start through the rest of the array
a[:end]      # items from the beginning through end-1
a[:]         # a copy of the whole array
```
There is also the step value, which can be used with any of the above:

```python
a[start:end:step] # start through not past end, by step
```

In [None]:
# Get the slice from 0 (included) to 2 (excluded)
beatles[:2] # this can also be written as [0:2]

The slice operation returns a new array, the original array is untouched: 

In [None]:
beatles

**We can change the elements that are contained in a list**: 

In [None]:
beatles[1] = "JohnYoko"
beatles

Lists can also be **extended in-place with the `append()` function**:

In [None]:
beatles.append("George Martin")
beatles

Lists can be **concatenated**: 

In [None]:
zeppelin = ["Jimmy", "Robert", "John", "John"]
beatles += zeppelin
beatles

We can **check the length** of a list:

In [None]:
len(zeppelin)

Lists can also be **nested**: 

In [None]:
# let's reset the beatles first
beatles = ["Paul", "John", "George", "Ringo"]
bands = [beatles, zeppelin]
bands

### While Loop Recap

While loops use the `while` keyword, a condition, and the loop body:

In [None]:
a, b = 1, 1
while b < 1000:
    print(b, end=", ") 
    temp = b
    b += a
    a = temp
    # a better way of writing this is using simultaneous assignment: 
    # a, b = b, a + b

This continues, until the terminating condition is reached. 

We can also **use the `break` statement to terminate a loop**: 

In [None]:
a, b = 1, 1
while True:
    print(b, end=", ") 
    a, b = b, a + b
    if b > 1000:
        break

### For Loop Recap


For loops are mainly used to iterate over items of a sequence. 

In [None]:
for member in zeppelin: 
    print(member)

When you want to iterate over a sequence of numbers, use the [`range()`](https://docs.python.org/3/library/stdtypes.html#range) function. Range generates a sequence of numbers:

In [None]:
for i in range(10): 
    print (i)

We can use the `range(start, stop, step)` attributes to create custom indices:

In [None]:
for i in range (9, -1, -1):
    print (i)

### List Comprehension Recap

[List comprehension](https://docs.python.org/3.5/tutorial/datastructures.html#list-comprehensions) is a compact way to modify or create lists in python. 

Here is the syntax: 

```python
[new_element for original_element in original_list] 
```
 * `new_element` can be any expression, including just a number, or an operation. You can also refer to the `original_element` in this expression. 
 * `original_element` is just a member of the original list. Sometimes we don't need this original element, then we just write `_` by convention. 
 * `original_list` is a list that I use as a basis for the list comprehension. This can be an existing list with data, or it can be a range expression. 
 
Here are some examples. Notice that you would just use a plain range function in practice for some of these examples.  

In [None]:
my_list = list(range(10))
my_list

Initialize an list with 0s. We don't use the `original_element`, hence `_`.

In [None]:
[0 for _ in my_list]

We can do the range in the list comprehension directly: 

In [None]:
[0 for _ in range(10)]

Here we use the `original_element`. This effectively copies the list. 

In [None]:
[original_element for original_element in my_list]

Here we use an operation on the original_element in the list. 

In [None]:
[original_element*2 for original_element in my_list]

Here I'm combining a slicing operation on `my_list` that inverts the order of `my_list` with a list comprehension. 

In [None]:
[original_element*2 for original_element in my_list[::-1]]

We've seen before that we can use functions in the list comprehension: 

In [None]:
[len(my_list) for _ in my_list]

Or, with a more interesting example:

In [None]:
import random
rands = [random.random() * 10 for _ in range(10)]
rands

## 1. Tuples

[Tuples](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences) are a list-like data structure that are, in contrast to lists, **immutable**. 

The purpose of tuples is to store objects of different types. Remember that lists should only contain **homogeneous data** and numpy lists even enforce that; Tuples are designed for the **heterogeneous case**. 

Also, Tuples have practical implications for performance and `HashTables`, which we will discuss later. 

Here is how we can initalize a tuple: 

In [None]:
person = "Alex", 1981, "Computer Science"
person

Initialization with brackets is prefered, since it's more explicit:

In [None]:
person = ("Alex", 1981, "Austria")
person

We can access them just like arrays: 

In [None]:
person[1]

We cannot, however change values. This throws a **TypeError**.

In [None]:
# throws TypeError
person[1] = 1985

Arbitrary objects can be part of a tuple:

In [None]:
train_schedule = ("Train 1", [9,11])
# this works because we're modifying the mutable array within the immuatable tuple.
train_schedule[1][0] = 15
train_schedule

Of course, that includes tuples:

In [None]:
train_schedule = ("Train 1", (9,11))
# this doesn't work
# train_schedule[1][0] = 15
train_schedule

This allows us to create functions with **multiple return values**

Consider the following code:

In [None]:
def multiply(a, b, c):
    return (a*b), (a*c), (b*c), (a*b*c)

Here, it looks like we return multiple values - something that's not possible in most programming languages! But it's very convenient. In practice, we "only" return a tuple. 

Let's try it out:

In [None]:
multiply(3, 7, 11)

The round brackets in the returned values indicate what's going on: what is returned, is in fact, a tuple!

We can use this return value to assign multiple variables at the same time:

In [None]:
ab, ac, bc, abc = multiply(3, 7, 11)
print(ab, ac, bc, abc)

To do this, no function is necessary. We can just do the following:

In [None]:
what, i_s, going, on = "this", "is", "really", "nice" # use () to be more explicit
print(what, i_s, going, on)

## 2. Sets

A [set](https://docs.python.org/3/tutorial/datastructures.html#sets) is a mutable collection, similar to a list, however, it is
 * **not ordered**, and
 * **cannot contain the same element twice**

Here is an example:

In [None]:
# Initialize a set with {}
beatles = {"John", "Paul", "Ringo", "George"}
beatles

Notice that on my machine, the output is in a different order from the input: 
`{'George', 'John', 'Paul', 'Ringo'}`

We can also initalize a set with an array or a tuple:

In [None]:
usernames = set(["Jimmy", "Robert", "John", "John"])
usernames

We've initialized the set `usernames` with an array of names. We have chose a set, because we don't want to have duplicate user names. 

However, **in the second example, the array included a duplicate – John was specified twice**. We can see, however, that **John is contained in the set only once**.

Sets are great for various tasks. For example, they can be used to remove duplicate entries from lists. Most importantly, they let you very efficiently check whether an element already exists. 

A set works based on a mathematical function that produces a "hash code". This hash code is then used as an index to an array. For example, "Jimmy" could hash to the value 13, and accordingly, Jimmy would be put at the 13th index of an array. When we want to test whether "Jimmy" is already in a set, we simply compute the hash, which will again produce 13, and then look up whether something is stored at index 13. 

We can check whether a set contains a value using the `in` keyword:

In [None]:
"Jimmy" in usernames

In [None]:
"Ringo" in usernames

Note that this also works in lists, but if your set or list is large, this is considerably slower. 

In [None]:
username_list = ["Jimmy", "Robert", "John", "John"]
"John" in username_list

We can add values using the add function on a set:

In [None]:
usernames.add("JohnB")
usernames

And remove elements with the remove function: 

In [None]:
usernames.remove("John")
usernames

If the set doesn't contain a key we want to remove, it will throw a `KeyError`.

In [None]:
usernames.remove("Joseph")

To prevent that, it is advisable to first check whether a set actually contains a value, if you're not 100% sure: 

In [None]:
if ("Joseph" in usernames):
    usernames.remove("Joseph")

We can iterate over the values of a set. Note, however, that no guarantee about the order of the set is made. 

In [None]:
for name in usernames:
    print (name)

Make sure to check out the [documentation](https://docs.python.org/3.5/library/stdtypes.html#set) to see what else a set can do. 

## Exercise 2: Sets

Write a function that finds the overlap of two sets and prints them. Initialize two sets, e.g., with values `{13, 25, 37, 45, 13}` and `{14, 25, 38, 8, 45}` and call this function with them.

In [None]:
# your code

## 3. Dictionaries

[Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) are related to sets, but are more powerful: in addition to the key used to identify an element in a set, dictionaries also store a value associated with a key. Other terms commonly used for dictionaries are *associative arrays*, *(hash) maps*, and *hash tables*. 

Here is a simple example:

In [None]:
musicians = {"John":"Zeppelin", "Jimmy":"Zeppelin", "Paul":"Beatles", "Ringo":"Beatles"}
musicians

As we can see, a dictionary can be created with curly brackets and a list of key-value pairs, separated by a `:`. Here, the names are the keys, the bands are the values. 

There are other ways of creating a dictionary. Here, we pass a list of tuples to the dictionary, but we could also pass a list of lists.

In [None]:
more_musicians = dict([("Thom", "Radiohead"), ("Dave", "Foo Fighters")])
more_musicians

Of course, a dictionary can be of any data type. Here is an example with int as keys, floats as values:

In [None]:
numbers = {3:1.45, 4:1.32, 19:9.97, 6:9.99}
numbers

Note that it's generally not a good idea to use floats as keys, as they are stored only as approximations.

Dictionary elements are accessed just as elements in a list, with square brackets, but instead of the index, we pass in the key: 

In [None]:
numbers[3]

In [None]:
musicians["John"]

We can add elements to a dict:

In [None]:
musicians["Thom"] = "Radiohead"
musicians

And remove them using the `del` keyword:

In [None]:
del musicians["Thom"]
musicians

Again, we have to worry about key errors. If we want to remove Thom again, we'd get a `KeyError`.

In [None]:
del musicians["Thom"]

We can access a list of keys and values separately: 

In [None]:
musicians.keys()

Notice that the result is not a list or a set, but a [view object](https://docs.python.org/3/library/stdtypes.html#dict-views). A view object always is updated when the dictionary is changed, and we can use it to iterate over a dictionary. 

In [None]:
for musician in musicians.keys():
    print(musician)

This also works with `values()` and `items()`:

In [None]:
musicians.values()

In [None]:
musicians.items()

The latter is especially handy for iterating over the key-value pairs in a dictionary:

In [None]:
# notice that we iterate over the tuples and have the elements of the tuple assigned to k and v, respectively.
for k, v in musicians.items():
    print (k + ", Band: " + v)

Another way to write the previous expression would be like this: 

In [None]:
for k in musicians.keys():
    print(k + ", Band: " +  musicians[k])

Make sure to check out [the dictionary documentation](https://docs.python.org/3/library/stdtypes.html#typesmapping) for more info. 

### Exercise 3: Dictionaries

 * Create a dictionary with two-letter codes of two of US states and the full names, e.g., UT: Utah, NY: New York
 * After initially creating the dictionary, add two more states to the dictionary.
 * Create a second dictionary that maps the state codes to an array of cities in that state, e.g., UT: [Salt Lake City, Ogden, Provo, St. George]. 
 * Write a function that takes a state code and prints the full name of the state and lists the cities in that state.

In [None]:
# Your code

## 4. Classes and Objects
*Note that this is not a detailed introduction into OOP and we glance over a lot of subtleties and use terminology loosely*

We won't be actively doing much object-oriented programming in this class, but we will frequently use objects as they are returned by a library. 

**Objects** are a data-structure that you can customize completely. They also provide interfaces to manipulate that data. 

[**Object oriented programming**](https://en.wikipedia.org/wiki/Object-oriented_programming) is one of the most commonly used programming paradigms. It's based on bundling data together with functionality, i.e., it's a combination of a data structure and functions – called **methods** – that operate on the data of an object. 

**Classes** are templates or data types for **objects**. An object of a class is also called an **instance** of that class. 

Let's define a class:


In [None]:
class Person: 
    # a class variable, shared by all instances
    name = "blank"
    
    # a method setting the value of a member
    def set_name(self, name):
        # write the parameter self to the member variable name
        # both, "self" and "name" are arbitrary terms
        self.name = name
    
    # a method that does something, without a variable
    def print_name(self):
        print("Name:", self.name)

Notice the use of the `class` keyword to define the class. 

Methods are defined just like functions, but they have the `self` variable. The name of that variable is actually not relevant, but it's customary to call it `self`. This is a reference to the specific instance. You don't specify that variable when you call the method, it's provided for you automatically based on the object you're calling. 

Here, we instantiate that class and set a parameter via a method; then use the `print_name()` method: 

In [None]:
ringo = Person()
# method without parameter
ringo.print_name()
# call a method with a parameter
ringo.set_name("Ringo")
ringo.print_name()
# accessing a class member
ringo.name

Here, we create a different person: 

In [None]:
paul = Person()
paul.set_name("Paul")
paul.print_name()
ringo.print_name()

If we ask for the data type of our ringo variable, we'll see that it's an instance of our class:

In [None]:
type(ringo)

We can also use a shorthand to initialize objects with the required variables. We use the `__init__` method (the name matters here) to do that. 

In [None]:
class Musician: 
    # instantiation operation
    def __init__(self, name, instrument):
        # an instance variable, specific to that instance
        self.name = name
        self.instrument = instrument
    
    def print_musician(self):
        print(self.name, "plays", self.instrument)

With this definition, we create an object and at the same time specify its parameters. 

In [None]:
ringo = Musician("Ringo", "Drums")
ringo.print_musician()
ringo.instrument

If we have that signature, we also do have to use it. This will fail: 

In [None]:
# will throw a Type Error because we didn't use the proper signature
paul = Musician()

A workaround is to use default values: 

In [None]:
class Musician: 
    def __init__(self, name="default", instrument="default"):
        # an instance varaible, specific to that instance
        self.name = name
        self.instrument = instrument
    
    def print_musician(self):
        print(self.name, "plays", self.instrument)

In [None]:
paul = Musician()
paul.print_musician()

There is more to OO than what we covered here. For example, inheritence is a common paradigm that's also supported in Python. But what we've learned is enough to use objects that are provided by the libraries we'll be using. You can learn more from the [official documentation](https://docs.python.org/3/tutorial/classes.html). 

## 5. Working with Modules

While we briefly touched on modules (remember the `import math` statement), we haven't really talked about what a module is. Modules are used, ugh, to modularize code. You can write a module by simply creating a `.py` file. We won't be writing many modules ourselves, but we will use them extensively.

To import a module simply write

```python
import module_name
```

You can then use functions defined in the module with the `.` notation. Here's an example:

In [None]:
import math
math.sqrt(9)

We can also use the `from` notation to import specific functions from a package and add them directly to the namespace:

In [None]:
from math import log10
# notice that this is NOT accessed via math.log10()
log10(3)

You can also bulk-import all functions of a module into your local namespace, however, this is **strongly discouraged**, as it can lead to name-clashes and makes your code unreadable eventually.

In [None]:
from math import * 
log2(3)

Finally, we can redefine the name of a module. This is useful to define a shorthand for long library names.

In [None]:
import math as m 
m.sqrt(13)