# Python Review I

This notebook goes over material covered in [Chapter 3 of Python for Data Analysis](https://wesmckinney.com/book/python-builtin). Use **Code** cells to write and run any code you need to answer the question and **Markdown** cells to write out answers in words. After you are finished with the assignment, remember to download it as an **HTML file** and submit it in **ELMS**.

## Objects
Recall that you create object in Python by giving them **names**. You put the name you want to give an object, followed by an = sign, followed by the information you want in that object. For example, 
`x = 4` 
will create an object called `x` and assign it the value 4. Names must/should adhere to the following:
* Be brief and informative. 
* Start with a character. 
* Be unique (unless you want to overwrite contents). 
* Cannot contain spaces (use underscores instead).

For example, `4numbers` is not a valid name, but `numbers4` or `four_numbers` is. 

In [None]:
number = 5

To retrieve the contents of an object you've created, you can use the `print()` function or type the object name. Be warned: **These do not do the same thing, even though they appear to.**

**Reminder:** The # symbol in Python will initiate a comment in Python. Anything after # in the same line will not be interpreted. 

In [None]:
x = 20 # create an object called x

In [None]:
print(x) # this returns a printed output

In [None]:
x # this returns the actual object, which you can interact with

**Note**: If you want to view the contents of multiple objects in the same cell, you will need to **print** each one. The semi-colon `;` allows us to stack multiple commands on the same line. 

In [None]:
print(number); print(x)

# Data Types 

The values you assign to objects have specific properties and a **data type**. There are a few different data types in Python, though the breakdown and characteristics of these data types are not as "neat" as other languages (e.g., R). The core/basic data types in Python are: 
* **Float**: All real numbers. 
* **Integer**: Whole numbers. This is a more efficient way of storing integer data, though not much else changes in what you can do between float and integer data.
* **String**: Character data, whether letters, sentences, or novels. String data must be enclosed in quotation marks, whether they be single ' or double ". (It just has to match.)
* **Boolean**: Logical data that take on the value `True` or `False`. (The casing of these letters is essential.)

If you're not sure what a type of data are in an object, you can check using the `type()` function.

**Note:** The ; operator allows us to put multiple Python commands on the same line. You cannot have multiple object assignment statements on the same line without a semi-colon. 

In [None]:
n_data = 4.5; print(type(n_data))
i_data = 4; print(type(i_data))
s_data = "4"; print(type(s_data))
b_data = True; print(type(b_data))

<font color = 'red'>**Question 1: Create each of the data types above (name them datatypename_data). Check the data type and try performing addition, multiplication, and division to the data and see what happens.**</font>

Understanding how Python classifies different observations and what you can do to those observations will prevent many errors (or cause them if you do not understand these differences well). Mastering this takes some patience, practice, and curiosity. It's okay to try and "break" Python. I try to do it all the time and learn something new! 

## Data Structures 

Python has many ways to store **data**. They each have different properties, and thus will be useful in their own ways. To work with data structures, we will need to import a couple of modules:

In [None]:
import numpy as np
import pandas as pd

For example, say we  want to look at the heights of everyone in a group of 100 people. In that case, we might store the heights data in an object called a *list* (or some other type of sequence) with 100 values. However, as we'll see, if we want to convert everyone's heights into diffent units, a list might not actually be the best means of storage. 

In this class, we will use many different ways to store sequences:
* sets
* tuples
* dictionaries
* lists
* arrays (from *numpy*)
* Series (from *pandas*)

**Note**: They are sometimes interchangeable, so it can be easy to get mixed up. If you are getting an error, check what type of object you have using `type()`. 

In [None]:
nums = [1,2,3]
type(nums)

The properties of objects in Python vary on a few key metrics, including: 
* **Mutability**: Whether the structure can be altered or not. (i.e., Can you add, remove, or re-assign elements without creating a new object?)
* **Ordered**: Whether the position of the items is preserved. (i.e., Does the object "remember" the order you entered the items? Will the items always appear in the same sequence?)
* **Indexable**: Whether you can access a specific element using a key or an integer position. (e.g., Can you access a specific item using square brackets []?)
* **Allows duplicates**: Whether the object allows multiple instances of the same value. (e.g., If you add the number `3` to the object three times, does `3` appear once or three times?)

Let's see how the objects above compare on these dimensions. Understanding the ins and outs of these objects is paramount to improving your workflow, preventing errors, and ensuring good coding hygiene. 

## Sets

**Sets** are mutable, but not ordered, indexable, or allow duplicates. You initiate the creation of sets using {}.

Sets are useful, for example, if you want to find all the unique values in a particular variable. You can covert the object to a set and it will delete all the duplicates. 

Sets are not very popular objects and we won't use them often this semester. 

In [None]:
# create sets and observe the difference

set1 = {1,1,1,1,1};print(set1); print(type(set1))
# notice only one value prints. Why?

set2 = {'a', 1, 2, True};print(set2); print(type(set2))
# notice that True is removed. Why? 
# knowing how data types are handled will help you greatly. 

# generates an error because sets are not indexable
# set2[2]

You can return how many elements are in a sequence using the `len()` function. 

## Tuples

**Tuples** are fixed-length, immutable sequences. 
* Fixed-length: Cannot change length in place.
* Immutable: Cannot change values in place.

You initiate tuples using ().

We'll use tuples a lot with functions to output and store multiple values. Note that we would not generally use tuples to store data within loops because they are fixed-length and immutable. 

In [None]:
tup = (1,2,3)
print(tup)
print(type(tup))

You can access individual values of a tuple using the **index** of that value. This will return the THIRD element in the object because TWO elements come before the third element. **Remember, Python starts indexing at 0!** (This is not intuitive for some people, especially those coming from 1-based indexing languages like R.)

In [None]:
tup[2]

Note that you can use negative numbers to count backwards from the end. So, the last value would be the "-1" index, the second to last would be the "-2" index, and so on.

In [None]:
tup[-1]

Generally, Python interprets values separated by commas to be tuples, even if they don't have the associated parentheses with them. 

In [None]:
tup2 = 1, 2, 3
print(tup2)
print(type(tup2))

Tuples are immutable, meaning they cannot be changed in place. You will need to create a new tuple with new position values to "replace" them. 

In [None]:
# This does not work
tup2[2] = 5

You can also "unpack" tuples. This gives us an easy way of accessing individual elements of a tuple without necessarily needing to assign everything individually. 

In [None]:
a, b, c = (1, 2, 3)
# print a, b, and c below


We'll see this happening with functions and having multiple outputs that we want to unpack.

<font color = 'red'>**Question 2a: Consider the tuple below. What are the number of elements in the tuple? How would you access the value of `'example'`?**</font> 

Think about the answers to these questions. Try using the `len` function to get the answer to the first question. Is this what you expected? 

**Hint**: You can "stack" indices. 

In [None]:
some_tuple = (1,2,3), ('this','is','an','example')

You can return objects as different objects using the `tuple()`, `set()`, etc. functions that correspond to the object you want. Let's convert the set in the first section to a tuple. 

In [None]:
tup_ex = tuple(set2)
tup_ex

<font color = 'red'> **Question 2b: Consider the tuple of GPAs below. Say I want to obtain the unique observations—how could I do that only converting objects? (Make sure the final output is a tuple of the unique values.) How many observations were removed?** </font> 

In [None]:
gpas = (4.0, 3.74, 3.74, 2.0, 3.8, 4.0, 3.0, 3.1, 3.14, 4.0)



## Lists

Lists are variable length and mutable. This means you can add or change values within lists. This makes lists very useful for storing values when using a loop, especially since lists can contain a variety of objects. Lists are one of the most common ways of storing sequences that we'll use, so make sure you are familiar with the properties of lists!

**Note**: You initiate lists with [].

In [None]:
# create a list
example = [1, 2, 3]
print(example) # return the contents
print(type(example)) # check the object type

Just like tuples, you can access individual values of the list using the **index** of the value with the bracket notation. 

In [None]:
# returns the first element since 0 elements come before it
example[0]

However, unlike tuples, lists are mutable.

In [None]:
# we can change specific values in lists
example[0] = 3
example

You can also create lists by using the `list` function. The same was true using `tuple` to turn something into a tuple.

> This might be useful if you have a tuple of some sort and need it to have properties of a list. For example, if you want to change some values within a tuple and need it to be mutable, then converting it to a list might make sense.

In [None]:
print(list((1,2,3))) # turns the internal tuple into a list

# it also works for objects
x = (1,2,3)
print(list(x))

In [None]:
tuple([1,2,3]) # turns the internal list into a tuple

# it also works for objects
y = [1,2,3]
print(tuple(y))

<font color = 'red'>**Question 3: Consider the list below. What are the number of elements in the list? What types of objects are in the list? How would you access the value of `'example'`? Can you change the value of `'an'` to `'another'`?**</font>

**Hint**: You can also stack indices for lists.

In [None]:
some_list = [(1,2,3), ('this','is','an','example'),[1, 2, 3]]

### Notes about lists 
* Indexed positionally, and therefore has a notion of position/order.
* This allows you to do things like sort, find by position (e.g., "first" or "last").
* Mutable, so you can change the values inside. 
* The `+` operator combines lists.

In [None]:
[1, 2, 3] + [4, 5, 6]

### Some List methods
Recall that methods belong to specific objects. Below, you can see some methods that belong to lists. 

|Method | Description|
|---|---|
|.append() | Append a value to the list|
|.count() | Returns the number of elements with the specified value|
|.pop() | Remove an element at the specified position|
|.sort()|Sort the list|

The `append` method is commonly used with loops. We first initialize an empty list using `[]`, then append values as we go through the loop. **Note:** We will go over loops in more detail next lecture. 

In [None]:
output = []
for i in range(10):
    output.append(i * 2)
output

The `pop` method does the opposite of `append` and removes an element at the specified position and outputs that element.

In [None]:
example = [1, 2, 3]
example.pop(1)

In [None]:
example

<font color = 'red'>**Question 4: Create a list that contains all the powers of 3 starting with 3^0, all the way up to 3^10. Call this list `powers_of_three`.**</font>

<font color = 'red'>**Question 5: Calculate the mean of `powers_of_three` from the previous question.**</font>

**Hint:** You can use the `np.mean` function from numpy for this.

## Dictionaries

A **dictionary** stores a collection of *key-value* pairs. Each key is associated with a value, and you can access the values that are stored within dictionaries by using its key. Keys and values can be any type of object. 

Intuitively, a dictionary works similar to how a dictionary works in real life, with words. You look up a word (a key) to find the definition (the value). In the same way, we can access the data inside a dictionary by looking up the key associated with that piece of data.

Dictionaries are useful because they allow us to store data using informative keys. Rather than trying to remember which position we happened to have decided to use to store a particular attribute (if we used lists), we can use semantically meaningful indices for values, i.e., keys!

In [None]:
example_dict = {'a': (1, 2, 3), 'b': (2, 3, 4)}

The key parts of a dictionary are:

- The `{ }` curly braces which indicate that it's a dictionary (similar to `""` for strings, or `[]` for lists)
- Each entry has maps a value on the right of a `:` to a key on the left. For example, our first entry maps the value `(1,2,3)` to the key `'a'`.
- We include commas, similar to lists, to separate entries in the dictionary.

Dictionaries are very flexible, and the objects stored inside a dictionary do not all need to be the same type.

In [None]:
example_dict2 = {'a': (1, 2, 3), 'b': 'some text', 3: 'c' }

<font color = 'red'>**Question 6: What type of object is `example_dict['a']`? What about `example_dict2['b']`?**</font>

Note: Unlike lists, dictionaries are **not ordered**. They are organized according to their *keys*, rather than by *indices*. 

<font color = 'red'>**Question 7: Consider the tuple `tuple_of_numbers` below. Create a dictionary called `even_odd_dict` with two <u>*lists*</u>: a list of even numbers from `list_of_numbers` and a list of odd numbers from `list_of_numbers` . Use the keys `even` and `odd` for these two lists. Write your code that you aren't hard-coding the values in `tuple_of_numbers` so that if you were given another sequence, it would also separate the odd and even numbers.**</font>

*Hint:* One useful operator might be `%` which is the modulo operator—the remainder of division. 

In [None]:
tuple_of_numbers = (1, 2, 2, 3, 4, 5, 17, 4, 12, 0, 32, 23, 15, 12, 121, 44, 21, 53) 

<font color = 'red'>**Question 8: Check your code above by trying creating a list with integers from 1 to 100 and running the same code you did above for that list.**</font>

## List Comprehension

**List comprehension** is a quick, concise way of constructing a list based on a specific structure. It looks similar to a *for loop*, but is constructed completely inside a list.

In [None]:
[2*x for x in range(5)]

**Recall:** Loop structure looks like:

    for i in <range>:
        <some expression>
        
List comprehension would look something like this:

    [<some expression> for i in <range>]

<font color='red'>**Question 9: Consider the following code. Use list comprehension to get the same result in one line of code.**</font>

In [None]:
values = []
for i in range(10):
    new_value = 2*i + 5
    values.append(new_value)
values

<font color = 'red'>**Question 10: Using list comprehension, create a list that contains all of the powers of 2 from 0 to 10. In other words, the list should contain the values 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024.**</font>

### List Comprehension with Conditionals

In the next lecture, we will also go into more detail about conditional statements. However, we will rely on your past knowledge for the application in this question. 

We might also have a loop structure that looks like this:

    for i in <range>:
        if <some conditional>:
            <some expression>
        
List comprehension can be used in this case:

    [<some expression> for i in <range> if <some conditional>]

In [None]:
[2*x for x in range(10) if x > 5] 

<font color = 'red'>**Question 11: Repeat Question 7 using list comprehension to create the dictionary with two lists.**</font>