# Lab Session 2 - Practicing Python basics


## Objectives

#### Review and practice the Python basics covered in this week's lecture notebook


* **Working with Python**
    * *LISTS*
    * *DICTIONARIES*
    * *LOOPS*
    * *CONDITIONS*
    * *FUNCTIONS*


## 0. Setup

In [1]:
%matplotlib inline

import random
import matplotlib.pyplot as plt

## 1. Lists

* A __LIST__ is an ordered sequence of objects


* You can create a `list` object in a number of ways:
    1. use square brackets to for boundaries of list and include the items separated by commas
    2. start with an empty list (`[]`) and add (`.append()` function) items to it one at a time

#### 1. Create a list and define the items in it at the same time

In [2]:
my_list = [11,23,45,1,5]

In [3]:
print(my_list)

[11, 23, 45, 1, 5]


In [4]:
len(my_list)

5

#### 2. Create an empty list and add items to it one at a time with `append()`

In [5]:
my_list2 = []

In [6]:
print(my_list2)

[]


In [7]:
my_list2.append(11)

In [8]:
print(my_list2)

[11]


In [9]:
my_list2.append(23)
my_list2.append(45)
my_list2.append(1)
my_list2.append(5)

In [10]:
print(my_list2)

[11, 23, 45, 1, 5]


* You can see that we end up with two list objects with the same content

In [11]:
my_list == my_list2

True

### List indexing

* Items in a list can be accessed/referenced using __INDEXING__.

 
* `my_list = [11, 23, 45, 1, 5]`

    * `my_list[0]` will return the first item in the list, the integer object with value `11`


* **REMEMBER** the first item in the list has an index of `0` (zero). 


* Using `my_list` use indexing to retrieve the following items:
  1. The integer object with value `45`
  2. The fourth item in the list
  3. The integer object with value `23`
  4. The last item in the list

In [12]:
#  1. The integer object with value `45`
  

In [13]:
# 2. The fourth item in the list


In [14]:
# 3. The integer object with value `23`


In [15]:
# 4. The last item in the list


### Adding comments to code cells

* In notebooks we can interleave text and code using different types of cells:
    * `Markdown` cells for text and documentation and explanation
    * `Code` cells for Python code we want to be executed


* It is also useful sometimes to have some explanation or comments mixed in with code to explain what it does etc. We can use __COMMENTS__ to do this.


* In Python comments in code are lines that begin with `#`

In [16]:
# This is a comment (text in a code cell) and will not be interpreted as Python code

* Whereas if we try to put the same text in a code cell without the `#` comment marker we will get an error.

In [18]:
This is a comment and will not be interpreted as Python code

SyntaxError: invalid syntax (<ipython-input-18-db3f4ab24d74>, line 1)

* Here is an example of an extended block of Python code with comments included:

In [19]:
# define an empty list
my_list3 = []

# add the numbers 16, 154 and 111 to the list
my_list3.append(16)
my_list3.append(154)
my_list3.append(111)

# print the length of the list
print('my_list3 has {} items'.format(len(my_list3)))

# print the values
print('The values in my_list3 are:', my_list3)

my_list3 has 3 items
The values in my_list3 are: [16, 154, 111]


* It is good practice to include comments in your code so that:
    1. you'll be able to remember what you where doing when you come back to your code later
    2. others can read, use and expand on your code because they understand what it does
    
    
* When you are trying to solve a problem and write a series of steps in code the first thing you should do is write out these steps in comments. Then you have a template and space to fill in the code step by step.
    * For example the code block above with begin like this:

In [20]:
# define an empty list


# add the numbers 16, 154 and 111 to the list


# print the length of the list


# print the values


### Getting subsequences of items from a list using __SLICING__

* Subsequences of items in a list can be accessed using __SLICING__.
    * You specify a _start_ and _end index_ separated by a colon (`:`) to define a slice
 
 
* `my_list = [11, 23, 45, 1, 5]`

    * `my_list[1:3]` will return the second (index `1`) and third (index `2`) items in the list, the integer objects with values `23` and `45`



* **REMEMBER** one way to think about how slicing works in Python is that it is:
    1. start index _inclusive_
    2. end index _exclusive_

## TASK

* Here is a new list object containing numbers sampled from a list of integers from 0 to 99.

In [21]:
random_list = [18, 72, 27, 61, 83, 19, 21, 36, 40, 94, 75, 6, 29, 80, 96, 89, 68, 78]

In [22]:
# 1. how many items are in this list


In [23]:
# 2. define a slice that will return the subsequence [61, 83, 19, 21]
# using random_list[s:e] where s & e are indices


In [24]:
# 3. define a slice that will return the subsequence [96, 89]
# using random_list[s:e] where s & e are indices


In [25]:
# 4. define a slice that will return the last 5 items in the list


In [26]:
# 5. define a slice that will return the first 10 items in the list


## TASK

* Here is a table of data relating to the number of US Facebook users

<figure style="width:300px; padding: 10px; border: 1px solid gray">
<img src="img/fb2018_users.png"/>

<figcaption>Number of Facebook users in the United States as of January 2018, by age group (in millions)</figcaption>

</figure>


1. Define a list of string objects with a pointer named `age_group` for the first column **Age group**
2. Define a list of numeric objects with a pointer named `users` for the second column **Users**
3. Create a bar plot using these lists, that should look like this:

![](img/barplot1.png)

In [None]:
# 1. Define a list of string objects with a pointer named `age_group` for the first column **Age group** 

age_group =  # you code here

In [None]:
# 2. Define a list of numeric objects with a pointer named `users` for the second column **Users**

users = # you code here

In [None]:
# 3. create a bar plot

plt.bar(age_group, users)

* This plot has no real meaning because there are 
    1. no labels on either axes
    2. no title label
    
    
* Add those three things to the plot so it looks like this:

![](img/barplot2.png)

In [None]:
# 4. create an improved barplot
plt.bar(age_group, users)


# use plt.xlabel('...') to add a label to the x-axis


# use plt.ylabel('...') to add a label to the y-axis


# use plt.title('...') to add a title to the barplot



* Try switching the plotting function from
    * `plt.bar()` to `plt.barh()`
    
    
* Fix the labels for this new plot

In [None]:
# 5. create a second plot with labels using plt.barh() and add appropriate axes labels

### ADD YOUR CODE HERE



## Task

1. create a new list called `age_group2` with string values `13-17`, `18-34`, `35-54` and `55-64+`.


2. create a new list called `users2` derived from `users` that has the number of users corresponding to these four groups.
    * **HINT** use __SLICING_ on values in `users` followed by the `sum()` function to combine the 2nd & 3rd, 4th & 5th and 6th & 7th items in `users`


3. Using the two new lists:
    1. `age_group2`
    2. `users2`
    
   create a bar plot that looks like this:

![](img/barplot3.png)

In [None]:
# 1. create a new list age_group2 with the values a'13-17', '18-34', '35-54', '55-64+'



In [None]:
# 2. create a list of four numbers called users2 derived from users that correspond to the labels in age_group2



In [None]:
# 3. create a bar plot like the one above


In [None]:
# 4. That is the total number of users captured in these data?



## 3. Dictionaries

* A `dictionary` is an unordered set of **KEY**-**VALUE** pairs


* You can define one with curly braces `{` `}`


* With a pair in the format: `key : value`


* And pairs are separated by commas


In [None]:
story_items = {
         'bear': 3,
         'girl': 1,
         'table': 1,
         'chair': 3,
         'bowl': 3,
         'bed': 3,
         'house': 1
        }

* You can get the value of an item (key-value pair) in a dictionary by using the key as the index, e.g.
    * `story_items['chair']` will find the item in the dictionary object with a key `chair` and return the associated value `3`

In [None]:
story_items['chair']

* You will get an error if you try and retrieve the value for a key that does not exist in the dictionary.

In [None]:
story_items['tree']

* You can add a new key-value pair to a dictionary like this:

In [None]:
story_items['forest'] = 1

In [None]:
story_items

* The value in a dictionary can be any kind of object including a `list` or even another `dictionary`

In [None]:
goldilocks_story = {
    'bears': ['papa', 'mama', 'baby'],
    'girls': ['goldilocks'],
    'chairs': ["papa bear's", "mama bear's", "baby bear's"],
    'bowls': ["papa bear's", "mama bear's", "baby bear's"],
    'beds': ["papa bear's", "mama bear's", "baby bear's"]
}

## TASK

* Retrieve the values from `story_items` corresponding to the following keys:
    * `bear`
    * `girl`
    * `table`

In [None]:
# how many bears are there? - use indexing on the dictionary story_items to answer



In [None]:
# how many girls?


In [None]:
# how many tables?


* Now do the same thing with the second dictionary `goldilocks_story` which will return a list and then you can find out the length of that list


* For example, to get a list of the bowls in the story do:
  * `goldilocks_story['bowls']`
  * then use `len()`
  
  

In [None]:
# using goldilocks_story dictionary - how many chairs are there?


In [None]:
# using goldilocks_story dictionary - how many beds are there?


* You can also add a second level of indexing as the value returned is a list.


* For example:

In [None]:
goldilocks_story['beds']

In [None]:
# who does the second bed in the list belong to?

goldilocks_story['beds'][1]

In [None]:
# who does the first chair belong to?


In [None]:
# who does the last bowl belong to?


## TASK

* Using the table below create two lists
    1. `female` should be a list of the percentages in the second column
    2. `male` should be a list of the percentages in the third column

   _Note_ you should already have the first column in a list called `age_group`   


<figure style="width:300px; padding: 10px; border: 1px solid gray">
<img src="img/fb_sharing_may2019.png"/>

<figcaption>Distribution of Facebook users in the United States as of May 2019, by age group and gender
</figcaption>

</figure>

<small>Source: https://www.statista.com/statistics/187041/us-user-age-distribution-on-facebook/</small>


* Now create a dictionary with a pointer called `fb_data` that has:
    * three keys:
        1. `age_group`
        2. `female`
        3. `male`
    * and the matching values should be the corresponding lists

In [None]:
# create a list called female with the numbers from the second column of the table



# create a list called male with the numbers from the third column of the table


In [None]:
# create a dictionary with a pointer called fb_data with three items with keys age_group, female & male
# and the correspond values for these keys will be the three lists

fb_data = {
            # add your items here
          }

* Run the following cell to test whether your dictionary contains the correct values

In [None]:
try:
    assert(fb_data == {'age_group': ['13-17', '18-24', '25-34', '35-44', '45-54', '55-64', '65+'], 'female': [1.0, 7.6, 12.5, 10.4, 9.3, 7.6, 7.1], 'male': [0.7, 6.5, 12.0, 8.7, 7.1, 5.2, 4.1]})
    print(':) Great your dictionary is correct!')
except:
    print(':( Something is wrong with your dictionary - try again')


* Run the follow cell to produce a grouped bar plot


* Add the x & y axis labels and a main title to the plot

In [None]:
bar_width = 0.4

x1 = range(len(fb_data['female']))
x2 = [p+bar_width for p in x1]
xpos = [p+bar_width/2 for p in x1]

plt.bar(x1, fb_data['female'], width=bar_width, label='Female')
plt.bar(x2, fb_data['male'], width=bar_width, label='Male')

plt.xticks(xpos, age_group)

# use plt.xlabel() to add a label "Age group" to the x-axis

# use plt.ylabel() to add a label "% of users by gender" to the y-axis

# use plt.title() to add an appropriate title to the plot


plt.legend()
plt.show()

## Task

* Here are two lists that capture the years on which data is available from Yellowstone on grizzy bears

In [None]:
years = ['1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', 
        '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', 
        '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', 
        '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017']


monitored = [34, 43, 46, 26, 35, 21, 29, 30, 46, 40, 35, 42, 41, 43, 60, 71, 76, 70, 58, 
             65, 84, 82, 81, 80, 78, 91, 92, 86, 87, 97, 85, 92, 112, 88, 94, 101, 106, 99]

* Use functions to calculate
    1. total number of bears monitored
    2. the average number
    
    
* Using slicing and functions make three new lists
    1. `decade` which will be `'1980-1989', '1990-1999', etc.`
    2. `decade_monitored` - the sum of all the bears monitored in the corresponding years
    3. `decade_mean` - the mean of the bears monitored in the corresponding years

In [None]:
# total number of bears monitored



In [None]:
# average number of bears monitored in a year


In [None]:
# create a new list called decade


In [None]:
# create decade_monitored



In [None]:
# create decade_mean



* Create some bar plots using these lists

* Look at the lecture notebook and create some line plots of
    1. the complete year by year dataset (using `years` and `monitored`)
    2. the decade aggregated data