## What We Looked At Last Time
* We wrapped up our discussion of Python list fundamentals.
* We looked at using functions as objects.
* We looked at lazy evaluation.
* We worked through a few exercises involving material from the past few sessions.

## What We'll Look At Today
* We'll look at a few more built-in Python functions
* We'll look at 2D lists.
* We'll look at plotting data using Seaborn.
* We'll examine the Python dictionary object in depth (although we may not finish!). 

### Iterating Backwards Through a Sequence Using `reversed`
* `reversed` creates a generator object that returns an iterable expression in reverse order.
* As with many previous operations, it is more efficient than using the slice approach because it does not generate an intermediate copy. 

In [None]:
numbers = [10, 3, 7, 1, 9, 4, 2, 8, 5, 6]
reversed_numbers = [item ** 2 for item in numbers[::-1]]
print(reversed_numbers)

In [None]:
print(reversed(numbers))
reversed_numbers = [item ** 2 for item in reversed(numbers)]
print(reversed_numbers)

In [None]:
print(reversed('ABCDE'))
for ele in reversed('ABCDE'):
    print(ele)

### Combining Iterables into Tuples of Corresponding Elements Using `zip`
* The built-in function **`zip`** enables you to iterate over _multiple_ iterable expressions at the _same_ time. 
* It receives any number of iterables and returns an iterator that produces tuples containing the elements at the same index in each. 
* If the iterable expressions return different numbers of elements, the one with the _least_ elements determines how many tuples are produced.

In [None]:
#zip and print 3 separate data fields related to students
names = ['Bob', 'Sue', 'Amanda']
ids = [123456, 789123, 555555]
grade_point_averages = [3.6, 4.0, 3.7] 
for name, sid, gpa in zip(names, ids, grade_point_averages):
    print(f'Name={name}; ID={sid}; GPA={gpa}')

In [None]:
#Because "playernums" has only 4 elements, the zip returns only four tuples (first four of each IE) 
playernums = [1, 65, 4, 27]
scores =  [80, 70, 62, 90, 77, 65] 
for pnum, sc in zip(playernums, scores):
    print(f'PNum={pnum}; Score={sc}')


# Two-Dimensional Lists
* Lists can contain other lists as elements. 
* Typical use is to represent **tables** of values consisting of information arranged in **rows** and **columns**. 
* To identify a particular table element, we specify _two_ indices—the first identifies the element’s row, the second the element’s column.

In [None]:
a = [[77, 68, 86, 73], [96, 87, 89, 81], [70, 90, 86, 81]]
print(a[2][1:3])

Writing the list as follows makes its row and column tabular structure clearer:

```python
a = [[77, 68, 86, 73],  # first student's grades
     [96, 87, 89, 81],  # second student's grades 
     [70, 90, 86, 81]]  # third student's grades
```

### Illustrating a Two-Dimensional List

![The two-dimensional list 'a' with its rows and columns of exam grade values](ch05images/AAHBDOV0_2.png "The two-dimensional list 'a' with its rows and columns of exam grade values")

### Identifying the Elements in a Two-Dimensional List

![The two-dimensional list 'a' labeled with the names of its elements](ch05images/AAHBDOV0.png "The two-dimensional list 'a' labeled with the names of its elements")

In [None]:
#output the rows and columns of the prior 2D List
for row in a:
    for colitem in row:
        print(colitem, end=' ')
    print()
        

In [None]:
#We can use enumerate to print rows and column IDs in addition to values
for i, row in enumerate(a):
    for j, item in enumerate(row):
        print(f'a[{i}][{j}]={item} ', end=' ')
    print()

# Intro to Data Science: Simulation and Static Visualizations



* Visualizations help you “get to know” your data. 
* They give you a powerful way to understand data that goes beyond simply looking at raw data.
* The **Seaborn visualization library** is built over the **Matplotlib visualization library** and simplifies many Matplotlib operations. 

## Sample Graphs for 600, 60,000 and 6,000,000 Die Rolls
* A vertical bar chart that for 600 die rolls summarizes the frequencies with which each of the six faces appear, and their percentages of the total.
* Seaborn refers to this type of graph as a **bar plot**. 



<center><img src="ch05images/Seaborn_01.png" alt="Drawing" style="width: 900px;"/></center>

* We would expect about 100 occurrences of each die face, or 16.667%. 
* For a small number of rolls, none of the frequencies is exactly 100 and most of the percentages are not close to 16.667% (about 1/6th). 
* For 60,000 die rolls, the bars will become much closer in size, and at 6,000,000 die rolls, they’ll appear to be the same.
* The **Law of large numbers** indicates that increasing the sample size within an experiment will gradually shift its distributions to fit the population (in this case expected probability distribution).

<center><img src="ch05images/Seaborn_02.png" alt="Drawing" style="width: 900px;"/></center>

<center><img src="ch05images/Seaborn_03.png" alt="Drawing" style="width: 900px;"/></center>

## Visualizing Our Own Die-Roll Frequencies and Percentages

### Crucial Libraries for visualization

1. **`matplotlib.pyplot`** contains the Matplotlib library’s graphing capabilities that we use. This module typically is imported with the name `plt`. 
3. NumPy (Numerical Python) library includes the function `unique` that we’ll use to summarize the die rolls. The **`numpy` module** typically is imported as `np`.  Note that we'll see NumPy in far greater detail down the road. 
4. `random` contains Python’s random-number generation functions.
5. **`seaborn`** contains the Seaborn library’s graphing capabilities we use (they are less customizable, but easier to apply than pyplot's). This module typically is imported with the name `sns`. 


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import random
import seaborn as sns #NOTE: YOU MAY NEED TO install seaborn (e.g. "conda install seaborn") before using!

### Rolling the Die and Calculating Die Frequencies
* NumPy's **`unique` function** expects an `ndarray` argument and returns an `ndarray`. 
* If you pass a list, NumPy converts it to an `ndarray` for better performance. 
* Keyword argument **`return_counts`**`=True` tells `unique` to count each unique value’s number of occurrences
* In this case, `unique` returns a **tuple of two one-dimensional `ndarray`s** containing the **sorted unique values** and their corresponding frequencies, respectively. 

In [None]:
def rollplot(rollquantity):
    rolls = [random.randrange(1, 7) for i in range(rollquantity)] #Create list of ALL die rolls.
    values, frequencies = np.unique(rolls, return_counts=True) #Get counts per die face.
    title = f'Rolling a Six-Sided Die {len(rolls):,} Times'
    sns.set_style('whitegrid')  # default is white with no grid
    axes = sns.barplot(x=values, y=frequencies, palette='bright') # create and display the bar plot
    axes.set_title(title)     # set the title of the plot
    axes.set(xlabel='Die Value', ylabel='Frequency') # label the axes
    axes.set_ylim(top=max(frequencies) * 1.10) # scale the y-axis to add room for text above bars
    # create and display the text for each bar
    for bar, frequency in zip(axes.patches, frequencies):
        text_x = bar.get_x() + bar.get_width() / 2.0  
        text_y = bar.get_height() 
        text = f'{frequency:,}\n{frequency / len(rolls):.3%}'
        axes.text(text_x, text_y, text, 
                  fontsize=11, ha='center', va='bottom')

In [None]:
rollplot(6000000)

### Some Valuable pyplot Functionality
* The figure in the above example is generally quite small on most displays.  
* We can use two steps to quickly resize a generated figure (in pyplot OR seaborn).
    * First, we can use the method `plt.gcf()` to grab the current (last plotted or manipulated) figure object.
    * We can then use the method `set_size_inches(width,height)` to set the display size of the object in inches.
* While notebook handles the functionality implicitly, it's good practice to always include `plt.show()` to show the current figure (when executed outside the IPython environment, this is required. 

In [None]:
rollplot(60000)
fig = plt.gcf() #fig now references the current figure
fig.set_size_inches(11,8) #set the current figure's size to be 11" by 8".
plt.show() #Best practice is to "display" the figure when ready.

In [None]:
rollplot(6000000)
fig = plt.gcf()
fig.set_size_inches(11,8)
plt.show()

# Dictionaries and Sets
* A **dictionary** is an _unordered_ collection which stores **key–value pairs** that map immutable keys to values, just as a conventional dictionary maps words to definitions. 
* A **set** is an unordered collection of unique immutable elements.
* Dictionaries and sets are distinct from, but very closely related to lists.

## Dictionaries
* A dictionary _associates_ keys with values. 
* Each key _maps_ to a specific value. 
* These values can be as simple as primitive objects (integers, bools, etc.), or embedded structures (e.g. lists, other dictionaries) themselves.  

## Examples
| Keys | Key type | Values | Value type
| :-------- | :-------- | :-------- | :--------
| Country names | `str` | Internet country codes | `str` 
| Decimal numbers | `int` | Roman numerals | `str` 
| States | `str` | Agricultural products | list of `str` 
| Hospital patients | `str`  | Vital signs | tuple of `int`s and `float`s 
| Baseball players | `str`  | Batting averages | `float` 
| Metric measurements | `str`  | Abbreviations | `str` 
| Inventory codes | `str`  | Quantity in stock | `int` 

### Key Requirements
* Keys in any one dictionary must be _immutable_ and _unique_. 
* Multiple keys can have the same value (e.g mapping a large group of people's SSNs to their first or last names would see many duplicates.) 

## Dictionary Basics
* Create a dictionary by enclosing in curly braces, `{}`, a comma-separated list of key–value pairs, each of the form _key_: _value_. 
* Create an empty dictionary with `{}`. 
* Dictionaries are _unordered_ collections.
* You should generally _not_ write code that depends on the order in which key–value pairs were added. 

In [None]:
country_codes = {'Finland': 'fi', 'South Africa': 'za', 
                  'Nepal': 'np'}
print(country_codes)                 

### Dictionary Length and Empty Dictionaries
* We can use the `len` function (returns # of keys) to determine if a dictionary is empty
* We can also substitute dictionaries for conditions directly: it will evaluate to `False` if empty and `True` otherwise.
* The method `clear` will delete all of a dictionary's keys. 

In [None]:
empty_dict= {}
print(len(country_codes)==0) 
print(len(empty_dict)==0)


In [None]:
if country_codes:
    print('country_codes is not empty')
else:
    print('country_codes is empty')

if empty_dict:
    print('empty_dict is not empty')
else:
    print('empty_dict is empty')
    

In [None]:
country_codes.clear()
if country_codes:
    print('country_codes is not empty')
else:
    print('country_codes is empty')

## Dictionary: Element Access and Iteration 
* Use bracket notation with a key-name to access the corresponding value
* We can use dictionary method `keys` to return an iterable of all keys in a dictionary.
* Alternatively, dictionary method `items` returns each key–value pair as a tuple.
* Either `keys` or `items` is suitable for manipulating key-value pairs, but the latter is generally preferred if both are required.

In [None]:
days_per_month = {'January': 31, 'February': 28, 'March': 31}
print(days_per_month)

In [None]:
print(days_per_month['March'])

In [None]:
for month in days_per_month.keys():
    print(f'{month} has {days_per_month[month]} days.') #Not the most elegant way to do things

In [None]:
for month, days in days_per_month.items():
    print(f'{month} has {days} days.') #Better
    

### Adding or Modifying Key–Value Pairs
* Bracket notation can _also_ be used to bind a value to a new key.
* If used with an existing key-value pair, the value for the given key will be **overwritten**.
* Note that String-based keys are _case-sensitive_

In [None]:
days_per_month['April'] = 31 #Oops -- not the right number of days!
print(days_per_month)

In [None]:
days_per_month['APRIL'] = 30 #Trying to fix the error (but ignoring case-sensitivity!)
print(days_per_month)

### Removing Key–Value Pairs
* `del` can be used to delete a key.
* `pop` can be used to remove and _return_ the value for a given key.

In [None]:
del(days_per_month['April'])
print(days_per_month)

In [None]:
print(days_per_month.pop('APRIL'))
print(days_per_month)

### `get` and Nonexistent Keys
* Trying to directly index a non-existent key with brackets produces a `KeyError`
* Method **`get`** returns its argument’s corresponding value in a dictionary or `None` if the key is not found. 
* Note that if you don't explicitly use an output statement like `print` IPython will not display anything for `None`. 
* `get` with a second argument returns the second argument if the key is not found.

In [None]:
days_per_month.get('May')

In [None]:
days_per_month.get('March')

In [None]:
if days_per_month.get('May'):
    print('That month exists in the dictionary')
else: 
    print('That month does not exist in the dictionary')

In [None]:
print(days_per_month.get('May','May not in dictionary!'))