# 5.17 Intro to Data Science: Simulation and Static Visualizations

**Instructor Note: This notebook's code has been organized into cells differently than the snippets presented in the book. In a notebook, all the code that affects the visualization's appearance must appear in the same cell. Any code that modifies that appearance would have to re-display the visualization. For this reason, snippet numbers in this notebook do not match with the snippet numbers in the book.**

* Visualizations help you “get to know” your data. 
* Give you a powerful way to understand data that goes beyond simply looking at raw data.
* The **Seaborn visualization library** is built over the **Matplotlib visualization library** and simplifies many Matplotlib operations. 

## 5.17.1 Sample Graphs for 600, 60,000 and 6,000,000 Die Rolls
* A vertical bar chart that for 600 die rolls summarizes the frequencies with which each of the six faces appear, and their percentages of the total.
* Seaborn refers to this type of graph as a **bar plot**: 

![Screen capture of a vertical bar chart for 600 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total](ch05images/Seaborn_01.png "Screen capture of a vertical bar chart for 600 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total")

* Expect about 100 occurrences of each die face. 
* For a small number of rolls, none of the frequencies is exactly 100 and most of the percentages are not close to 16.667% (about 1/6th). 
* For 60,000 die rolls, the bars will become much closer in size. 
* At 6,000,000 die rolls, they’ll appear to be exactly the same size.
* “Law of large numbers” at work. 

* The first screen capture below shows the results for 60,000 die rolls—expect about 10,000 of each face. 
* The second screen capture below shows the results for 6,000,000 rolls—expect about 1,000,000 of each face
* With more die rolls, the frequency percentages are much closer to the expected 16.667%.

![Screen capture of a vertical bar chart for 60,000 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total](ch05images/Seaborn_02.png "Screen capture of a vertical bar chart for 60,000 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total")

![Screen capture of a vertical bar chart for 6,000,000 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total](ch05images/Seaborn_03.png "Screen capture of a vertical bar chart for 6,000,000 die rolls summarizing the frequencies with which each of the six faces appear, and their percentages of the total")
 

## 5.17.2 Visualizing Die-Roll Frequencies and Percentages

### Launching IPython for Interactive Matplotlib Development
* To enable IPython's built-in support for interactively developing Matplotlib graphs:

```python
ipython --matplotlib
```

### Importing the Libraries
**Note: `%matplotlib inline` is an IPython magic that enables Matplotlib-based graphics to be displayed directly in the notebook. We've separated by two blank lines the snippets that were combined into a single cell.**

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
import numpy as np

In [None]:
import random

In [None]:
import seaborn as sns

1. **`matplotlib.pyplot`** contains the Matplotlib library’s graphing capabilities that we use. This module typically is imported with the name `plt`. 
2. NumPy (Numerical Python) library includes the function `unique` that we’ll use to summarize the die rolls. The **`numpy` module** typically is imported as `np`. 
3. `random` contains Python’s random-number generation functions.
4. **`seaborn`** contains the Seaborn library’s graphing capabilities we use. This module typically is imported with the name `sns`. 

### Rolling the Die and Calculating Die Frequencies

In [None]:
rolls = [random.randrange(1, 7) for i in range(600)]

* NumPy's **`unique` function** expects an `ndarray` argument and returns an `ndarray`. 
* If you pass a list, NumPy converts it to an `ndarray` for better performance. 
* Keyword argument **`return_counts`**`=True` tells `unique` to count each unique value’s number of occurrences
* In this case, `unique` returns a **tuple of two one-dimensional `ndarray`s** containing the **sorted unique values** and their corresponding frequencies, respectively. 

In [None]:
values, frequencies = np.unique(rolls, return_counts=True)

### Creating the Initial Bar Plot

### Setting the Window Title and Labeling the x- and y-Axes

### Finalizing the Bar Plot

In [None]:
title = f'Rolling a Six-Sided Die {len(rolls):,} Times'


sns.set_style('whitegrid')  # default is white with no grid


# create and display the bar plot
axes = sns.barplot(x=values, y=frequencies, palette='bright')


# set the title of the plot
axes.set_title(title)


# label the axes
axes.set(xlabel='Die Value', ylabel='Frequency')  


# scale the y-axis to add room for text above bars
axes.set_ylim(top=max(frequencies) * 1.10)


# create and display the text for each bar
for bar, frequency in zip(axes.patches, frequencies):
    text_x = bar.get_x() + bar.get_width() / 2.0  
    text_y = bar.get_height() 
    text = f'{frequency:,}\n{frequency / len(rolls):.3%}'
    axes.text(text_x, text_y, text, 
              fontsize=11, ha='center', va='bottom')

### Rolling Again and Updating the Bar Plot—Introducing IPython Magics

In [None]:
# plt.cla()
# We placed this code in a comment because it was meant for use 
# in an interactive IPython session in which we clear the window,
# then display a new graph in it. In a notebook, we can simply 
# display a new graph inline.

When you execute the next cell, the notebook will add another cell below it containing the code in Snippet 5. You should then change 600 to 60000.

In [None]:
%recall 5

When you execute the next cell, the notebook will add another cell below it containing the code in Snippets 6-7. Executing that cell will produce a new graph.

In [None]:
%recall 6-7

### Saving Snippets to a File with the %save Magic 

In [None]:
%save RollDie.py 1-7

In [None]:
# plt.cla()
# We placed this code in a comment because it was meant for use 
# in an interactive IPython session in which we clear the window,
# then display a new graph in it. In a notebook, we can simply 
# display a new graph inline.

### Command-Line Arguments; Displaying a Plot from a Script
* Provided with this chapter’s examples is an edited version of the `RollDie.py` file you saved above. 
* We added comments and a two modifications so you can run the script with an argument that specifies the number of die rolls, as in:
```python
ipython RollDieWithArg.py 600
```

* **`sys` module** enables a script to receive _command-line arguments_ that are passed into the program. 
* These include the script’s name and any values that appear to the right of it when you execute the script. 
* The `sys` module’s **`argv`** list contains the arguments. 
* **_Matplotlib and Seaborn do not automatically display the plot for you when you create it in a script_**. So at the end of the script we added the following call to Matplotlib’s **`show`** function, which displays the window containing the graph:
```python
plt.show()
```

In [None]:
run RollDieWithArg.py 6000

------
&copy;1992&ndash;2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).

DISCLAIMER: The authors and publisher of this book have used their 
best efforts in preparing the book. These efforts include the 
development, research, and testing of the theories and programs 
to determine their effectiveness. The authors and publisher make 
no warranty of any kind, expressed or implied, with regard to these 
programs or to the documentation contained in these books. The authors 
and publisher shall not be liable in any event for incidental or 
consequential damages in connection with, or arising out of, the 
furnishing, performance, or use of these programs.                  