# Plotting practice - tips dataset

* Adapted from [guipsamora pandas-exercises](https://github.com/guipsamora/pandas_exercises)

### Instructions

* The goal of this workbook is to get practice plotting data from pandas DataFrames using `matplotlib` and `seaborn`.
* Some of the code blocks below are missing parts of the code. The missing parts are indicated by `____`. You need to fill in the correct function or variable name to complete the prompt for that step.
    - For example, on Step 4, the prompt is to view the first 10 entries in the data. The code block is: `tips._____(10)`. 
    - The solution to this code is `tips.head(10)`. You should type in `head` where the `____` is originally and execute the code bock.
* You have completed this assignment when you have filled in all of the blanks and executed the entire notebook successfully.

### Introduction:

This exercise was created based on the tutorial and documentation from [Seaborn](https://stanford.edu/~mwaskom/software/seaborn/index.html)  
The dataset being used is tips from Seaborn.

### Step 1. Download the dataset 
1. Open the [tips dataset](https://github.com/rlowd/python-bigdata/blob/master/pandas-exercises/data/tips-dataset.csv) in new window.
2. Click the "Raw" button at the top right.
3. Right-click and "Save As..." to the same folder as your jupyter notebook.
![Click the "Raw" button](../../images/click-raw-button.png)
![Right click and "Save As..."](../../images/click-save-as.png)

### Step 2. Import modules

In [None]:
import pandas as pd

# visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns


# print the graphs in the notebook
% matplotlib inline

# set seaborn style to white
sns.set_style("white")

### Step 3. Read in the tips dataset

* **Modify the path to read the dataset based on where it lives on your computer.**
* Assign it to the variable `tips`.

In [None]:
tips = pd.read_csv('/path/to/data/tips-dataset.csv')
tips.head()

### Step 4. Delete the Unnamed 0 column

In [None]:
del tips['Unnamed: 0']

* View the first 10 rows of the cleaned dataset.

In [None]:
tips.____(10)

### Step 5. Plot the total_bill column histogram

* First preview the `total_bill` column.

In [None]:
tips_____.head()

* Now use `distplot` to create a histogram of these data.
* _Note: you will get a `UserWarning` here. This is a [known issue](https://github.com/mwaskom/seaborn/issues/1392), just ignore the warning for now._

In [None]:
# create histogram
ttbill = sns._______(tips.total_bill);

# set lables and titles
ttbill.set(xlabel = 'Value', ylabel = 'Frequency', title = "Total Bill")

# take out the right and upper borders
sns.despine()

### Step 6. Create a scatter plot presenting the relationship between total_bill and tip

* Preview the `total_bill` and `tip` values.
* Note there are two ways to preview a single column from a pandas DataFrame. 
    - `DataFrame['quoted_column_name_in_brackets'].head()
    - `DataFrame.raw_column_name.head()
* Try both!

In [None]:
# Preview total_bill values
tips_____.head()

In [None]:
# Preview tip values
_____.tip.head()

* Use seaborn `jointplot()` to create a scatter plot showing the realtionship between two columns.
* _Note: you will get a `UserWarning` here. This is a [known issue](https://github.com/mwaskom/seaborn/issues/1392), just ignore the warning for now._

In [None]:
sns._____( x ="total_bill", y ="tip", data = _____ )

### Step 7.  Create one image with the relationship of total_bill, tip and size.

* Use `pairplot()` to create sets of scatter plots

In [None]:
___.pairplot(data = _____)

### Step 8. Present the relationship between days and total_bill value

* What types of values are the `day` and `total_bill` columns?

In [None]:
tips.day.head()

In [None]:
type(tips.day.values[0])

In [None]:
tips.total_bill.head()

In [None]:
type(_____.total_bill.values[0])

In [None]:
sns.stripplot(x = "____", y = "____", data = tips, jitter = True);

### Step 9. Create a scatter plot with the day as the y-axis and tip as the x-axis, differ the dots by sex

In [None]:
tips.sex.head()

In [None]:
# What type are the individual values in the sex column?
type(tips.sex.____[0])

In [None]:
# Map hue to the sex value to alter color of the dots according to gender.
sns.stripplot(x = "tip", y = "day", hue = "_____", data = ____, jitter = True);

### Step 10.  Create a box plot presenting the total_bill per day differetiation the time (Dinner or Lunch)

* Map `time` values onto the `hue` parameter.

In [None]:
tips.time.head()

In [None]:
sns.boxplot(x = "____", y = "total_bill", hue = "____", data = tips);

### Step 11. Create two histograms of the tip value based for Dinner and Lunch. They must be side by side.

In [None]:
tips.time.unique()

In [None]:
# better seaborn style
sns.set(style = "ticks")

# creates FacetGrid
g = sns.FacetGrid(tips, col = "_____")
g.map(plt.hist, "tip");

### Step 12. Create two scatterplots graphs, one for Male and another for Female, presenting the total_bill value and tip relationship, differing by smoker or no smoker

* You want the color (`hue`) to map to smoking status, and the facets (each separate plot) to map to the sex (M/F). 
* Hint: the `col` argument in `FacetGrid()` stands for column.

In [None]:
g = sns.FacetGrid(tips, col = "____", hue = "_____")
g.map(plt.scatter, "total_bill", "tip", alpha =.7)

g.add_legend();

### BONUS: Create your own question and answer it using a graph.