<a href="https://colab.research.google.com/github/sensei-jirving/Online-DS-PT-01.24.22-cohort-notes/blob/main/Week_04/Lecture_02/PreClass_Advanced_Visualizations_with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📊 Advanced Visualizations with Python 🎨

- 02/18/22
- 01.24.22 Cohort

### 🗂 Table of Contents 
<a name="contents"></a>

- ☁️ [Working with Google Drive with Python](#gdrive)
    - Saving google drive folder paths in variables
    - Using the `os` module for file management
    - Links to Dataset
- 🏠 [Task #1: Ames Housing Revisited](#task1):
    - Formatting Tick Labels 
    - Customizing fonts 
    - Matplotlib styles
    - Saving to Google Drive
- 🦸 [Task #2: Super Hero Powers](#task2)🦸‍♀️:
    - Precise/Selective Coloring of Bars
    - Loops through lists, using filters, etc.

- [Appendix](#Appendix)
    - Seaborn Palettes
    - Subplots of Different Sizesa
    - Annotating bar values



## Learning Objectives

> Today we are going to focus on examples of advanced and more complex visualization construction with matplotlib/pandas/seaborn.

- Instead of a single-dataset-driven activity, we are going to use several datasets to demonstrate different visualizations tasks/tweaks.





### Previously Promised in Class



- [x] Formatting ticks - location and text formatting
- [x] Matplotlib styles/seaborn themes
- [x] Font customization (titles/axis labels, etc)
- [x] Figures with multiple subplots
    - [x] Multiple subplots with DIFFERENT figure sizes. (Appendix)
- [x] Saving/exporting figures (programatically)




### Requested


- [x] Review loading and saving files to Google Drive.


- [x] Using lists to iterate through the custom creation of Subplots; data frames, filters, colors, labels etc



- [x] OOP Syntax and Using Axes



- [x] Selective removal of frame/spines



## Datasets/Files Used Today

### Google Drive Data Folder to Save

- 💾 All data files used today are stored in [this Google Drive folder](https://drive.google.com/drive/folders/10O96wCNedDmmuKaO_NBc5rsNCUoiLuXs?usp=sharing).
    - You will want to click the add to google drive button on the top right.
    - or you can download the files locally and then add to colab/drive.


### Original Sources
- Super Heroes Dataset: https://www.kaggle.com/claudiodavi/superhero-set
    - Files:
        - `heroes_information.csv`
        - `super_hero_powers.csv`

- Ames Housing Dataset: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
    - Files:
        - Kaggle file name: `train.csv`
        - renamed to `ames-train.csv`


# ☁️ Working with Folders and Stored Files **with Python** <a name="gdrive"></a> 
- 🗂[Click to jump to ToC](#contents)


## Using Google Drive - with Python



### Mounting Google Drive



- When I click the Mount google drive button on the Files sidebar, I always hit `Cancel`, which automatically inserts the following cell of code.
```python
from google.colab import drive
drive.mount('/content/drive')
```


- If you uploaded new files on Google Drive's website, you may need to reload the contents of your drive in Colab. 
    - To Do so, add `force_remount=True`
```python
from google.colab import drive
drive.mount('/content/drive',force_remount=True)
```



### Saving Filepaths as Variables


- I have all of my bootcamp materials saved on google drive in a "`DS-Bootcamp`" folder.

- My data files are stored in on gdrive in the following "`Data`" folder:
    - `My Drive` > `DS-Bootcamp` > `Data`

- **I may want to save files back to Google Drive, somewhere inside by `Ds-Bootcamp` folder.**
    - So I am going to mount google drive sand **save the filepath for it as `BASE_FOLDER`**
        - Tip: I  always make sure that I include a `/` at the END of my folder name, that way I can combine it with file names later.
        - `BASE_FOLDER = '/content/drive/MyDrive/DS-Bootcamp/' `

    - I also will make a `DATA_FOLDER` variable to make it easier to access the numerous files we will be loading today. 
        - `DATA_FOLDER = BASE_FOLDER+"Data/"`


In [None]:
from google.colab import drive
drive.mount('/content/drive',force_remount=True)

In [None]:
## saving filepath variables
BASE_FOLDER = '/content/drive/MyDrive/DS-Bootcamp/'
DATA_FOLDER = BASE_FOLDER+"Data/"

## Using the `os` module for managing files and folders

#### Getting Folder Contents

- Python has a builtin module called `os` that has lots of helful functions for working with files and folders. 

- To check the contents of the current folder:
```python
os.listdir() # list directory contents
```
    - The current folder is Colab's virtual hard drive, NOT google drive!
    

In [None]:
## import module
import os

##  check current folder
os.listdir()



- **To check the contents of a different folder**, just provide the folder path as an argument
```python
os.listdir(DATA_FOLDER) # list directory contents
```

- To sort the file names alphabetically, pass the above code into the `sorted` function.
```python
sorted(os.listdir(DATA_FOLDER)) # list directory contents
```


In [None]:
## get list of folders/files in DATA_FOLDER
sorted(os.listdir(DATA_FOLDER))

In [None]:
## get list of folders/files in my BASE_FOLDER
sorted(os.listdir(BASE_FOLDER))

In [None]:
## get list of folders/files in Week_04
sorted(os.listdir(BASE_FOLDER+'Week_04/'))

#### Creating New Folders

- I want to create a new folder called "Advanced Visualizatinons" inside of the "`Week_04/Lecture_02/`" folder we see above.


- `os.makedirs` function will create the folders for whatever filepath you provide. It needs the new folder to create as a string



- To verify I made my folder path correctly,I can save it as a variable FIRST and then use that variable to make the folder and then check its contents.
```python 
new_folder = BASE_FOLDER+'Week_04/Lecture_02/Advanced_Visualizations/'
print(new_folder)
```

- If the file path looks correct (make sure no missing "/"), then create the folder with `os.makedirs`:
```python
os.makedirs(new_folder,exist_ok=True)
```

    - adding `exist_ok=True` will prevent errors if the folder already exists
    ```python
     os.makedirs(BASE_FOLDER+'Week_04/Lecture_02/Advanced_Visualizations/',exist_ok=True
    ```


In [None]:
# checking new folder name
new_folder = BASE_FOLDER+'Week_04/Lecture_02/Advanced_Visualizations/'
new_folder

In [None]:
## Creating the new folder
os.makedirs(new_folder,exist_ok=True)

In [None]:
## Checking that I can get list of files in new folder
sorted(os.listdir(new_folder)) # --> this would error if folder didn't exist

- Okay great! No files is ok, because we just created this folder!
    - If the folder didn't exist, we would have received an error message.
- Now we are ready to start working with our numerous data files and to save our results back to Google Drive!

# 🏠 Task #1 - House Price Insights for Ames, Iowa <a name="task1"></a>
- [🗂Click to jump to ToC](#contents)



<img src="https://www.brickunderground.com/sites/default/files/styles/blog_primary_image/public/blog/images/080818_desmoinesmain.jpg" width=50%>

- A home owners association from Ames, Iowa has hired us to provide some insights on the prices of homes in the area.  They have provided us with some data on house sales in the region, as well as a list of questions they'd like answered.

- We will therefore use the appropriate visualizations to answer their questions in visual-form.


### Imports and Checking Versions of Packages

In [None]:
## Our usual imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


## Notice the extra matplotlib import!
import matplotlib as mpl


## Printing the current version of these packages in Colab
print('- Package Versions:')
print(f'\tMatplotlib = {mpl.__version__}')
print(f'\tPandas = {pd.__version__}')
print(f'\tSeaborn = {sns.__version__}')

### The Questions to Answer

1. What is the distribution of house prices in Ames, Iowa?
    - What does the full distribution of homes look like?
    - Are there any homes that are outliers, in terms of their price? 
    
2. What is the relationship between square footage of the living area (`GrLivArea`)  and sale price (`SalePrice`)?
    
2. What is the average sale price for each of the different types of homes (BldgType)?

### Distribution of House Prices + Outliers 

In [None]:
## load the ames-train.csv file into a df
df = pd.read_csv(DATA_FOLDER+'ames-train.csv',index_col=0)
df

In [None]:
## Keeping a subset of features
cols_to_use = ['YrSold', 'MoSold', 'Fireplaces', 'TotRmsAbvGrd', 'GrLivArea',
          'FullBath', 'YearRemodAdd', 'YearBuilt', 'OverallCond', 
          'OverallQual', 'LotArea', 'SalePrice','BldgType']
df = df[cols_to_use].copy()
df

In [None]:
## super-quick null value and datatype check
print("- Null Values:",df.isna().sum(),'\n',sep='\n')
df.info()

##### Revisiting Our Visualization from Week 03, Lecture 01

In [None]:
## FROM WEEK 03, LECTURE 01

## Make a larger fig/ax before plotting
fig, ax = plt.subplots(figsize=(10,5))

## Plot histogram
sns.histplot(data=df, x='SalePrice', ax=ax)
ax.set(title='Distribution of Home Prices in Ames, Iowa');

## Annotating mean and median
mean_price = df['SalePrice'].mean()
ax.axvline(mean_price,color='k', ls=':', 
           label=f"Mean Price= ${round(mean_price,2)}");

med_price = df['SalePrice'].median()
ax.axvline(med_price,color='k', ls=':', 
           label=f"Median Price= ${round(med_price,2)}");

ax.legend();

#### 🕹 Python String Format Specifiers/Code


- To get 2-digit prices for our mean and median, we *can*  use the round function to round to 2 digits 

```python
## save mean
mean_price = df["SalePrice"].mean()

## Add vertical line with mean in the label
ax.axvline(mean,label= f'Mean Price= ${round(mean_price,2)}')
```

- BUT the better way is to use python string formatting to customize how the number appears.
    - Inside our f'strings curly brackets, after the variable we want to print, we will add a `:` followed by a special format code (see reference below).
        - Using ".2f" would make our float (decimal value) only display 2 decimal places.
            - `{mean:.2f}`
        - The line of code would look like:
```python
ax.axvline(mean,label= f'Mean Price= ${mean_price:.2f}')
```

- Just by adding a `,` to our format code, we can have pyhton add `,`'s as thousands separators (example: `100,000`)
    - Put together, our format code would be:
        - - `{mean:,.2f}`

- Additional Resources:
    - [Tutorial: String Formatting](https://www.w3schools.com/python/ref_string_format.asp)
    - [Reference Table: String Format Codes](https://mkaz.blog/code/python-string-format-cookbook/)





In [None]:
## Paste our visualization code, but use string formatting instead of rounding




In [None]:
## functionize the above plot and call it plot_dist


In [None]:
## test out our function to verify result


>- What if we wanted our SalePrice ticks to look as good as our legend values?

### 📚 Customizing Tick Formatting

- [Tick Formatters Examples](https://matplotlib.org/stable/gallery/ticks_and_spines/tick-formatters.html)
    - [**Reference: String Format Codes**](https://mkaz.blog/code/python-string-format-cookbook/)
- [Tick Locator Examples](https://matplotlib.org/stable/gallery/ticks_and_spines/tick-locators.html)


- We will use the `StrMethodFormatter` which uses python string format codes to change the tick laebl text.



- Let's make our price ticks look more professional
    - Add $'s 
    - Add , separator for thousands
    - Show 2 decimal places
    
- [Tutorial Example](https://matplotlib.org/stable/gallery/pyplots/dollar_ticks.html?highlight=tick)

#### 🕹Using StrMethodFormatter

In [None]:
## Make price_fmt using the StrMethodFormatter and the appropriate format code 

## Get the fig and ax from our function


In [None]:
## Now, use the ax.xaxis.set_major_formatter method 

## Display fig again


#### Using the `FuncFormatter` to Converting "\$700,000" Dollars to "\$700K"

- Example: How to use the FuncFormatter to convert \\$'s to millions of \\$'s.
- To use the Function Formatter:
    - Create a function that accepts 2 arguments: `x` and `pos`
    - Use f-strings and format codes to specify how to change x. 
    - e.g. `f"${x*1e-6:,}M"` would convert "2,000,000" -> "$2M"


```python
from matplotlib.ticker import FuncFormatter

def hundred_k(x,pos):
    """function for use wth matplotlib FuncFormatter -  formats money in millions"""
    return f"${x*1e-3:,0f}K"

# Create the formatter
price_fmt_mill =FuncFormatter(hundred_k)

## Set the axis' major formatter
ax.xaxis.set_major_formatter(price_fmt_mill)
```

In [None]:
def hundred_k(x,pos):
    """function for use wth matplotlib FuncFormatter -  formats money in millions"""
    return f"${x*1e-3:,.0f}K"

# exxample, just using 0 for pos to run function to test
hundred_k(700000,0)

In [None]:
from matplotlib.ticker import FuncFormatter
price_fmt_100k = FuncFormatter(hundred_k)

In [None]:
## Get a new fig from plot_dist and use the new FuncFormatter for price
fig, ax = plot_dist(df)
ax.xaxis.set_major_formatter(price_fmt_100k)

#### `𝑓` Updating our Function with (optional) Tick Formatting

In [None]:
## Creating a BETTER version of our function with fancy price ticks
from matplotlib.ticker import FuncFormatter
def hundred_k(x,pos):
    """function for use wth matplotlib FuncFormatter -  formats money in millions"""
    return f"${x*1e-3:,.0f}K"


def plot_dist(data,x='SalePrice',figsize=(10,5),format_price=True):
    ## Make a larger fig/ax before plotting
    fig, ax = plt.subplots(figsize=figsize)

    ## Plot histogram
    sns.histplot(data=df,x=x,ax=ax)
    ax.set_title('Distribution of Home Prices in Ames, Iowa');


    ## Annotating mean and median
    mean_price = df[x].mean()
    ax.axvline(mean_price,color='slategray', ls='--', lw=3,
            label=f"Mean {x} = ${mean_price:,.2f}");

    med_price = df[x].median()
    ax.axvline(med_price,color='skyblue', ls=':', lw=3,
            label=f"Median {x} = ${med_price:,.2f}");
    ax.legend();


    ## if format_price is True use our FuncFormatter
    if format_price:
        
        price_fmt_100k = FuncFormatter(hundred_k)
        ax.xaxis.set_major_formatter(price_fmt_100k)

    return fig,ax

In [None]:
plot_dist(df);

## 📚 Customizing Fonts

- Multiple options, of varying ease-of-use and power:
    1. Specifying individual font properties when adding text.
    2. Changing the default values for all matplotlib fonts
    3. Using seaborn to scale fonts with sns.set_context
    4. Using a matplotlib style with larger text

- Additional Resources: 
    - [4-different methods for changing the font size](https://towardsdatascience.com/4-different-methods-for-changing-the-font-size-in-python-seaborn-fd5600592242)

### 1) Specifying individual font properties when adding text



- Anything that the Text class in Matplotib accepts can be added to `plt.title/ax.set_title`, as well as for all labels,  ticks, and axis labels.
- [Text class - Matplotlib](https://matplotlib.org/stable/api/text_api.html#matplotlib.text.Text)
    - Examples Text properties:
        - fontweight: 
            - either as a numeric value in range 0-1000
            - or one of these:
                - 'ultralight', 'light', 'normal', 'regular', 'book', 'medium', 'roman', 'semibold', 'demibold', 'demi', 'bold', 'heavy', 'extra bold', 'black'}`
        - fontsize: either the number of pt to use or one of  `{'xx-small', 'x-small', 'small', 'medium', 'large', 'x-large', 'xx-large'}`
        - fontfamily
        - color
- This approach is 100% valid, but can get tedious if you want to scale all of the fonts for every piece of text, including tick labels




- Tip: When you want to change the font of text elements (axis labels, tick labels)  without changing the actual labels, we can use the OOP syntax and `get` and `set` methods.
    - Example changing xticklabels withh
        - `ax.set_xticklabels()` + `ax.get_xticklabels()` 
```python
# example rotating xticklabels with OOP
ax.set_xticklabels(ax.get_xticklabels(), rotation=45,ha='right')
```
    - Example changing axis labels:
        - `ax.xaxis.get_label()` return the Text object
        - Adding `get_text()` will retrieve JUST the text values that we want to use.
```python
## Increasing Axis Label Font Sizes
ax.set_xlabel(ax.xaxis.get_label().get_text(),
              fontsize='xx-large')
```



In [None]:
## Paste our original non-functionized viz code
# increase title font size, bold it, and make serif font

## Make a larger fig/ax before plotting
fig, ax = plt.subplots(figsize=(10,5))

## Plot histogram
sns.histplot(data=df,x='SalePrice',ax=ax)
ax.set_title('Distribution of Home Prices in Ames, Iowa', 
             fontfamily='serif',
             fontsize='xx-large',
             fontweight='semibold');


## Annotating mean and median
mean_price = df['SalePrice'].mean()
ax.axvline(mean_price,color='k', ls=':', 
           label=f"Mean Price= ${mean_price:,.2f}",);

med_price = df['SalePrice'].median()
ax.axvline(med_price,color='k', ls=':', 
           label=f"Median Price= ${med_price:,.2f}");


## Increasing Axis Label Font Sizes
ax.set_xlabel(ax.xaxis.get_label().get_text(),
              fontsize='x-large')
ax.set_ylabel(ax.yaxis.get_label().get_text(),
              fontsize='x-large')

# ax.xaxis.label()
ax.legend();

### 2) Updating plt.rcParams

- `plt.rcPrams` is a dictionary of all of the default settings for all of matplotlib.
    - Text properties
    - Colors
    - Line widths
    - etc

- We can change individual params by replacing the value stored in the dictionary. 
    - All font params start with `font.`
```python
plt.rcParams['font.family'] = 'serif
```

- If we have many params to update, we can use the dictionary .update method to change multiple params.
```python
plt.rcParams.update( {'font.family':'serif',
                    'font.size':'xx-large',
             'font.weight':'semibold'}
```

In [None]:
## Saving the current params as default_params,in case I want to un-do
default_params = plt.rcParams.copy()
# default_params.keys()

In [None]:
# plt.rcParams['axes.labelsize'] = 16#'x-large'

In [None]:
## Updating rcParams  with the same font params that we used 
plt.rcParams.update( {'font.family':'serif',
                    #   'figure.figsize':[15,20],
                    #   'font.size':'xx-large', #not all options for Text are options here
             'font.weight':'bold'})

In [None]:
## Use our function to see the change
plot_dist(df);

In [None]:
## resetting rcParams values to defaults that we saved
plt.rcParams.update(default_params)

### 3) Using seaborn to scale fonts with sns.set_context

- Seaborn has a `sns.set_context` function which is designed to change the default visualization sizes to be more appropriate for whatever context the figure will be displayed.
    - Contexts:
        - `'talk'`: powerpoint presentations
        - `'poster'`: large printed posters for conferences
        - `'paper'`: for printing standard letter-sized paper
        - `'notebook'`: for a jupyter/colab notebook
- sns.set_context examples:
    - https://datavizpyr.com/seaborn-set_context-to-adjust-size-of-plot-labels-and-lines/

In [None]:
## test a seaborn context 
sns.set_context('talk')
plot_dist(df);

In [None]:
## comparing all 4 contexts
for context in ['poster','talk','notebook','paper']:

    # set context
    sns.set_context(context)

    # generate plot, change title to be style name
    fig,ax = plot_dist(df)
    ax.set_title(context)
    fig.show()

In [None]:
# choosing final context
sns.set_context('talk')

plot_dist(df)

### 4) Using a matplotlib style to overhaul the default text and theme

- Quick & Easy Visual Overhaul 
- List of Styles Available:
    - `plt.style.available`
    - Examples: https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html

- To use a style permanently:
    - let's say we wanted to use 'ggplot'
    ```python
    plt.style.use("ggplot")
    ```
    
- To use a style temporarily, we will use a `with` statement (works kind of like an "if" statment)
    ```python
    with plt.style.context('ggplot'):
        fig, ax = plt.subplots()
        ax.scatter(...etc...
    ```


#### 🕹 Matplotlib Styles


In [None]:
## Loop to test out all available styles

    ## Use plt.stlye.context to temporarily use style

        ## make our figure with our func
        

        ## set the title to be the style name and show
        

- We can even COMBINE styles just by passing a list of style names instead of just 1!

In [None]:
## Testing combining dark_background and seaborn-muted styles 
with plt.style.context(['dark_background','seaborn-muted']):
    fix, ax = plot_dist(df)

In [None]:
## setting my final style choices
sns.set_context('talk')
plt.style.use(['dark_background','seaborn-muted'])
plt.rcParams['font.family'] = 'serif'

In [None]:
## Create the (near) final version of our figure
final_fig,ax = plot_dist(df)

#### Removing some plot borders

- Let's remove the top and right borders of our visualization.
    - The 4 sides of our axis are called `Spines`
    - `ax.spines` is a special dictionary of left/right/top/bottom spine objects. 

In [None]:
fig, ax = plot_dist(df)

## removing top and right border
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

## repositioning final legend
ax.legend(loc='center right')

### `𝑓` Final `plot_dist` Function

In [None]:
## Creating a BETTER version of our function with fancy price ticks
from matplotlib.ticker import FuncFormatter
def hundred_k(x,pos):
    """function for use wth matplotlib FuncFormatter -  formats money in millions"""
    return f"${x*1e-3:,.0f}K"


def plot_dist(data,x='SalePrice',figsize=(10,5),format_price=True,
              despine=True):
    ## Make a larger fig/ax before plotting
    fig, ax = plt.subplots(figsize=figsize)

    ## Plot histogram
    sns.histplot(data=df,x=x,ax=ax)
    ax.set_title('Distribution of Home Prices in Ames, Iowa',
                 fontsize='x-large',y=1.1);


    ## Annotating mean and median
    mean_price = df[x].mean()
    ax.axvline(mean_price,color='slategray', ls='--', lw=3,
            label=f"Mean {x} = ${mean_price:,.2f}");

    med_price = df[x].median()
    ax.axvline(med_price,color='skyblue', ls=':', lw=3,
            label=f"Median {x} = ${med_price:,.2f}");


    ## Increasing Axis Label Font Sizes
    ax.set_xlabel(ax.xaxis.get_label().get_text(),
                fontsize='large')
    ax.set_ylabel(ax.yaxis.get_label().get_text(),
                fontsize='large')

    

    ## if format_price is True use our FuncFormatter
    if format_price:
        
        price_fmt_100k = FuncFormatter(hundred_k)
        ax.xaxis.set_major_formatter(price_fmt_100k)

    if despine:
        ## removing top and right border
        ax.spines['right'].set_visible(False)
        ax.spines['top'].set_visible(False)

    ## add a legend        
    ax.legend()

    return fig,ax

In [None]:
## Final FIgure - Testing Final Function
best_fig,ax = plot_dist(df)


## 📚 Saving Our Final Visualization To Google Drive 

- Reminder: we have already created a `new_folder` variable that we wanted to store our images in.

- We can `plt.savefig` or `fig.savefig` to save our figure to an image file (usually a .png or a .jpg.

- We will give it the exact filename (including folder) that we want to save it as. 

- There are a couple of settings we may want to change as we save it.
    - dpi: quality of image 
    - bbox_inches: if set to "tight" auto calculate best outer edge
    - facecolor: background of the entire figure (normally transparent!)
    - pad_inches: if your title or axis label text gets cutoff

In [None]:
## checking if any images already exist (to avoid overwriting a file)
print(new_folder)
sorted(os.listdir(new_folder))

In [None]:
## Save the final figure
best_fig.savefig(new_folder+'home-prices-in-ames.png',
                  dpi=300,facecolor='black',bbox_inches='tight',pad_inches=0.2)

In [None]:
## testing loading image with matplotlib - BETTER TO CHECK IN GOOGEL DRIVE!!
loaded_img = plt.imread(new_folder+'home-prices-in-ames.png',)
plt.imshow(loaded_img)
plt.axis('off')

# 🕹 🦸 Task #2: The Tallest Super Heroes (By Super Power) <a name="task2"></a>🦸‍♀️
- [🗂Click to jump to ToC ](#contents)

<img src="https://storage.googleapis.com/kaggle-datasets-images/26532/33799/5651215d143dbcf8afe85f3f57c1b284/dataset-cover.jpg?t=2018-05-14-23-16-16">

- We will be working with 2 csv's with super heroes data:
    - `'heroes_information.csv'`: their stats/general info
    - `'super_hero_powers.csv'`: collections of powers.

- We are going to find the 10 most common powers in all comics/movies/shows.

- For each of these 10 most common powers, we are going to display a bar chart of the 10 tallest heroes with those powers. 


- We are going to color-code our bars based on the Publisher. 
    - We will have to manually construct our legend to do so.

- We are going to save the images to file with appropraite file names. 

In [None]:
## Load in the heroes and powers files
heroes =pd.read_csv(DATA_FOLDER+'heroes_information.csv',index_col=0)

powers = pd.read_csv(DATA_FOLDER+'super_hero_powers.csv')
display(heroes.head(),powers.head())

In [None]:
## set names as index for powers df
powers = powers.set_index('hero_names')
powers

In [None]:
## Summing powers True/False will tell us how many heroes have the power 


In [None]:
## Save top 10 most common powers
powers_list = None
powers_list

In [None]:
##  Save first power in powers_list as power_name


## Select/filter ONLY hereos with that power


In [None]:
## Saving selected heroes names (the index) as heroes_with_power
heroes_with_power = None
heroes_with_power

In [None]:
# Use the list of heroes combine with pd.Series.isin to
# make a filter for hereos with the power
filter_heroes = None
filter_heroes

In [None]:
## save selected heroes in a new df called selected_heroes
selected_heroes = None
selected_heroes

In [None]:
## Find the 10 largest Heights and save
tallest10 = None
tallest10

### 📚 Controlling Selective Coloring of Figure Elements

- We want to be able to color-code heroes according to which publisher produces their comic/media. 

>- We want the same colors to be used for the same publishers across any figures we make.

- We can use this dictionary with `.map` to get a series with color names to use (whihc we determine using Publisher.
    - `df['Publisher'].map(colors_dict)`
- Example:
```python
pub_colors = {'Marvel Comics':'blue', 
              'Dark Horse Comics':'darkgray', 
              'DC Comics':'brown'}
ax = sns.barplot(data=tallest10,x='name',y='Height',
palette = tallest10['Publisher'].map(pub_colors))
```

In [None]:
## Dict of Colors to use for each publisher
pub_colors = {'Marvel Comics':'blue', 
              'Dark Horse Comics':'darkgray', 
              'DC Comics':'brown',
              'George Lucas':'green',
              'J. R. R. Tolkien':'red',
              'Image Comics':'purple'}


## make a barplot and use the pub_colors dictionary to create the palette 




## Use the power name to create a descrptive title


## Rotate the xlabels so they are readable



## Increasing Axis Label Font Sizes



> Using this approach, we cannot automatically add a color legend, since we determined the colors ourselves. 

- Below is a good example of finding helpful examples on the web and adapting to suit our needs, and citing the sourrce.

In [None]:
# Adapted from https://moonbooks.org/Articles/How-to-manually-add-a-legend-with-a-color-box-on-a-matplotlib-figure-/
import matplotlib.patches as mpatches

## Concstruct the list of handles for the legend
handles = []
for publisher, color in pub_colors.items():
    handles.append(mpatches.Patch(color=color, label=publisher))
    

## Use ax.legend with handles=handles
ax.legend(handles=handles,bbox_to_anchor=[1,1])


## display fig to see the result.
fig

### def `plot_heroes_color_publisher` Function

In [None]:
## functionize our plot as plot_heroes_color_publisher
def plot_heroes_color_publisher(tallest10,figsize=(12,6), 
                                pub_colors = {'Marvel Comics':'blue', 
                                              'Dark Horse Comics':'darkgray', 
                                              'DC Comics':'brown',
                                              'George Lucas':'green',
                                              'J. R. R. Tolkien':'red',
                                              'Image Comics':'purple'},
                                despine=True, 
                                spines_to_remove=["top","right"]):
    
    fig, ax = plt.subplots(figsize=figsize)

    ## make a barplot and use the pub_colors dictionary to create the palette 
    sns.barplot(data=tallest10, x='name', y='Height',
                edgecolor='white',
                     palette=tallest10['Publisher'].map(pub_colors),ax=ax)
    
    ## Set text
    ax.set_title(f'10 Tallest Heroes with the Power of "{power_name}"',
                 fontsize='x-large', y=1.05)
    ax.set_xticklabels(ax.get_xticklabels(), 
                       rotation=45, ha='right')
    
    ## Increasing Axis Label Font Sizes
    ax.set_xlabel("Hero",
                fontsize='large')
    ax.set_ylabel(ax.yaxis.get_label().get_text(),
                fontsize='large')

    # remove spines
    if despine:
        [ax.spines[side].set_visible(False) for side in spines_to_remove]
        
    ### Below Code Adapted from https://moonbooks.org/Articles/How-to-manually-add-a-legend-with-a-color-box-on-a-matplotlib-figure-/
    ## Manually constructing legend
    import matplotlib.patches as mpatches

    handles = []
    for publisher, color in pub_colors.items():
        handles.append(mpatches.Patch(color=color, 
                                      label=publisher))
    
    ax.legend(handles=handles, 
              bbox_to_anchor=[1,1])
    

    return fig, ax



fig,ax = plot_heroes_color_publisher(tallest10)


### ✅ Future Function To-Dos

- [ ] sort the legend alphabetically
- [ ] only include colors in the legend that appear in the filtered dataset

### ∞ Final Loop to Make Top Power Figures & Save

In [None]:
# Creating a dictionary for saving my figures
FIGS = None

## for each of the top 10 most common powers... 


    ## grab the heroes that have that power and 


    ## slice just filtered heroes


    ## find the Tallest/Shortest heroes and plot as...bars?


    ## since we only accounted for a few publishers, we may encounter some errors
    # using try and except to attempt to plot the data
   

        ## Saving to FIGS dict

    ## if it errors, print what power had the error and display the unique publishers


> Confirm we saved our figures to our new_folder on Google Drive

In [None]:
print(new_folder)
os.listdir(new_folder)

In [None]:
# raise Exception("Stop here for primary in-class activity!")

# APPENDIX <a name='Appendix'></a>
- [🗂Click to jump to ToC](#contents)

### Seaborn Themes/Contexts/Colorpalette

- https://seaborn.pydata.org/api.html#themeing
- https://seaborn.pydata.org/tutorial/color_palettes.html

In [None]:
sns.color_palette("rocket")

In [None]:
palette = sns.color_palette('rocket')
sns.barplot(data=df,x='BldgType',y='SalePrice',palette=palette)

## Annotations

- https://jessica-miles.medium.com/adding-annotations-to-visualizations-using-matplotlib-279e9c770baa
- https://www.geeksforgeeks.org/how-to-annotate-bars-in-barplot-with-matplotlib-in-python/

In [None]:
df2 = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vRQpH02vlgxAwATTUhJBC0MGiciSz-vUPenWNbrkVH4ijb12NXK-4ut0jLqbANnBgRUo36ZAXDfeQKa/pub?output=csv')
df2

In [None]:

df2.nunique()

In [None]:
ax = sns.barplot(data=df2, x='Status',y='Life expectancy')
## Adapted from: https://www.geeksforgeeks.org/how-to-annotate-bars-in-barplot-with-matplotlib-in-python/
for bar in ax.patches:
   
  # Using Matplotlib's annotate function and
  # passing the coordinates where the annotation shall be done
  ax.annotate(format(bar.get_height(), '.2f'),
                   (bar.get_x() + bar.get_width() / 2, #x-coordinate: 
                    bar.get_height()), #y-coordinate:
                 ha='center', va='center', # text horizontal and vertical alignment
                   size=15, fontfamily='serif',
              xytext=(0, 8), ## free space to be left to make graph pleasing: (0, 8)
                   textcoords='offset points')
  
## Making the y-limit 5-units higher to allow for annotation  
ax.set_ylim(top=ax.get_ylim()[-1]+5)

## Multiple Subplots with Different Figure Sizes


- Plot two subplots: large histogram and small boxplot
    - Give the distribution much more width than the bar plot.
        - Using gridspec with matplotlib: 
            - https://matplotlib.org/stable/tutorials/intermediate/gridspec.html
            - https://stackoverflow.com/questions/34268742/how-to-use-gridspec-with-subplots/34269388

#### Example of Default Subplots

In [None]:
## Visualize Distributions and Means
fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(10,6))

sns.histplot(data=df2,x='Life expectancy',kde=True,ax=axes[0])
sns.boxplot(data=df2, x='Life expectancy' ,ax=axes[1])

[a.grid(axis='x',ls=':',color='gray') for a in axes];
fig.tight_layout()

#### Using `gridspec_kws` to set different proportions

In [None]:
## Visualize Distributions and Means
fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(10,6),
                         gridspec_kw={'height_ratios':[0.75,0.25]},)

sns.histplot(data=df2,x='Life expectancy',kde=True,ax=axes[0])
sns.boxplot(data=df2, x='Life expectancy' ,ax=axes[1])

[a.grid(axis='x',ls=':',color='gray') for a in axes];
# [a.set(frame_on=False) for a in axes];

# fig.frameon(False)

In [None]:
## Visualize Distributions and Means
fig, axes = plt.subplots(ncols=2, figsize=(10,5),
                         gridspec_kw={'width_ratios':[0.8,0.2]})

sns.histplot(data=df2,x='Life expectancy',hue='Status' ,ax=axes[0])
sns.barplot(data=df2, y='Life expectancy',x='Status' ,ax=axes[1],ci=68)

axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=45,fontsize='small',
                        ha='right')
# axes[1].legend(handles=axes[0].get_legend().get_handles(),bbox_to_anchor=[1,1])
fig.tight_layout()


## Transforming DataFrames to Leverage Seaborn



### Example: Cats Dogs Boxplot

In [None]:
df = pd.read_excel('https://docs.google.com/spreadsheets/d/e/2PACX-1vTXqVI5_p-kjmdG6Ww9mxHJfB_rM3VlLIbIk6HGCWgy1b0Fy3i9AscZm2JHU9re5Q/pub?output=xlsx')
df

In [None]:
##  The problem
sns.boxplot(data=df, y='Mean Number of Cats',color='orange')
sns.boxplot(data=df, y='Mean Number of Dogs per household',color='blue')

In [None]:
## Preivously shared solution - modified for black background
## new dict for setting white lines
white_lines = dict(color='white')

plt.figure(figsize = (12, 5))
fig, axes = plt.subplots(nrows = 1, ncols = 1, figsize = (8,5))
boxplots = axes.boxplot([df['Mean Number of Cats'],df['Mean Number of Dogs per household']],
           notch = True,
           labels=['Cats', 'Dogs'],
           widths = .7,
           patch_artist=True, 
           medianprops = dict(linestyle='-', linewidth=2, color='Yellow'),
           boxprops = dict(linestyle='--', linewidth=2, color='white', facecolor = 'blue', alpha = .4),
           whiskerprops=white_lines,
           flierprops=white_lines,
           capprops=white_lines
          );
# The more you understand any library, the more you can do
boxplot1 = boxplots['boxes'][0]
boxplot1.set_facecolor('orange')
plt.ylabel('Mean Number of Animals per State', fontsize = 20);
plt.xticks(fontsize = 16);
plt.yticks(fontsize = 16);

- Using pd.melt to turn multiple columns into stacked rows.
    - https://pandas.pydata.org/docs/reference/api/pandas.melt.html

In [None]:
## melt dataframe, keep only Location as id, and cats/dogs as values
melted = pd.melt(df, id_vars='Location',value_vars= ['Mean Number of Cats','Mean Number of Dogs per household'])
melted.sort_values("Location")

In [None]:
## give the new columns better name
plot_df = pd.melt(df, id_vars=['Location'],
                  value_vars= ['Mean Number of Cats','Mean Number of Dogs per household'],
                  var_name='Pet Type',
                  value_name='Mean Number per household')
plot_df

In [None]:
ax = sns.boxplot(data=plot_df, x='Pet Type', y='Mean Number per household')#,notch=True)

In [None]:
## Replacing long names with just Cats and Dogs
rename_map = {"Mean Number of Cats":"Cats","Mean Number of Dogs per household":"Dogs"}
plot_df['Pet'] = plot_df['Pet Type'].replace(rename_map)
plot_df

In [None]:
ax = sns.boxplot(data=plot_df, x='Pet', y='Mean Number per household' )#,notch=True)

In [None]:
sns.boxplot(data=plot_df, x='Pet', y='Mean Number per household',notch=True,
            medianprops= dict(linestyle='--', linewidth=2, color='Yellow'),
            boxprops = dict(linestyle='-', linewidth=2,  alpha = .4))