# 4.1 pandas *groupby()* for plot.ly express Bar Charts
This notebook will show how to use the pandas *groupby()* function to get data into a *Shape* that can easily plotted with a Bar Chart using plotly express.  


1. [Data Cleanup](#1.-Data-Cleanup-if-required!)  
  1. Keep only Columns We Want  
  2. Change Data Types if Needed  
2. [Group on a Selected Column](#2.-Group-on-a-Selected-Column) 

3. [Plot!](#3.-Plot!)  
  1. Question: How Does Mean MPG City compare for the different Vehicle Types (SUV, Sedan, etc..)?  
  2. Question: What are the Mean Invoice Values for each of the Vehicle Type Categories (SUV, Sedan, etc..)?
  
**My References**  
- [**Keeping Columns**](../0_References/1_Pandas_Reference/Column_Operations.ipynb#Keeping-Columns)  
- [**Changing Data Type**](../0_References/1_Pandas_Reference/Field_Operations.ipynb#Changing-Data-Type)  
- [**Saving a pandas dataframe as a CSV file**](../0_References/1_Pandas_Reference/SavingFiles.ipynb#Saving-a-pandas-dataframe-as-a-CSV-file)



In [None]:
from IPython.display import display, HTML
import pandas as pd
import math

#import chart_studio.plotly as py
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
from scipy import special

#### Read Data file

In [None]:
#Read the csv file into a pandas dataframe
 = pd.read_csv('')

print("Number of Rows: ", df.shape[0])
print("Number of Columns: ", df.shape[1])
df_cars.head(5)

# 1. Data Cleanup (if required!)

### Keep only Selected Columns  

- We should get rid of any columns we don't want. 
- You can use the pandas [drop() function](../0_References/1_Pandas_Reference/Column_Operations.ipynb#Dropping-Columns), but in this case it's probably easier just to keep the columns we want
 

**My Reference:** [Keeping Columns](../0_References/1_Pandas_Reference/Column_Operations.ipynb#Keeping-Columns)


In [None]:
# Display all column names
df_cars

In [None]:
# Create a List of the names of the Columns we want to keep in our new dataframe
columns_to_keep = []

# Create a new dataframe with only the selected columns
df_cars = df_cars[columns_to_keep]

In [None]:
# Display new dataframe info
print("Number of Rows: ", .shape[0])
print("Number of Columns: ", .shape[1])
df_cars.head()

### Change Data Types as Needed  
- If we want to do numeric calculations on a column it is important that pandas recognizes it as numeric. 
- We also want to make sure a column is a float (rather than integer) if needed.
- Otherwise either errors or weird results are going to happen!  

**My Reference:** [Changing Data Type](../0_References/1_Pandas_Reference/Field_Operations.ipynb#Changing-Data-Type)


In [None]:
# data types 'Before' 
df_cars

In [None]:
# Convert MSRP, Invoice, MPG_City, MPG_Highway to floats
df_cars['MSRP'] = pd.to_numeric().astype(float)
df_cars['Invoice'] = pd.to_numeric().astype(float)

df_cars['MPG_City'] = pd.to_numeric().astype(float)
df_cars['MPG_Highway'] = pd.to_numeric().astype(float)

In [None]:
# data types 'After' 
df_cars

# 2. Group on a Selected Column  
1. We are going to group on the **Type** Column!  
  1. This means we will compute an Aggregate Function after we group the rows by the Vehicle Type (SUV, Sedan, etc..
2. The Aggregate Function we will use here is the **mean()**  
  1. We could instead use sum(), count() or others

In [None]:
# Display the top rows of the starting dataframe
df_cars.head(2)

In [None]:
# Display the unique values in the column we want to Group on
df_cars['Type']

In [None]:
# Group by:  Type
# Calculation:  mean
df_cars_by_type_mean = 
df_cars_by_type_mean

# 3. Plot!  
- Below are a number of questions that we can use plots to help answer  
- All of them are based on a dataframe that:  
  - Is grouped by Type  
  - Contain Mean values


### Question: How Does MPG City compare for the different Vehicle Types? (SUV, Sedan, etc..)

In [None]:
df_cars_by_type_mean

#### Let's get *Type* out of the index and just be a regular column  


In [None]:
df_cars_by_type_mean.
df_cars_by_type_mean.head()

In [None]:

fig = px.bar(df_cars_by_type_mean, 
             x='',            
             y='',
             template='presentation',
            title='Mean MPG City by Vehicle Type')

fig.show()

### Question: How Does MPG Highway compare for the different Vehicle Types? (SUV, Sedan, etc..)

In [None]:
df_cars_by_type_mean.head(2)

In [None]:
fig = px.bar(df_cars_by_type_mean, 
             x='',            
             y='',
             template='presentation',
            title='Mean MPG Highway by Vehicle Type')

fig.show()

# Saved the Grouped Data as a csv file  
- Let's save this Grouped data as a CSV file so that other Notebooks can use it without having to repeat the steps above.  



**My Reference:** [Saving a pandas dataframe as a CSV file](../0_References/1_Pandas_Reference/SavingFiles.ipynb#Saving-a-pandas-dataframe-as-a-CSV-file)


In [None]:
df_cars_by_type_mean.head(2)

In [None]:
# Save the df dataframe as a CSV file in the Data folder
df_cars_by_type_mean.to_csv("", index=False)

In [None]:
# Read the saved csv file 
df_test = pd.read_csv("")

In [None]:
df_test.head()