# Using pandas Dataframes With Excel

## lesson_4_2_1

### pandas and Excel

- pandas can read and write to Excel
- pandas uses openpyxl, install `conda install -n python_data_course openpyxl`
- [pandas.to_excel() documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html)
- [pandas.read_excel() documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)

#### Read tips.xlsx as a dataframe

In [14]:
import pandas as pd

tips_df = pd.read_excel('tips.xlsx', index_col=0)  

tips_df.head()

Unnamed: 0_level_0,meal_type,wait_staff,party_size,meal_total,tip
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Saturday,Dinner,Marcia,2,100.64,16.23
Friday,Dinner,Marcia,2,109.84,5.99
Friday,Lunch,Jan,4,90.5,22.04
Monday,Dinner,Marcia,1,60.01,8.77
Monday,Breakfast,Jan,1,10.88,1.68


#### Create a seperate df for each meal type

In [15]:
breakfast_df = tips_df[tips_df.meal_type=='Breakfast']
lunch_df = tips_df[tips_df.meal_type=='Lunch']
dinner_df = tips_df[tips_df.meal_type=='Dinner']

#### Use pd.to_excel() to create an Excel workbook with the breakfast data

In [16]:
breakfast_df.to_excel("breakfast_tips.xlsx")  

#### Test the file is created and has data

In [17]:
breakfast_tips_df = pd.read_excel('breakfast_tips.xlsx', index_col=0)  

breakfast_tips_df.head()

Unnamed: 0_level_0,meal_type,wait_staff,party_size,meal_total,tip
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Monday,Breakfast,Jan,1,10.88,1.68
Friday,Breakfast,Bobby,4,49.33,8.07
Tuesday,Breakfast,Marcia,1,9.71,1.99
Friday,Breakfast,Greg,3,46.44,5.54
Thursday,Breakfast,Jan,4,46.96,8.07


#### Write Excel file meal_type_tips.xlsx with a worksheet for each meal type and one for the original data

**NOTE:** _It is advised to keep a copy of the original data.  So, I suggest you always save to a new file._

To write to separate worksheets we will use ExcelWriter with `.to_excel()`.

In [25]:
with pd.ExcelWriter('meal_type_tips.xlsx') as writer:  
    breakfast_df.to_excel(writer, sheet_name='breakfast')
    lunch_df.to_excel(writer, sheet_name='lunch')
    dinner_df.to_excel(writer, sheet_name='dinner')
    tips_df.to_excel(writer, sheet_name='tips_orig')


To read all sheets in as an ordered_dict:

In [26]:
meal_type_tips_df = pd.read_excel('meal_type_tips.xlsx', sheet_name=None)  

meal_type_tips_df.keys()

dict_keys(['breakfast', 'lunch', 'dinner', 'tips_orig'])

In [28]:
meal_type_tips_df['breakfast']

Unnamed: 0,weekday,meal_type,wait_staff,party_size,meal_total,tip
0,Monday,Breakfast,Jan,1,10.88,1.68
1,Friday,Breakfast,Bobby,4,49.33,8.07
2,Tuesday,Breakfast,Marcia,1,9.71,1.99
3,Friday,Breakfast,Greg,3,46.44,5.54
4,Thursday,Breakfast,Jan,4,46.96,8.07
...,...,...,...,...,...,...
112,Saturday,Breakfast,Marcia,2,26.63,6.74
113,Saturday,Breakfast,Marcia,4,43.63,8.89
114,Friday,Breakfast,Marcia,2,24.59,3.26
115,Sunday,Breakfast,Jan,4,53.79,9.72


To read each sheet into a dataframe separately use argument `sheet_name`.

In [31]:
breakfast_tips_df = pd.read_excel('meal_type_tips.xlsx', sheet_name='breakfast')  

breakfast_tips_df.head()

Unnamed: 0,weekday,meal_type,wait_staff,party_size,meal_total,tip
0,Monday,Breakfast,Jan,1,10.88,1.68
1,Friday,Breakfast,Bobby,4,49.33,8.07
2,Tuesday,Breakfast,Marcia,1,9.71,1.99
3,Friday,Breakfast,Greg,3,46.44,5.54
4,Thursday,Breakfast,Jan,4,46.96,8.07
