## Lesson 03 - The Supply Report

#### Overview: 

In this lesson we're going to talk about the following:
* Transposing Data
* Assembling multiple tables into one
* Python Functions

#### Handy References:
* [Official Python Documentation](https://docs.python.org/3/)
* [Jupyter Notebook Documentation](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html)
* [Pandas](https://pandas.pydata.org/)
* [XlsxWriter](https://xlsxwriter.readthedocs.io/)

### Transposing Data

The file we're working with is /data/supply.xlsx.  Go ahead and take a quick look at it.  This one has more formatting than the previous files and uses some formulas.

You have three separate sheets, labeled 'March', 'April', and 'May'.
* We want to read all three sheets, transpose them, and combine them into a single DataFrame.
* We will then summarize the data and write it out to a new file.
* We'll also discuss functions as a way to shorten our workflow and reduce retyping the same code.

As always, let's import our tools and define the file we'll be working with.

In [1]:
# File Imports
import pandas as pd
import xlsxwriter
import os

In [2]:
# Define the path to the file
supply_file = os.path.join('..', 'data', 'supply.xlsx')

Now we can read the file in and take a look at the various tables:

In [3]:
frames = pd.read_excel(supply_file, sheet_name=None)

In [4]:
frames['March'].head()

Unnamed: 0,Supplier,Hammermill,Weyerhouser,Georgia-Pacific,Boise,HP Card Stock
0,On-Hand,25,50,35,15,25
1,Incoming,300,300,250,250,300
2,Outgoing,250,275,200,200,300
3,Balance,75,75,85,65,25


### Assembling the data into a single DataFrame
* All three sheets are identical to 'March'.
* We want to assemble them all into a single DataFrame that also contains the sheet name, or month.
* To start, we need to define the process that needs to happen to a single DataFrame:
* * Transpose the data with [.transpose()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html)
* * Reset the index with [.reset_index()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html)
* * Set the first row as the columns with df.columns
* * Drop the first row
* * Add the sheet name as the 'Month' Column

That's a lot to do if we have to do it three individual times.  We'll build a reusable [function](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) that takes a sheet name as an argument, performs the actions listed above, and returns our formatted dataframe.  You can read more about them in the linked reference, but functions are created like this:

```
def add_two_numbers(a, b):
    result = a + b
    return result
```

We use `def` to tell Python we're creating a function.  In parentheses we put placeholders for the arguments we want to give that function.  Then we add our commands to the function.  The last thing we do is use `return` to define what the function gives us back.  Let's build one now that does what we need for our data. 

In [5]:
frames['March'].head()

Unnamed: 0,Supplier,Hammermill,Weyerhouser,Georgia-Pacific,Boise,HP Card Stock
0,On-Hand,25,50,35,15,25
1,Incoming,300,300,250,250,300
2,Outgoing,250,275,200,200,300
3,Balance,75,75,85,65,25


In [6]:
def format_dataframe(frames, sheet_name):
    '''formats data from supply.xlsx''' # This is a docstring that tells us what the function does
    # Define the dataframe
    df = pd.DataFrame(frames[sheet_name])
    
    # Reset the index
    df = df.T.reset_index()
    
    # Set the first row as the columns
    df.columns = df.loc[0]
    
    # Drop the first row
    df.drop([0], inplace=True)
    
    # Add the sheet name as a column
    df['Month'] = sheet_name
    
    # Return the dataframe
    return df

Ok, let's give it a try with 'March':

In [7]:
march_formatted = format_dataframe(frames, 'March')

In [8]:
march_formatted.head()

Unnamed: 0,Supplier,On-Hand,Incoming,Outgoing,Balance,Month
1,Hammermill,25,300,250,75,March
2,Weyerhouser,50,300,275,75,March
3,Georgia-Pacific,35,250,200,85,March
4,Boise,15,250,200,65,March
5,HP Card Stock,25,300,300,25,March


#### Using flow control
Our function works and that's great!  What we need is a way to use that function on every sheet in `frames`, add the formatted DataFrame to a list, and finally assemble all of those frames into a single table.  

* We'll start by creating an empty list: `formatted_dfs`
* We will then iterate through `frames.keys()` using a [for loop](https://docs.python.org/3/tutorial/controlflow.html#for-statements) and [append](https://docs.python.org/2/tutorial/datastructures.html) the data to our empty list.

In [9]:
# Create the empty list:
formatted_dfs = []

# Iterate through our frames
for f in frames.keys():
    df = format_dataframe(frames, f)
    # Add the formatted frame to our list
    formatted_dfs.append(df)

In [10]:
# Checkout our data:
formatted_dfs[0].head()

Unnamed: 0,Supplier,On-Hand,Incoming,Outgoing,Balance,Month
1,Hammermill,25,300,250,75,March
2,Weyerhouser,50,300,275,75,March
3,Georgia-Pacific,35,250,200,85,March
4,Boise,15,250,200,65,March
5,HP Card Stock,25,300,300,25,March


In [11]:
formatted_dfs[1].head()

Unnamed: 0,Supplier,On-Hand,Incoming,Outgoing,Balance,Month
1,Hammermill,75,250,250,75,April
2,Weyerhouser,75,250,275,50,April
3,Georgia-Pacific,85,200,200,85,April
4,Boise,65,200,200,65,April
5,HP Card Stock,25,300,225,100,April


In [12]:
formatted_dfs[2].head()

Unnamed: 0,Supplier,On-Hand,Incoming,Outgoing,Balance,Month
1,Hammermill,75,200,250,25,May
2,Weyerhouser,50,250,275,25,May
3,Georgia-Pacific,85,175,200,60,May
4,Boise,65,200,200,65,May
5,HP Card Stock,100,250,300,50,May


The next thing we need to do is [concatenate](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html), or combine all of those frames into a single DataFrame.

In [13]:
summary_df = pd.concat(formatted_dfs, ignore_index=True)

In [14]:
summary_df

Unnamed: 0,Supplier,On-Hand,Incoming,Outgoing,Balance,Month
0,Hammermill,25,300,250,75,March
1,Weyerhouser,50,300,275,75,March
2,Georgia-Pacific,35,250,200,85,March
3,Boise,15,250,200,65,March
4,HP Card Stock,25,300,300,25,March
5,Hammermill,75,250,250,75,April
6,Weyerhouser,75,250,275,50,April
7,Georgia-Pacific,85,200,200,85,April
8,Boise,65,200,200,65,April
9,HP Card Stock,25,300,225,100,April


### Using a List Comprehension
Before we move on, let's examine the method we used to get that DataFrame.  We created an empty list, then added items to it one at a time, then performed an action on that list.  Doing all of that is computationally expensive.  It is relatively slow and creates multiple copies of our data.  What we could do is use a [list comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) instead.  It may look a little complex at first, but it is much more efficient.  When you move beyond this class and are working with real data, you'll notice the difference.

* List comprehensions create lists in place, rather than iterating through something and appending items to a list.
* You can read more about them using the link above, but for now remember the syntax is:
* * `new_list = [item for item in original object]`
* * As an example, let's say we want the square of every number in a given list:
```
my_numbers = [1, 2, 3, 4, 5]
squared_nums = [i**2 for i in my_numbers]
```
* Let's change our previous code into a list comprehension: 

In [15]:
# Previous Code:
'''
# Create the empty list:
formatted_dfs = []

# Iterate through our frames
for f in frames.keys():
    df = format_dataframe(frames, f)
    # Add the formatted frame to our list
    formatted_dfs.append(df)
'''

# List Comprehension
formatted_dfs = [format_dataframe(frames, f) for f in frames.keys()]

We're going to take it one step further.  Instead of creating the list `formatted_dfs`, we're going to create our assembled DataFrame directly from the list comprehension: 

In [16]:
summary_df = pd.concat([format_dataframe(frames, f) for f in frames.keys()], ignore_index=False)

In [17]:
summary_df.head()

Unnamed: 0,Supplier,On-Hand,Incoming,Outgoing,Balance,Month
1,Hammermill,25,300,250,75,March
2,Weyerhouser,50,300,275,75,March
3,Georgia-Pacific,35,250,200,85,March
4,Boise,15,250,200,65,March
5,HP Card Stock,25,300,300,25,March


That's much shorter, faster, and easier to write.  As you learn more about Python, you'll find there are some cases where it's better to use a list comprehension and others where a `for loop` is better.  

### Exercise: Writing the data to Excel
* We're going to be doing more with our supply information in a later lesson.
* For now we want to write `summary_df` to Excel with a few minor changes:
* * The column order should be: Month, Supplier, Incoming, Outgoing, On-Hand
* * We want the Incoming column to be Excel's 'Good' cell format, only if the value is greater than zero
* * We want the Outgoing column to be Excel's 'Neutral' cell format, only if the value is greater than zero
* * We want the Balance column to be Excel's 'Bad' cell format if the balance is 25 or less

To save time, the Hex color codes are as follows:
```
Good:
- Background: #C6EFCE
- Font: #006100

Neutral
- Background: #FFEB9C
- Font: #9C5700

Bad
- Background: #FFC7CE
- Font: #9C0006
```

For the exercise, complete the code below to write the data to Excel:
* The output file name is `supply_summary.xlsx`
* The sheet name is `Supply Summary`

#### Reorder the columns:

In [19]:
summary_df = summary_df[['Month', 'Supplier', 'Incoming', 'Outgoing', 'On-Hand']]

In [20]:
summary_df.head()

Unnamed: 0,Month,Supplier,Incoming,Outgoing,On-Hand
1,March,Hammermill,300,250,25
2,March,Weyerhouser,300,275,50
3,March,Georgia-Pacific,250,200,35
4,March,Boise,250,200,15
5,March,HP Card Stock,300,300,25


#### Define the workbook:

In [25]:
output_file = os.path.join('..', 'data', 'supply_summary.xlsx')
writer = pd.ExcelWriter(output_file, engine='xlsxwriter')
workbook = writer.book

#### Define the formats:
* The header format is defined for you.
* The number format is `#,##0`

In [26]:
# Define the format for our header:
header_format = workbook.add_format({
    'bold': True, #Bold Font: This value must be either True or False
    'align': 'center', #Center Alignment
    'valign': 'top', #Top Alignment
    'fg_color': '#4472C1', #Cell Color
    'font_color': 'white', #Font Color
    'font_size': 12, #Font Size
})

In [27]:
# Define the format for our numbers:
number_format = workbook.add_format({'num_format': '#,##0'})

#### Color Formats:
* Remember, we need three color formats:
* * Good 
* * Neutral 
* * Bad

In [28]:
good_color = workbook.add_format({'bg_color': '#C6EFCE',
                            'font_color': '#006100'})

neutral_color = workbook.add_format({'bg_color': '#FFEB9C',
                            'font_color': '#9C5700'})

bad_color = workbook.add_format({'bg_color': '#FFC7CE',
                            'font_color': '#9C0006'})

#### Write the Data to Excel:

In [30]:
# Define the sheet and write the data to Excel
sheet = 'Supply Summary'
summary_df.to_excel(writer, sheet_name=sheet, index=False)

In [31]:
# Define the worksheet
worksheet = writer.sheets[sheet]

In [32]:
# Write the headers to the worksheet
for col_num, value in enumerate(summary_df.columns.values):
    worksheet.write(0, col_num, value, header_format)

In [33]:
# Set the numerical columns
worksheet.set_column('A:B', 14, None)
worksheet.set_column('C:E', 14, number_format)

0

#### Assign the color formats

In [34]:
# Define rows and columns
first_row = 1
last_row = len(summary_df.index)
incoming_column = 2
outgoing_column = 3
on_hand_column = 4


# Set the color conditions
incoming_col_condition = {
    'type': 'cell', # Because we want to apply the formatting to each individual cell
    'criteria': '>', # for greater than
    'value': 0,
    'format': good_color,
}

outgoing_col_condition = {
    'type': 'cell', 
    'criteria': '>', 
    'value': 0,
    'format': neutral_color,
}

on_hand_col_condition = {
    'type': 'cell', 
    'criteria': '<=', # for less than or equal to
    'value': 25,
    'format': bad_color,
}

In [36]:
# Apply the color formats

# Incoming Column
worksheet.conditional_format(first_row,
                            incoming_column,
                            last_row,
                            incoming_column,
                            incoming_col_condition)

# Outgoing Column
worksheet.conditional_format(first_row,
                            outgoing_column,
                            last_row,
                            outgoing_column,
                            outgoing_col_condition)
# On Hand Column
worksheet.conditional_format(first_row,
                            on_hand_column,
                            last_row,
                            on_hand_column,
                            on_hand_col_condition)

#### Save the File:

In [37]:
writer.save()