# Hello, Jupyter

Jupyter is a popular environment for working with Python.

At a high level, it consists of **cells** which can contain text (like this one)... 

In [3]:
# ... or code, like this one.
# Go ahead and print('Hello from Jupyter') below.
# By the way, these hashtags represent human-read comments,
# not machine-read code. 

print('Hello from Jupyter')

Hello from Jupyter


For a more in-depth look at working with Jupyter notebooks, check out the course resources in the conclusion.

# Reading spreadsheet data into Python

You will usually start working with data in Python by importing it from an external source. 

The `read_excel()` function from `pandas` will be helpful for reading worksheet data into Python. 

For more about working with `pandas`, check out the recommended resources in the conclusion. 

## Demo: `superstore.xlsx`

This workbook contains three worksheets. Let's read each of them into Python and perform some data analysis.

In [3]:
# Import pandas
import pandas as pd

# Read in our worksheet with read_excel()
orders = pd.read_excel("superstore.xlsx")

# Sneak peek of the data with head()
orders.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,7981,CA-2011-103800,2013-01-03,2013-01-07,Standard Class,DP-13000,Darren Powers,Consumer,United States,Houston,...,77095,Central,OFF-PA-10000174,Office Supplies,Paper,"Message Book, Wirebound, Four 5 1/2"" X 4"" Form...",16.448,2,0.2,5.5512
1,740,CA-2011-112326,2013-01-04,2013-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-LA-10003223,Office Supplies,Labels,Avery 508,11.784,3,0.2,4.2717
2,741,CA-2011-112326,2013-01-04,2013-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-ST-10002743,Office Supplies,Storage,SAFCO Boltless Steel Shelving,272.736,3,0.2,-64.7748
3,742,CA-2011-112326,2013-01-04,2013-01-08,Standard Class,PO-19195,Phillina Ober,Home Office,United States,Naperville,...,60540,Central,OFF-BI-10004094,Office Supplies,Binders,GBC Standard Plastic Binding Systems Combs,3.54,2,0.8,-5.487
4,1760,CA-2011-141817,2013-01-05,2013-01-12,Standard Class,MB-18085,Mick Brown,Consumer,United States,Philadelphia,...,19143,East,OFF-AR-10003478,Office Supplies,Art,Avery Hi-Liter EverBold Pen Style Fluorescent ...,19.536,3,0.2,4.884


By default, `pandas.read_excel()` reads in the active worksheet in the workbook. If we want to read in others, we will specify a second argument, `sheet_name`.    

In [5]:
# Read in all three worksheets this time
orders = pd.read_excel("superstore.xlsx", sheet_name='orders')
returns = pd.read_excel("superstore.xlsx", sheet_name='returns')
people = pd.read_excel("superstore.xlsx", sheet_name='people')

# Preview all three `pandas` DataFrames
print(orders.head())
print(returns.head())
print(people.head())

Row ID        Order ID Order Date  Ship Date       Ship Mode Customer ID  \
0    7981  CA-2011-103800 2013-01-03 2013-01-07  Standard Class    DP-13000   
1     740  CA-2011-112326 2013-01-04 2013-01-08  Standard Class    PO-19195   
2     741  CA-2011-112326 2013-01-04 2013-01-08  Standard Class    PO-19195   
3     742  CA-2011-112326 2013-01-04 2013-01-08  Standard Class    PO-19195   
4    1760  CA-2011-141817 2013-01-05 2013-01-12  Standard Class    MB-18085   

   Customer Name      Segment        Country          City  ... Postal Code  \
0  Darren Powers     Consumer  United States       Houston  ...       77095   
1  Phillina Ober  Home Office  United States    Naperville  ...       60540   
2  Phillina Ober  Home Office  United States    Naperville  ...       60540   
3  Phillina Ober  Home Office  United States    Naperville  ...       60540   
4     Mick Brown     Consumer  United States  Philadelphia  ...       19143   

    Region       Product ID         Category Sub-Cate

In [7]:
# Renaming columns to make them easier to
# operate on in `pandas`
orders.columns = orders.columns.str.lower()
people.columns = people.columns.str.lower()

print(orders.columns)
print(people.columns)

Index(['row id', 'order id', 'order date', 'ship date', 'ship mode',
       'customer id', 'customer name', 'segment', 'country', 'city', 'state',
       'postal code', 'region', 'product id', 'category', 'sub-category',
       'product name', 'sales', 'quantity', 'discount', 'profit'],
      dtype='object')
Index(['person', 'region'], dtype='object')


We will now "look up" the salesperson names into the orders data, find the total sales for each salesperson, and then write that to Excel.

Don't worry too much about the code to manipulate the data in `pandas`.

Instead, focus on the code to read and write data in and out of Excel, the focus of this course.

I will have resources at the conclusion of this book if you would like to learn more about analyzing and manipulating datasets in Python.

In [8]:
# "Look up" the salesperson into the orders data
report = orders.merge(people, on='region', how='left')

# Find total sales by rep
report_agg = report.groupby(['person'])['sales','profit'].sum()

# Preview our report
report_agg.head()

Unnamed: 0_level_0,sales,profit
person,Unnamed: 1_level_1,Unnamed: 2_level_1
Anna Andreadi,725457.8245,108418.4489
Cassandra Brandow,391721.905,46749.4303
Chuck Magee,678781.24,91522.78
Kelly Williams,501239.8908,39706.3625


We can now write the results of `report_agg` to Excel using the `to_excel()` method. We will specify what to call this file. 

In [44]:
# Let's write this to Excel.

report_agg.to_excel("sales-report.xlsx")

By default, our workbook will be written to the same folder as this file. To customize or change that, check out file paths and directory paths in Python.

# DRILL: `baseball.xlsx`

Now it's your turn to read worksheets into `pandas` DataFrames, operate on them, and export the results back to Excel.

I have completed the code to conduct the data manipulation. You finish the code to read and write the data. 

In [None]:
# Import pandas. We will need it for the data manipulation
import pandas as pd


#  Read the `teams`, `salaries` and `people` worksheets 
#  into DataFrames of the same names.
teams = pd.___('baseball.xlsx', ___)
salaries = ___(___, ___=___)
people = ___(___, ___=___)

In [6]:
# "Look up" first names and 
# last names from the people table into the
# salaries table. 
salaries_report = salaries.merge(people[['playerID','nameFirst','nameLast']],on='playerID',how='left')

# Find total salaries by player.
# This line is completed for you to run. 
salaries_agg = salaries_report.groupby(['playerID','nameFirst','nameLast'])['salary'].sum()

# Preview our report
salaries_agg.head()

playerID   nameFirst  nameLast
aardsda01  David      Aardsma     9259750
aasedo01   Don        Aase        2300000
abadan01   Andy       Abad         327000
abadfe01   Fernando   Abad        3766400
abbotje01  Jeff       Abbott       985000
Name: salary, dtype: int64

In [71]:
# 4. Write this DataFrame to an Excel file
# called `salaries-report.xlsx`
___.___(___)

Congrats on moving Excel data in and out of Python using `pandas`! Now, let's look at another, more versatile way for producing Excel reports from Python.

# `xlsxwriter` basics# `xlsxwriter` basics

`pandas` is great for performing automated data analysis and exporting the results of that analysis back to Excel, as you saw in the previous lesson. 

... but what about automating *Excel* itself?

- Freezing panes
- Changing fonts
- Adding charts
- Doing all that formatting stuff that your boss loves

![Spreadsheet design is my passion!](images/spreadsheet-design-is-my-passion.png)

## Enter `xlsxwriter`. 

- A module for creating Excel files
- [See documentation](https://xlsxwriter.readthedocs.io)
 - [Get the PDF guide](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjJ9IDm0__pAhXFRzABHb81Bf4QFjAAegQIAhAB&url=https%3A%2F%2Fraw.githubusercontent.com%2Fjmcnamara%2FXlsxWriter%2Fmaster%2Fdocs%2FXlsxWriter.pdf&usg=AOvVaw3kHptxxFcoER5_Jgt9P8O8) (617 pages long!) We will *just* scratch the surface.

## `xlsxwriter` basics

### Installation

If xlsxwriter is not on your machine, you can install it from Jupyter
with the below code. Only run it once to install!

In [1]:
# Run the below code once to install `xlsxwriter`.
# Do not remove the exclamation mark -- only the #hashtag!

#!pip install xlsxwriter

## "Hello, world" from `xlsxwriter`

Creating workbooks from `xlsxwriter` is more versatile than `pandas`, but also more involved. 

Here's our basic workflow for `xlsxwriter`:

1. Initialize the workbook
2. Add a worksheet
3. Make your changes
4. Close the workbook

### Loading `xlsxwriter`

While we've installed `xlsxwriter`, we still need to *load* it if we want to use it. This is accomplished with the code `import xlsxwriter`. 

We will need to execute this code each time we start a new Python session.

In [2]:
# Do this with each new session
import xlsxwriter

We are now ready to begin the process. 

Let's write `Hello, world!` in cell `A1` of our workbook. 

We can write to individual cells in Excel with the `write()` method.

Do you remember the steps?

In [None]:
# Initialize a workbook.
# This workbook doesn't exist yet. 
# We are creating it from Python!
# Name the file something sensible
workbook = xlsxwriter.Workbook('hello-world.xlsx')

# Make your changes.
# Write to a given cell with worksheet.write()
worksheet.write('A1', 'Hello, world!')

# Close the workbook
# Your workbook isn't searchable until you 
# close it.
workbook.close()


Check out the resulting workbook, `hello-world.xlsx`.

*Why don't we see "Hello, world" on cell `A1`?*

It's because we forgot a step:

1. Initialize the workbook
2. **Add a worksheet**
3. Make your changes
4. Close the workbook

Let's try this again:

In [None]:
# 1. Initialize workbook
workbook = xlsxwriter.Workbook('hello-world.xlsx')

# 2. Add worksheet. Let's name it 'helloworld.'
worksheet = workbook.add_worksheet('helloworld')

# 3. Make changes
worksheet.write('A1', 'Hello, world!')

# 4. Close the workbook
workbook.close()

# DRILL

1. Place the below steps in the proper order for `xlsxwriter`. Not all steps may be necessary.

- Add a worksheet    
- Make your changes  
- Initialize the workbook  
- Create a new workbook from Excel  
- Close the workbook  

2. Fill out the code below to do the following:  
a. Create a workbook named `my-favorite-things.xlsx` with a worksheet called `favorites`.    
b. In cell `A1`, add your favorite color.  
c. In cell `A2`, your favorite food.  
d. In cell `A3`, your favorite animal.   
e. Close the workbook and admire it.     


In [None]:
# Import xlsxwriter
___ ___

# 1. Initialize workbook
workbook = xlsxwriter.Workbook(___)

# 2. Add worksheet
worksheet = workbook.___(___)

# 3. Make changes
worksheet.___(___, ___)
worksheet.___(___, 'Your favorite food here')
worksheet.write('A3', ___)

# 4. Close the workbook
___

Now that you have the hang of writing individual cells to a worksheet, let's look at writing rows, columns and multiple worksheets to a workbook. 

We will do so with Python *lists*. To learn more about Python data structures, such as lists, check out the resources.

## Adding rows and columns

- We can add whole rows to a worksheet with `write_row()`.  
- We can add whole columns to a worksheet with `write_columns`.

In each case, we need to specify *where* in the workbook the data should be added. 

We have a couple of options for this:

- We can use alphanumeric cell references such as `A1`, `C3`, etc. 
- We can also use `R1C1`-like references where the first number indicates the  row position of the cell, and the second position the column. 
  - In this convention, `0, 0` is the equivalent of cell `A1`; so `1,2` does not indicate `B1` but `C2`!

In [None]:
 # 1. Initialize workbook
workbook = xlsxwriter.Workbook('rows-and-columns.xlsx')

# 2. Add worksheet -- by default will
# use Excel's Sheet1 naming convention
worksheet = workbook.add_worksheet()

# Let's define our rows and columns 
my_row = ['Jack','Jill','Susan','Bobby']
my_col = [0,1,1,2,3,5]

# 3. Make changes
worksheet.write_row('A1', my_row)

# row/column sequence works as well
worksheet.write_column(4,4,my_col)

# 4. Close the workbook.
workbook.close()

Be careful that you don't over-write any cells in your workbook!

In [None]:
 # 1. Initialize workbook
workbook = xlsxwriter.Workbook('wheres-my-data.xlsx')

# 2. Add worksheet
worksheet = workbook.add_worksheet()

# Let's define our rows and columns 
my_row = [1,2,3,'Boo!']
my_col = [0,1,1,2,3,5]
my_col_2 = [5,2,3]

# 3. Make changes
worksheet.write_row('A1', my_row)
worksheet.write_column(0,0,my_col)
worksheet.write_column('B1', my_col_2)

# 4. Close the workbook.
workbook.close()

## Adding data to multiple worksheets

Thus far, we've only been adding one worksheet to a workbook.

To add multiple, we can call `add_worksheet()` multiple times, assigning the results to multiple *variables* representing each worksheet.

In [None]:
# 1. Initialize workbook
workbook = xlsxwriter.Workbook('multiple-sheets.xlsx')

worksheet1 = workbook.add_worksheet('This sheet')
worksheet2 = workbook.add_worksheet('That sheet')
worksheet3 = workbook.add_worksheet('The other sheet')

# Can you guess what these will do?
worksheet2.write(1,4,'Boo!')
worksheet3.write('A6','Boo who!')

# Always close the workbook when you are done!
workbook.close()

# DRILL

From Python:

- Create a workbook `hello-xlsxwriter` with three worksheets:
  - Name the first worksheet `ws_1`. Add a row starting at cell `B3` with the values `23`,`26`,`27` each in a different cell.
  - Name the second worksheet `ws_2`. Add a column from `A1` with values `1`,`2`,`3`,`Hello!` each in a different cell.

I have provided some scaffolding for this exercise below, or try to build it on your own. 


In [None]:
# 1. Initialize workbook
workbook = xlsxwriter.Workbook('hello-xlsxwriter.xlsx')

worksheet1 = ___.___(___)
worksheet2 = ___.___('ws_2')
 
my_row = [23,26,27]
my_col = [1,2,3,'Hello!']


worksheet1.___(___, ___)
worksheet2.___(___,___)

# Always close the workbook when you are done!
___

## Questions?

# Beginning the workbook do-over

Writing data to a workbook from `xlsxwriter` is great, but we've not done much that wouldn't have been possible with `pandas.`

Let's start jazzing up our workbooks using `xlsxwriter`:

- Including cell formulas  
- Changing fonts and colors  
- Freezing panes  
- Adding borders

Let's get started!

## Writing formulas 

We can add Excel formulas to our worksheet using `write.formula()`. We will provide `xlsxwriter` with where in the worksheet to write the formula, and what formula to write.

This will work on any standard Excel formula, text or number! 🎉

** You *must* include the `=` sign when writing the Excel formula. **


In [None]:
import xlsxwriter

# Write our workbook
workbook = xlsxwriter.Workbook('add-formula.xlsx')

# Add our worksheet
worksheet = workbook.add_worksheet()

# What do we want to write?
my_numbers = [1,2,3]
my_string = 'Hello, world!'

# Write them
worksheet.write_column('A1', my_numbers)
worksheet.write('D1', my_string)

# Now write formulas to analyze that data
worksheet.write_formula('A4','=SUM(A1:A3)')
worksheet.write_formula('D2','=LEN(D1)')

# 4. Close the workbook.
workbook.close()

## Formatting cells from `xlsxwriter`

So far when we've added data to a worksheet we've provided `xlsxwriter` two bits of information:

1. What to write  
2. Where to write it

Now let's add a third bit of information to the mix:

3. How to *format* what is written.

We will do this by assigning a variable (which we will name  `cell_format` by default) and adding this as an argument in our `write` functions.

In [None]:
import xlsxwriter

workbook = xlsxwriter.Workbook('add-formats.xlsx')

worksheet = workbook.add_worksheet()
 
my_numbers = [1,2,3]

# Define a cell format
cell_format = workbook.add_format()
# Toggle bold on for this format
cell_format.set_bold(True)

# Now we can set the format to bold:
worksheet.write_row('A1', my_numbers, cell_format)

# Let's check it out!
workbook.close()

Below is a table of some formats that we will be experimenting with for the remainder of this lesson.

| Method             | Argument taken                                       |
| ------------------ | ---------------------------------------------------- |
| `set_bold()`       | `True`/`False`  (`True` by default)                  |
| `set_font_color()` | A color (e.g. `red`, `blue` , `yellow`)              |
| `set_font_size()`  | A font size (e.g. 12, 14, 16)                        |
| `set_font_name()`  | A font name (e.g. `Times New Roman`, `Comic Sans MS` |
| `set_border()`     | `True`/`False` (`True` by default)                   |
| `set_top_border()` | `True`/`False` (`True` by default)                   |

    
Let's get more creative with our formatting!

In [None]:
import xlsxwriter

workbook = xlsxwriter.Workbook('add-formats.xlsx')

worksheet = workbook.add_worksheet()
 
my_numbers = [1,2,3]
my_strings = ['Having','fun','yet?']
my_python_fun = ['Python', 'IS', 'fun!']

# Define a first cell format
cell_format_1 = workbook.add_format()
# Toggle bold on for this format
cell_format_1.set_bold(True)
# Add a border around this cell format
cell_format_1.set_border(True)

# Define a second cell format
cell_format_2 = workbook.add_format()
# Add color to format
cell_format_2.set_font_color('pink')
# Add custom font to format
cell_format_2.set_font_name('Comic Sans MS')

# Write our formatted data
worksheet.write_row('A1', my_numbers, cell_format_1)
worksheet.write_row('A2',my_strings, cell_format_2)
worksheet.write_column('E1', my_python_fun, cell_format_2)

# Let's check it out!
workbook.close()

## Setting cell formats via a dictionary

It is a pain to keep writing a new line each time we want to add a new cell format. 

Instead, we can add a bunch of properties at the same time using a *dictionary*. This, like a list, is a Python data type. 

Dictionaries are a series of `key : value` pairs enclosed by `{brackets}`. For more on how dictionaries work, check out the resources.

Rather than use `set_bold()` or `set_font_size()`, we will pass `bold` or `font_size` as keys in our dictionary, with the properties we want assigned to them in our values.

For example, we could change both the font of our worksheet to Segoe UI and change it to size 12 font using a dictionary like this:

In [None]:
workbook = xlsxwriter.Workbook('add-formats-with-dict.xlsx')

worksheet = workbook.add_worksheet()
 
# We will set our formats by passing a dictionary to `add_format()`
cell_format = workbook.add_format({'font_size':12,'font_name':'Segoe UI'})

# To turn on bold formatting, borders, etc, pass True to the dictionary
cell_format_2 = workbook.add_format({'font_color':'Blue','bold':True,'border':True})


# Write with our formats
worksheet.write('A1','Hello, world!', cell_format)
worksheet.write('A2', 'Python is fun!', cell_format_2)

# Close and admire
workbook.close()



Check out Chapter 9 ("The Format Class") of the [`xlsxwriter` documentation](https://github.com/jmcnamara/XlsxWriter/blob/master/docs/XlsxWriter.pdf) for further possibilities on formatting cells. 

Let's look at one more helpful bit of cell formatting before moving on.

## Freezing panes

Freezing panes helps tremendously with legibility, especially with larger datasets... which we *will* get to once we eventually start working with `pandas`. 

To do that, we can use the `freeze_panes()` method, which takes two arguments:

- The number of rows that should be frozen, and  
- The number of columns that should be frozen. 

Let's try it out!

In [None]:
workbook = xlsxwriter.Workbook('freeze-panes.xlsx')

worksheet = workbook.add_worksheet()
 
name = ['Jack','Jill','Bobby','Susan']
grade = [85, 90, 99, 88]

worksheet.write_column('A1', name)
worksheet.write_column('B1', grade)

# Freeze panes --
# What do you expect this to do?
worksheet.freeze_panes(1,2)

# Take a look if the panes froze
# as expected!
workbook.close()

# DRILL

Build on the example above to make a worksheet that looks like the below. Note that this worksheet contains *no* frozen panes.

![Workbook do-over drill solution](images/workbook-do-over-drill-solution.png)

You can use the scaffolding below, or try it out yourself. 

In [None]:
workbook = xlsxwriter.Workbook('freeze-panes.xlsx')

worksheet = workbook.add_worksheet()
 
name = ['Jack','Jill','Bobby','Susan']
grade = [85, 90, 99, 88]

# Write our data
worksheet.___
___

# Set the cell format for our average line
cell_format = workbook.add_format({___:'green','top':True,'bold':___})


# Write our average grade lines
worksheet.___('A5','Average',___)
worksheet.___(___,___,___)


# Close the workbook -- do we have a match?
___

# Questions?