# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Pandas for Exploratory Data Analysis I 
by [@josephofiowa](https://twitter.com/josephofiowa)

Pandas is the most prominent Python library for exploratory data analysis (EDA). The functions Pandas supports are integral to understanding, formatting, and preparing our data. Formally, we use Pandas to investigate, wrangle, munge, and clean our data. Pandas is the Swiss Army Knife of data manipulation!


We'll have two coding-heavy sessions on Pandas. In this one, we'll use Pandas to:
 - Read in a dataset
 - Investigate a dataset's integrity
 - Filter, sort, and manipulate a DataFrame's series

*An important message from our sponsor before going further:*

![](https://media.giphy.com/media/UtxwEhibdd5ss/giphy.gif)

## About the Dataset: Iowa Liquor

For today's Pandas exercises, we will be using a real dataset from the state of Iowa government on liquor sales. Due to state licensing laws, stores must report daily transactions of all alcohol they sell to the Iowa Department of Commerce's Alcoholic Beverages Division. The state of Iowa makes this data available for analysis -- and it is an excellent, structured dataset for our use!

Take a look at the data source [page](https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy).


Let's take a closer look at the data dictionary, or what is included:
- **Invoice/Item Number** - Concatenated invoice and line number associated with the liquor order. This provides a unique identifier for the individual liquor products included in the store order
- **Date** - Date of order 
- **Store Number** - Unique number assigned to the store who ordered the liquor.
- **Store Name** - Name of store who ordered the liquor.
- **Address** - Address of the store that ordered the liquor
- **City** - City where the store who ordered the liquor is located
- **Zip Code** - Zip Code of where the store that ordered is located 
- **Store Location** - Location of store who ordered the liquor. The Address, City, State and Zip Code are geocoded to provide geographic coordinates. Accuracy of geocoding is dependent on how well the address is interpreted and the completeness of the reference data used.
- **County Number** - Iowa county number for the county where store who ordered the liquor is located
- **County** - County where the store who ordered the liquor is located
- **Category** - Category code associated with the liquor ordered
- **Category Names** - Category of the liquor ordered.
- **Vendor Number** - The vendor number of the company for the brand of liquor ordered
- **Vendor Name** - The vendor name of the company for the brand of liquor ordered
- **Item Name** - Item number for the individual liquor product ordered.
- **Item Description** - Description of the individual liquor product ordered.
- **Pack** - The number of bottles in a case for the liquor ordered
- **Bottle Volume (mL)** - Volume of each liquor bottle ordered in milliliters.
- **State Bottle Cost** - The amount that Alcoholic Beverages Division paid for each bottle of liquor ordered
- **State Bottle Retail** - The amount the store paid for each bottle of liquor ordered
- **Bottles Solde** - The number of bottles of liquor ordered by the store
- **Sale (Dollars)** - Total cost of liquor order (number of bottles multiplied by the state bottle retail)
- **Volume Sold (Liters)** - Total volume of liquor ordered in liters. (i.e. (Bottle Volume (ml) x Bottles Sold)/1,000)
- **Volume Sold (Gallons)** - Total volume of liquor ordered in gallons. (i.e. (Bottle Volume (ml) x Bottles Sold)/3785.411784)


### Our Modified Iowa Liquor Dataset

Because the full dataset (of all liquor sales from 2012 to present) is greater than 13 million rows (13,948,103+ at the time of writing), **we will work with a modified dataset.**

Our modified dataset has a few key changes:
- Only sales from May 2017 and May 2018 are present
- A number of values have been deliberately deleted (to practice working with missing data!)


## Importing Pandas

To import a library, we simply say `import` and the library name. For Pandas, is it common to name the library `pd` so that when we reference a function from the Pandas library, we only write `pd` -- not `pandas`.

In [1]:
import pandas as pd

## Reading in Data

Pandas dramatically simplifies the process of reading in data. When we say "reading in data," we mean loading a file into our machine's memory.

When you have a CSV, for example, and then you double-click to open it in Microsoft Excel, the open file is "read into memory." You can now manipulate the CSV.

When we read data into memory in Python, we are creating an object. We will soon explore this object. _(And, as an aside, when we have a file that is greater than the size of our computer's memory, this is approaching "Big Data.")_

Because we are working with a CSV, we will use the [read CSV](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) method. (Note: reading other formats are supported)

In [2]:
# pd refers to "Pandas" (specific encoding for this file -- non-essential for others)
liq = pd.read_csv("../data/iowa_liquor_may_17_18.csv", encoding='cp1252')

*Documentation Pause*

How did we know how to use `pd.read_csv`? Let's take a look at the [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html). Note the first argument required (`filepath`).
> Take a moment to dissect other arguments and options when reading in data.

We have just created a data structure called a `DataFrame`. See?

In [3]:
type(liq)

pandas.core.frame.DataFrame

## Inspecting our DataFrame: The basics

We'll now perform basic operations on the DataFrame, denoted with comments.

In [4]:
# print the first and last 30 rows
liq

Unnamed: 0,Date,Store Number,Store Name,City,Zip Code,Store Location,County,Category Name,Vendor Name,Item Number,...,Pack,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons),is_may_2017,is_may_2018
0,5/2/17,5286,Sauce,Iowa City,52240.0,"108, College\rIowa City 52240\r",JOHNSON,Blended Whiskies,Laird & Company,23827,...,12,1000,$4.40,$6.60,1,$79.20,1.00,0.26,1,0
1,5/1/17,4169,Super Quick 2 / Hubbell,Des Moines,50317.0,1824 Hubbell Ave\rDes Moines 50317\r,POLK,Canadian Whiskies,CONSTELLATION BRANDS INC,11773,...,48,200,$1.56,$2.34,1,$112.32,0.20,0.05,1,0
2,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84207,...,10,600,$6.00,$9.00,1,$9.00,0.60,0.15,1,0
3,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84197,...,10,600,$6.00,$9.00,1,$9.00,0.60,0.15,1,0
4,5/3/17,2565,Hy-Vee Food Store / Spencer,Spencer,51301.0,"819 N Grand Ave\rSpencer 51301\r(43.145897, -9...",CLAY,Mixto Tequila,LUXCO INC,89448,...,6,1750,$12.00,$18.00,3,$18.00,5.25,1.38,1,0
5,5/3/17,5105,Three Brothers Liquors,North Liberty,52317.0,585 HIGHWAY 965\rNorth Liberty 52317\r(41.7381...,JOHNSON,American Vodka,A V BRANDS INC,937040,...,6,750,$21.99,$32.99,2,$135.00,1.50,0.39,1,0
6,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,American Brandies,CONSTELLATION BRANDS INC,53214,...,24,375,$3.22,$4.83,1,$115.92,0.37,0.09,1,0
7,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,American Vodkas,Laird & Company,35914,...,24,375,$1.93,$2.90,1,$69.60,0.37,0.09,1,0
8,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,Imported Vodkas,BACARDI USA INC,34359,...,12,200,$5.00,$7.50,1,$90.00,0.20,0.05,1,0
9,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,Imported Vodkas,BACARDI USA INC,34423,...,12,375,$9.00,$13.50,1,$162.00,0.37,0.09,1,0


In [5]:
# print the first five rows
liq.head()

Unnamed: 0,Date,Store Number,Store Name,City,Zip Code,Store Location,County,Category Name,Vendor Name,Item Number,...,Pack,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons),is_may_2017,is_may_2018
0,5/2/17,5286,Sauce,Iowa City,52240.0,"108, College\rIowa City 52240\r",JOHNSON,Blended Whiskies,Laird & Company,23827,...,12,1000,$4.40,$6.60,1,$79.20,1.0,0.26,1,0
1,5/1/17,4169,Super Quick 2 / Hubbell,Des Moines,50317.0,1824 Hubbell Ave\rDes Moines 50317\r,POLK,Canadian Whiskies,CONSTELLATION BRANDS INC,11773,...,48,200,$1.56,$2.34,1,$112.32,0.2,0.05,1,0
2,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84207,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
3,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84197,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
4,5/3/17,2565,Hy-Vee Food Store / Spencer,Spencer,51301.0,"819 N Grand Ave\rSpencer 51301\r(43.145897, -9...",CLAY,Mixto Tequila,LUXCO INC,89448,...,6,1750,$12.00,$18.00,3,$18.00,5.25,1.38,1,0


Notice that `.head()` is a method (denoted by parantheses), so it takes arguments.

**Class Question:** What do you think changes if we pass a different number `head()` argument?

In [6]:
# try 10 as an argument
liq.head(10)

Unnamed: 0,Date,Store Number,Store Name,City,Zip Code,Store Location,County,Category Name,Vendor Name,Item Number,...,Pack,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons),is_may_2017,is_may_2018
0,5/2/17,5286,Sauce,Iowa City,52240.0,"108, College\rIowa City 52240\r",JOHNSON,Blended Whiskies,Laird & Company,23827,...,12,1000,$4.40,$6.60,1,$79.20,1.0,0.26,1,0
1,5/1/17,4169,Super Quick 2 / Hubbell,Des Moines,50317.0,1824 Hubbell Ave\rDes Moines 50317\r,POLK,Canadian Whiskies,CONSTELLATION BRANDS INC,11773,...,48,200,$1.56,$2.34,1,$112.32,0.2,0.05,1,0
2,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84207,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
3,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84197,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
4,5/3/17,2565,Hy-Vee Food Store / Spencer,Spencer,51301.0,"819 N Grand Ave\rSpencer 51301\r(43.145897, -9...",CLAY,Mixto Tequila,LUXCO INC,89448,...,6,1750,$12.00,$18.00,3,$18.00,5.25,1.38,1,0
5,5/3/17,5105,Three Brothers Liquors,North Liberty,52317.0,585 HIGHWAY 965\rNorth Liberty 52317\r(41.7381...,JOHNSON,American Vodka,A V BRANDS INC,937040,...,6,750,$21.99,$32.99,2,$135.00,1.5,0.39,1,0
6,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,American Brandies,CONSTELLATION BRANDS INC,53214,...,24,375,$3.22,$4.83,1,$115.92,0.37,0.09,1,0
7,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,American Vodkas,Laird & Company,35914,...,24,375,$1.93,$2.90,1,$69.60,0.37,0.09,1,0
8,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,Imported Vodkas,BACARDI USA INC,34359,...,12,200,$5.00,$7.50,1,$90.00,0.2,0.05,1,0
9,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,Imported Vodkas,BACARDI USA INC,34423,...,12,375,$9.00,$13.50,1,$162.00,0.37,0.09,1,0


In [7]:
# print the last five rows
liq.tail()

Unnamed: 0,Date,Store Number,Store Name,City,Zip Code,Store Location,County,Category Name,Vendor Name,Item Number,...,Pack,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons),is_may_2017,is_may_2018
427918,5/31/18,2595,Hy-Vee Wine and Spirits / Denison,Denison,51442.0,"1620 4th Ave, South\rDenison 51442\r(42.012395...",CRAWFORD,American Vodkas,McCormick Distilling Co.,36908,...,6,1750,$7.47,$11.21,1,$67.26,1.75,0.46,0,1
427919,5/31/18,2595,Hy-Vee Wine and Spirits / Denison,Denison,51442.0,"1620 4th Ave, South\rDenison 51442\r(42.012395...",CRAWFORD,American Vodkas,SAZERAC COMPANY INC,36978,...,6,1750,$6.92,$10.38,1,$62.28,1.75,0.46,0,1
427920,5/31/18,2595,Hy-Vee Wine and Spirits / Denison,Denison,51442.0,"1620 4th Ave, South\rDenison 51442\r(42.012395...",CRAWFORD,Iowa Distilleries,Infinium Spirits,27125,...,6,750,$22.75,$34.13,1,$204.78,0.75,0.19,0,1
427921,5/31/18,2595,Hy-Vee Wine and Spirits / Denison,Denison,51442.0,"1620 4th Ave, South\rDenison 51442\r(42.012395...",CRAWFORD,Blended Whiskies,DIAGEO AMERICAS,25607,...,12,1000,$8.00,$12.00,1,$144.00,1.0,0.26,0,1
427922,5/31/18,2595,Hy-Vee Wine and Spirits / Denison,Denison,51442.0,"1620 4th Ave, South\rDenison 51442\r(42.012395...",CRAWFORD,Blended Whiskies,DIAGEO AMERICAS,25608,...,6,1750,$11.96,$17.94,1,$107.64,1.75,0.46,0,1


In [8]:
# identify the shape (rows by columns)
liq.shape

(427923, 21)

Wow! hundreds of thousands of rows rows and tens of columns!

In [9]:
# display the index
liq.index

RangeIndex(start=0, stop=427923, step=1)

In [10]:
# print the columns
liq.columns

Index(['Date', 'Store Number', 'Store Name', 'City', 'Zip Code',
       'Store Location', 'County', 'Category Name', 'Vendor Name',
       'Item Number', 'Item Description', 'Pack', 'Bottle Volume (ml)',
       'State Bottle Cost', 'State Bottle Retail', 'Bottles Sold',
       'Sale (Dollars)', 'Volume Sold (Liters)', 'Volume Sold (Gallons)',
       'is_may_2017', 'is_may_2018'],
      dtype='object')

In [11]:
# examine the datatypes of the columns
liq.dtypes

Date                      object
Store Number               int64
Store Name                object
City                      object
Zip Code                 float64
Store Location            object
County                    object
Category Name             object
Vendor Name               object
Item Number                int64
Item Description          object
Pack                       int64
Bottle Volume (ml)        object
State Bottle Cost         object
State Bottle Retail       object
Bottles Sold               int64
Sale (Dollars)            object
Volume Sold (Liters)     float64
Volume Sold (Gallons)    float64
is_may_2017                int64
is_may_2018                int64
dtype: object

**Class Question:** Why do datatypes matter? What operations could we perform on some datatypes that we could not on others? Note the importance of this in checking dataset integrity.

## Selecting a Columns

We can select columns in two ways. Either we treat the column as an attribute of the DataFrame or we index the DataFrame for a specific element (in this case, the element is a column name).

In [12]:
# select the county column
liq['County']

0            JOHNSON
1               POLK
2         POTTAWATTA
3         POTTAWATTA
4               CLAY
5            JOHNSON
6               POLK
7               POLK
8               POLK
9               POLK
10              POLK
11              POLK
12              POLK
13              POLK
14              POLK
15              POLK
16              POLK
17              POLK
18              POLK
19              POLK
20              POLK
21              POLK
22              POLK
23              POLK
24              POLK
25              POLK
26              POLK
27              POLK
28              POLK
29              POLK
             ...    
427893      CRAWFORD
427894      CRAWFORD
427895      CRAWFORD
427896      CRAWFORD
427897      CRAWFORD
427898      CRAWFORD
427899      CRAWFORD
427900      CRAWFORD
427901      CRAWFORD
427902      CRAWFORD
427903      CRAWFORD
427904      CRAWFORD
427905      CRAWFORD
427906      CRAWFORD
427907      CRAWFORD
427908      CRAWFORD
427909      C

In [13]:
# this is equivalent
liq.County

0            JOHNSON
1               POLK
2         POTTAWATTA
3         POTTAWATTA
4               CLAY
5            JOHNSON
6               POLK
7               POLK
8               POLK
9               POLK
10              POLK
11              POLK
12              POLK
13              POLK
14              POLK
15              POLK
16              POLK
17              POLK
18              POLK
19              POLK
20              POLK
21              POLK
22              POLK
23              POLK
24              POLK
25              POLK
26              POLK
27              POLK
28              POLK
29              POLK
             ...    
427893      CRAWFORD
427894      CRAWFORD
427895      CRAWFORD
427896      CRAWFORD
427897      CRAWFORD
427898      CRAWFORD
427899      CRAWFORD
427900      CRAWFORD
427901      CRAWFORD
427902      CRAWFORD
427903      CRAWFORD
427904      CRAWFORD
427905      CRAWFORD
427906      CRAWFORD
427907      CRAWFORD
427908      CRAWFORD
427909      C

**Class Question:** What if we wanted to select a column that has a space in it? Which method from the above two would we use? Why?

## Renaming Columns

Perhaps we want to rename our columns, like replacing spaces with underscores. There's a few options for doing this.

In [14]:
# rename one or more columns with a dictionary. Note: inplace = True
liq.rename(columns={'Store Number': 'Store_Number', 'Store Name':'Store_Name'}, inplace=True)

In [15]:
# check the result
liq.head()

Unnamed: 0,Date,Store_Number,Store_Name,City,Zip Code,Store Location,County,Category Name,Vendor Name,Item Number,...,Pack,Bottle Volume (ml),State Bottle Cost,State Bottle Retail,Bottles Sold,Sale (Dollars),Volume Sold (Liters),Volume Sold (Gallons),is_may_2017,is_may_2018
0,5/2/17,5286,Sauce,Iowa City,52240.0,"108, College\rIowa City 52240\r",JOHNSON,Blended Whiskies,Laird & Company,23827,...,12,1000,$4.40,$6.60,1,$79.20,1.0,0.26,1,0
1,5/1/17,4169,Super Quick 2 / Hubbell,Des Moines,50317.0,1824 Hubbell Ave\rDes Moines 50317\r,POLK,Canadian Whiskies,CONSTELLATION BRANDS INC,11773,...,48,200,$1.56,$2.34,1,$112.32,0.2,0.05,1,0
2,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84207,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
3,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84197,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
4,5/3/17,2565,Hy-Vee Food Store / Spencer,Spencer,51301.0,"819 N Grand Ave\rSpencer 51301\r(43.145897, -9...",CLAY,Mixto Tequila,LUXCO INC,89448,...,6,1750,$12.00,$18.00,3,$18.00,5.25,1.38,1,0


Bulk rename columns with a list of new column names.

In [16]:
liq.columns

Index(['Date', 'Store_Number', 'Store_Name', 'City', 'Zip Code',
       'Store Location', 'County', 'Category Name', 'Vendor Name',
       'Item Number', 'Item Description', 'Pack', 'Bottle Volume (ml)',
       'State Bottle Cost', 'State Bottle Retail', 'Bottles Sold',
       'Sale (Dollars)', 'Volume Sold (Liters)', 'Volume Sold (Gallons)',
       'is_may_2017', 'is_may_2018'],
      dtype='object')

In [17]:
# declare a list of strings - these strings will become the new column names
cols = ['date', 'store_number', 'store_name', 'city', 
        'zip_code', 'location', 'county', 'category_name',
        'vendor_name', 'item_number', 'item_description', 'pack', 
       'bottle_vol_ml', 'state_bottle_cost', 'state_bottle_retail', 'bottles_sold',
       'sale', 'volumne_sold_l', 'volume_sold_gal', 'is_may_2017', 'is_may_2018']

In [18]:
# use that list to rename our columns - inplace by default
liq.columns = cols

In [19]:
# check out the result!
liq.head()

Unnamed: 0,date,store_number,store_name,city,zip_code,location,county,category_name,vendor_name,item_number,...,pack,bottle_vol_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale,volumne_sold_l,volume_sold_gal,is_may_2017,is_may_2018
0,5/2/17,5286,Sauce,Iowa City,52240.0,"108, College\rIowa City 52240\r",JOHNSON,Blended Whiskies,Laird & Company,23827,...,12,1000,$4.40,$6.60,1,$79.20,1.0,0.26,1,0
1,5/1/17,4169,Super Quick 2 / Hubbell,Des Moines,50317.0,1824 Hubbell Ave\rDes Moines 50317\r,POLK,Canadian Whiskies,CONSTELLATION BRANDS INC,11773,...,48,200,$1.56,$2.34,1,$112.32,0.2,0.05,1,0
2,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84207,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
3,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84197,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
4,5/3/17,2565,Hy-Vee Food Store / Spencer,Spencer,51301.0,"819 N Grand Ave\rSpencer 51301\r(43.145897, -9...",CLAY,Mixto Tequila,LUXCO INC,89448,...,6,1750,$12.00,$18.00,3,$18.00,5.25,1.38,1,0


## Notable Column Operations

While this is non-comprehensive, these are a few key column-specific data checks.


**Five-number summary:**  the minimum, first quartile, median, third quartile, and maximum.

(And more! The mean too.)

Five Number Summary (all assumes numeric data):
- **Min:** The smallest value in the column
- **Max:** The largest value in the column
- **Quartile:** A quartile is one fourth of our data
    - **First quartile:** This is the bottom most 25 percent
    - **Median:** The middle value. (Line all values biggest to smallest - median is the middle!) Also the 50th percentile
    - **Third quartile:** This the the top 75 percentile of our data


![](https://www.mathsisfun.com/data/images/quartiles-a.svg)

In [20]:
# note - describe *default* only checks numeric datatypes
liq.describe()

Unnamed: 0,store_number,zip_code,item_number,pack,bottles_sold,volumne_sold_l,volume_sold_gal,is_may_2017,is_may_2018
count,427923.0,427909.0,427923.0,427923.0,427923.0,427923.0,427923.0,427923.0,427923.0
mean,3789.924475,51275.466083,46367.24879,12.423403,2.251461,1.984745,0.518811,0.489308,0.510692
std,1090.596072,986.858053,52774.045141,7.849579,3.976579,6.175569,1.631609,0.499886,0.499886
min,2106.0,50002.0,139.0,1.0,0.0,0.01,0.0,0.0,0.0
25%,2616.0,50317.0,27125.0,6.0,1.0,0.75,0.19,0.0,0.0
50%,3849.0,51104.0,38177.0,12.0,1.0,1.5,0.39,0.0,1.0
75%,4802.0,52314.0,64762.0,12.0,3.0,2.0,0.52,1.0,1.0
max,9937.0,56201.0,998546.0,48.0,315.0,551.25,145.62,1.0,1.0


> To yourself: consider the **skew** of your series. When does the mean differ significantly from the median? Why does this matter?

In [21]:
# enrichment
# liq.describe(include='all')

**Value Counts:** count the occurrence of each value within our series.

In [22]:
liq.columns

Index(['date', 'store_number', 'store_name', 'city', 'zip_code', 'location',
       'county', 'category_name', 'vendor_name', 'item_number',
       'item_description', 'pack', 'bottle_vol_ml', 'state_bottle_cost',
       'state_bottle_retail', 'bottles_sold', 'sale', 'volumne_sold_l',
       'volume_sold_gal', 'is_may_2017', 'is_may_2018'],
      dtype='object')

In [23]:
# show the most frequent store purchasers
liq.store_name.value_counts()

Hy-Vee #3 / BDI / Des Moines                 3946
Central City 2                               3244
Hy-Vee Food Store / Cedar Falls              3165
Hy-Vee Wine and Spirits / Bettendorf         2646
Hy-Vee Food Store / Coralville               2418
Hy-Vee Food Store / Spencer                  2339
Hy-Vee Food Store #1 / Ames                  2338
Hy-Vee Food Store / Muscatine                2284
Hy-Vee Wine and Spirits / Iowa City          2169
Hy-Vee Food Store #5 / Cedar Rapids          2152
Cyclone Liquors                              2108
Central City Liquor, Inc.                    2056
Hy-Vee Food Store #1 / Mason City            2051
Hy-Vee Food Store #3 / Sioux City            2007
Hy-Vee #7 / Cedar Rapids                     1987
Hy-Vee Wine and Spirits / WDM                1984
Iowa Street Market, Inc.                     1972
Hy-Vee / Waukee                              1942
Hy-Vee Food Store #3 / Cedar Rapids          1932
Hy-Vee Food and Drug / Clinton               1918


In [24]:
# same thing - but only top 10 using splicing we learned in days 1-3
liq.store_name.value_counts()[:10]

Hy-Vee #3 / BDI / Des Moines            3946
Central City 2                          3244
Hy-Vee Food Store / Cedar Falls         3165
Hy-Vee Wine and Spirits / Bettendorf    2646
Hy-Vee Food Store / Coralville          2418
Hy-Vee Food Store / Spencer             2339
Hy-Vee Food Store #1 / Ames             2338
Hy-Vee Food Store / Muscatine           2284
Hy-Vee Wine and Spirits / Iowa City     2169
Hy-Vee Food Store #5 / Cedar Rapids     2152
Name: store_name, dtype: int64

**Unique values:** Determine the number of distinct values within a given series.

In [25]:
# how many distinct stores are there in Iowa?
liq.store_name.nunique()

1583

In [26]:
# what are those stores called? (notice: too many to print them all!)
liq.store_name.unique()

array(['Sauce', 'Super Quick 2 / Hubbell',
       'Hy-Vee Drugstore / Council Bluffs', ...,
       'Liberty View Wine and Spirits',
       "Casey's General Store #1901 / Des Moines",
       'CVS Pharmacy #10162 / Des Moines'], dtype=object)

## Filtering

Filtering and sorting are key processes that allow us to drill into the 'nitty gritty' and cross sections of our dataset.

To filter, we use a process called **Boolean Filtering**, wherein we define a Boolean condition, and use that Boolean condition to filer on our DataFrame.

Recall: our given dataset has a column `is_may_2017` when a given sale occurred in the month of May 2017. The column is equal to 1 when the sale did occur in May 2017, and zero otherwise (which means it occurred in May 2018 as those are the only two months in our dataset).

Let's calculate the **average pack size** for sales in May 2017.

> Think: What are the component parts of this problem?

In [27]:
# First, create a Boolean filter for is_may_2017
liq.is_may_2017 == 1

0          True
1          True
2          True
3          True
4          True
5          True
6          True
7          True
8          True
9          True
10         True
11         True
12         True
13         True
14         True
15         True
16         True
17         True
18         True
19         True
20         True
21         True
22         True
23         True
24         True
25         True
26         True
27         True
28         True
29         True
          ...  
427893    False
427894    False
427895    False
427896    False
427897    False
427898    False
427899    False
427900    False
427901    False
427902    False
427903    False
427904    False
427905    False
427906    False
427907    False
427908    False
427909    False
427910    False
427911    False
427912    False
427913    False
427914    False
427915    False
427916    False
427917    False
427918    False
427919    False
427920    False
427921    False
427922    False
Name: is_may_2017, Lengt

In [28]:
# use that Boolean filter to index the DataFrame, only printing rows where True
liq[liq.is_may_2017 == 1]

Unnamed: 0,date,store_number,store_name,city,zip_code,location,county,category_name,vendor_name,item_number,...,pack,bottle_vol_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale,volumne_sold_l,volume_sold_gal,is_may_2017,is_may_2018
0,5/2/17,5286,Sauce,Iowa City,52240.0,"108, College\rIowa City 52240\r",JOHNSON,Blended Whiskies,Laird & Company,23827,...,12,1000,$4.40,$6.60,1,$79.20,1.00,0.26,1,0
1,5/1/17,4169,Super Quick 2 / Hubbell,Des Moines,50317.0,1824 Hubbell Ave\rDes Moines 50317\r,POLK,Canadian Whiskies,CONSTELLATION BRANDS INC,11773,...,48,200,$1.56,$2.34,1,$112.32,0.20,0.05,1,0
2,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84207,...,10,600,$6.00,$9.00,1,$9.00,0.60,0.15,1,0
3,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84197,...,10,600,$6.00,$9.00,1,$9.00,0.60,0.15,1,0
4,5/3/17,2565,Hy-Vee Food Store / Spencer,Spencer,51301.0,"819 N Grand Ave\rSpencer 51301\r(43.145897, -9...",CLAY,Mixto Tequila,LUXCO INC,89448,...,6,1750,$12.00,$18.00,3,$18.00,5.25,1.38,1,0
5,5/3/17,5105,Three Brothers Liquors,North Liberty,52317.0,585 HIGHWAY 965\rNorth Liberty 52317\r(41.7381...,JOHNSON,American Vodka,A V BRANDS INC,937040,...,6,750,$21.99,$32.99,2,$135.00,1.50,0.39,1,0
6,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,American Brandies,CONSTELLATION BRANDS INC,53214,...,24,375,$3.22,$4.83,1,$115.92,0.37,0.09,1,0
7,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,American Vodkas,Laird & Company,35914,...,24,375,$1.93,$2.90,1,$69.60,0.37,0.09,1,0
8,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,Imported Vodkas,BACARDI USA INC,34359,...,12,200,$5.00,$7.50,1,$90.00,0.20,0.05,1,0
9,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,Imported Vodkas,BACARDI USA INC,34423,...,12,375,$9.00,$13.50,1,$162.00,0.37,0.09,1,0


Great! This DataFrame is only sales in May 2017!

Now, we need to calculate the average pack size of this DataFrame.

In [29]:
# get the pack column data...
liq[liq.is_may_2017 == 1].pack

0         12
1         48
2         10
3         10
4          6
5          6
6         24
7         24
8         12
9         12
10        24
11        12
12         6
13        24
14        12
15        12
16         6
17        24
18        12
19         6
20        12
21        12
22        12
23        12
24        12
25        12
26        12
27        12
28        48
29        24
          ..
209356     6
209357     6
209358    12
209359    12
209360    12
209361    12
209362     6
209363    12
209364    12
209365    12
209366    12
209367    12
209368    12
209369     6
209370     6
209371    12
209372    12
209373     6
209374    12
209375    12
209376    12
209377    12
209378    12
209379    12
209380    12
209381    12
209382    12
209383    12
209384    12
209385    12
Name: pack, Length: 209386, dtype: int64

In [30]:
# now calculate the average!
liq[liq.is_may_2017 == 1].pack.describe()

count    209386.000000
mean         12.414278
std           7.754952
min           1.000000
25%           6.000000
50%          12.000000
75%          12.000000
max          48.000000
Name: pack, dtype: float64

**The average pack size for sales in May 2017 is 12.41!**

## Sorting

We can sort one column of our DataFrame as well.

In [31]:
# let's sort by largest bottles in a single day
liq.sort_values(by='bottles_sold', ascending = False)

Unnamed: 0,date,store_number,store_name,city,zip_code,location,county,category_name,vendor_name,item_number,...,pack,bottle_vol_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale,volumne_sold_l,volume_sold_gal,is_may_2017,is_may_2018
285425,5/10/18,3814,Costco Wholesale #788 / WDM,West Des Moines,50266.0,7205 Mills Civic Pkwy\rWest Des Moines 50266\r...,Dallas,American Vodkas,Levecke Corporation,936600,...,6,1750,$8.93,$13.40,315,$80.40,551.25,145.62,0,1
256127,5/8/18,4677,Costco Wholesale #1111 / Coralville,Coralville,52241.0,2900 Heartland Dr\rCoralville 52241\r(41.69802...,JOHNSON,Cocktails /RTD,Levecke Corporation,962094,...,6,1750,$6.00,$9.00,270,$54.00,472.50,124.82,0,1
256128,5/8/18,4677,Costco Wholesale #1111 / Coralville,Coralville,52241.0,2900 Heartland Dr\rCoralville 52241\r(41.69802...,JOHNSON,American Vodkas,Levecke Corporation,936600,...,6,1750,$8.93,$13.40,270,$80.40,472.50,124.82,0,1
136200,5/22/17,3814,Costco Wholesale #788,West Des Moines,50266.0,7205 Mills Civic Pkwy\rWest Des Moines 50266\r...,Dallas,Imported Vodka,MISA Imports Inc,987514,...,6,1750,$14.48,$21.72,240,$130.32,420.00,110.95,1,0
285424,5/10/18,3814,Costco Wholesale #788 / WDM,West Des Moines,50266.0,7205 Mills Civic Pkwy\rWest Des Moines 50266\r...,Dallas,Cocktails /RTD,Levecke Corporation,962094,...,6,1750,$6.00,$9.00,225,$54.00,393.75,104.01,0,1
136197,5/22/17,3814,Costco Wholesale #788,West Des Moines,50266.0,7205 Mills Civic Pkwy\rWest Des Moines 50266\r...,Dallas,100% Agave Tequila,MISA Imports Inc,989289,...,6,1750,$14.90,$22.35,224,$134.10,392.00,103.55,1,0
100130,5/16/17,4677,Costco Wholesale #1111 / Coralville,Coralville,52241.0,2900 Heartland Dr\rCoralville 52241\r(41.69802...,JOHNSON,Canadian Whiskies,CONSTELLATION BRANDS INC,11788,...,6,1750,$10.45,$15.68,200,$94.08,350.00,92.46,1,0
38373,5/4/17,2593,Hy-Vee Food Store / Carroll,Carroll,51401.0,905 US Highway 30 West\rCarroll 51401\r(42.070...,CARROLL,Canadian Whiskies,CONSTELLATION BRANDS INC,11788,...,6,1750,$10.45,$15.68,200,$94.08,350.00,92.46,1,0
223994,5/2/18,4319,Fareway Stores #703 / Humbolt,Humboldt,50548.0,1700 10th Avenue N\rHumboldt 50548\r(42.731879...,HUMBOLDT,Canadian Whiskies,CONSTELLATION BRANDS INC,11788,...,6,1750,$10.45,$15.68,200,$94.08,350.00,92.46,0,1
76245,5/11/17,3420,Sam's Club 6344 / Windsor Heights,Windsor Heights,50311.0,1101 73RD STREET\rWindsor Heights 50311\r(41.5...,Polk,Canadian Whiskies,CONSTELLATION BRANDS INC,11788,...,6,1750,$10.45,$15.68,200,$94.08,350.00,92.46,1,0


In [32]:
# let's sort by largest bottles in a single day - but just the top 10
liq.sort_values(by='bottles_sold', ascending = False).head(10)

Unnamed: 0,date,store_number,store_name,city,zip_code,location,county,category_name,vendor_name,item_number,...,pack,bottle_vol_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale,volumne_sold_l,volume_sold_gal,is_may_2017,is_may_2018
285425,5/10/18,3814,Costco Wholesale #788 / WDM,West Des Moines,50266.0,7205 Mills Civic Pkwy\rWest Des Moines 50266\r...,Dallas,American Vodkas,Levecke Corporation,936600,...,6,1750,$8.93,$13.40,315,$80.40,551.25,145.62,0,1
256127,5/8/18,4677,Costco Wholesale #1111 / Coralville,Coralville,52241.0,2900 Heartland Dr\rCoralville 52241\r(41.69802...,JOHNSON,Cocktails /RTD,Levecke Corporation,962094,...,6,1750,$6.00,$9.00,270,$54.00,472.5,124.82,0,1
256128,5/8/18,4677,Costco Wholesale #1111 / Coralville,Coralville,52241.0,2900 Heartland Dr\rCoralville 52241\r(41.69802...,JOHNSON,American Vodkas,Levecke Corporation,936600,...,6,1750,$8.93,$13.40,270,$80.40,472.5,124.82,0,1
136200,5/22/17,3814,Costco Wholesale #788,West Des Moines,50266.0,7205 Mills Civic Pkwy\rWest Des Moines 50266\r...,Dallas,Imported Vodka,MISA Imports Inc,987514,...,6,1750,$14.48,$21.72,240,$130.32,420.0,110.95,1,0
285424,5/10/18,3814,Costco Wholesale #788 / WDM,West Des Moines,50266.0,7205 Mills Civic Pkwy\rWest Des Moines 50266\r...,Dallas,Cocktails /RTD,Levecke Corporation,962094,...,6,1750,$6.00,$9.00,225,$54.00,393.75,104.01,0,1
136197,5/22/17,3814,Costco Wholesale #788,West Des Moines,50266.0,7205 Mills Civic Pkwy\rWest Des Moines 50266\r...,Dallas,100% Agave Tequila,MISA Imports Inc,989289,...,6,1750,$14.90,$22.35,224,$134.10,392.0,103.55,1,0
100130,5/16/17,4677,Costco Wholesale #1111 / Coralville,Coralville,52241.0,2900 Heartland Dr\rCoralville 52241\r(41.69802...,JOHNSON,Canadian Whiskies,CONSTELLATION BRANDS INC,11788,...,6,1750,$10.45,$15.68,200,$94.08,350.0,92.46,1,0
38373,5/4/17,2593,Hy-Vee Food Store / Carroll,Carroll,51401.0,905 US Highway 30 West\rCarroll 51401\r(42.070...,CARROLL,Canadian Whiskies,CONSTELLATION BRANDS INC,11788,...,6,1750,$10.45,$15.68,200,$94.08,350.0,92.46,1,0
223994,5/2/18,4319,Fareway Stores #703 / Humbolt,Humboldt,50548.0,1700 10th Avenue N\rHumboldt 50548\r(42.731879...,HUMBOLDT,Canadian Whiskies,CONSTELLATION BRANDS INC,11788,...,6,1750,$10.45,$15.68,200,$94.08,350.0,92.46,0,1
76245,5/11/17,3420,Sam's Club 6344 / Windsor Heights,Windsor Heights,50311.0,1101 73RD STREET\rWindsor Heights 50311\r(41.5...,Polk,Canadian Whiskies,CONSTELLATION BRANDS INC,11788,...,6,1750,$10.45,$15.68,200,$94.08,350.0,92.46,1,0


## Independent Exercises

Do your best to complete the following prompts. Don't hesitate to look at code we wrote together!

Print the first 15 rows of the whole DataFrame.

In [33]:
# your answer here
liq.head(15)

Unnamed: 0,date,store_number,store_name,city,zip_code,location,county,category_name,vendor_name,item_number,...,pack,bottle_vol_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale,volumne_sold_l,volume_sold_gal,is_may_2017,is_may_2018
0,5/2/17,5286,Sauce,Iowa City,52240.0,"108, College\rIowa City 52240\r",JOHNSON,Blended Whiskies,Laird & Company,23827,...,12,1000,$4.40,$6.60,1,$79.20,1.0,0.26,1,0
1,5/1/17,4169,Super Quick 2 / Hubbell,Des Moines,50317.0,1824 Hubbell Ave\rDes Moines 50317\r,POLK,Canadian Whiskies,CONSTELLATION BRANDS INC,11773,...,48,200,$1.56,$2.34,1,$112.32,0.2,0.05,1,0
2,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84207,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
3,5/1/17,2641,Hy-Vee Drugstore / Council Bluffs,Council Bluffs,51501.0,757 W Broadway\rCouncil Bluffs 51501\r(41.2616...,POTTAWATTA,American Cordials & Liqueur,SAZERAC NORTH AMERICA,84197,...,10,600,$6.00,$9.00,1,$9.00,0.6,0.15,1,0
4,5/3/17,2565,Hy-Vee Food Store / Spencer,Spencer,51301.0,"819 N Grand Ave\rSpencer 51301\r(43.145897, -9...",CLAY,Mixto Tequila,LUXCO INC,89448,...,6,1750,$12.00,$18.00,3,$18.00,5.25,1.38,1,0
5,5/3/17,5105,Three Brothers Liquors,North Liberty,52317.0,585 HIGHWAY 965\rNorth Liberty 52317\r(41.7381...,JOHNSON,American Vodka,A V BRANDS INC,937040,...,6,750,$21.99,$32.99,2,$135.00,1.5,0.39,1,0
6,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,American Brandies,CONSTELLATION BRANDS INC,53214,...,24,375,$3.22,$4.83,1,$115.92,0.37,0.09,1,0
7,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,American Vodkas,Laird & Company,35914,...,24,375,$1.93,$2.90,1,$69.60,0.37,0.09,1,0
8,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,Imported Vodkas,BACARDI USA INC,34359,...,12,200,$5.00,$7.50,1,$90.00,0.2,0.05,1,0
9,5/1/17,4301,Sahota Food Mart,Des Moines,50320.0,"1805 SE 14th St\rDes Moines 50320\r(41.57222, ...",POLK,Imported Vodkas,BACARDI USA INC,34423,...,12,375,$9.00,$13.50,1,$162.00,0.37,0.09,1,0


Identify the average number of bottles sold.

In [34]:
# your answer here
liq.bottles_sold.describe()

count    427923.000000
mean          2.251461
std           3.976579
min           0.000000
25%           1.000000
50%           1.000000
75%           3.000000
max         315.000000
Name: bottles_sold, dtype: float64

In [35]:
# option 2
liq.bottles_sold.mean()

2.251461127352351

Identify unique number of pack sizes.

In [36]:
# your answer here
liq.pack.nunique()

15

Determine the average volume sold (in gallons) in May 2018.

In [37]:
# your answer here
liq[liq.is_may_2018 == 1].volume_sold_gal.describe()

count    218537.000000
mean          0.521378
std           1.612147
min           0.000000
25%           0.190000
50%           0.390000
75%           0.520000
max         145.620000
Name: volume_sold_gal, dtype: float64

**Challenge:** Identify the top store (as measured by `bottles_sold`) in Johnson County.

In [38]:
# your answer here
liq[liq.county == 'JOHNSON'].sort_values(by='bottles_sold', ascending = False).head(1)

Unnamed: 0,date,store_number,store_name,city,zip_code,location,county,category_name,vendor_name,item_number,...,pack,bottle_vol_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale,volumne_sold_l,volume_sold_gal,is_may_2017,is_may_2018
256128,5/8/18,4677,Costco Wholesale #1111 / Coralville,Coralville,52241.0,2900 Heartland Dr\rCoralville 52241\r(41.69802...,JOHNSON,American Vodkas,Levecke Corporation,936600,...,6,1750,$8.93,$13.40,270,$80.40,472.5,124.82,0,1


## Recap

We covered a lot of ground! It's ok if this takes a while to gel.

```python

# basic DataFrame operations
df.head()
df.tail()
df.shape
df.columns
df.Index

# selecting columns
df.column_name
df['column_name']

# renaming columns
df.rename({'old_name':'new_name'}, inplace=True)
df.columns = ['new_column_a', 'new_column_b']

# notable columns operations
df.describe() # five number summary
df.column_name.nunique() # number of unique values
df.column_name.value_counts() # number of occurrences of each value in column

# filtering
df[df.column_name < 50] # filter column to be less than 50

# sorting
df.sort_values(by='column_name', ascending = False) # sort biggest to smallest

```


It's common to refer back to your own code *all the time.* Don't hesistate to reference this guide!

Moreover, there's a file `intro-to-pandas-i-answers.py`, that makes referencing this in the future easy. (Or just export this notebook to a `.py` via Files > Download as > .py ) 🐼


