# <font color="blue"> LESSON 2: Intro to Pandas for Excel Users </font>


Pandas does everything that Excel can do and more, only faster, more flexible, and fully reproducible. <br>

Panda has a lot of power, but at a high level, the module is really good at two things:
<ol>
<li>Munging Data Sets: helping you clean up and put data together into a format that is easy to use and analyze.</li>
<li>Automating the clean up of data sets (missing data, incongruent dates in series,etc).</li>
</ol>
Excel is simply not good at these things. Even if you are a keyboard jockey, it can take hours and hours to clean up and get even the smallest data sets to the point where you can do things like pivot tables etc (think lots of selecting, cutting and pasting).


# Lesson Goals

Knowing pandas is a great introduction to more powerful and complex data analysis. However, learning to use Python and pandas can seem challenging because there is no point-and-click user interface like in Excel. 

So don't worry if the transition from Excel to Pandas  seems daunting...

<img src=images/panda.jpg width=300px>

In this lesson we explore how do to some common Excel tasks in pandas, helping you learn one of the most powerful Python libraries for data analysis.

# Defintions

<u>Excel</u>: Excel is a spreadsheet software, containing data in tabular form. Entries of the data are located in cells, with numbered rows and letter labeled columns. 

<u>Pandas</u>: The Python package pandas is a great alternative to Excel, providing much of the same functionality and more. Pandas is great for data manipulation, cleaning, analysis, and exploration. Additionally, these tasks can be easily automated and reapplied to different datasets.

<u>DataFrame</u>: The equivalent to an Excel spreadsheet in pandas is the DataFrame. It looks like a spreadsheet, with rows, columns, and indices. In other words, a “table” of data is stored in a DataFrame. We can create a DataFrame from scratch, or more commonly, import the data from an Excel or csv file.

<u>Pivot Table</u> A pivot table is a table that summarizes data in another table, and is made by applying an operation such as sorting, averaging, or summing to data in the first table, typically including grouping of the data. Data stored in one table or spreadsheet is sorted, counted, totaled or averaged.  The results of this process are then displayed in a second table — the pivot table — showing the summarized data.




# Dataset
In the meantime, let's look at some common Excel functions/tools for analysis that can be done in Pandas. For the following examples, we will be exploring school progress report data from Chicago Public Schools for the 2016–2017 school year. The [data](https://data.cityofchicago.org/Education/Chicago-Public-Schools-School-Progress-Reports-SY1/cp7s-7gxg) contains detailed information at the school level, with one school per row.  Let's get started by loading the data...

In [1]:
# Load the package
import pandas as pd

In [3]:
# create a variable sy1617 for school year 2016-20017
# set sy1617 equal to the output from panda's module read_csv
# you can use the Shift and Shift+Tab shortcuts to see more about panda's modules

sy1617 = pd.read_csv('data/Chicago_Public_Schools_Progress_Reports_SY1617.csv', index_col='School_ID')

In [4]:
# view the contents using head()

sy1617.head()

Unnamed: 0_level_0,Short_Name,Long_Name,School_Type,Primary_Category,Address,City,State,Zip,Phone,Fax,...,Mobility_Rate_Pct,Chronic_Truancy_Pct,Empty_Progress_Report_Message,School_Survey_Rating_Description,Supportive_School_Award,Supportive_School_Award_Desc,Parent_Survey_Results_Year,School_Latitude,School_Longitude,Location
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
610385,MULTICULTURAL HS,Multicultural Academy of Scholarship,Small,HS,3120 S KOSTNER AVE,Chicago,Illinois,60623,7735354242,7735354000.0,...,9.8,,,This school is “Well-Organized for Improvement...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.835282,-87.735283,"3120 S KOSTNER AVE\nChicago, Illinois 60623\n(..."
609848,ALDRIDGE,Ira F Aldridge Elementary School,Neighborhood,ES,630 E 131ST ST,Chicago,Illinois,60827,7735355614,7735356000.0,...,43.0,,,This school is “Not Yet Organized for Improvem...,Emerging,This school has been rated Emerging for initia...,2016.0,41.657405,-87.606474,"630 E 131ST ST\nChicago, Illinois 60827\n(41.6..."
609954,GREGORY,John Milton Gregory Elementary School,Neighborhood,ES,3715 W POLK ST,Chicago,Illinois,60624,7735346820,7735346000.0,...,47.8,,,This school is “Moderately Organized for Impro...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.870742,-87.718666,"3715 W POLK ST\nChicago, Illinois 60624\n(41.8..."
610171,MIRELES,Arnold Mireles Elementary Academy,Neighborhood,ES,9000 S EXCHANGE AVE,Chicago,Illinois,60617,7735356360,7735356000.0,...,30.5,,,This school is “Moderately Organized for Impro...,Exemplary,This school has been rated Exemplary for its s...,2016.0,41.731717,-87.552788,"9000 S EXCHANGE AVE\nChicago, Illinois 60617\n..."
610212,ALBANY PARK,Albany Park Multicultural Academy,Neighborhood,MS,4929 N SAWYER AVE,Chicago,Illinois,60625,7735345108,7735345000.0,...,9.9,,,This school is “Well-Organized for Improvement...,Established,This school has been rated Established for its...,2016.0,41.971504,-87.710609,"4929 N SAWYER AVE\nChicago, Illinois 60625\n(4..."


# Accessing Data
<u>Excel</u>: One benefit of Excel is that the data is right in front of us at all times. We use basic point and click commands or keyboard shortcuts to select data.

<u>Pandas:</u> There are a few different ways to access specific rows, columns, and cells.

To access an individual column, use square brackets []. The output is a one dimensional array:

In [5]:
sy1617["School_Type"]

School_ID
610385           Small
609848    Neighborhood
609954    Neighborhood
610171    Neighborhood
610212    Neighborhood
              ...     
609723    Neighborhood
400048         Charter
609821    Neighborhood
610305    Neighborhood
609874    Neighborhood
Name: School_Type, Length: 661, dtype: object

To access multiple columns, specify a list of column names. The output is now a DataFrame:

In [6]:
sy1617[["Long_Name", "School_Type", "Primary_Category"]]

Unnamed: 0_level_0,Long_Name,School_Type,Primary_Category
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
610385,Multicultural Academy of Scholarship,Small,HS
609848,Ira F Aldridge Elementary School,Neighborhood,ES
609954,John Milton Gregory Elementary School,Neighborhood,ES
610171,Arnold Mireles Elementary Academy,Neighborhood,ES
610212,Albany Park Multicultural Academy,Neighborhood,MS
...,...,...,...
609723,John Marshall Metropolitan High School,Neighborhood,HS
400048,L.E.A.R.N. - Excel Campus,Charter,ES
609821,Burnham Elementary Inclusive Academy,Neighborhood,ES
610305,George Leland Elementary School,Neighborhood,ES


We can access a row by its index label:

In [7]:
sy1617.loc[610385]

Short_Name                                                       MULTICULTURAL HS
Long_Name                                    Multicultural Academy of Scholarship
School_Type                                                                 Small
Primary_Category                                                               HS
Address                                                        3120 S KOSTNER AVE
                                                      ...                        
Supportive_School_Award_Desc    This school is in the process of being reviewe...
Parent_Survey_Results_Year                                                   2016
School_Latitude                                                           41.8353
School_Longitude                                                         -87.7353
Location                        3120 S KOSTNER AVE\nChicago, Illinois 60623\n(...
Name: 610385, Length: 160, dtype: object

We can also specify the row number instead:

In [8]:
#Note that in the example below we start counting row numbers at 0 not 1

sy1617.iloc[0]

Short_Name                                                       MULTICULTURAL HS
Long_Name                                    Multicultural Academy of Scholarship
School_Type                                                                 Small
Primary_Category                                                               HS
Address                                                        3120 S KOSTNER AVE
                                                      ...                        
Supportive_School_Award_Desc    This school is in the process of being reviewe...
Parent_Survey_Results_Year                                                   2016
School_Latitude                                                           41.8353
School_Longitude                                                         -87.7353
Location                        3120 S KOSTNER AVE\nChicago, Illinois 60623\n(...
Name: 610385, Length: 160, dtype: object

To access a single cell, simply subset by row and column:



In [9]:
sy1617.loc[610385]["Long_Name"]

'Multicultural Academy of Scholarship'

# Basic Summary Statistics
<u>Excel</u>: Apply count, average, median, percentile, etc. functions across desired columns or rows.

<u>Pandas</u>: The describe method can display summary statistics for selected columns. The types of statistics outputted will depend on the data types of the columns.

Text columns will output count, unique, most common, and frequency of the most common:



In [10]:
sy1617[['School_Type', 'Primary_Category']].describe()

Unnamed: 0,School_Type,Primary_Category
count,661,661
unique,12,3
top,Neighborhood,ES
freq,400,470


Numerical columns will output count, mean, std, min, max, and percentiles:



In [11]:
sy1617[['School_Survey_Student_Response_Rate_Pct', 'Suspensions_Per_100_Students_Year_1_Pct']].describe()

Unnamed: 0,School_Survey_Student_Response_Rate_Pct,Suspensions_Per_100_Students_Year_1_Pct
count,657.0,514.0
mean,80.110198,8.271595
std,24.769436,17.292914
min,0.0,0.0
25%,75.7,0.825
50%,88.5,2.55
75%,95.8,7.8
max,99.9,188.7


# Filtering
<u>Excel</u>: Apply filters to column(s) to subset data by a specific value or by some condition.

<u>Pandas</u>: Subset a DataFrame by some condition. First, we apply a conditional statement to a column and obtain a Series of True/False booleans. We then put those results into square brackets to subset the DataFrame for only rows that meet the condition (i.e. are True).

For example, filter the DataFrame for schools that are of type “Charter”:



In [12]:
is_charter = sy1617['School_Type'] == 'Charter'
is_charter

School_ID
610385    False
609848    False
609954    False
610171    False
610212    False
          ...  
609723    False
400048     True
609821    False
610305    False
609874    False
Name: School_Type, Length: 661, dtype: bool

In [13]:
sy1617[is_charter]

Unnamed: 0_level_0,Short_Name,Long_Name,School_Type,Primary_Category,Address,City,State,Zip,Phone,Fax,...,Mobility_Rate_Pct,Chronic_Truancy_Pct,Empty_Progress_Report_Message,School_Survey_Rating_Description,Supportive_School_Award,Supportive_School_Award_Desc,Parent_Survey_Results_Year,School_Latitude,School_Longitude,Location
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
400116,MONTESSORI ENGLEWOOD,The Montessori School of Englewood Charter,Charter,ES,6936 S HERMITAGE AVE,Chicago,Illinois,60636,7735359255,7.735360e+09,...,33.8,,,This school is “Not Yet Organized for Improvem...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.774572,-87.676147,"6936 S HERMITAGE AVE\nChicago, Illinois 60636\..."
400117,NOBLE - HANSBERRY HS,Noble - Hansberry College Prep,Charter,HS,8748 S ABERDEEN ST,Chicago,Illinois,60620,7737293400,7.733042e+09,...,,,,This school is “Well-Organized for Improvement...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.734442,-87.650987,"8748 S ABERDEEN ST\nChicago, Illinois 60620\n(..."
400102,URBAN PREP - WEST HS,Urban Prep Charter Academy for Young Men - West,Charter,HS,1326 W 14TH PL,Chicago,Illinois,60608,7735348860,7.735341e+09,...,13.7,,,This school is “Partially Organized for Improv...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.862540,-87.660107,"1326 W 14TH PL\nChicago, Illinois 60608\n(41.8..."
400130,YCCS - YOUTH DEVELOPMENT,YCCS-Community Youth Development Institute HS,Charter,HS,7836 S UNION AVE,Chicago,Illinois,60620,7732242273,7.732242e+09,...,,,A School Progress Report customized for CPS Op...,This school is “Organized for Improvement” whi...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.751377,-87.641731,"7836 S UNION AVE\nChicago, Illinois 60620\n(41..."
400056,NOBLE - ROWE CLARK HS,Noble - Rowe-Clark Math and Science Academy,Charter,HS,3645 W CHICAGO AVE,Chicago,Illinois,60651,7732422212,7.738267e+09,...,,,,This school is “Moderately Organized for Impro...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.895362,-87.718047,"3645 W CHICAGO AVE\nChicago, Illinois 60651\n(..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
400170,NOBLE - ACADEMY HS,Noble - The Noble Academy,Charter,HS,1443 N OGDEN AVE,Chicago,Illinois,60610,3125741527,7.085754e+09,...,,,,This school is “Well-Organized for Improvement...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.906879,-87.645577,"1443 N OGDEN AVE\nChicago, Illinois 60610\n(41..."
400075,U OF C - DONOGHUE,University of Chicago - Donoghue,Charter,ES,707 E 37TH ST,Chicago,Illinois,60653,7732855301,7.732852e+09,...,4.9,,,This school is “Moderately Organized for Impro...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.827726,-87.608501,"707 E 37TH ST\nChicago, Illinois 60653\n(41.82..."
400163,KIPP - BLOOM,KIPP Chicago Charter School - KIPP Bloom,Charter,ES,5515 S LOWE AVE,Chicago,Illinois,60621,7739388565,7.737837e+09,...,15.1,,,This school is “Well-Organized for Improvement...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.793556,-87.639967,"5515 S LOWE AVE\nChicago, Illinois 60621\n(41...."
400021,CATALYST - CIRCLE ROCK,Catalyst Elementary Charter School - Circle Rock,Charter,ES,5608 W WASHINGTON BLVD,Chicago,Illinois,60644,7739455025,3.126262e+09,...,5.1,,,This school is “Organized for Improvement” whi...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.882322,-87.765322,"5608 W WASHINGTON BLVD\nChicago, Illinois 6064..."


We can look for multiple values in a column, such as “Charter” and “Magnet” schools, using .isin():


In [14]:
charter_magnet = sy1617['School_Type'].isin(['Charter','Magnet'])
charter_magnet

School_ID
610385    False
609848    False
609954    False
610171    False
610212    False
          ...  
609723    False
400048     True
609821    False
610305    False
609874    False
Name: School_Type, Length: 661, dtype: bool

In [15]:
sy1617[charter_magnet]

Unnamed: 0_level_0,Short_Name,Long_Name,School_Type,Primary_Category,Address,City,State,Zip,Phone,Fax,...,Mobility_Rate_Pct,Chronic_Truancy_Pct,Empty_Progress_Report_Message,School_Survey_Rating_Description,Supportive_School_Award,Supportive_School_Award_Desc,Parent_Survey_Results_Year,School_Latitude,School_Longitude,Location
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
610405,SUDER,Suder Montessori Magnet ES,Magnet,ES,2022 W WASHINGTON BLVD,Chicago,Illinois,60612,7735347685,7.735348e+09,...,2.7,,,This school is “Organized for Improvement” whi...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.883215,-87.677636,"2022 W WASHINGTON BLVD\nChicago, Illinois 6061..."
400116,MONTESSORI ENGLEWOOD,The Montessori School of Englewood Charter,Charter,ES,6936 S HERMITAGE AVE,Chicago,Illinois,60636,7735359255,7.735360e+09,...,33.8,,,This school is “Not Yet Organized for Improvem...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.774572,-87.676147,"6936 S HERMITAGE AVE\nChicago, Illinois 60636\..."
400117,NOBLE - HANSBERRY HS,Noble - Hansberry College Prep,Charter,HS,8748 S ABERDEEN ST,Chicago,Illinois,60620,7737293400,7.733042e+09,...,,,,This school is “Well-Organized for Improvement...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.734442,-87.650987,"8748 S ABERDEEN ST\nChicago, Illinois 60620\n(..."
400102,URBAN PREP - WEST HS,Urban Prep Charter Academy for Young Men - West,Charter,HS,1326 W 14TH PL,Chicago,Illinois,60608,7735348860,7.735341e+09,...,13.7,,,This school is “Partially Organized for Improv...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.862540,-87.660107,"1326 W 14TH PL\nChicago, Illinois 60608\n(41.8..."
609941,RANDOLPH,Asa Philip Randolph Elementary School,Magnet,ES,7316 S HOYNE AVE,Chicago,Illinois,60636,7735359015,7.735359e+09,...,48.9,,,This school is “Not Yet Organized for Improvem...,Emerging,This school has been rated Emerging for initia...,2016.0,41.760614,-87.675966,"7316 S HOYNE AVE\nChicago, Illinois 60636\n(41..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
400170,NOBLE - ACADEMY HS,Noble - The Noble Academy,Charter,HS,1443 N OGDEN AVE,Chicago,Illinois,60610,3125741527,7.085754e+09,...,,,,This school is “Well-Organized for Improvement...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.906879,-87.645577,"1443 N OGDEN AVE\nChicago, Illinois 60610\n(41..."
400075,U OF C - DONOGHUE,University of Chicago - Donoghue,Charter,ES,707 E 37TH ST,Chicago,Illinois,60653,7732855301,7.732852e+09,...,4.9,,,This school is “Moderately Organized for Impro...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.827726,-87.608501,"707 E 37TH ST\nChicago, Illinois 60653\n(41.82..."
400163,KIPP - BLOOM,KIPP Chicago Charter School - KIPP Bloom,Charter,ES,5515 S LOWE AVE,Chicago,Illinois,60621,7739388565,7.737837e+09,...,15.1,,,This school is “Well-Organized for Improvement...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.793556,-87.639967,"5515 S LOWE AVE\nChicago, Illinois 60621\n(41...."
400021,CATALYST - CIRCLE ROCK,Catalyst Elementary Charter School - Circle Rock,Charter,ES,5608 W WASHINGTON BLVD,Chicago,Illinois,60644,7739455025,3.126262e+09,...,5.1,,,This school is “Organized for Improvement” whi...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.882322,-87.765322,"5608 W WASHINGTON BLVD\nChicago, Illinois 6064..."


Filter for schools with student survey response rates of at least 80%:


In [16]:
gt80 = sy1617['School_Survey_Student_Response_Rate_Pct'] >= 80
gt80

School_ID
610385    False
609848     True
609954     True
610171     True
610212     True
          ...  
609723    False
400048    False
609821     True
610305    False
609874     True
Name: School_Survey_Student_Response_Rate_Pct, Length: 661, dtype: bool

In [17]:
sy1617[gt80]

Unnamed: 0_level_0,Short_Name,Long_Name,School_Type,Primary_Category,Address,City,State,Zip,Phone,Fax,...,Mobility_Rate_Pct,Chronic_Truancy_Pct,Empty_Progress_Report_Message,School_Survey_Rating_Description,Supportive_School_Award,Supportive_School_Award_Desc,Parent_Survey_Results_Year,School_Latitude,School_Longitude,Location
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
609848,ALDRIDGE,Ira F Aldridge Elementary School,Neighborhood,ES,630 E 131ST ST,Chicago,Illinois,60827,7735355614,7.735356e+09,...,43.0,,,This school is “Not Yet Organized for Improvem...,Emerging,This school has been rated Emerging for initia...,2016.0,41.657405,-87.606474,"630 E 131ST ST\nChicago, Illinois 60827\n(41.6..."
609954,GREGORY,John Milton Gregory Elementary School,Neighborhood,ES,3715 W POLK ST,Chicago,Illinois,60624,7735346820,7.735346e+09,...,47.8,,,This school is “Moderately Organized for Impro...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.870742,-87.718666,"3715 W POLK ST\nChicago, Illinois 60624\n(41.8..."
610171,MIRELES,Arnold Mireles Elementary Academy,Neighborhood,ES,9000 S EXCHANGE AVE,Chicago,Illinois,60617,7735356360,7.735356e+09,...,30.5,,,This school is “Moderately Organized for Impro...,Exemplary,This school has been rated Exemplary for its s...,2016.0,41.731717,-87.552788,"9000 S EXCHANGE AVE\nChicago, Illinois 60617\n..."
610212,ALBANY PARK,Albany Park Multicultural Academy,Neighborhood,MS,4929 N SAWYER AVE,Chicago,Illinois,60625,7735345108,7.735345e+09,...,9.9,,,This school is “Well-Organized for Improvement...,Established,This school has been rated Established for its...,2016.0,41.971504,-87.710609,"4929 N SAWYER AVE\nChicago, Illinois 60625\n(4..."
610405,SUDER,Suder Montessori Magnet ES,Magnet,ES,2022 W WASHINGTON BLVD,Chicago,Illinois,60612,7735347685,7.735348e+09,...,2.7,,,This school is “Organized for Improvement” whi...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.883215,-87.677636,"2022 W WASHINGTON BLVD\nChicago, Illinois 6061..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
400129,YCCS - PROGRESSIVE LEADERSHIP,YCCS-Progressive Leadership Academy,Charter,HS,6620 S DR MARTIN LUTHER KING JR DR,Chicago,Illinois,60637,7733633837,7.737239e+09,...,,,A School Progress Report customized for CPS Op...,This school is “Organized for Improvement” whi...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.773962,-87.615740,"6620 S DR MARTIN LUTHER KING JR DR\nChicago, I..."
400163,KIPP - BLOOM,KIPP Chicago Charter School - KIPP Bloom,Charter,ES,5515 S LOWE AVE,Chicago,Illinois,60621,7739388565,7.737837e+09,...,15.1,,,This school is “Well-Organized for Improvement...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.793556,-87.639967,"5515 S LOWE AVE\nChicago, Illinois 60621\n(41...."
400021,CATALYST - CIRCLE ROCK,Catalyst Elementary Charter School - Circle Rock,Charter,ES,5608 W WASHINGTON BLVD,Chicago,Illinois,60644,7739455025,3.126262e+09,...,5.1,,,This school is “Organized for Improvement” whi...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.882322,-87.765322,"5608 W WASHINGTON BLVD\nChicago, Illinois 6064..."
609821,BURNHAM,Burnham Elementary Inclusive Academy,Neighborhood,ES,9928 S CRANDON AVE,Chicago,Illinois,60617,7735356530,7.735357e+09,...,31.8,,,This school is “Organized for Improvement” whi...,Established,This school has been rated Established for its...,2016.0,41.714402,-87.567131,"9928 S CRANDON AVE\nChicago, Illinois 60617\n(..."


In [18]:
sy1617[gt80][["Short_Name", "School_Survey_Student_Response_Rate_Pct"]]

Unnamed: 0_level_0,Short_Name,School_Survey_Student_Response_Rate_Pct
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1
609848,ALDRIDGE,90.6
609954,GREGORY,84.4
610171,MIRELES,95.4
610212,ALBANY PARK,96.5
610405,SUDER,80.2
...,...,...
400129,YCCS - PROGRESSIVE LEADERSHIP,88.7
400163,KIPP - BLOOM,80.7
400021,CATALYST - CIRCLE ROCK,86.8
609821,BURNHAM,90.4


We can combine mmultiple conditions with & and |:


In [19]:
sy1617[is_charter & gt80][["Short_Name", "School_Type", "School_Survey_Student_Response_Rate_Pct"]]

Unnamed: 0_level_0,Short_Name,School_Type,School_Survey_Student_Response_Rate_Pct
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
400102,URBAN PREP - WEST HS,Charter,85.0
400024,CICS - BASIL,Charter,86.0
400098,NOBLE - MUCHIN HS,Charter,98.1
400046,LEARN - BUTLER,Charter,86.5
400034,CICS - NORTHTOWN HS,Charter,85.8
...,...,...,...
400079,ACERO - ZIZUMBO,Charter,94.1
400164,INSTITUTO - LOZANO MASTERY HS,Charter,85.7
400129,YCCS - PROGRESSIVE LEADERSHIP,Charter,88.7
400163,KIPP - BLOOM,Charter,80.7


# Sorting
<u>Excel</u>: Sort the data by a certain column or set of columns.

<u>Pandas</u>: Sort the data using the sort_values method. For example, sort alphabetically by primary/middle/high school and school name:


In [20]:
sy1617.sort_values(by=['Primary_Category', 'Short_Name'])

Unnamed: 0_level_0,Short_Name,Long_Name,School_Type,Primary_Category,Address,City,State,Zip,Phone,Fax,...,Mobility_Rate_Pct,Chronic_Truancy_Pct,Empty_Progress_Report_Message,School_Survey_Rating_Description,Supportive_School_Award,Supportive_School_Award_Desc,Parent_Survey_Results_Year,School_Latitude,School_Longitude,Location
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
400153,ACERO - BRIGHTON PARK,Acero Charter Schools - Brighton Park,Charter,ES,4420 S FAIRFIELD AVE,Chicago,Illinois,60632,3124555434,3.124555e+09,...,,,,This school is “Moderately Organized for Impro...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.813109,-87.693238,"4420 S FAIRFIELD AVE\nChicago, Illinois 60632\..."
400101,ACERO - CISNEROS,Acero Charter Schools - Sandra Cisneros,Charter,ES,2744 W PERSHING RD,Chicago,Illinois,60632,7733768830,7.733769e+09,...,,,,This school is “Moderately Organized for Impro...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.822884,-87.693996,"2744 W PERSHING RD\nChicago, Illinois 60632\n(..."
400120,ACERO - CLEMENTE,Acero Charter Schools - Roberto Clemente,Charter,ES,2050 N NATCHEZ AVE,Chicago,Illinois,60707,3124555425,3.124555e+09,...,,,,This school is “Not Yet Organized for Improvem...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.917952,-87.787830,"2050 N NATCHEZ AVE\nChicago, Illinois 60707\n(..."
400121,ACERO - DE LA CRUZ,Acero Charter Schools - Sor Juana Inés de la Cruz,Charter,ES,7416 N RIDGE AVE,Chicago,Illinois,60645,3124555442,3.124555e+09,...,,,,This school is “Not Yet Organized for Improvem...,Coming Soon,This school is in the process of being reviewe...,2016.0,42.016476,-87.684406,"7416 N RIDGE AVE\nChicago, Illinois 60645\n(42..."
400081,ACERO - DE LAS CASAS,Acero Charter Schools - Bartolomé de las Casas,Charter,ES,1641 W 16TH ST,Chicago,Illinois,60608,3124323224,3.124321e+09,...,,,,This school is “Organized for Improvement” whi...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.859504,-87.667949,"1641 W 16TH ST\nChicago, Illinois 60608\n(41.8..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
610215,MADERO,Francisco I Madero Middle School,Neighborhood,MS,3202 W 28TH ST,Chicago,Illinois,60623,7735354466,7.735354e+09,...,6.8,,,This school is “Well-Organized for Improvement...,Emerging,This school has been rated Emerging for initia...,2016.0,41.840957,-87.705064,"3202 W 28TH ST\nChicago, Illinois 60623\n(41.8..."
610051,NORTHWEST,Northwest Middle School,Neighborhood,MS,5252 W PALMER ST,Chicago,Illinois,60639,7735343250,7.735343e+09,...,13.5,,,This school is “Partially Organized for Improv...,Coming Soon,This school is in the process of being reviewe...,2016.0,41.920555,-87.757854,"5252 W PALMER ST\nChicago, Illinois 60639\n(41..."
610588,RICHARDSON,Robert J. Richardson Middle School,Neighborhood,MS,6018 S KARLOV,Chicago,Illinois,60629,7735358640,7.735358e+09,...,,,This school does not have enough data to displ...,,,,,41.783826,-87.725422,"6018 S KARLOV\nChicago, Illinois 60629\n(41.78..."
610559,SHIELDS MIDDLE,James Shields Middle School,Neighborhood,MS,2611 W 48TH ST,Chicago,Illinois,60632,7735357115,7.735357e+09,...,8.2,,,This school is “Well-Organized for Improvement...,Exemplary,This school has been rated Exemplary for its s...,2016.0,41.806491,-87.689780,"2611 W 48TH ST\nChicago, Illinois 60632\n(41.8..."


# Pivot Tables
<u>Excel</u>: The drag and drop functions make it easy to aggregate and filter the data in any way. Here is a sample pivot table that groups by School_Type in the rows and Primary_Category in the columns, and calculates average School_Survey_Student_Response_Rate_Pct within the table.


<img src=images/pivottable.png>

<u>Pandas</u>: We can produce the same table in Pandas using the pivot_table function. This requires importing another package, numpy as np.

In [23]:
import numpy as np

In [22]:
pd.pivot_table(sy1617, values='School_Survey_Student_Response_Rate_Pct', index='School_Type', columns=['Primary_Category'], aggfunc=np.mean)

Primary_Category,ES,HS,MS
School_Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Career academy,,65.375,
Charter,66.033333,69.556061,99.4
Citywide-Option,0.0,65.235,
Classical,98.32,,
Contract,45.7,84.45,
Magnet,88.722222,84.342857,
Military academy,,87.883333,
Neighborhood,84.537681,80.424444,93.814286
Regional gifted center,93.38,,
Selective enrollment,,82.190909,


We can save this table as an output for later reference or use.

In [24]:
output = pd.pivot_table(sy1617, values='School_Survey_Student_Response_Rate_Pct', index='School_Type', columns=['Primary_Category'], aggfunc=np.mean)

In [None]:
output.to_excel("st_pivot_table.xlsx")

# VLOOKUPs
<u>Excel</u>: In a nutshell, the VLOOKUP function searches for a specific value in a range of cells, and then returns a value that lies in the same row as where the value is found.

You can use VLOOKUPs to join the relevant columns of one data set with another. For example, suppose we have the [school progress report](https://data.cityofchicago.org/Education/Chicago-Public-Schools-School-Progress-Reports-SY1/fvrx-esxp) for Chicago Public Schools from the prior year, in a sheet titled “SY1516.” We want to know the SY1516 Student_Attainment_Rating for every school in the SY1617 data set, so that we may analyze the change in rating between the two years.

We could bring in this data by creating a column of VLOOKUPs, using School_ID as the lookup value:




<img src=images/vlookup.png>


If we wanted to bring in any other columns from the SY1516 sheet, we would need to add an additional VLOOKUP column for each.

<u>Pandas</u>: Joining two data sets is much simpler in Pandas. 

Let’s use Pandas to replicate the above VLOOKUP example. The merge function allows us to combine the two data sets using their indices (School_ID) as a sort of “lookup value.”



In [26]:
# create a variable sy1617 for school year 2016-20017
# set sy1617 equal to the output from panda's module read_csv
# you can use the Shift and Shift+Tab shortcuts to see more about panda's modules

sy1516 = pd.read_csv('data/Chicago_Public_Schools_Progress_Reports_SY1516.csv', index_col='School_ID')

In [27]:
# view the contents using head()

sy1516.head()

Unnamed: 0_level_0,Short_Name,Long_Name,School_Type,Primary_Category,Address,City,State,Zip,Phone,Fax,...,Mobility_Rate_Pct,Chronic_Truancy_Pct,Empty_Progress_Report_Message,School_Survey_Rating_Description,Supportive_School_Award,Supportive_School_Award_Desc,Parent_Survey_Results_Year,School_Latitude,School_Longitude,Location
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
400009,GLOBAL CITIZENSHIP,Academy for Global Citizenship Charter School,Charter,ES,4647 W 47TH ST,Chicago,Illinois,60632,7735821000.0,7735821000.0,...,4.3,,This School Progress Report is currently under...,This school does not have enough data availabl...,NOT RATED,This school is in the process of being reviewe...,2015.0,41.807579,-87.740097,"4647 W 47TH ST\nChicago, Illinois 60632\n(41.8..."
400010,ACE TECH HS,ACE Technical Charter School,Charter,HS,5410 S STATE ST,Chicago,Illinois,60609,7735489000.0,7735489000.0,...,27.3,,This School Progress Report is currently under...,This school is “Partially Organized for Improv...,NOT RATED,This school is in the process of being reviewe...,2015.0,41.796122,-87.625849,"5410 S STATE ST\nChicago, Illinois 60609\n(41...."
400011,LOCKE A,Alain Locke Charter School,Charter,ES,3141 W JACKSON BLVD,Chicago,Illinois,60612,7732657000.0,7732657000.0,...,4.7,,This School Progress Report is currently under...,This school does not have enough data availabl...,NOT RATED,This school is in the process of being reviewe...,2015.0,41.877248,-87.705235,"3141 W JACKSON BLVD\nChicago, Illinois 60612\n..."
400013,ASPIRA - EARLY COLLEGE HS,ASPIRA Charter School - Early College High School,Charter,HS,3986 W BARRY AVE,Chicago,Illinois,60618,7732521000.0,7732674000.0,...,,,This School Progress Report is currently under...,This school is “Not Yet Organized for Improvem...,NOT RATED,This school is in the process of being reviewe...,2015.0,41.937298,-87.727096,"3986 W BARRY AVE\nChicago, Illinois 60618\n(41..."
400017,ASPIRA - HAUGAN,ASPIRA Charter School - Haugan Middle School,Charter,MS,3729 W LELAND AVE,Chicago,Illinois,60625,7732521000.0,7732674000.0,...,,,This School Progress Report is currently under...,This school is “Organized for Improvement” whi...,NOT RATED,This school is in the process of being reviewe...,2015.0,41.966406,-87.721825,"3729 W LELAND AVE\nChicago, Illinois 60625\n(4..."


In [28]:
sy1617_short = sy1617[['Student_Attainment_Rating', 'Long_Name']]
sy1516_short = sy1516[['Student_Attainment_Rating']]


In [29]:
sy1617_short.head()

Unnamed: 0_level_0,Student_Attainment_Rating,Long_Name
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1
610385,FAR BELOW AVERAGE,Multicultural Academy of Scholarship
609848,FAR BELOW AVERAGE,Ira F Aldridge Elementary School
609954,BELOW AVERAGE,John Milton Gregory Elementary School
610171,BELOW AVERAGE,Arnold Mireles Elementary Academy
610212,ABOVE AVERAGE,Albany Park Multicultural Academy


In [30]:
sy1516_short.head()

Unnamed: 0_level_0,Student_Attainment_Rating
School_ID,Unnamed: 1_level_1
400009,AVERAGE
400010,FAR BELOW AVERAGE
400011,ABOVE AVERAGE
400013,BELOW AVERAGE
400017,AVERAGE


In [31]:
pd.merge(sy1516_short, sy1617_short, left_index=True, right_index=True, suffixes=('_1516','_1617'))

Unnamed: 0_level_0,Student_Attainment_Rating_1516,Student_Attainment_Rating_1617,Long_Name
School_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
400009,AVERAGE,AVERAGE,Academy for Global Citizenship Charter School
400010,FAR BELOW AVERAGE,BELOW AVERAGE,ACE Technical Charter School
400011,ABOVE AVERAGE,NO DATA AVAILABLE,Alain Locke Charter School
400013,BELOW AVERAGE,BELOW AVERAGE,ASPIRA Charter School - Early College High School
400017,AVERAGE,AVERAGE,ASPIRA Charter School - Haugan Middle School
...,...,...,...
610573,NO DATA AVAILABLE,NO DATA AVAILABLE,Camelot Safe HS
610580,FAR BELOW AVERAGE,FAR BELOW AVERAGE,Magic Johnson- Humboldt Park HS
610581,NO DATA AVAILABLE,FAR BELOW AVERAGE,Magic Johnson- Brainerd HS
610586,,NO DATA AVAILABLE,Southeast Area Elementary School


We can save the results again.  This time as a CSV.

In [32]:
output = pd.merge(sy1516_short, sy1617_short, left_index=True, right_index=True, suffixes=('_1516','_1617'))

In [None]:
output.to_csv("vlookup.csv")

<img src=images/pandagiphy.gif width=300px>

<center> <b>Congratulations!  You finished Lesson 2.</b></center>

Link to [Lesson 3](03_Lesson_Cleaning_with_Pandas.ipynb)