# Working with Pandas Dataframes
We use Pandas dataframes to manipulate scouting data within Python. A dataframe is like a table in a database or spreadsheet. It has rows and columns and there are many functions for manipulating the data.

### Setting Up Our Environment
The Python pandas package is designed to work closely with yet another Python package called numpy. Numpy adds the ability to work with very large arrays of data. We'll generally import both Pandas and Numpy together.

Pay close attention to the comments in the code.

In [2]:
# Setting Up Our Environment
import pandas as pd
import numpy as np

# We also need to set our working directory to wherever we put the excel spreadsheet with scouting data.
import os
os.chdir('C:/Users/stacy/OneDrive/Projects/Python_Notebooks')
os.getcwd()

'C:\\Users\\stacy\\OneDrive\\Projects\\Python_Notebooks'

### Reading the Excel File
Pandas has the ability to create a dataframe directly from an Excel file.

In [3]:
# Reading the Excel File
#   NOTE: The Excel file must be located in the working directory (see above)
filename = "test.xlsx"
df = pd.read_excel(filename, "Rankings", index_col = None)
print "Initial Length: " + str(len(df))
df

Initial Length: 615


Unnamed: 0,team,phase,task,sum_successes,sum_attempts
0,1258,auto,holdFuelCapacity,1.0,0.0
1,1258,auto,holdGear,7.0,0.0
2,1258,auto,moveBaseline,5.0,4.0
3,1258,auto,placeGear,2.0,0.0
4,1258,auto,startingLocation,0.0,0.0
5,1258,finish,maintainContact,3.0,0.0
6,1258,finish,pushTouchPad,6.0,0.0
7,1258,teleop,defendMovement,1.0,1.0
8,1258,teleop,pickupFuelFloor,1.0,0.0
9,1258,teleop,pickupFuelRetrieval,0.0,0.0


### Dropping Empty Rows
Our spreadsheet contains some rows with no data because there are teams in the database that did not participate in our
most recent competition. Let's remove that data.

In [4]:
# Let's see some of those rows with no data.
#   Note the technique for extracting just a few rows from the dataframe.
df[85:95]

Unnamed: 0,team,phase,task,sum_successes,sum_attempts
85,2046,teleop,pickupFuelRetrieval,0.0,0.0
86,2046,teleop,pickupGearFloor,11.0,11.0
87,2046,teleop,pickupGearRetrieval,7.0,7.0
88,2046,teleop,placeGear,22.0,22.0
89,2046,teleop,shootHighBoiler,5.0,27.0
90,2412,,,,
91,2522,,,,
92,2555,,,,
93,2557,auto,holdFuelCapacity,0.0,0.0
94,2557,auto,holdGear,5.0,0.0


In [5]:
# Now let's drop any row with NaN (standf for Not a Number) in the phase column.
df = df.dropna(axis = 0, subset = ['phase'])
print "Length after dropping empty rows: " + str(len(df))
df[85:95]

Length after dropping empty rows: 580


Unnamed: 0,team,phase,task,sum_successes,sum_attempts
86,2046,teleop,pickupGearFloor,11.0,11.0
87,2046,teleop,pickupGearRetrieval,7.0,7.0
88,2046,teleop,placeGear,22.0,22.0
89,2046,teleop,shootHighBoiler,5.0,27.0
93,2557,auto,holdFuelCapacity,0.0,0.0
94,2557,auto,holdGear,5.0,0.0
95,2557,auto,moveBaseline,1.0,4.0
96,2557,auto,placeGear,1.0,0.0
97,2557,auto,startingLocation,0.0,0.0
98,2557,finish,disabled,1.0,0.0


### Querying the Dataframe
Use the `query` method to filter the dataframe.

In [6]:
# Lets look at only the rows where sum_attempts is greater than 0
df_att = df.query("sum_attempts > 0")
print "# Rows with attempts > 0: " + str(len(df_att))
df_att

# Rows with attempts > 0: 206


Unnamed: 0,team,phase,task,sum_successes,sum_attempts
2,1258,auto,moveBaseline,5.0,4.0
7,1258,teleop,defendMovement,1.0,1.0
10,1258,teleop,pickupGearFloor,2.0,2.0
11,1258,teleop,pickupGearRetrieval,4.0,4.0
12,1258,teleop,placeGear,11.0,13.0
16,1294,auto,moveBaseline,7.0,7.0
17,1294,auto,placeGear,3.0,1.0
23,1294,teleop,pickupGearRetrieval,14.0,14.0
24,1294,teleop,placeGear,21.0,21.0
25,1294,teleop,shootHighBoiler,2.0,6.0


In [7]:
# Queries can be chained to apply multiple criteria
df.query("sum_attempts > 0").query("phase == 'auto'")

Unnamed: 0,team,phase,task,sum_successes,sum_attempts
2,1258,auto,moveBaseline,5.0,4.0
16,1294,auto,moveBaseline,7.0,7.0
17,1294,auto,placeGear,3.0,1.0
28,1318,auto,moveBaseline,6.0,4.0
30,1318,auto,shootHighBoiler,2.0,2.0
47,1899,auto,moveBaseline,2.0,5.0
61,1983,auto,moveBaseline,7.0,7.0
74,2046,auto,moveBaseline,7.0,7.0
76,2046,auto,shootHighBoiler,0.0,30.0
95,2557,auto,moveBaseline,1.0,4.0


In [8]:
# Accessing Individual Columns
#   The head() function returns the top five rows of the dataframe.
print df.team.head()

# Accessing idividual values
print
print df.sum_successes[4]

0    1258
1    1258
2    1258
3    1258
4    1258
Name: team, dtype: int64

0.0


In [9]:
# Accessing rows based on content
print df.query("team == 1318").query("task == 'placeGear'").query("phase == 'teleop'").sum_successes

42    15.0
Name: sum_successes, dtype: float64


### Setting Indexes
Pandas assigns a name to everyt row a dataframe . The list of row names is the index of the dataframe. If you don't specify the index when you create the dataframe, Pandas will use integers, with the first row at index 0, the second row at index 1, and so on. This is what we've seen so far in the df dataframe that we read in from Excel.

Pandas can also use multi-level indexes. See below.

In [10]:
# Setting a multi-level index
df_indexed = df.set_index(['team', 'phase', 'task'])
df_indexed

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,sum_successes,sum_attempts
team,phase,task,Unnamed: 3_level_1,Unnamed: 4_level_1
1258,auto,holdFuelCapacity,1.0,0.0
1258,auto,holdGear,7.0,0.0
1258,auto,moveBaseline,5.0,4.0
1258,auto,placeGear,2.0,0.0
1258,auto,startingLocation,0.0,0.0
1258,finish,maintainContact,3.0,0.0
1258,finish,pushTouchPad,6.0,0.0
1258,teleop,defendMovement,1.0,1.0
1258,teleop,pickupFuelFloor,1.0,0.0
1258,teleop,pickupFuelRetrieval,0.0,0.0


### Reshaping the Dataframe
The dataframe above has a three-level index, with the teams as the first (0) index, phase as the second (1) index, and  task as the third (2) index. With multi-level indices, reshaping the data is pretty easy.

In [11]:
# Add additional columns for each task
df_indexed.unstack(2).head(15)

Unnamed: 0_level_0,Unnamed: 1_level_0,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,...,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts
Unnamed: 0_level_1,task,defendMovement,disabled,hasRobot,holdFuelCapacity,holdGear,maintainContact,moveBaseline,pickupFuelFloor,pickupFuelHopper,pickupFuelRetrieval,...,pickupFuelRetrieval,pickupGearFloor,pickupGearRetrieval,placeGear,pushTouchPad,shootHighBoiler,shootLowBoiler,shotPercentHigh,shotPercentLow,startingLocation
team,phase,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
360,auto,,,,0.0,7.0,,5.0,,,,...,,,,1.0,,,,,,0.0
360,finish,,,,,,4.0,,,,,...,,,,,0.0,,,0.0,,
360,teleop,1.0,,,,,,,,,,...,,,10.0,11.0,,,,,,
568,auto,,,,2.0,6.0,,4.0,,,,...,,,,2.0,,,0.0,,,0.0
568,finish,,1.0,0.0,,,1.0,,,,,...,,,,,0.0,,,0.0,,
568,teleop,,,,,,,,0.0,0.0,1.0,...,0.0,,6.0,11.0,,,6.0,,,
949,auto,,,,0.0,8.0,,3.0,,,,...,,,,1.0,,,,,,0.0
949,finish,,1.0,2.0,,,,,,,,...,,,,,0.0,,,0.0,0.0,
949,teleop,,,,,,,,1.0,,,...,,,1.0,6.0,,,,,,
1258,auto,,,,1.0,7.0,,5.0,,,,...,,,,0.0,,,,,,0.0


In [12]:
# Or maybe we want a column for each team.
df_indexed.unstack(0).head(15)

Unnamed: 0_level_0,Unnamed: 1_level_0,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,...,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts
Unnamed: 0_level_1,team,360,568,949,1258,1294,1318,1899,1983,2046,2557,...,4681,4915,5495,5588,5683,5803,5827,5937,6350,6503
phase,task,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
auto,holdFuelCapacity,0.0,2.0,0.0,1.0,3.0,8.0,0.0,0.0,3.0,0.0,...,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0
auto,holdGear,7.0,6.0,8.0,7.0,10.0,10.0,7.0,12.0,9.0,5.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
auto,moveBaseline,5.0,4.0,3.0,5.0,7.0,6.0,2.0,7.0,7.0,1.0,...,5.0,6.0,8.0,8.0,6.0,7.0,7.0,5.0,7.0,0.0
auto,placeGear,4.0,3.0,0.0,2.0,3.0,3.0,1.0,5.0,4.0,1.0,...,2.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0
auto,shootHighBoiler,,,,,,2.0,,,0.0,,...,,50.0,,0.0,,,,,10.0,
auto,shootLowBoiler,,0.0,,,0.0,,,,,,...,,,,,,,,,,
auto,startingLocation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
finish,disabled,,1.0,1.0,,,1.0,,2.0,,1.0,...,,,,,,0.0,0.0,,,0.0
finish,hasRobot,,0.0,2.0,,,,,,,,...,,0.0,,,,,,,,
finish,maintainContact,4.0,1.0,,3.0,3.0,7.0,3.0,4.0,7.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


## The Answer We Want -- A Multi-level Unstacked Dataframe That We Can Save to Excel

In [13]:
# Actually, what we really want is one row per team and a column for each task and phase
#   We'll do a mult-level unstack
df_unstacked = df_indexed.unstack([1,2])
df_unstacked.head(15)

Unnamed: 0_level_0,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,sum_successes,...,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts,sum_attempts
phase,auto,auto,auto,auto,auto,finish,finish,teleop,teleop,teleop,...,teleop,teleop,auto,teleop,auto,finish,finish,finish,teleop,finish
task,holdFuelCapacity,holdGear,moveBaseline,placeGear,startingLocation,maintainContact,pushTouchPad,defendMovement,pickupFuelFloor,pickupFuelRetrieval,...,placeGear,shootHighBoiler,shootLowBoiler,pickupFuelHopper,shootHighBoiler,disabled,shotPercentHigh,shotPercentLow,shootLowBoiler,hasRobot
team,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
360,0.0,7.0,5.0,4.0,0.0,4.0,7.0,1.0,,,...,11.0,,,,,,0.0,,,
568,2.0,6.0,4.0,3.0,0.0,1.0,2.0,,0.0,1.0,...,11.0,,0.0,0.0,,0.0,0.0,,6.0,0.0
949,0.0,8.0,3.0,0.0,0.0,,0.0,,1.0,,...,6.0,,,,,0.0,0.0,0.0,,0.0
1258,1.0,7.0,5.0,2.0,0.0,3.0,6.0,1.0,1.0,0.0,...,13.0,0.0,,,,,,,,
1294,3.0,10.0,7.0,3.0,0.0,3.0,6.0,,,,...,21.0,6.0,0.0,0.0,,,,,,
1318,8.0,10.0,6.0,3.0,0.0,7.0,8.0,,4.0,0.0,...,22.0,10.0,,0.0,2.0,0.0,0.0,0.0,,
1899,0.0,7.0,2.0,1.0,0.0,3.0,3.0,1.0,0.0,0.0,...,19.0,0.0,,,,,,,,
1983,0.0,12.0,7.0,5.0,0.0,4.0,9.0,,,,...,22.0,0.0,,,,0.0,0.0,0.0,,
2046,3.0,9.0,7.0,4.0,0.0,7.0,7.0,1.0,1.0,0.0,...,22.0,27.0,,0.0,30.0,,0.0,0.0,,
2557,0.0,5.0,1.0,1.0,0.0,1.0,1.0,,2.0,2.0,...,9.0,6.0,,0.0,,0.0,,,,


In [14]:
# Now let's save that to Excel
df_unstacked.to_excel("rankings.xlsx", sheet_name = "Rankings")

Voila

In [20]:
df_unstacked.sum_success['auto']

AttributeError: 'DataFrame' object has no attribute 'sum_success'