# Pandas Introduction
New to pandas? No need to worry! We're going to walk through some of the basic functions here. If you don't understand how a function works, Google is your friend! There's extensive documentation, random StackOverflow posts, YouTube videos, and more that can help you understand how to do something.

In [100]:
# Let's import the pandas library so that we can use its functions.
import pandas as pd


In [101]:
# Pandas is useful because we can create tables to organize our data. These tables are called dataframes.

# Let's create a dataframe.

sdc_members = {
    "Name": ["Grace", "Ayesha", "Paco", "Ellie", "Isaac", "Danny"],
    "Class Year": [2025, 2024, 2023, 2023, 2025, 2025],
    "Position": ["Policy Team", "Policy Team", "President", "President", "Spotlights", "Lunch and Learn"]
}

df = pd.DataFrame(sdc_members)

In [122]:
# What if I want to look at just one row?
df.loc[1]

# Look at rows 2 through 4 (the two lines below are equivalent):
df.loc[2:4]
df.loc[[2, 3, 4]]

# Let's look at columns now!
df["Name"]
df[["Name", "Position"]]

In [103]:
# Your turn! Return a table with only the odd number rows.

# Now return a table with rows 1 through 4 of df (try doing this in two different ways)!

# Let's combine our indexing. Try returning a table that only has the "Name" and "Position" columns, 
# and also only has the first and second rows of df.


In [104]:
# Great job! You've learned some basic indexing and how we can alter a table to get the data we want.

# Now let's grab some actual sentencing data to look at.
mms_stats = pd.read_csv('../state/VA/VA_data/sentencing_data/sentencing_2021.csv')

# What does that data look like? Let's take a look. Try running each of the lines below, and see what it shows you.
# mms_stats
# mms_stats.head()
# mms_stats.tail()

  mms_stats = pd.read_csv('../state/VA/VA_data/sentencing_data/sentencing_2021.csv')


In [105]:
# Just to be safe, let's make a copy of this table to play around with.
mms_copy = mms_stats.copy()

In [106]:
# We'll be using .groupby() a lot to combine data into manageable chunks.
# Compare mms_copy with mms_copy2. How are they different?

mms_copy2 = mms_copy.groupby(["Offender Birth Month"]).count()

In [120]:
# Some columns have names that are too long to code easily with. Let's rename a column and group by it.

mms_copy3 = mms_copy.rename({"Total Effective Sentence (Imposed Less Suspended Time) incl. Alternative Programs (in Months)": 
                "Total Effective Sentence"}, axis=1
                )

mms_copy3.groupby(["Total Effective Sentence", "Calendar Year of Sentencing"]).count()

In [108]:
# Try it yourself! Rename the "Sentencing Guidelines Recommended High End of Range (in Months)" column to 
# "Sentencing Guidelines", and then try grouping by it.


In [109]:
# Now try grouping by multiple columns: "Calendar Year of Sentencing" and "Fiscal Year of Sentencing".
# What does the table look like? Why would we want to group by two columns?


In [119]:
# Another common function we'll use is .apply(). Pass in a function to .apply() and you can apply that function to
# every cell in a dataframe! Very useful when we want to manipulate data in terms of a policy function.

# Let's look at an example with our old sdc_members dataframe. 
# Run this cell to remind ourselves what the dataframe looks like.

df_copy = df.copy()
df_copy

In [118]:
# What if everyone in the table decides to take a gap year? Then our class year would increase by one. A tedious way
# to reflect that change in the dataframe would be manually editing each class year for each person:
df_copy.loc[0, "Class Year"] + 1;

# But we would have to do that for every row in the dataframe, and when you have a dataframe like mms_copy,
# with thousands of rows, that can get difficult. (Fun aside: try writing a for loop to accomplish this goal!)
# Instead, we can use .apply().

# To use .apply(), we'll need a function to apply to the dataframe. Let's create one:

def add_one(year): # def is the keyword for defining a function, add_one is the function name, and year is the input
    return year + 1 # year + 1 is what the function is doing, and the function is outputting (returning) that value

df_copy["Class Year"] = df_copy["Class Year"].apply(add_one)

# Ta da! We added a year to everyone's class.

df_copy

In [117]:
# Another quick way to do this would be to use a lambda function. A lambda function is a small function that 
# can be written in one line, so you don't have to define it separately. Here's what it looks like:
df_copy["Class Year"] = df_copy["Class Year"].apply(lambda year : year + 1)

df_copy

# Try using .apply and a lambda function to reflect a world in which everyone in the table graduates one year early.



In [115]:
# Your turn! Make a copy of df and use .apply() and a lambda function to convert everyone's class year to the 
# number of years they have left at Stanford. For example, I (Ayesha) am graduating in 2024, so I have two years left. 
# Once you're done with this, your output should match changed_years.
# When you're done, rename the "Class Year" column so it says "Years Left".

sdc_members_years_left = {
    "Name": ["Grace", "Ayesha", "Paco", "Ellie", "Isaac", "Danny"],
    "Years Left": [3, 2, 1, 1, 3, 3],
    "Position": ["Policy Team", "Policy Team", "President", "President", "Spotlights", "Lunch and Learn"]
}

years_left = pd.DataFrame(sdc_members_years_left)
years_left

# Your work here:



In [116]:
# Now try using .apply() and define a function (don't use lambda) to convert everyone's years left to 
# how many years they've spent at Stanford. For example, Grace has three years left, so she's spent one full year 
# at Stanford. The 3 in her row should be replaced with a 1. Paco has 1 year left, so he has spent three full years 
# at Stanford. The 1 in his row should be replaced with a 3. Once you're done with this, your output should match
# years_spent. Make a copy of df to work with, and rename any columns as needed.

sdc_members_years_spent = {
    "Name": ["Grace", "Ayesha", "Paco", "Ellie", "Isaac", "Danny"],
    "Years Spent": [1, 2, 3, 3, 1, 1],
    "Position": ["Policy Team", "Policy Team", "President", "President", "Spotlights", "Lunch and Learn"]
}

years_spent = pd.DataFrame(sdc_members_years_spent)
years_spent

# Your work here:



You did it! You've completed this pandas introduction and learned a few of the key functions that we tend to use when analyzing data. I'd recommend playing around with this notebook some more: give yourself more tasks! Try creating your own dataframe. Index into it, try new things, edit the cells, whatever you want. It can be a lot of fun once you get the hang of it! Google things you're not sure about, look up the pandas documentation, and don't be discouraged by error messages. It's all part of the learning process :)