

```
# Jadyn Dangerfield
# Assignment: 02 Pandas Series and DataFrames
```



# 🐼 Notebook 02: Pandas Series and DataFrames

Congratulations. You've graduated from lists and dictionaries — it’s time to wield a more powerful tool: **Pandas**.

This notebook introduces you to the two most essential data structures in data science:

- `Series`: a one-dimensional labeled array (like a dictionary and list had a well-organized child)
- `DataFrame`: a two-dimensional table with labeled axes (basically Excel’s smarter cousin)

---

In [None]:
import pandas as pd

## 🔢 Series from a List
Simple but powerful — labels give it superpowers.

In [None]:
# A Series from a list — just like a list, but fancier and labeled
prices = pd.Series([2.99, 4.49, 1.99], index=["Apple", "Milk", "Bread"])
print("Grocery Prices:")
print(prices)

# Accessing elements like a dictionary
print("\nPrice of Milk:", prices["Milk"])

Grocery Prices:
Apple    2.99
Milk     4.49
Bread    1.99
dtype: float64

Price of Milk: 4.49


## 🗺️ Series from a Dictionary
Keys become the index — pretty intuitive, right?

In [None]:
population = {
    "Texas": 29_000_000,
    "California": 39_000_000,
    "New York": 19_000_000
}

state_pop = pd.Series(population)
print("State Populations:")
print(state_pop)

# Series support math!
print("\nPopulation in millions:")
print(state_pop / 1_000_000)

State Populations:
Texas         29000000
California    39000000
New York      19000000
dtype: int64

Population in millions:
Texas         29.0
California    39.0
New York      19.0
dtype: float64


## 📋 Creating a DataFrame
Like a spreadsheet, but you control the universe.

In [None]:
# Creating a DataFrame — the bread and butter of Pandas
data = {
    "Name": ["Alice", "Bob", "Charlie", "Diana"],
    "GPA": [3.9, 2.7, 3.4, 3.8],
    "Credits": [90, 45, 60, 120],
    "Graduating": [False, False, False, True]
}

students = pd.DataFrame(data)
print("Student DataFrame:")
print(students)

Student DataFrame:
      Name  GPA  Credits  Graduating
0    Alice  3.9       90       False
1      Bob  2.7       45       False
2  Charlie  3.4       60       False
3    Diana  3.8      120        True


## 🎯 Column Access + Row Access

In [None]:
# Access a column
print("\nGPA column:")
print(students["GPA"])

# Access a row
print("\nCharlie's record:")
print(students.loc[2])


GPA column:
0    3.9
1    2.7
2    3.4
3    3.8
Name: GPA, dtype: float64

Charlie's record:
Name          Charlie
GPA               3.4
Credits            60
Graduating      False
Name: 2, dtype: object


## 🧪 Inspect the DataFrame

In [None]:
# Shape and structure
print("\nShape:", students.shape)
print("\nColumns:", students.columns.tolist())

# Info dump
print("\nDataFrame Info:")
print(students.info())

# Stats summary
print("\nDataFrame Summary Stats:")
print(students.describe())


Shape: (4, 4)

Columns: ['Name', 'GPA', 'Credits', 'Graduating']

DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        4 non-null      object 
 1   GPA         4 non-null      float64
 2   Credits     4 non-null      int64  
 3   Graduating  4 non-null      bool   
dtypes: bool(1), float64(1), int64(1), object(1)
memory usage: 232.0+ bytes
None

DataFrame Summary Stats:
            GPA     Credits
count  4.000000    4.000000
mean   3.450000   78.750000
std    0.544671   33.260337
min    2.700000   45.000000
25%    3.225000   56.250000
50%    3.600000   75.000000
75%    3.825000   97.500000
max    3.900000  120.000000


## 🧼 Rename a Column

In [None]:
students.rename(columns={"Graduating": "Is_Graduating"}, inplace=True)
print("\nRenamed column:")
print(students.head())


Renamed column:
      Name  GPA  Credits  Is_Graduating
0    Alice  3.9       90          False
1      Bob  2.7       45          False
2  Charlie  3.4       60          False
3    Diana  3.8      120           True


---
## 🔍 Your Turn

1. Create a `Series` from a dictionary of state abbreviations to population (use fake or real data).
2. Create a `DataFrame` for 5 students with columns: `Name`, `GPA`, `Credits`, `Graduating` (True/False).
3. Try accessing a row using `.loc[]` and a column using bracket notation.
4. Print the `.shape`, `.columns`, and `.info()` of your DataFrame.

🎯 Bonus: Rename a column just to mess with the future grader.

In [None]:
# Create a series from a dictionary of state abbreviations to population
populations = {
    "TX": 31_290_831,
    "LA": 4_507_740,
    "NM": 2_130_256,
    "KS": 2_970_606,
    "PA": 13_078_751
}

state_population = pd.Series(populations)

print("State Population:")
print(state_population)

State Population:
TX    31290831
LA     4507740
NM     2130256
KS     2970606
PA    13078751
dtype: int64


In [None]:
# Create a dataframe for 5 students with columns: Name, GPA, Credits, Graduating (True/False)
student_info = {
    "Name": ["Jadyn", "Gracie", "Sophia", "Maggie", "Daiza"],
    "GPA": [3.25, 3.1, 4.0, 3.6, 3.8],
    "Credits": [143, 28, 32, 16, 150],
    "is_Graduating": [True, False, False, False, True]
}

student_df = pd.DataFrame(student_info)

print("Student DataFrame:")
print(student_df)

Student DataFrame:
     Name   GPA  Credits  is_Graduating
0   Jadyn  3.25      143           True
1  Gracie  3.10       28          False
2  Sophia  4.00       32          False
3  Maggie  3.60       16          False
4   Daiza  3.80      150           True


In [None]:
# Access a row using .loc[]
print("\nJadyn's Record:")
print(student_df.loc[0])
# Access a column using bracket notation
print("\nSophia's GPA:")
print(student_df["GPA"][2])


Jadyn's Record:
Name             Jadyn
GPA               3.25
Credits            143
is_Graduating     True
Name: 0, dtype: object

Sophia's GPA:
4.0


In [None]:
# shape
print("\nShape:", student_df.shape)
# columns
print("\nColumns:", student_df.columns.tolist())
# info
print("\nDataFrame Info:")
print(student_df.info())


Shape: (5, 4)

Columns: ['Name', 'GPA', 'Credits', 'is_Graduating']

DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Name           5 non-null      object 
 1   GPA            5 non-null      float64
 2   Credits        5 non-null      int64  
 3   is_Graduating  5 non-null      bool   
dtypes: bool(1), float64(1), int64(1), object(1)
memory usage: 257.0+ bytes
None


In [None]:
# Bonus
student_df.rename(columns={"is_Graduating": "WTF"}, inplace=True)
print("\nRenamed column:")
print(student_df.head())


Renamed column:
     Name   GPA  Credits    WTF
0   Jadyn  3.25      143   True
1  Gracie  3.10       28  False
2  Sophia  4.00       32  False
3  Maggie  3.60       16  False
4   Daiza  3.80      150   True


---
## 📎 Side Notes
- A `Series` is like a single column of a spreadsheet.
- A `DataFrame` is like the full spreadsheet.
- Rows and columns can both have labels (called the **index** and **columns**).

Next stop: loading data from the real world. Brace yourself for CSVs.