# üêº Notebook 02: Pandas Series and DataFrames

Congratulations. You've graduated from lists and dictionaries ‚Äî it‚Äôs time to wield a more powerful tool: **Pandas**.

This notebook introduces you to the two most essential data structures in data science:

- `Series`: a one-dimensional labeled array (like a dictionary and list had a well-organized child)
- `DataFrame`: a two-dimensional table with labeled axes (basically Excel‚Äôs smarter cousin)

---

In [2]:
import pandas as pd

## üî¢ Series from a List
Simple but powerful ‚Äî labels give it superpowers.

In [3]:
# A Series from a list ‚Äî just like a list, but fancier and labeled
prices = pd.Series([2.99, 4.49, 1.99], index=["Apple", "Milk", "Bread"])
print("Grocery Prices:")
print(prices)

# Accessing elements like a dictionary
print("\nPrice of Milk:", prices["Milk"])

Grocery Prices:
Apple    2.99
Milk     4.49
Bread    1.99
dtype: float64

Price of Milk: 4.49


## üó∫Ô∏è Series from a Dictionary
Keys become the index ‚Äî pretty intuitive, right?

In [4]:
population = {
    "Texas": 29_000_000,
    "California": 39_000_000,
    "New York": 19_000_000
}

state_pop = pd.Series(population)
print("State Populations:")
print(state_pop)

# Series support math!
print("\nPopulation in millions:")
print(state_pop / 1_000_000)

State Populations:
Texas         29000000
California    39000000
New York      19000000
dtype: int64

Population in millions:
Texas         29.0
California    39.0
New York      19.0
dtype: float64


## üìã Creating a DataFrame
Like a spreadsheet, but you control the universe.

In [5]:
# Creating a DataFrame ‚Äî the bread and butter of Pandas
data = {
    "Name": ["Alice", "Bob", "Charlie", "Diana"],
    "GPA": [3.9, 2.7, 3.4, 3.8],
    "Credits": [90, 45, 60, 120],
    "Graduating": [False, False, False, True]
}

students = pd.DataFrame(data)
print("Student DataFrame:")
print(students)

Student DataFrame:
      Name  GPA  Credits  Graduating
0    Alice  3.9       90       False
1      Bob  2.7       45       False
2  Charlie  3.4       60       False
3    Diana  3.8      120        True


## üéØ Column Access + Row Access

In [6]:
# Access a column
print("\nGPA column:")
print(students["GPA"])

# Access a row
print("\nCharlie's record:")
print(students.loc[2])


GPA column:
0    3.9
1    2.7
2    3.4
3    3.8
Name: GPA, dtype: float64

Charlie's record:
Name          Charlie
GPA               3.4
Credits            60
Graduating      False
Name: 2, dtype: object


## üß™ Inspect the DataFrame

In [7]:
# Shape and structure
print("\nShape:", students.shape)
print("\nColumns:", students.columns.tolist())

# Info dump
print("\nDataFrame Info:")
print(students.info())

# Stats summary
print("\nDataFrame Summary Stats:")
print(students.describe())


Shape: (4, 4)

Columns: ['Name', 'GPA', 'Credits', 'Graduating']

DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        4 non-null      object 
 1   GPA         4 non-null      float64
 2   Credits     4 non-null      int64  
 3   Graduating  4 non-null      bool   
dtypes: bool(1), float64(1), int64(1), object(1)
memory usage: 232.0+ bytes
None

DataFrame Summary Stats:
            GPA     Credits
count  4.000000    4.000000
mean   3.450000   78.750000
std    0.544671   33.260337
min    2.700000   45.000000
25%    3.225000   56.250000
50%    3.600000   75.000000
75%    3.825000   97.500000
max    3.900000  120.000000


## üßº Rename a Column

In [8]:
students.rename(columns={"Graduating": "Is_Graduating"}, inplace=True)
print("\nRenamed column:")
print(students.head())


Renamed column:
      Name  GPA  Credits  Is_Graduating
0    Alice  3.9       90          False
1      Bob  2.7       45          False
2  Charlie  3.4       60          False
3    Diana  3.8      120           True


---
## üîç Your Turn

1. Create a `Series` from a dictionary of state abbreviations to population (use fake or real data).
2. Create a `DataFrame` for 5 students with columns: `Name`, `GPA`, `Credits`, `Graduating` (True/False).
3. Try accessing a row using `.loc[]` and a column using bracket notation.
4. Print the `.shape`, `.columns`, and `.info()` of your DataFrame.

üéØ Bonus: Rename a column just to mess with the future grader.

In [13]:
state_pop = {
    "TX":50_000_000,
    "NY":20_000_000,
    "CAL":35_000_000
}

state_pop_pd = pd.Series(state_pop)
state_pop_pd

Unnamed: 0,0
TX,50000000
NY,20000000
CAL,35000000


In [15]:
import pandas as pd

data = {
    "Name": ["Alice", "Brandon", "Carlos", "Diana", "Emily"],
    "GPA": [3.8, 3.2, 2.9, 3.5, 3.9],
    "Credits": [90, 75, 60, 105, 120],
    "Graduating": [False, False, False, True, True]
}

df = pd.DataFrame(data)
print(df)

df.loc[3]
df['Credits']


      Name  GPA  Credits  Graduating
0    Alice  3.8       90       False
1  Brandon  3.2       75       False
2   Carlos  2.9       60       False
3    Diana  3.5      105        True
4    Emily  3.9      120        True


Unnamed: 0,Credits
0,90
1,75
2,60
3,105
4,120


In [18]:
df.shape
df.columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        5 non-null      object 
 1   GPA         5 non-null      float64
 2   Credits     5 non-null      int64  
 3   Graduating  5 non-null      bool   
dtypes: bool(1), float64(1), int64(1), object(1)
memory usage: 257.0+ bytes


---
## üìé Side Notes
- A `Series` is like a single column of a spreadsheet.
- A `DataFrame` is like the full spreadsheet.
- Rows and columns can both have labels (called the **index** and **columns**).

Next stop: loading data from the real world. Brace yourself for CSVs.