<img src= "https://static.wixstatic.com/media/9278e7_c8e6664df6e44185b1da6e60e9e8da6c~mv2.png/v1/fill/w_110,h_110,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/black_logo.png">

# Sp25 PDM Intro to Data
Built and presented by Shivani Sahni and Rahil Shaik

March 19th 2025

### Section 1: Introduction to Jupyter Notebooks

#### 1.1: Introduction to Jupyter Notebooks:
Jupyter Notebooks are interactive documents that combine code, text, and visualizations, making them ideal for data analysis and teaching.​ These are commonplace in research, machine learning, and quantitative finance settings to perform exploratory data work. It enables us to run different experiments to see how we can improve a model's performance in a streamlined and convenient manner.

#### 1.2: Operating a Jupyter Notebook:

- Running Cells: Each notebook consists of cells that can contain code or text. To execute a code cell, click on it and press `Shift + Enter`. There is also a button when you hover a cell that resembles a play button that allows you to run the cell. 

- Creating Cells: You can make two types of cells in python notebooks: markdown and code. Markdowns are generally used to add explanatory text around your code cells. Code cells are used for... coding! There are options at the top taskbar to choose between markdown and code. If you double clik into this cell you can see the scripting for this markdown! 

#### 1.3: Understanding how Kernel's work
A kernel is the computational engine that executes the code in the notebook. We will select a python kernel to execute the cells in this python notebook. If the kernel stops or "dies", you can restart it with the above taskbar using 'Kernel' > 'Restart'.


### Section 2: Python and Pandas Basics

#### 2.1: Setting up your Python environment
There are a few options here including installing Python to your local system, creating a Python virtual environment (venv, conda). Today we will create a python venv virtual environment because they are genearlly lightweight and a major advantage being that you can create isolated environments that use different versions of libraries or Python itself.

If you are using macOS, you need to install Homebrew, which helps manage packages easily (I think you guys all have macOS). Access your terminal and run the below commands:

`/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"`

`echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile`

`eval "$(/opt/homebrew/bin/brew shellenv)" `

To ensure you have installed brew, run this command

`brew --version`

Then install python and git

`brew install python`

`brew install git`

Then you can clone the PDM repository using the git script

`git clone https://github.com/rahilisashaik/pdm-intro-to-data.git`

You should then access this directory in visual studio code, google colab, or where ever you would like. I assume it is at your default directory:

`/Users/rahilshaik/pdm-intro-to-data` or `~/pdm-intro-to-data`

For the rest of these instructions, you should be in the built in terminal for your coding environment (colab, vs code, jupyter)

Check if python installed correctly with

`python3 --version`

`pip --version` or `pip3 --version`

Now we can create a python virtual environment for this project using the below commands

`python -m venv pdmdata` or `python3 -m venv pdmdata`

`source pdmdata/bin/activate`

Pip is a package manger, if any point you get `ModuleNotFoundError`, you can use pip to install those packages. I have listed the package requirements for this project in the 'requirements.txt' file, we can use pip to install them. 

`pip3 install -r requirements.txt`


Now you're ready to start coding!


#### 2.2: Basics of Python

First we'll talk about variables, variable types, and how python interprets and stores data.

In [188]:
# these are a bunch of package imports, the great thing about coding in 2025 is the grunt work is 
# almost always done for you so you can just import packages that do tasks for you

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.cluster import KMeans
import seaborn as sns
import util

In [189]:
# Integer
x = 10  
print(x)

10


In [190]:
# Float
y = 10.5  
print(y)  # <class 'float'>

10.5


In [191]:
# String
name = "Leponda"
print(name)  # <class 'str'>

Leponda


In [192]:
# Boolean
is_student = True
print(is_student)  # <class 'bool'>

True


In [193]:
pledges = ["gurnoor", "arjun", "sarah", "katie", "sadie", "aathma", "jay"]
pledge_points = [4, 4, -1, 8, 16, 4, 4] # as of 03/17 at 7:21 PM

print(pledges)  # <class 'list'>
print(pledge_points)  # <class 'list'>

['gurnoor', 'arjun', 'sarah', 'katie', 'sadie', 'aathma', 'jay']
[4, 4, -1, 8, 16, 4, 4]


There are a few manipulations you can do with lists that are pretty useful

Guess what this will do

In [194]:
[1, 2, 3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

Now we're using numpy arrays, similar to lists but guess what this will return

In [195]:
np.array([1, 2, 3]) + np.array([4, 5, 6])

array([5, 7, 9])

In [196]:
a = np.array([[1, 2, 3],
             [4, 5, 6],
             [7, 8, 9]])

b = np.array([[2, 4, 6,],
             [8, 10, 12],
             [14, 16, 18]])
a + b

array([[ 3,  6,  9],
       [12, 15, 18],
       [21, 24, 27]])

Now try this out yourself! Create a 4 x 3 matrix like such:

$$ \begin{bmatrix}
1 & 2 & 3\\
5 & 6 & 7\\
8 & 9 & 10\\
\end{bmatrix} $$

Then, output a matrix where each column is subtracted by its (each column's) average.  
_Hint: use np.mean(axis=)_

In [197]:
#TODO: define np.array with the above items

In [198]:
#TODO: get means of each column and subtract from each column in matrix

In [199]:
#TODO: print the output

In [200]:
# Dictionary (key-value pairs)
trash_pledge_leaderboard = {"top_pledge": "rahil", "bottom_pledge": "shivani"}
print(trash_pledge_leaderboard)  # <class 'dict'>

{'top_pledge': 'rahil', 'bottom_pledge': 'shivani'}


This doesn't look right, let's use the lists we created and update the dictionary with the correct pledge and pledge points.

In [201]:
pledge_to_points = zip(pledges, pledge_points)
pledge_leaderboard = dict(pledge_to_points)

print(pledge_leaderboard)

{'gurnoor': 4, 'arjun': 4, 'sarah': -1, 'katie': 8, 'sadie': 16, 'aathma': 4, 'jay': 4}


Let's use some python syntax to return the top and bottom pledge. We'll start with a brief overview of for loops and if statements in python.

In [202]:
for pledge in pledges:
    print(pledge)

gurnoor
arjun
sarah
katie
sadie
aathma
jay


In [203]:
for i in range(len(pledges)):
    print(pledges[i])

gurnoor
arjun
sarah
katie
sadie
aathma
jay


In [204]:
for pledge, points in pledge_leaderboard.items():
    if points > np.mean(pledge_points):
        print("The pledges doing above average are", pledge)
        

The pledges doing above average are katie
The pledges doing above average are sadie


In [205]:
least_points = float('inf')
most_points = float('-inf')

bottom_pledge = ""
top_pledge = ""

for pledge, points in pledge_leaderboard.items():
    if points < least_points:
        least_points = points
        bottom_pledge = pledge
        
    if points > most_points:
        most_points = points
        top_pledge = pledge

In [206]:
print("top pledge is", top_pledge, "with", pledge_leaderboard[top_pledge], "points")
print("bottom pledge is", bottom_pledge, "with", pledge_leaderboard[bottom_pledge], "points")

top pledge is sadie with 16 points
bottom pledge is sarah with -1 points


The last piece of syntax we'll go over is indexing and slicing

In [207]:
print(pledges)

['gurnoor', 'arjun', 'sarah', 'katie', 'sadie', 'aathma', 'jay']


In [208]:
# direct indexing
pledges[2]

'sarah'

In [209]:
pledges[-1]

'jay'

You can also use slicing:

`list[start:end:step]`

If you leave `start` blank it will default to 0
If you leave `end` blank it will default to the last item in the list
If you leave `step` blank it will default to 1

In [210]:
pledges[1:6:2]

['arjun', 'katie', 'aathma']

#### 2.3: Using Pandas for Exploratory Data Analysis
We will use a data set from sklearn to practice about california housing, pandas enables us to read this information in as a 'dataframe'.

In [211]:
df = pd.read_csv("train.csv")

Use `.head()` to get the first 5 rows of your data frame

In [212]:
df.head(2)

Unnamed: 0,Id,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,PRICE
0,14196,3.2596,33.0,5.017657,1.006421,2300.0,3.691814,32.71,-117.03,1.03
1,8267,3.8125,49.0,4.473545,1.041005,1314.0,1.738095,33.77,-118.16,3.821


A few operations on the dataframe you can use to extract information

In [None]:
# df.head()  # Show first 5 rows
# df.tail(3)  # Show last 3 rows
# df.shape  # Get number of rows and columns
# df.columns  # List column names
# df["MedInc"].value_counts() # summary of specific value occurences for a set
df.describe()  # Summary statistics for numerical columns


Unnamed: 0,Id,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,PRICE
count,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0
mean,10311.052931,3.880754,28.608285,5.435235,1.096685,1426.453004,3.096961,35.643149,-119.58229,2.071947
std,5969.605185,1.904294,12.602499,2.387375,0.433215,1137.05638,11.578744,2.136665,2.005654,1.156226
min,1.0,0.4999,1.0,0.888889,0.333333,3.0,0.692308,32.55,-124.35,0.14999
25%,5149.75,2.5667,18.0,4.452055,1.006508,789.0,2.428799,33.93,-121.81,1.198
50%,10338.5,3.5458,29.0,5.235874,1.049286,1167.0,2.81724,34.26,-118.51,1.7985
75%,15476.25,4.773175,37.0,6.061037,1.100348,1726.0,3.28,37.72,-118.01,2.65125
max,20639.0,15.0001,52.0,141.909091,25.636364,35682.0,1243.333333,41.95,-114.31,5.00001


You can reference specific column names using brackets

In [214]:
df["MedInc"]  # Select a single column (returns a Series)
df[["MedInc", "HouseAge"]]  # Select multiple columns

Unnamed: 0,MedInc,HouseAge
0,3.2596,33.0
1,3.8125,49.0
2,4.1563,4.0
3,1.9425,36.0
4,3.5542,43.0
...,...,...
16507,6.3700,35.0
16508,3.0500,33.0
16509,2.9344,36.0
16510,5.7192,15.0


There are two methods to access specific partitions of the dataframe in pandas including `.query()` and bracket notation

In [215]:
df.query("MedInc > 5.6431 and HouseAge > 20")

Unnamed: 0,Id,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,PRICE
7,9389,7.9892,37.0,6.052758,0.954436,999.0,2.395683,37.91,-122.53,5.00001
21,18056,8.5425,35.0,7.508403,1.018908,1325.0,2.783613,37.24,-121.98,5.00001
40,5252,11.1978,32.0,7.270344,1.031646,2768.0,2.502712,34.10,-118.47,5.00001
43,5549,6.0190,42.0,5.702454,1.033742,813.0,2.493865,33.97,-118.38,2.94500
92,16394,6.6605,29.0,7.825397,1.038095,859.0,2.726984,38.03,-121.25,2.20700
...,...,...,...,...,...,...,...,...,...,...
16469,11016,6.2944,33.0,6.815725,1.051597,1229.0,3.019656,33.76,-117.82,2.65600
16474,8792,11.9993,31.0,8.324090,0.996534,1490.0,2.582322,33.80,-118.45,5.00001
16479,3556,8.5650,31.0,8.107438,1.004132,697.0,2.880165,34.25,-118.56,5.00001
16481,17912,7.8543,35.0,5.986111,1.032407,701.0,3.245370,37.36,-121.98,2.81900


In [216]:
df[(df["MedInc"] > 5) & (df["HouseAge"] > 20)]

Unnamed: 0,Id,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,PRICE
7,9389,7.9892,37.0,6.052758,0.954436,999.0,2.395683,37.91,-122.53,5.00001
21,18056,8.5425,35.0,7.508403,1.018908,1325.0,2.783613,37.24,-121.98,5.00001
25,413,5.4171,52.0,6.764706,1.075163,722.0,2.359477,37.89,-122.28,2.92000
40,5252,11.1978,32.0,7.270344,1.031646,2768.0,2.502712,34.10,-118.47,5.00001
43,5549,6.0190,42.0,5.702454,1.033742,813.0,2.493865,33.97,-118.38,2.94500
...,...,...,...,...,...,...,...,...,...,...
16479,3556,8.5650,31.0,8.107438,1.004132,697.0,2.880165,34.25,-118.56,5.00001
16481,17912,7.8543,35.0,5.986111,1.032407,701.0,3.245370,37.36,-121.98,2.81900
16496,5311,5.3777,41.0,5.479401,1.000000,601.0,2.250936,34.06,-118.43,5.00001
16501,16023,5.1238,52.0,6.167076,1.076167,1153.0,2.832924,37.73,-122.45,3.35100


In [217]:
df[(df["MedInc"] > 5) | (df["HouseAge"] > 20)]

Unnamed: 0,Id,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,PRICE
0,14196,3.2596,33.0,5.017657,1.006421,2300.0,3.691814,32.71,-117.03,1.030
1,8267,3.8125,49.0,4.473545,1.041005,1314.0,1.738095,33.77,-118.16,3.821
3,14265,1.9425,36.0,4.002817,1.033803,1418.0,3.994366,32.69,-117.11,0.934
4,2271,3.5542,43.0,6.268421,1.134211,874.0,2.300000,36.78,-119.80,0.965
5,17848,6.6227,20.0,6.282147,1.008739,2695.0,3.364544,37.42,-121.86,2.648
...,...,...,...,...,...,...,...,...,...,...
16507,11284,6.3700,35.0,6.129032,0.926267,658.0,3.032258,33.78,-117.96,2.292
16508,11964,3.0500,33.0,6.868597,1.269488,1753.0,3.904232,34.02,-117.43,0.978
16509,5390,2.9344,36.0,3.986717,1.079696,1756.0,3.332068,34.03,-118.38,2.221
16510,860,5.7192,15.0,6.395349,1.067979,1777.0,3.178891,37.58,-121.96,2.835


We can also get specfic rows and columns using `.iloc[]` and `.loc[]`

In [218]:
df.iloc[0]  # Select first row (by index)
df.iloc[:3]  # Select first three rows

df.loc[0, "MedInc"]  # Select a specific value (row 0, column "Name")
df.loc[:, "HouseAge"]  # Select all rows for "Age" column

0        33.0
1        49.0
2         4.0
3        36.0
4        43.0
         ... 
16507    35.0
16508    33.0
16509    36.0
16510    15.0
16511    52.0
Name: HouseAge, Length: 16512, dtype: float64

The next pandas operations we will cover is sorting which you can do in ascending and descending order and also across multiple columns

args:

`by` decides which columns to sort by

`ascending` dictates the order of sorting (increasing, decreasing)

In [219]:
df.sort_values(by="HouseAge")  # Sort by HouseAge (ascending)
df.sort_values(by="HouseAge", ascending=False)  # Sort by HouseAge (descending)
df.sort_values(by=["HouseAge", "MedInc"], ascending=[True, False])  # Sort by multiple columns

Unnamed: 0,Id,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,PRICE
3811,18972,5.2636,1.0,7.694030,1.279851,872.0,3.253731,38.23,-122.00,1.91300
6230,3130,4.8750,1.0,5.533333,1.000000,32.0,2.133333,35.08,-117.95,1.41700
8120,19536,4.2500,1.0,20.125000,2.928571,402.0,3.589286,37.65,-120.93,1.89200
1596,1566,15.0001,2.0,22.222222,2.222222,25.0,2.777778,37.74,-121.96,3.50000
2497,13177,8.4411,2.0,10.296296,1.166667,179.0,3.314815,33.97,-117.78,5.00001
...,...,...,...,...,...,...,...,...,...,...
1392,16204,0.6960,52.0,5.333333,1.592593,272.0,5.037037,37.95,-121.29,0.42500
12218,14348,0.5360,52.0,5.000000,1.000000,13.0,2.600000,32.75,-117.19,1.62500
8410,6343,0.4999,52.0,3.875000,0.562500,44.0,2.750000,34.06,-117.75,1.12500
10977,19523,0.4999,52.0,2.870968,0.854839,152.0,2.451613,37.65,-121.01,0.82500


Now that you have an understanding of how to view different parts of the dataframe, you can create your own columns using the below commands

In [220]:
df["MedIncNorm"] = (df["MedInc"] - min(df["MedInc"])) / (max(df["MedInc"]) - min(df["MedInc"]))  # Adding new column normalizing between 0-1

If you realize adding that column is a dumb ass idea, you can `.drop()` the column

`axis` indicates whether to do the operation across all column or all rows (axis = 1 performs operations across rows and axis = 0 performs operations across columns)

`inplace` dictates whether you mutate the original dataframe or create a copy dataframe with the operation updated

In [221]:
#TODO: Drop MedIncNorm column from dataframe

Finally, we'll explore aggregation & grouping across columns

In [222]:
df["MedInc"].mean()  # Average salary
df["MedInc"].min()  # Minimum age

np.float64(0.4999)

We can also aggregate HouseAges together and take their mean. For example, for a HouseAge of 1.0, we get a 'Population' mean of 328.500000

In [223]:
df.groupby("HouseAge")["Population"].mean().head(5)  # Average salary per age group

HouseAge
1.0     435.333333
2.0    1812.347826
3.0    3086.000000
4.0    2661.246667
5.0    2481.034314
Name: Population, dtype: float64

Readability of your code and dataframe are very important, some of these column names are not super readable so let's `.rename()` them

`columns` key-value pairs that map old column name to new column name

`inplace` same as above

In [224]:
df.rename(columns={"MedInc": "MedianIncome"}, inplace=True)
df

Unnamed: 0,Id,MedianIncome,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,PRICE,MedIncNorm
0,14196,3.2596,33.0,5.017657,1.006421,2300.0,3.691814,32.71,-117.03,1.030,0.190322
1,8267,3.8125,49.0,4.473545,1.041005,1314.0,1.738095,33.77,-118.16,3.821,0.228452
2,17445,4.1563,4.0,5.645833,0.985119,915.0,2.723214,34.66,-120.48,1.726,0.252162
3,14265,1.9425,36.0,4.002817,1.033803,1418.0,3.994366,32.69,-117.11,0.934,0.099488
4,2271,3.5542,43.0,6.268421,1.134211,874.0,2.300000,36.78,-119.80,0.965,0.210638
...,...,...,...,...,...,...,...,...,...,...,...
16507,11284,6.3700,35.0,6.129032,0.926267,658.0,3.032258,33.78,-117.96,2.292,0.404829
16508,11964,3.0500,33.0,6.868597,1.269488,1753.0,3.904232,34.02,-117.43,0.978,0.175867
16509,5390,2.9344,36.0,3.986717,1.079696,1756.0,3.332068,34.03,-118.38,2.221,0.167894
16510,860,5.7192,15.0,6.395349,1.067979,1777.0,3.178891,37.58,-121.96,2.835,0.359947


We can also improve readability of code using comment lines to document commands. Highlight any block of code and hit `Cmd + /` to comment all of it. Without highlighting anything, `Cmd + /` will comment the current line you're on. Generally adding a `#` before any line will make it commented.

Use any of the above commenting methods to remove the error from below

In [225]:
# below I am printing voyager seniors

print("shivani")
print("christine")
print("eric")
print("nolan")
print("vaarun")

shivani
christine
eric
nolan
vaarun


#### 2.4: Visualization
Presenting your findings and visualizing trends is an essential data science skill. We will use matplotlib.pyplot, which is industry standard for data visualization as it provides a lot of freedom in your visualization.

In [236]:
df

Unnamed: 0,Id,MedianIncome,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,PRICE,MedIncNorm
0,14196,3.2596,33.0,5.017657,1.006421,2300.0,3.691814,32.71,-117.03,1.030,0.190322
1,8267,3.8125,49.0,4.473545,1.041005,1314.0,1.738095,33.77,-118.16,3.821,0.228452
2,17445,4.1563,4.0,5.645833,0.985119,915.0,2.723214,34.66,-120.48,1.726,0.252162
3,14265,1.9425,36.0,4.002817,1.033803,1418.0,3.994366,32.69,-117.11,0.934,0.099488
4,2271,3.5542,43.0,6.268421,1.134211,874.0,2.300000,36.78,-119.80,0.965,0.210638
...,...,...,...,...,...,...,...,...,...,...,...
16507,11284,6.3700,35.0,6.129032,0.926267,658.0,3.032258,33.78,-117.96,2.292,0.404829
16508,11964,3.0500,33.0,6.868597,1.269488,1753.0,3.904232,34.02,-117.43,0.978,0.175867
16509,5390,2.9344,36.0,3.986717,1.079696,1756.0,3.332068,34.03,-118.38,2.221,0.167894
16510,860,5.7192,15.0,6.395349,1.067979,1777.0,3.178891,37.58,-121.96,2.835,0.359947


In [241]:
# Honestly i think i am better suited doing this for you guys so you can see how plotting works

### Section 3: Basics of Linear Modeling
#### 3.1: Building a Linear Model

Your objective is to be able to predict a value based on other features by constructing a linear relationship between the features and the predicted value. I've provided you guys with a training set and a testing set, we'll talk about how you can use this  

In [226]:
train_df = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")

In [227]:
X_train = train_df.drop(columns=["PRICE", "Id"])  # Drop target column
y_train = train_df["PRICE"]  # Target variable

In [228]:
X_test = test_df

In [229]:
model = LinearRegression()
model.fit(X_train, y_train)

y_train_pred = model.predict(X_train)

#### 3.2 Model Evaluation
For our purposes, we'll be evaluating on r-squared, the formula of which is provided below. This metric is essentially explaining the proportion of variance the relationship you are modeling accounts for. In simple terms, it is a measure of correlation of our linear model that is normalized between 0 and 1. Closer to 1 means generally more accurate and closer to 0 is less accurate (although if it is 1.0 exactly your model is probably overfitting).

$$ 
R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}
$$

In [230]:
print(f"R-squared: {r2_score(y_train, y_train_pred):.4f}")

R-squared: 0.6126


We can see that the linear regression model only provides an r-squared of 0.61 so ts is some mid. 

In terms of evaluating your model, you can use this helper method I have below

In [231]:
def generate_submission_file(model, test_features, name, submission_number):

    ids = test_features["Id"].astype(str) 
    test_features = test_features.drop(columns=["Id"])  # drop 'Id' for prediction

    X_test = test_features[X_train.columns]
    y_pred = model.predict(X_test)

    submission = pd.DataFrame({
        "Id": ids,  
        "Predicted": y_pred 
    })

    filename = f"{name}_{submission_number}.csv"
    submission.to_csv(filename, index=False)

    return filename


In [232]:
test_features = pd.read_csv("test.csv")
generate_submission_file(model=model, test_features=test_features, name="Rahil", submission_number=1) # update your submisssion number


'Rahil_1.csv'

### Submissions & Deliverables

Now that you have your submission csv which should populate in your file directory after you run the above cell, see how your results compare with the rest of the PDM kaggle competition:

1. Go to the below link:

https://www.kaggle.com/competitions/pdm-linear-modeling-comp/code

2. Click 'Submit Predictions'

3. Then click 'Upload Submission'

4. Click 'Submit'

5. Check the 'Leaderboard' and see how your results compare!

6. Submit as many submissions as you want, you should be able to achieve > 0.90 r2 on this dataset (try using random forest regression instead of linear regression, hyperparam tuning, feature engineering)

In [233]:
#TODO: Build a linear model and submit your csv on kaggle 