# Linear Regression for NBA Player Weight

This notebook adapts a linear regression model to predict the **Weight** of NBA players based on their **Height** and **Age** using the `nba.csv` dataset. We also investigate if including the player's **Position** improves the model's performance.

In [None]:
import pandas as pd
import numpy as np
import sklearn.linear_model as lm

We read the data from `nba.csv`.

In [None]:
nba = pd.read_csv('nba.csv')
nba.head()

### Model 1: Height and Age
We first build a model using only numerical attributes: `Height` and `Age`.

In [None]:
X = np.array(nba[['Height', 'Age']])
Y = np.array(nba['Weight'])

M1 = lm.LinearRegression()
M1.fit(X, Y)

R2_simple = M1.score(X, Y)
print(f"R^2 score (Height, Age): {R2_simple:.4f}")

### Model 2: Height, Age, and Position
Now we investigate whether the score improves if we take the player's **Position** into account. Since position is a categorical variable, we use one-hot encoding.

In [None]:
# One-hot encode the 'Pos' column
nba_encoded = pd.get_dummies(nba, columns=['Pos'], drop_first=True)
nba_encoded

In [None]:
# Select the new features (Height, Age, and the encoded position columns)
X_plus_pos = nba_encoded.drop(columns=['Player', 'Weight'])

M2 = lm.LinearRegression()
M2.fit(X_plus_pos, Y)

R2_pos = M2.score(X_plus_pos, Y)
print(f"R^2 score (Height, Age, Position): {R2_pos:.4f}")

### Conclusion
By including the player's position in the regression model, the $R^2$ score improved from approximately **0.685** to **0.711**. This suggests that while height is the primary factor, the physical requirements of different positions (Center, Forward, Guard) account for additional variation in player weight.