<a href="https://colab.research.google.com/github/marvinpehlke/Marv/blob/main/Kopie_von_Your_first_Regression_Solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predicting Win Ratings in the NBA

Your first regression task of this week is to **predict** the Win Rating of a player based on certain characteristics!

Why is this a regression task? Because win rating is a continuous feature, it's on a scale!

## Importing the data

The data is available at this link: [https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/NBA.csv](https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/NBA.csv). Like in the previous challenge, you can either click on the link to download it or put the URL in the `pd.read_csv()` function.

In [None]:
import pandas as pd
df = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/NBA.csv")

What does the data look like? Do you see the **features and target**?

In [None]:
df

Unnamed: 0,season,poss,mp,do_ratio,pacing,win_rating
0,1987,4847,2409,-1.546275,-0.599918,1.643091
1,2017,5582,2708,0.996587,0.531888,10.808427
2,2016,4976,2407,0.242598,0.127976,3.054773
3,2021,1178,585,-1.343806,-1.201034,0.230004
4,1988,4534,2056,-1.813990,-0.128997,0.905614
...,...,...,...,...,...,...
3995,1986,1224,565,-1.761425,-0.011698,0.082828
3996,1994,3564,1785,0.907199,-0.125856,3.542416
3997,1998,706,378,0.848178,0.694183,0.787185
3998,2003,289,156,-6.298906,0.205844,-0.312601


## Defining X and y

First step: defining the features (X) and the target (y).

Let's start off with a simple example - let X be **the number of minutes** a player has played.

You can figure out what y should be from the task we are trying to achieve!

In [None]:
X = df[["mp"]]
y = df["win_rating"]

## Scaling the feature

Just like in the previous exercise, we need to scale our numerical feature.

Import and instantiate a scaler of your choice from the **Sklearn** library. Then, transform X and save the result to a variable called `X_scaled`.

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

## Train test split

As always, we need to split the data into train and test!

Use Sklearn's `train_test_split` to accomplish this.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size = 0.2)

## Modelling!

Now the exciting part: the Linear Regression.

`Linear Regression` is the fundamental regression model in Machine Learning! For now, you don't need to understand the specifics of how it works, just know that it can do regression tasks very easily!

Import, instantiate and fit a `LinearRegression` model on the training data!

In [None]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

## Scoring

Now we can score our model. Remember - we do this on the test data to get accurate and relevant results!

In [None]:
model.score(X_test, y_test)

0.5741338169866856

How can we interpret the score? For now, bigger is better (the scale goes up to 1). We'll understand more in depth how to make sense of this when we study Linear Regressions in detail.

## A more complete feature set

It's time to try the Linear Regression on all of the relevant features.

Create new `X` and `y` variables, this time with all of the features.

In [None]:
X = df[["poss", "mp", "do_ratio", "pacing"]]
y = df["win_rating"]

## Scaling

Now it's time to scale the data, just like we did before!

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

## Train test split

You know the drill by now! Split `X` and `y`!

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size = 0.2)

## A new model

Because we have a new set of features, we need to train a new model!
Do what you did last time, but give the model a different name now!

In [None]:
from sklearn.linear_model import LinearRegression

model2 = LinearRegression()
model2.fit(X_train, y_train)

Score the model, is it any better?

In [None]:
model2.score(X_test, y_test)

0.6517517042274967

## A new player arrives!

These are their stats:

- poss: 4902
- mp: 1845
- do_ratio: 1.011
- pacing: 0.381

Create a single row dataframe for this new player!

In [None]:
new_player = pd.DataFrame({"poss":[4902], "mp":[1845], "do_ratio":[1.011], "pacing":[0.381]})
new_player

Unnamed: 0,poss,mp,do_ratio,pacing
0,4902,1845,1.011,0.381


Predict the win rating for this player!

In [None]:
model2.predict(new_player)



array([1064.89497765])

## What is going on?

That's an astronomical win rating! Did we forget to do something?

We did... can you figure out what we forgot to do and fix the issue? Then, you can try the prediction again!

In [None]:
new_player_scaled = scaler.transform(new_player)
model2.predict(new_player_scaled)

array([3.33144242])