# Livestream o Machine Learning

[Na kanale Devspresso dostępne nagranie](https://www.youtube.com/watch?v=0t4toLvwQUU).

## Co mamy w repozytorium?

- podstawowy Jupyter notebook
- zainstalowane potrzebne paczki

## Wyzwanie

Na podstawie [danych o studentach](https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success) przewidujemy ich "sukcesy" w karierze akademickiej.

In [1]:
import pandas as pd
import plotly.graph_objects as go
from sklearn.linear_model import LogisticRegression

In [2]:
data = pd.read_csv('./data.csv', sep=";")[["Marital status", "Admission grade", "International", "Target"]]
data

Unnamed: 0,Marital status,Admission grade,International,Target
0,1,127.3,0,Dropout
1,1,142.5,0,Graduate
2,1,124.8,0,Dropout
3,1,119.6,0,Graduate
4,2,141.5,0,Graduate
...,...,...,...,...
4419,1,122.2,0,Graduate
4420,1,119.0,1,Dropout
4421,1,149.5,0,Dropout
4422,1,153.8,0,Graduate


In [3]:
data["Marital status"].value_counts()

Marital status
1    3919
2     379
4      91
5      25
6       6
3       4
Name: count, dtype: int64

In [4]:
go.Figure(data=[go.Histogram(x=data["Admission grade"])])

In [5]:
data.groupby("Target").value_counts(["International"])

Target    International
Dropout   0                1389
          1                  32
Enrolled  0                 770
          1                  24
Graduate  0                2155
          1                  54
Name: count, dtype: int64

In [6]:
pd.get_dummies(data, prefix="marital_", columns=["Marital status"])

Unnamed: 0,Admission grade,International,Target,marital__1,marital__2,marital__3,marital__4,marital__5,marital__6
0,127.3,0,Dropout,True,False,False,False,False,False
1,142.5,0,Graduate,True,False,False,False,False,False
2,124.8,0,Dropout,True,False,False,False,False,False
3,119.6,0,Graduate,True,False,False,False,False,False
4,141.5,0,Graduate,False,True,False,False,False,False
...,...,...,...,...,...,...,...,...,...
4419,122.2,0,Graduate,True,False,False,False,False,False
4420,119.0,1,Dropout,True,False,False,False,False,False
4421,149.5,0,Dropout,True,False,False,False,False,False
4422,153.8,0,Graduate,True,False,False,False,False,False


In [7]:
def preprocess_data(data: pd.DataFrame):
    no_enrolled = data[lambda df: df["Target"] != "Enrolled"]
    return pd.get_dummies(no_enrolled, prefix="Marital", columns=["Marital status"])

used_data = preprocess_data(data)
used_data

Unnamed: 0,Admission grade,International,Target,Marital_1,Marital_2,Marital_3,Marital_4,Marital_5,Marital_6
0,127.3,0,Dropout,True,False,False,False,False,False
1,142.5,0,Graduate,True,False,False,False,False,False
2,124.8,0,Dropout,True,False,False,False,False,False
3,119.6,0,Graduate,True,False,False,False,False,False
4,141.5,0,Graduate,False,True,False,False,False,False
...,...,...,...,...,...,...,...,...,...
4419,122.2,0,Graduate,True,False,False,False,False,False
4420,119.0,1,Dropout,True,False,False,False,False,False
4421,149.5,0,Dropout,True,False,False,False,False,False
4422,153.8,0,Graduate,True,False,False,False,False,False


In [8]:
inputs = used_data.loc[:, lambda df: df.columns != "Target"]
inputs

Unnamed: 0,Admission grade,International,Marital_1,Marital_2,Marital_3,Marital_4,Marital_5,Marital_6
0,127.3,0,True,False,False,False,False,False
1,142.5,0,True,False,False,False,False,False
2,124.8,0,True,False,False,False,False,False
3,119.6,0,True,False,False,False,False,False
4,141.5,0,False,True,False,False,False,False
...,...,...,...,...,...,...,...,...
4419,122.2,0,True,False,False,False,False,False
4420,119.0,1,True,False,False,False,False,False
4421,149.5,0,True,False,False,False,False,False
4422,153.8,0,True,False,False,False,False,False


In [9]:
target_outputs = used_data["Target"] == "Graduate"
target_outputs

0       False
1        True
2       False
3        True
4        True
        ...  
4419     True
4420    False
4421    False
4422     True
4423     True
Name: Target, Length: 3630, dtype: bool

In [10]:
regression = LogisticRegression().fit(inputs, target_outputs)
regression.coef_

array([[ 0.0187001 , -0.0061077 ,  0.26695225, -0.48487445, -0.09769212,
        -0.44853605, -0.47416498, -0.46062551]])

In [11]:
go.Figure(
    data=[
        go.Bar(
            x=inputs.columns,
            y=regression.coef_[0],
        )
    ]
)