<a href="https://colab.research.google.com/github/spyingcyclops/gisma/blob/main/my_logistic_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Logistic Regression
You should build a machine learning pipeline using a logistic regression model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). 
- Train and test a logistic regression model using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

## Import libraries

In [None]:
import pandas as pd 
import sklearn.model_selection
import sklearn.linear_model
import sklearn.metrics

## Load the dataset

In [None]:
df = pd.read_csv("../../datasets/mnist.csv")
df = df.set_index("id") #making the id col our index, and removing it as a feature
df.head()

Unnamed: 0_level_0,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
31953,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
34452,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
60897,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
36953,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1981,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Split data set into train and test data

In [None]:
x = df.drop(["class"], axis=1)
y = df["class"]

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y)

print("df:", df.shape) 
print("x_train:", x_train.shape) 
print("x_test:", x_test.shape)
print("y_train:", y_train.shape) 
print("y_test:", y_test.shape)

df: (4000, 785)
x_train: (3000, 784)
x_test: (1000, 784)
y_train: (3000,)
y_test: (1000,)


## Train the model

In [None]:
model = sklearn.linear_model.LogisticRegression(max_iter=50000)
model.fit(x_train, y_train)

LogisticRegression(max_iter=50000)

In [None]:
y_predicted = model.predict(x_test)

print(sklearn.metrics.accuracy_score(y_test, y_predicted))

0.862
