In [20]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [22]:
titanic=pd.read_csv('titanic_cleaned.csv')
titanic.head()

Unnamed: 0.1,Unnamed: 0,PassengerId,Survived,Pclass,Sex,Age,Fare,Embarked,Is_male,FamilySize,Title
0,0,1,0,3,male,Adult,7.25,S,1,1,Mr
1,1,2,1,1,female,Adult,71.2833,C,0,1,Mrs
2,2,3,1,3,female,Adult,7.925,S,0,0,Miss
3,3,4,1,1,female,Adult,53.1,S,0,1,Mrs
4,4,5,0,3,male,Adult,8.05,S,1,0,Mr


We imported the dataset into the dataframe and split it into training/testing. Now we can train our model.

In [33]:
X_eğitim, X_test, y_eğitim, y_test =  train_test_split(X, y, test_size=0.20, random_state=111)

In [49]:
from sklearn.linear_model import LogisticRegression
log_model = LogisticRegression(solver='liblinear', max_iter=1000)
log_model.fit(X_eğitim, y_eğitim)

LogisticRegression(max_iter=1000, solver='liblinear')

We can use the .score() function to measure the performance of our model. This function will return us the ratio of correct predictions. Let's look at this for both training and test data

In [50]:
egitim_dogruluk = log_reg.score(X_eğitim, y_eğitim)
test_dogruluk = log_reg.score(X_test, y_test)
print('One-vs-rest', '-'*20, 
      'Modelin eğitim verisindeki doğruluğu : {:.2f}'.format(egitim_dogruluk), 
      'Modelin test verisindeki doğruluğu   : {:.2f}'.format(test_dogruluk), sep='\n')

One-vs-rest
--------------------
Modelin eğitim verisindeki doğruluğu : 0.82
Modelin test verisindeki doğruluğu   : 0.83


We used the one-vs-rest method as the default value. Let's calculate again by changing the parameters to calculate with the multinomial method.

In [53]:
log_reg_mnm = LogisticRegression(multi_class='multinomial', solver='saga')
log_reg_mnm.fit(X_eğitim, y_eğitim)
egitim_dogruluk = log_reg_mnm.score(X_eğitim, y_eğitim)
test_dogruluk = log_reg_mnm.score(X_test, y_test)
print('Multinomial (Softmax)', '-'*20, 
      'Modelin eğitim verisindeki doğruluğu : {:.2f}'.format(egitim_dogruluk), 
      'Modelin test verisindeki doğruluğu   : {:.2f}'.format(test_dogruluk), sep='\n')

Multinomial (Softmax)
--------------------
Modelin eğitim verisindeki doğruluğu : 0.70
Modelin test verisindeki doğruluğu   : 0.73




When we look at the results, we see that 82% and 70% correct classification is made when training data is used, and 83% and 73% correct classification is made when test data is used. We can say that both methods gave similar results.

Finally, let's see the performance of our model at different c values.

In [55]:
C_değerleri = [0.001,0.01,0.1,1,10,100, 1000]
dogruluk_df = pd.DataFrame(columns = ['C_Değeri','Doğruluk'])

dogruluk_değerleri = pd.DataFrame(columns=['C Değeri', 'Eğitim Doğruluğu', 'Test Doğruluğu'])

for c in C_değerleri:
    
    # Apply logistic regression model to training data
    lr = LogisticRegression(penalty = 'l2', C = c, random_state = 0,solver='liblinear', max_iter=1000)
    lr.fit(X_eğitim,y_eğitim)
    dogruluk_değerleri = dogruluk_değerleri.append({'C Değeri': c,
                                                    'Eğitim Doğruluğu' : lr.score(X_eğitim, y_eğitim),
                                                    'Test Doğruluğu': lr.score(X_test, y_test)
                                                    }, ignore_index=True)
display(dogruluk_değerleri)    

Unnamed: 0,C Değeri,Eğitim Doğruluğu,Test Doğruluğu
0,0.001,0.69382,0.72067
1,0.01,0.766854,0.815642
2,0.1,0.800562,0.798883
3,1.0,0.81882,0.821229
4,10.0,0.824438,0.826816
5,100.0,0.825843,0.826816
6,1000.0,0.824438,0.832402


When we look at the accuracy values, we see that the most appropriate c value for our data set is the default value of 1.