<a href="https://colab.research.google.com/github/l-tting/titanic_x_regression/blob/main/Titanic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [12]:
import seaborn as sns
import pandas as pd

**1. Data Generation**

Create a dataframe df that holds the data with rows as passengers and columns as attributes (e.g., age, sex, fare, survived).



In [13]:
# Load Titanic dataset
df = sns.load_dataset('titanic')

Explore the dataset

df.info() shows data types and missing values.

df.describe() provides summary statistics like mean, min, max, etc.

df.head() displays the first 5 rows, giving a sense of what data looks like.

In [None]:
df.info()
df.describe()
df.head()


**2. Data Cleaning.**

Drop redundant columns like alivee.

Drop columns with too many missing values like deck




In [15]:
df.drop(columns=['deck', 'embark_town', 'alive', 'class', 'who', 'adult_male'], inplace=True)

df['age'] = df['age'].fillna(df['age'].median())
df['embarked'] = df['embarked'].fillna(df['embarked'].mode()[0])

**3. Feature selection and encoding.**

Chosen features are likely to affect survival:

Target Variable: y = df['survived'] — this is the column we're trying to predict.

LabelEncoder converts strings like 'male', 'female' into numbers (e.g., 0 and 1).

In [16]:
from sklearn.preprocessing import LabelEncoder

# Select relevant features
features = ['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked']
X = df[features]
y = df['survived']

# Encode categorical features
X = X.copy()
le = LabelEncoder()
X['sex'] = le.fit_transform(X['sex'])
X['embarked'] = le.fit_transform(X['embarked'])


**4. Data Splitting**

X is the feature set and y the target variable.

Training Set: 80% of the data is used to fit the model.

Testing Set: 20% of the data is reserved to evaluate the model's performance.




In [17]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


**5. Model Training with Logistic Regression**

A logistic regression model is instantiated and fitted to the training data:

In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)


**6. Model Evaluation**

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
