<div style="background-color: #2f3e46; border-radius: 25px; padding: 30px; text-align: center;">
  <span style="color: #ff7f50; font-family: 'Georgia', serif; font-size: 3em; font-weight: bold;">
    Exploring Mental Health 🧠
  </span>
</div>

<div style="background-color: #D76C82; padding: 20px; border-radius: 15px; text-align: center; margin-top: 20px; box-shadow: 4px 4px 15px rgba(0,0,0,0.4);">
  <span style="color: #3D0301; font-family: 'Cardo', serif; font-size: 2em; font-weight: bold; text-shadow: 1.5px 1.5px 4px rgba(0,0,0,0.3);">
    1. Import Required Libraries
  </span>
</div>

In [44]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.metrics import accuracy_score

import gc
import optuna
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier, early_stopping, log_evaluation
import warnings
warnings.filterwarnings("ignore")

<div style="background-color: #D76C82; padding: 20px; border-radius: 15px; text-align: center; margin-top: 20px; box-shadow: 4px 4px 15px rgba(0,0,0,0.4);">
  <span style="color: #3D0301; font-family: 'Cardo', serif; font-size: 2em; font-weight: bold; text-shadow: 1.5px 1.5px 4px rgba(0,0,0,0.3);">
    2. Dive Into the Data – Let’s See What We’re Working With!
  </span>
</div>

In [45]:
train = pd.read_csv('/kaggle/input/playground-series-s4e11/train.csv', index_col='id')
test  = pd.read_csv('/kaggle/input/playground-series-s4e11/test.csv', index_col='id')
submission = pd.read_csv('/kaggle/input/playground-series-s4e11/sample_submission.csv')

In [46]:
train.head()

Unnamed: 0_level_0,Name,Gender,Age,City,Working Professional or Student,Profession,Academic Pressure,Work Pressure,CGPA,Study Satisfaction,Job Satisfaction,Sleep Duration,Dietary Habits,Degree,Have you ever had suicidal thoughts ?,Work/Study Hours,Financial Stress,Family History of Mental Illness,Depression
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
0,Aaradhya,Female,49.0,Ludhiana,Working Professional,Chef,,5.0,,,2.0,More than 8 hours,Healthy,BHM,No,1.0,2.0,No,0
1,Vivan,Male,26.0,Varanasi,Working Professional,Teacher,,4.0,,,3.0,Less than 5 hours,Unhealthy,LLB,Yes,7.0,3.0,No,1
2,Yuvraj,Male,33.0,Visakhapatnam,Student,,5.0,,8.97,2.0,,5-6 hours,Healthy,B.Pharm,Yes,3.0,1.0,No,1
3,Yuvraj,Male,22.0,Mumbai,Working Professional,Teacher,,5.0,,,1.0,Less than 5 hours,Moderate,BBA,Yes,10.0,1.0,Yes,1
4,Rhea,Female,30.0,Kanpur,Working Professional,Business Analyst,,1.0,,,1.0,5-6 hours,Unhealthy,BBA,Yes,9.0,4.0,Yes,0


In [47]:
test.head()

Unnamed: 0_level_0,Name,Gender,Age,City,Working Professional or Student,Profession,Academic Pressure,Work Pressure,CGPA,Study Satisfaction,Job Satisfaction,Sleep Duration,Dietary Habits,Degree,Have you ever had suicidal thoughts ?,Work/Study Hours,Financial Stress,Family History of Mental Illness
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
140700,Shivam,Male,53.0,Visakhapatnam,Working Professional,Judge,,2.0,,,5.0,Less than 5 hours,Moderate,LLB,No,9.0,3.0,Yes
140701,Sanya,Female,58.0,Kolkata,Working Professional,Educational Consultant,,2.0,,,4.0,Less than 5 hours,Moderate,B.Ed,No,6.0,4.0,No
140702,Yash,Male,53.0,Jaipur,Working Professional,Teacher,,4.0,,,1.0,7-8 hours,Moderate,B.Arch,Yes,12.0,4.0,No
140703,Nalini,Female,23.0,Rajkot,Student,,5.0,,6.84,1.0,,More than 8 hours,Moderate,BSc,Yes,10.0,4.0,No
140704,Shaurya,Male,47.0,Kalyan,Working Professional,Teacher,,5.0,,,5.0,7-8 hours,Moderate,BCA,Yes,3.0,4.0,No


In [48]:
original = pd.read_csv('/kaggle/input/depression-surveydataset-for-analysis/final_depression_dataset_1.csv')
original['Depression'] = original['Depression'].map({'No': 0, 'Yes':1})

In [49]:
train = pd.concat([train, original],axis=0)

In [50]:
train.duplicated().sum()

0

In [51]:
test.duplicated().sum()

0

In [52]:
train.info()

<class 'pandas.core.frame.DataFrame'>
Index: 143256 entries, 0 to 2555
Data columns (total 19 columns):
 #   Column                                 Non-Null Count   Dtype  
---  ------                                 --------------   -----  
 0   Name                                   143256 non-null  object 
 1   Gender                                 143256 non-null  object 
 2   Age                                    143256 non-null  float64
 3   City                                   143256 non-null  object 
 4   Working Professional or Student        143256 non-null  object 
 5   Profession                             105953 non-null  object 
 6   Academic Pressure                      28399 non-null   float64
 7   Work Pressure                          114836 non-null  float64
 8   CGPA                                   28400 non-null   float64
 9   Study Satisfaction                     28399 non-null   float64
 10  Job Satisfaction                       114844 non-null  float64

In [53]:
test.info()

<class 'pandas.core.frame.DataFrame'>
Index: 93800 entries, 140700 to 234499
Data columns (total 18 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   Name                                   93800 non-null  object 
 1   Gender                                 93800 non-null  object 
 2   Age                                    93800 non-null  float64
 3   City                                   93800 non-null  object 
 4   Working Professional or Student        93800 non-null  object 
 5   Profession                             69168 non-null  object 
 6   Academic Pressure                      18767 non-null  float64
 7   Work Pressure                          75022 non-null  float64
 8   CGPA                                   18766 non-null  float64
 9   Study Satisfaction                     18767 non-null  float64
 10  Job Satisfaction                       75026 non-null  float64
 11  S

In [54]:
num_train_rows, num_train_columns = train.shape

num_test_rows, num_test_columns = test.shape

num_submission_rows, num_submission_columns = submission.shape

print("Training Data:")
print(f"Number of Rows: {num_train_rows}")
print(f"Number of Columns: {num_train_columns}\n")

print("Test Data:")
print(f"Number of Rows: {num_test_rows}")
print(f"Number of Columns: {num_test_columns}\n")

print("Submission Data:")
print(f"Number of Rows: {num_submission_rows}")
print(f"Number of Columns: {num_submission_columns}")

Training Data:
Number of Rows: 143256
Number of Columns: 19

Test Data:
Number of Rows: 93800
Number of Columns: 18

Submission Data:
Number of Rows: 93800
Number of Columns: 2


In [55]:
train_null = train.isnull().sum().sum()

test_null = test.isnull().sum().sum()

print(f'Null Count in Train: {train_null}')
print(f'Null Count in Test: {test_null}')

Null Count in Train: 438715
Null Count in Test: 287291


In [56]:
train_duplicates = train.duplicated().sum()

test_duplicates = test.duplicated().sum()

submission_duplicates = submission.duplicated().sum()

print(f"Number of duplicate rows in train data: {train_duplicates}")
print(f"Number of duplicate rows in test data: {test_duplicates}")
print(f"Number of duplicate rows in test data: {submission_duplicates}")

Number of duplicate rows in train data: 0
Number of duplicate rows in test data: 0
Number of duplicate rows in test data: 0


In [57]:
print(f"Number of duplicate rows in train_data: {train.nunique()}")

Number of duplicate rows in train_data: Name                                     422
Gender                                     2
Age                                       43
City                                      98
Working Professional or Student            2
Profession                                64
Academic Pressure                          5
Work Pressure                              5
CGPA                                     331
Study Satisfaction                         5
Job Satisfaction                           5
Sleep Duration                            36
Dietary Habits                            23
Degree                                   115
Have you ever had suicidal thoughts ?      2
Work/Study Hours                          13
Financial Stress                           5
Family History of Mental Illness           2
Depression                                 2
dtype: int64


In [58]:
train.describe(include="number")

Unnamed: 0,Age,Academic Pressure,Work Pressure,CGPA,Study Satisfaction,Job Satisfaction,Work/Study Hours,Financial Stress,Depression
count,143256.0,28399.0,114836.0,28400.0,28399.0,114844.0,143256.0,143252.0,143256.0
mean,40.364613,3.139829,2.999408,7.657031,2.947252,2.975131,6.248597,2.988621,0.181647
std,12.383146,1.380722,1.405975,1.464505,1.360518,1.416124,3.852275,1.413664,0.385555
min,18.0,1.0,1.0,5.03,1.0,1.0,0.0,1.0,0.0
25%,29.0,2.0,2.0,6.29,2.0,2.0,3.0,2.0,0.0
50%,42.0,3.0,3.0,7.77,3.0,3.0,6.0,3.0,0.0
75%,51.0,4.0,4.0,8.92,4.0,4.0,10.0,4.0,0.0
max,60.0,5.0,5.0,10.0,5.0,5.0,12.0,5.0,1.0


In [59]:
train.describe(include="O")

Unnamed: 0,Name,Gender,City,Working Professional or Student,Profession,Sleep Duration,Dietary Habits,Degree,Have you ever had suicidal thoughts ?,Family History of Mental Illness
count,143256,143256,143256,143256,105953,143256,143252,143254,143256,143256
unique,422,2,98,2,64,36,23,115,2,2
top,Rohan,Male,Kalyan,Working Professional,Teacher,Less than 5 hours,Moderate,Class 12,No,No
freq,3204,78797,6683,114853,25228,39432,50537,15004,72445,72069


In [60]:
print("Checking for missing values in each column:")
print(train.isnull().mean() * 100)

Checking for missing values in each column:
Name                                      0.000000
Gender                                    0.000000
Age                                       0.000000
City                                      0.000000
Working Professional or Student           0.000000
Profession                               26.039398
Academic Pressure                        80.176048
Work Pressure                            19.838611
CGPA                                     80.175350
Study Satisfaction                       80.176048
Job Satisfaction                         19.833026
Sleep Duration                            0.000000
Dietary Habits                            0.002792
Degree                                    0.001396
Have you ever had suicidal thoughts ?     0.000000
Work/Study Hours                          0.000000
Financial Stress                          0.002792
Family History of Mental Illness          0.000000
Depression                            

In [61]:
target_column = 'Depression'
categorical_features = train.select_dtypes(include=['object']).columns
numerical_features = train.select_dtypes(exclude=['object']).columns.drop(target_column)

print("Target Column:", target_column)
print("\nCategorical Columns:", categorical_features.tolist())
print("\nNumerical Columns:", numerical_features.tolist())

Target Column: Depression

Categorical Columns: ['Name', 'Gender', 'City', 'Working Professional or Student', 'Profession', 'Sleep Duration', 'Dietary Habits', 'Degree', 'Have you ever had suicidal thoughts ?', 'Family History of Mental Illness']

Numerical Columns: ['Age', 'Academic Pressure', 'Work Pressure', 'CGPA', 'Study Satisfaction', 'Job Satisfaction', 'Work/Study Hours', 'Financial Stress']


In [62]:
print("The skewness of columns:")
print(train[numerical_features].skew())

The skewness of columns:
Age                  -0.214131
Academic Pressure    -0.131819
Work Pressure         0.017914
CGPA                 -0.072872
Study Satisfaction    0.010101
Job Satisfaction      0.053554
Work/Study Hours     -0.126190
Financial Stress      0.035717
dtype: float64


**I've peeked at other notebooks for data visualization and exploration insights—basically, I'm spying for inspiration. Don’t worry, I’ll jazz this one up later! 😄**

<div style="background-color: #D76C82; padding: 20px; border-radius: 15px; text-align: center; margin-top: 20px; box-shadow: 4px 4px 15px rgba(0,0,0,0.4);">
  <span style="color: #3D0301; font-family: 'Cardo', serif; font-size: 2em; font-weight: bold; text-shadow: 1.5px 1.5px 4px rgba(0,0,0,0.3);">
    3. Feature Engineering 
  </span>
</div>

In [63]:
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np

def enhanced_feature_engineering(data, profession_depression_rate, degree_depression_rate):
    binary_map = lambda x: 1 if str(x).strip().lower() == 'yes' else 0
    data['Suicidal_Thoughts'] = data['Have you ever had suicidal thoughts ?'].map(binary_map)
    data['Family_History'] = data['Family History of Mental Illness'].map(binary_map)
    data['Stress_Level'] = data[['Work Pressure', 'Financial Stress']].sum(axis=1)
    data['Academic_Stress_Index'] = data['Academic Pressure'] * (1 - data['Study Satisfaction'].fillna(0))
    data['Work_Job_Interaction'] = data['Work/Study Hours'] * data['Job Satisfaction']
    data['Profession_Depression_Rate'] = data['Profession'].map(profession_depression_rate).fillna(0)
    data['Degree_Depression_Rate'] = data['Degree'].map(degree_depression_rate).fillna(0)
    sleep_quality_mapping = {'Less than 5 hours': 1, '5-6 hours': 2, '6-7 hours': 3, '7-8 hours': 4, 'More than 8 hours': 5}
    data['Sleep_Quality_Index'] = data['Sleep Duration'].map(sleep_quality_mapping)
    data['Work_Stress_Ratio'] = data['Work Pressure'] / data['Work/Study Hours'].replace(0, np.nan)
    health_mapping = {'Healthy': 3, 'Moderate': 2, 'Unhealthy': 1}
    data['Dietary_Score'] = data['Dietary Habits'].map(health_mapping)
    data['Health_Index'] = data['Dietary_Score'] * data['Sleep_Quality_Index']
    data['Gender_Dietary'] = data['Gender'] + '_' + data['Dietary Habits']
    data['Sleep_Stress_Interaction'] = data['Sleep_Quality_Index'] * data['Stress_Level']
    data['Age_Category'] = pd.cut(data['Age'], bins=[0, 20, 30, 40, 50, 60], labels=['<20', '20-30', '30-40', '40-50', '50+'])
    data['Financial_Stress_Weighted'] = data['Financial Stress'] / data['Age']
    data['Stress_Dietary_Interaction'] = data['Stress_Level'] * data['Dietary_Score']
    data['Mental_Health_Burden'] = data['Stress_Level'] + data['Family_History'] + data['Suicidal_Thoughts']
    data['Study_Stress_Interaction'] = data['Academic Pressure'] * (1 - data['Job Satisfaction'].fillna(0))
    data['Adjusted_Sleep_Quality'] = data['Sleep_Quality_Index'] / (1 + data['Stress_Level'])
    data['Weighted_Profession_Degree_Rate'] = (
        data['Profession_Depression_Rate'] + data['Degree_Depression_Rate']
    ) / 2
    return data

# Compute Depression Rates
profession_depression_rate = train.groupby('Profession')['Depression'].mean().to_dict()
degree_depression_rate = train.groupby('Degree')['Depression'].mean().to_dict()

# Apply feature engineering
train_data = enhanced_feature_engineering(train, profession_depression_rate, degree_depression_rate)
test_data = enhanced_feature_engineering(test, profession_depression_rate, degree_depression_rate)

# Separate features and target for training
X = train_data.drop(columns=['Depression'])
y = train_data['Depression']

# Dynamically identify numerical and categorical columns
numerical_columns = X_train.select_dtypes(include=['float64', 'int64']).columns.tolist()
categorical_columns = X_train.select_dtypes(include=['object']).columns.tolist()

# Define transformers and pipeline
numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])
preprocessor = ColumnTransformer(transformers=[
    ('num', numerical_transformer, numerical_columns),
    ('cat', categorical_transformer, categorical_columns)
])
pipeline = Pipeline(steps=[('preprocessor', preprocessor)])

# Fit pipeline on training data
pipeline.fit(X_train)

# Transform both training and testing datasets
X = pipeline.transform(X_train)
test = pipeline.transform(test_data)


In [24]:
def feature_engineering(data, profession_depression_rate, degree_depression_rate):
    data['Suicidal_Thoughts'] = data['Have you ever had suicidal thoughts ?'].apply(lambda x: 1 if str(x).strip().lower() == 'yes' else 0)
    data['Family_History'] = data['Family History of Mental Illness'].apply(lambda x: 1 if str(x).strip().lower() == 'yes' else 0)

    data['Stress_Level'] = data[['Work Pressure', 'Financial Stress']].sum(axis=1)
    data['Academic_Stress_Index'] = data['Academic Pressure'] * (1 - data['Study Satisfaction'].fillna(0))
    data['Work_Job_Interaction'] = data['Work/Study Hours'] * data['Job Satisfaction']
    data['Profession_Depression_Rate'] = data['Profession'].map(profession_depression_rate).fillna(0)
    data['Degree_Depression_Rate'] = data['Degree'].map(degree_depression_rate).fillna(0)


    sleep_quality_mapping = {
        'Less than 5 hours': 1,
        '5-6 hours': 2,
        '6-7 hours': 3,
        '7-8 hours': 4,
        'More than 8 hours': 5
    }
    data['Sleep_Quality_Index'] = data['Sleep Duration'].map(sleep_quality_mapping)

    data['Work_Stress_Ratio'] = data['Work Pressure'] / data['Work/Study Hours'].replace(0, np.nan)

    health_mapping = {'Healthy': 3, 'Moderate': 2, 'Unhealthy': 1}
    data['Dietary_Score'] = data['Dietary Habits'].map(health_mapping)

    data['Health_Index'] = data['Dietary_Score'] * data['Sleep_Quality_Index']
    data['Gender_Dietary'] = data['Gender'] + '_' + data['Dietary Habits']

    return data

profession_depression_rate = train.groupby('Profession')['Depression'].mean().to_dict()
degree_depression_rate = train.groupby('Degree')['Depression'].mean().to_dict()

train_data = feature_engineering(train, profession_depression_rate, degree_depression_rate)
test_data = feature_engineering(test, profession_depression_rate, degree_depression_rate)

In [25]:
X_train = train_data.drop(columns=['Depression'])
y = train_data['Depression']
X_test = test_data

numerical_columns = X_train.select_dtypes(include=['float64', 'int64']).columns
categorical_columns = X_train.select_dtypes(include=['object']).columns

numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_columns),
        ('cat', categorical_transformer, categorical_columns)
    ]
)

pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor)
])
pipeline.fit(X_train)

X = pipeline.transform(X_train)
test = pipeline.transform(X_test)

<div style="background-color: #D76C82; padding: 20px; border-radius: 15px; text-align: center; margin-top: 20px; box-shadow: 4px 4px 15px rgba(0,0,0,0.4);">
  <span style="color: #3D0301; font-family: 'Cardo', serif; font-size: 2em; font-weight: bold; text-shadow: 1.5px 1.5px 4px rgba(0,0,0,0.3);">
    4. Model Building
  </span>
</div>

In [64]:
skfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

oof_preds = []
oof_accs = []

lgbm_params = {
    "boosting_type": "gbdt",
    "colsample_bytree": 0.18283018243382332,
    "learning_rate": 0.09945326391012832,
    "max_bins": 36644,
    "min_child_samples": 105,
    "min_child_weight": 0.2083765599710974,
    "n_estimators": 244,
    "n_jobs": -1,
    "num_leaves": 122,
    "random_state": 42,
    "reg_alpha": 8.662578235164972,
    "reg_lambda": 3.5696291074963926,
    "scale_pos_weight": 1.0733293968870794,
    "subsample": 0.5360642841695424,
    "verbose": -1
}


for fold, (train_idx, test_idx) in enumerate(skfold.split(np.zeros(X.shape[0]), y)):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]
    
    lgb_clf = LGBMClassifier(**lgbm_params)
    lgb_clf = lgb_clf.fit(X_train, y_train,
                          eval_set=[(X_test, y_test)],
                          eval_metric='binary_logloss',
                          callbacks=[early_stopping(100)])
    y_pred = lgb_clf.predict(X_test, num_iteration=lgb_clf.best_iteration_)
    acc = accuracy_score(y_test, y_pred)
    oof_accs.append(acc)
    oof_preds.append(lgb_clf.predict_proba(test, num_iteration=lgb_clf.best_iteration_)[:,1])
    print(f"\nFold {fold+1}--> Accuracy Score: {acc:.6f}\n")
    
    del X_train, y_train, X_test, y_test, lgb_clf
    gc.collect()

acc_mean = np.mean(oof_accs)
acc_std = np.std(oof_accs)
print(f"\n\nAverage Fold Accuracy Score: {acc_mean:.6f} \xB1 {acc_std:.6f}\n\n")

Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[206]	valid_0's binary_logloss: 0.151109

Fold 1--> Accuracy Score: 0.939899

Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[241]	valid_0's binary_logloss: 0.148711

Fold 2--> Accuracy Score: 0.940947

Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[203]	valid_0's binary_logloss: 0.151927

Fold 3--> Accuracy Score: 0.941226

Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[236]	valid_0's binary_logloss: 0.153256

Fold 4--> Accuracy Score: 0.936409

Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[226]	valid_0's binary_logloss: 0.139862

Fold 5--> Accuracy Score: 0.945344

Training until validation scores don't improve for 100 round

In [65]:
oof_preds
test_pred = (np.mean(oof_preds, axis=0)>0.53).astype(int)
test_pred

array([0, 0, 0, ..., 0, 1, 0])

<div style="background-color: #D76C82; padding: 20px; border-radius: 15px; text-align: center; margin-top: 20px; box-shadow: 4px 4px 15px rgba(0,0,0,0.4);">
  <span style="color: #3D0301; font-family: 'Cardo', serif; font-size: 2em; font-weight: bold; text-shadow: 1.5px 1.5px 4px rgba(0,0,0,0.3);">
    5. Submission
  </span>
</div>

In [66]:
submission[target_column] = test_pred

submission.to_csv('submission.csv', index=False)