# What leads to divorce?

### Table of Contents

* **[Overview](#Overview)**

* **[Data Exploration](#Data-Exploration)**  
    * [Check for Data Quality](#Check-for-Data-Quality)
        * [Missing Values Check](#Missing-Values-Check)
        * [Zero Values Check](#Zero-Values-Check)
        * [Unique Values Check](#Unique-Values-Check)
        * [Duplicate Values Check](#Duplicate-Values-Check)
* **[Data Visualization & Analysis](#Data-Visualization-&-Analysis)**
    * [Find Outliers](#Finding-Outliers)
    * [Find Correlations of Features](#Find-Correlations-of-Features)
* **[Data Preparation](#Data-Preparation)**
    * [Data Cleanup](#Data-Cleanup)
        * [Handling Missing Values](#Handling-Missing-Values)
        * [Handling Outliers](#Handling-Outliers)

# Overview


<p>This the dataset that was collected from <a href='kaggle.com'>Kaggle</a> from the the below url.</p>
<p><a href='https://www.kaggle.com/datasets/andrewmvd/divorce-prediction'>Divorce Prediction Dataset</a></p>
<p>The dataset contains actual data which has been masked for privacy on the given features and reference has the explanation for each feature. No personal information is revealed in the data.</p>
<p>This analysis is going to predict the factors that are contributing to the divorce.</p>

In [7]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import math
from scipy import stats
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.exceptions import ConvergenceWarning
import time


# Data Exploration

In [8]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

In [9]:
datareference = pd.read_csv('data/divorce-dataset-1/reference.tsv', sep = '|')

This dataset contains data about 170 couples with their corresponding Divorce Predictors Scale variables (DPS) on the basis of Gottman couples therapy for 54 questions.
The couples are from various regions of Turkey wherein the records were acquired from face-to-face interviews from couples who were already divorced or happily married.
All responses were collected on a 5 point scale (0=Never, 1=Seldom, 2=Averagely, 3=Frequently, 4=Always).

Source: <a href='https://www.kaggle.com/datasets/andrewmvd/divorce-prediction'>https://www.kaggle.com/datasets/andrewmvd/divorce-prediction</a>

In [10]:
# Read reference data & Aligned dataframe columns and headers
left_aligned_refdata = datareference.style.set_properties(**{'text-align': 'left'}).set_table_styles([dict(selector='th', props=[('text-align', 'left')])])
left_aligned_refdata

Unnamed: 0,atribute_id,description
0,1,"If one of us apologizes when our discussion deteriorates, the discussion ends."
1,2,"I know we can ignore our differences, even if things get hard sometimes."
2,3,"When we need it, we can take our discussions with my spouse from the beginning and correct it."
3,4,"When I discuss with my spouse, to contact him will eventually work."
4,5,The time I spent with my wife is special for us.
5,6,We don't have time at home as partners.
6,7,We are like two strangers who share the same environment at home rather than family.
7,8,I enjoy our holidays with my wife.
8,9,I enjoy traveling with my wife.
9,10,Most of our goals are common to my spouse.


In [11]:
# read data
data = pd.read_csv('data/divorce-dataset-1/divorce_data.csv', sep = ';')

In [12]:
data

Unnamed: 0,Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,Q11,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20,Q21,Q22,Q23,Q24,Q25,Q26,Q27,Q28,Q29,Q30,Q31,Q32,Q33,Q34,Q35,Q36,Q37,Q38,Q39,Q40,Q41,Q42,Q43,Q44,Q45,Q46,Q47,Q48,Q49,Q50,Q51,Q52,Q53,Q54,Divorce
0,2,2,4,1,0,0,0,0,0,0,1,0,1,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,2,1,2,0,1,2,1,3,3,2,1,1,2,3,2,1,3,3,3,2,3,2,1,1
1,4,4,4,4,4,0,0,4,4,4,4,3,4,0,4,4,4,4,3,2,1,1,0,2,2,1,2,0,1,1,0,4,2,3,0,2,3,4,2,4,2,2,3,4,2,2,2,3,4,4,4,4,2,2,1
2,2,2,2,2,1,3,2,1,1,2,3,4,2,3,3,3,3,3,3,2,1,0,1,2,2,2,2,2,3,2,3,3,1,1,1,1,2,1,3,3,3,3,2,3,2,3,2,3,1,1,1,2,2,2,1
3,3,2,3,2,3,3,3,3,3,3,4,3,3,4,3,3,3,3,3,4,1,1,1,1,2,1,1,1,1,3,2,3,2,2,1,1,3,3,4,4,2,2,3,2,3,2,2,3,3,3,3,2,2,2,1
4,2,2,1,1,1,1,0,0,0,0,0,1,0,1,1,1,1,1,2,1,1,0,0,0,0,2,1,2,1,1,1,1,1,1,0,0,0,0,2,1,0,2,3,0,2,2,1,2,3,2,2,2,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
165,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,4,3,4,0,0,4,0,1,0,1,0,0,0,0,1,0,4,1,1,4,2,2,2,0
166,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,3,1,3,4,1,2,2,2,2,3,2,2,0
167,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,1,0,1,0,0,1,1,1,2,1,3,3,0,2,3,0,2,0,1,1,3,0,0,0
168,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,4,1,2,1,1,0,4,3,3,2,2,3,2,4,3,1,0


# Check for Data Quality

Checking for all data quality issues in the dataset.
        

## Missing Values Check

Checking for missing values in the data set.

## Zero Values Check

Checking for zero values in the data set.


## Unique Values Check

Checking for unique values in the data set.
        

## Duplicate Values Check

Checking for duplicate values in the data set.

# Data Visualization & Analysis

## Finding Outliers
   

## Find Correlations of Features

# Data Preparation

## Data Cleanup

## Handling Missing Values

## Handling Outliers