# Introduction <a id="26"></a> <br>
* In this notebook we will analyze biomechanical features of orthepedic patients and train machine learning models.
* You can think that notebook as a mix of tutorial and real analysis. 
* If you do not understand these codes you can check these kernels(I learned from these kernels too):
    * Plotly: https://www.kaggle.com/kanncaa1/plotly-tutorial-for-beginners
    * Seaborn: https://www.kaggle.com/kanncaa1/seaborn-tutorial-for-beginners
    
## Content:
1.  [Data Analysis](#1)
    1.    [Features](#2) 
    1.    [Countplot](#3)
    1.    [Pairplot](#4)
    1.    [Lineplot, Patients' Bimechanical Feature Values](#5)   
    1.    [Histogram, Feature Values' Frequencies](#6)
    1.    [Swarmplot](#7)
    1.    [Barplot, Feature Value Means](#8)
    1.    [Boxplot, Biomechanical Features' Quartiles and Outliers For Each Classes](#9)
    1.    [Scatterplot](#10)
    1.    [Correlation Heatmap](#11)
    1.    [Correlated Features](#12)
    1.    [3D Scatterplots](#13)
1.  [Machine Learning(ML)](#14)
    1.    [Linear Regression](#15)
    1.    [Outlier Detection](#16)
    1.    [Logistic Regression](#17)
    1.    [KNN](#18)
    1.    [SVM](#19)
    1.    [Navie Bayes](#20)
    1.    [Decision Tree](#21)
    1.    [Random Forest](#22)
    1.    [Comparison](#23)
    1.    [K-Means](#24)
    1.    [Artificial Neural Network(ANN)](#25)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt #visualization
import seaborn as sns #visualization
import plotly.graph_objs as go #visualization
from plotly.offline import init_notebook_mode, iplot, plot
import warnings
init_notebook_mode(connected=True) 
# filter warnings
warnings.filterwarnings('ignore')

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
#reading data
import pandas as pd
#column_2c_weka = pd.read_csv("../input/biomechanical-features-of-orthopedic-patients/column_2C_weka.csv")
column_3c_weka = pd.read_csv("../input/biomechanical-features-of-orthopedic-patients/column_3C_weka.csv")
data = pd.read_csv("../input/biomechanical-features-of-orthopedic-patients/column_2C_weka.csv")

<a id="1"></a> <br>
# Data Analysis 
[Back to introduction](#26)

In [None]:
data.head()

<a id="2"></a> <br>
## Features
[Back to introduction](#26)
* **Pelvic Incidence:**                
* **Pelvic Tilt Numeric:**
* **Sacral Slope:** 
<img src = "https://d3i71xaburhd42.cloudfront.net/5822f98e3c7456629931a805eea914d6b767d030/2-Figure1-1.png" height = "400" width = "400" >

* **Lumbar Lordosis Angle:**
<img src = "https://i1.wp.com/musculoskeletalkey.com/wp-content/uploads/2017/08/A319013_1_En_3_Fig2_HTML.gif?w=960" height = "400" width = "400">
  
  
* **Degree Spondylolisthesis:**
<img src = "https://cloud2.spineuniverse.com/sites/default/files/imagecache/large-content/wysiwyg_imageupload/3998/2015/04/02/wu_spondylolisthesis_grades600.jpg" height = "400" width = "400" >


* **Pelvic Radius:**
<img src = "https://www.researchgate.net/profile/Siwadol_Wongsak/publication/49636125/figure/fig2/AS:340883061919753@1458284263239/Line-drawing-showing-pelvic-radius-PR-line-and-the-pelvic-radius-measurement-technique.png" height = "400" width = "400">
  
  
* **Pelvis:**
<img src = "https://i2.wp.com/upload.orthobullets.com/topic/12768/images/human-body-parts-pelvic-bone-pelvis.jpg?w=760&ssl=1" height = "600" width = "600">
 

In [None]:
data.info()

In [None]:
plt.style.use("ggplot")
plt.figure(figsize = (13,8))
plt.bar("Pelvic Incidence",data.pelvic_incidence.count(),color = "b")
plt.bar("Pelvic Tilt Numeric",data["pelvic_tilt numeric"].count(),color = "b")
plt.bar("Lumbar Lordosis Angle",data.lumbar_lordosis_angle.count(),color = "b")
plt.bar("Sacral Slope",data.sacral_slope.count(),color = "b")
plt.bar("Pelvic Radius",data.pelvic_radius.count(),color = "b")
plt.bar("Degree Spondylolisthesis",data.degree_spondylolisthesis.count(),color = "b")
plt.bar("Class",data["class"].count(),color = "b")
plt.xticks(rotation = 20)
plt.xlabel("Features")
plt.ylabel("Count")
plt.title("Missing Value Detection ")
plt.show()
print(data.count())

In [None]:
data.describe()

In [None]:
x_data = data.drop(["class"],axis = 1)
#normalization
x = (x_data - np.min(x_data))/(np.max(x_data)-np.min(x_data)).values
y = pd.DataFrame(data["class"])
x.head()

In [None]:
y.head()

In [None]:
data1 = pd.concat([x,y],axis = 1) 
data1.head()

In [None]:
A = data1[data1["class"] == "Abnormal"]
N = data1[data1["class"] == "Normal"]

<a id="3"></a> <br>
## Countplot
[Back to introduction](#26)

In [None]:
plt.style.use("default")
data_cnt_plt = data1.iloc[::-1] #reverse data
sns.countplot(data_cnt_plt["class"],palette = "icefire")
plt.title("Number of Normal and Abnormal Patients")
plt.show()
print(data["class"].value_counts())

<a id="4"></a> <br>
## Pairplot
[Back to introduction](#26)

In [None]:
plt.style.use("ggplot")
sns.pairplot(data, hue ="class", markers = "+")
plt.show()
desc = data.describe()
print(desc[:3])

<a id="5"></a> <br>
## Lineplot, Patients' Bimechanical Feature Values
[Back to introduction](#26)
* This plot is showing the biomechanical feature values of patients. 
* You can think dataframe index as patient number.
* fig = It includes data and layout
* x = X-Axis
* y = Y-Axis
* mode = Type of plot
* name = Name of plot
* marker = Plot adjectives(color, line, etc.)
* layout = It is a dictionary that includes attributes of plot(s) like width, height, etc.

In [None]:
from plotly.subplots import make_subplots

fig = make_subplots(rows=6,cols=1,subplot_titles = ("Pelvic Incidence","Lumbar Lordosis Angle","Pelvic Tilt Numeric","Sacral Slope","Degree Spondylolisthesis","Pelvic Raidus"))

fig.append_trace(go.Scatter(
x = data.index,
y = data.pelvic_incidence,
mode = "lines",
name = "Pelvic Incidence",
marker = dict(color = 'rgba(16, 112, 2, 0.8)')),row = 1, col = 1)

fig.append_trace(go.Scatter(
x = data.index,
y = data["pelvic_tilt numeric"],
mode = "lines",
name = "Pelvic Tilt Numeric",
marker = dict(color = 'rgba(80, 26, 80, 0.8)')),row = 2, col = 1)

fig.append_trace(go.Scatter(
x = data.index,
y = data.lumbar_lordosis_angle,
mode = "lines",
name = "Lumbar Lordosis Angle",
marker = dict(color = 'rgba(160, 112, 20, 0.8)')),row = 3, col = 1)

fig.append_trace(go.Scatter(
x = data.index,
y = data.sacral_slope,
mode = "lines",
name = "Sacral Slope",
marker = dict(color = 'rgba(12, 12, 140, 0.8)')),row = 4, col = 1)

fig.append_trace(go.Scatter(
x = data.index,
y = data.pelvic_radius,
mode = "lines",
name = "Pelvic Radius",
marker = dict(color = 'rgba(245, 128, 2, 0.8)')),row = 5, col = 1)

fig.append_trace(go.Scatter(
x = data.index,
y = data.degree_spondylolisthesis,
mode = "lines",
name = "Degree Spondylolisthesis",
marker = dict(color = 'rgba(235, 144, 235, 0.8)')),row = 6, col = 1) #174

fig.update_xaxes(title_text="Patient Number", row=1, col=1)
fig.update_xaxes(title_text="Patient Number", row=2, col=1)
fig.update_xaxes(title_text="Patient Number", row=3, col=1)
fig.update_xaxes(title_text="Patient Number", row=4, col=1)
fig.update_xaxes(title_text="Patient Number", row=5, col=1)
fig.update_xaxes(title_text="Patient Number", row=6, col=1)

fig.update_yaxes(title_text="Pelvic Incidence", row=1, col=1)
fig.update_yaxes(title_text="Lumbar Lordosis Angle", row=2, col=1)
fig.update_yaxes(title_text="Pelvic Tilt Numeric", row=3, col=1)
fig.update_yaxes(title_text="Sacral Slope", row=4, col=1)
fig.update_yaxes(title_text="Degree Spondylolisthesis", row=5, col=1)
fig.update_yaxes(title_text="Pelvic Radius", row=6, col=1)

fig.update_layout(height = 1800, width = 1000, title = "Biomechanical Features of Patients",template = "plotly_white")

iplot(fig)

<a id="6"></a> <br>
## Histogram, Feature Values' Frequencies
[Back to introduction](#26)
* Histogram is a chart type that is showing frequencies for each values.
* Red bars are abnormal, green bars are normal patients. 
* fig = It includes data and layout
* x = X-Axis
* name = Name of plot
* marker = Plot adjectives(color, line, etc.)
* layout = It is a dictionary that includes attributes of plot(s) like width, height, etc.
* showlegend = Show label

In [None]:
from plotly.subplots import make_subplots

fig = make_subplots(rows=3, cols=2,subplot_titles = ("Pelvic Incidence","Lumbar Lordosis Angle","Pelvic Tilt Numeric","Sacral Slope","Degree Spondylolisthesis","Pelvic Raidus"))

fig.append_trace(go.Histogram(x = A.pelvic_incidence, name = "Abnormal",showlegend = True,marker = dict(color = 'rgb(255, 100, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 1, col = 1)
fig.append_trace(go.Histogram(x = N.pelvic_incidence, name = "Normal",showlegend = True,marker = dict(color = 'rgb(100, 240, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 1, col = 1)

fig.append_trace(go.Histogram(x = A.lumbar_lordosis_angle,name = "Abnormal",showlegend = True,marker = dict(color = 'rgb(255, 100, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 1, col = 2)
fig.append_trace(go.Histogram(x = N.lumbar_lordosis_angle,name = "Normal",showlegend = True,marker = dict(color = 'rgb(100, 240, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 1, col = 2)

fig.append_trace(go.Histogram(x = A["pelvic_tilt numeric"], name = "Abnormal",showlegend = True,marker = dict(color = 'rgb(255, 100, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 2, col = 1)
fig.append_trace(go.Histogram(x = N["pelvic_tilt numeric"], name = "Normal",showlegend = True,marker = dict(color = 'rgb(100, 240, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 2, col = 1)

fig.append_trace(go.Histogram(x = A.sacral_slope, name = "Abnormal",showlegend = True,marker = dict(color = 'rgb(255, 100, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 2, col = 2)
fig.append_trace(go.Histogram(x = N.sacral_slope, name = "Normal",showlegend = True,marker = dict(color = 'rgb(100, 240, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 2, col = 2)

fig.append_trace(go.Histogram(x = A.degree_spondylolisthesis, name = "Abnormal",showlegend = True,marker = dict(color = 'rgb(255, 100, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 3, col = 1)
fig.append_trace(go.Histogram(x = N.degree_spondylolisthesis, name = "Normal",showlegend = True,marker = dict(color = 'rgb(100, 240, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 3, col = 1)

fig.append_trace(go.Histogram(x = A.pelvic_radius, name = "Abnormal",showlegend = True,marker = dict(color = 'rgb(255, 100, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 3, col = 2)
fig.append_trace(go.Histogram(x = N.pelvic_radius, name = "Normal",showlegend = True,marker = dict(color = 'rgb(100, 240, 100)',line = dict(color = "rgb(0,0,0)",width = 1.2))),row = 3, col = 2)

fig.update_xaxes(title_text="Number of Patients", row=1, col=1)
fig.update_xaxes(title_text="Number of Patients", row=1, col=2)
fig.update_xaxes(title_text="Number of Patients", row=2, col=1)
fig.update_xaxes(title_text="Number of Patients", row=2, col=2)
fig.update_xaxes(title_text="Number of Patients", row=3, col=1)
fig.update_xaxes(title_text="Number of Patients", row=3, col=2)

fig.update_yaxes(title_text="Pelvic Incidence", row=1, col=1)
fig.update_yaxes(title_text="Lumbar Lordosis Angle", row=1, col=2)
fig.update_yaxes(title_text="Pelvic Tilt Numeric", row=2, col=1)
fig.update_yaxes(title_text="Sacral Slope", row=2, col=2)
fig.update_yaxes(title_text="Degree Spondylolisthesis", row=3, col=1)
fig.update_yaxes(title_text="Pelvic Radius", row=3, col=2)

fig.update_layout(height=1400, width=850, title_text="Biomechanical Features' Frequencies For Each Classes ",template = "plotly_white")

fig.show()


<a id="7"></a> <br>
## Swarmplot
[Back to introduction](#26)
* Swarmplot is a way to draw a categorical scatterplot with non-overlapping points.

In [None]:
data_swrm_plt = data1.iloc[::-1] #reverse data
plt.style.use("default")
sns.set(style="whitegrid",palette = "muted")

data_swrm = pd.melt(data_swrm_plt,id_vars="class",
                    var_name="Features",
                    value_name='Values')
plt.figure(figsize = (13,8))
sns.swarmplot(x="Features", y="Values",hue="class", data=data_swrm)
plt.title("Swarmplot")
plt.show()

<a id="8"></a> <br>
## Barplot, Feature Value Means 
[Back to introduction](#26)
* This plot is showing feature value means for each feature.
* In the right top of the plot there is a second chart that shows maximum values of features.
* fig = It includes data and layout
* x = X-Axis
* y = Y-Axis
* mode = Type of plot
* name = Name of plot
* marker = Plot adjectives(color, line, etc.)
* text = Text
* textposition = Textposition
* twxtfont = Textfont
* layout = It is a dictionary that includes attributes of plot(s) like width, height, etc.

In [None]:
import plotly.graph_objs as go

class_list = list(data1['class'].unique())

list1_max = [np.max(np.array([A.pelvic_incidence])),np.max(np.array([N.pelvic_incidence]))]
list2_max = [np.max(np.array([A["pelvic_tilt numeric"]])),np.max(np.array([N["pelvic_tilt numeric"]]))]
list3_max = [np.max(np.array([A.lumbar_lordosis_angle])),np.max(np.array([N.lumbar_lordosis_angle]))]
list4_max = [np.max(np.array([A.sacral_slope])),np.max(np.array([N.sacral_slope]))]
list5_max = [np.max(np.array([A.pelvic_radius])),np.max(np.array([N.pelvic_radius]))]
list6_max = [np.max(np.array([A.degree_spondylolisthesis])),np.max(np.array([N.degree_spondylolisthesis]))]

pelvic_incidence = []
pelvic_tilt_numeric = []
lumbar_lordosis_angle = []
sacral_slope = []
pelvic_radius = []
degree_spondylolisthesis = []

for i in class_list:
    x = data1[data1["class"] == i]
    pelvic_incidence.append(sum(x.pelvic_incidence)/len(x)) 
    pelvic_tilt_numeric.append(sum(x["pelvic_tilt numeric"])/len(x))   
    lumbar_lordosis_angle.append(sum(x.lumbar_lordosis_angle)/len(x)) 
    sacral_slope.append(sum(x.sacral_slope)/len(x))
    pelvic_radius.append(sum(x.pelvic_radius)/len(x))
    degree_spondylolisthesis.append(sum(x.degree_spondylolisthesis)/len(x))  

#visualization

trace1 = go.Bar(
    x = pelvic_incidence,
    y = class_list,
    text = "Pelvic Incidence Mean",
    textposition = "outside",
    textfont = dict(size = 15),
    orientation='h',
    name = "Pelvic Incidence",
    marker = dict(color = "rgba(36,86,104,0.6)",
                 line = dict(color = "rgb(0,0,0)", width = 1.5))

)

trace1_m = go.Bar(
    x = list1_max,
    y = class_list,
    xaxis = "x2",
    yaxis = "y2",
    text = "Max Pelvic Incidence",
    textposition = "auto",
    orientation='h',
    name = "Max Pelvic Incidence",
    textfont = dict(color = "white"),
    marker = dict(color = "rgba(25,51,80,0.6)", 
                 line = dict(color = "rgba(0,0,0,1.0)", width = 1.5))

)

trace2 = go.Bar(
    x = pelvic_tilt_numeric,
    y = class_list,
    text = "Pelvic Tilt Numeric Mean",
    textposition = "outside",
    textfont = dict(size = 15),
    orientation='h',
    name = "Pelvic Tilt Numeric",
    marker = dict(color = "rgba(13,143,129,0.6)",
                 line = dict(color = "rgb(0,0,0)", width = 1.5))


)
trace2_m = go.Bar(
    x = list2_max,
    y = class_list,
    xaxis = "x2",
    yaxis = "y2",
    text = "Max Pelvic Tilt Numeric",
    textposition = "auto",
    orientation='h',
    name = "Max Pelvic Tilt Numeric",
    textfont = dict(color = "white"),
    marker = dict(color = "rgba(25,94,106,0.6)",
                 line = dict(color = "rgba(0,0,0,1.0)", width = 1.5))

)

trace3 = go.Bar(
    x = lumbar_lordosis_angle,
    y = class_list,
    text = "Lumbar Lordosis Angle Mean",
    textposition = "outside",
    orientation='h',
    textfont = dict(size = 15),
    name = "Lumbar Lordosis Angle",
    marker = dict(color = "rgba(57,171,126,0.6)", 
                 line = dict(color = "rgb(0,0,0)", width = 1.5))


)
trace3_m = go.Bar(
    x = list3_max,
    y = class_list,
    xaxis = "x2",
    yaxis = "y2",
    text = "Max Lumbar Lordosis Angle",
    textposition = "auto",
    textfont = dict(color = "white"),
    orientation='h',
    name = "Max Lumbar Lordosis Angle",
    marker = dict(color = "rgba(18,116,117,0.6)", 
                 line = dict(color = "rgba(0,0,0,1.0)", width = 1.5))

)

trace4 = go.Bar(
    x = sacral_slope,
    y = class_list,
    text = "Sacral Slope Mean",
    textposition = "outside",
    orientation='h',
    textfont = dict(size = 15),
    name = "Sacral Slope",
    marker = dict(color = "rgba(110, 196 ,116,0.6)", 
                 line = dict(color = "rgb(0,0,0)", width = 1.5))


)
trace4_m = go.Bar(
    x = list4_max,
    y = class_list,
    xaxis = "x2",
    yaxis = "y2",
    text = "Max Sacral Slope",
    textposition = "auto",
    orientation='h',
    textfont = dict(color = "white"),
    name = "Max Sacral Slope",
    marker = dict(color = "rgba(25,137,125,0.6)", 
                 line = dict(color = "rgba(0,0,0,1.0)", width = 1.5))

)

trace5 = go.Bar(
    x = pelvic_radius,
    y = class_list,
    text = "Pelvic Radius Mean",
    textposition = "outside",
    textfont = dict(size = 15),
    orientation='h',
    name = "Pelvic Radius",
    marker = dict(color = "rgba(15,114,121,0.6)", 
                 line = dict(color = "rgb(0,0,0)", width = 1.5))


)
trace5_m = go.Bar(
    x = list5_max,
    y = class_list,
    xaxis = "x2",
    yaxis = "y2",
    text = "Max Pelvic Radius",
    textposition = "auto",
    textfont = dict(color = "white"),
    orientation='h',
    name = "Max Pelvic Radius",
    marker = dict(color = "rgba(28,72,93,0.6)", 
                 line = dict(color = "rgba(0,0,0,1.0)", width = 1.5))

)

trace6 = go.Bar(
    x = degree_spondylolisthesis,
    y = class_list,
    text = "Deg. Spond. Mean",
    textposition = "outside",
    textfont = dict(size = 15),
    orientation='h',
    name = "Degree Spondylolisthesis",
    marker = dict(color = "rgba(169,220,103,0.6)", 
                 line = dict(color = "rgb(0,0,0)", width = 1.5))

)
trace6_m = go.Bar(
    x = list6_max,
    y = class_list,
    xaxis = "x2",
    yaxis = "y2",
    text = "Max Deg. Spond.",
    textposition = "auto",
    orientation='h',
    textfont = dict(color = "white"),
    name = "Max Degree Spondylolisthesis",
    marker = dict(color = "rgba(65,157,127,0.6)", 
                 line = dict(color = "rgba(0,0,0,1.0)", width = 1.5))

)
            
data_bar = [trace1_m,trace5_m,trace2_m,trace3_m,trace4_m,trace6_m,trace1,trace5,trace2,trace3,trace4,trace6]
layout = go.Layout(template = "plotly_white",height = 900, width = 1000, barmode = "group",xaxis2 = dict(domain=[0.64,0.99],anchor = "y2"),yaxis2 = dict(domain=[0.6,0.98],anchor="x2"),title = "Biomechanical Features For Each Classes") 

fig = go.Figure(data = data_bar, layout = layout)

fig.update_xaxes(title_text = "Mean Value")
fig.update_yaxes(title_text = "Class")
fig.add_annotation(
            x=1.19,#1,19
            y=1.8,
            showarrow = False,
            text="Max Vlues of Biomechanical Features")

iplot(fig)


<a id="9"></a> <br>
## Boxplot, Biomechanical Features' Quartiles and Outliers For Each Classes 
[Back to introduction](#26)
* Boxplot is a chart type that shows quartiles and outliers.
* fig = It includes data and layout
* y = Y-Axis
* name = Name of plot
* layout = ıt is a dictionary that includes attributes of plot(s) like width, height, etc.

In [None]:
from plotly.subplots import make_subplots

fig = make_subplots(rows=3, cols=2,subplot_titles = ("Pelvic Incidence","Lumbar Lordosis Angle","Pelvic Tilt Numeric","Sacral Slope","Degree Spondylolisthesis","Pelvic Raidus"))

fig.append_trace(go.Box(y = A.pelvic_incidence, name = "Abnormal",marker_color = 'rgb(255, 100, 100)'),row = 1, col = 1)
fig.append_trace(go.Box(y = N.pelvic_incidence,name = "Normal",marker_color = 'rgb(100, 240, 100)'),row = 1, col = 1)

fig.append_trace(go.Box(y = A.lumbar_lordosis_angle,name = "Abnormal",showlegend = True,marker_color = 'rgb(255, 100, 100)'),row = 1, col = 2)
fig.append_trace(go.Box(y = N.lumbar_lordosis_angle,name = "Normal",showlegend = True,marker_color = 'rgb(100, 240, 100)'),row = 1, col = 2)

fig.append_trace(go.Box(y = A["pelvic_tilt numeric"],name = "Abnormal",showlegend = True,marker_color = 'rgb(255, 100, 100)'),row = 2, col = 1)
fig.append_trace(go.Box(y = N["pelvic_tilt numeric"],name = "Normal",showlegend = True,marker_color = 'rgb(100, 240, 100)'),row = 2, col = 1)

fig.append_trace(go.Box(y = A.sacral_slope,name = "Abnormal",showlegend = True,marker_color = 'rgb(255, 100, 100)'),row = 2,col =2)
fig.append_trace(go.Box(y = N.sacral_slope,name = "Normal",showlegend = True,marker_color = 'rgb(100, 240, 100)'),row = 2,col =2)

fig.append_trace(go.Box(y = A.degree_spondylolisthesis,name = "Abnormal",showlegend = True,marker_color = 'rgb(255, 100, 100)'),row = 3,col = 1)
fig.append_trace(go.Box(y = N.degree_spondylolisthesis,name = "Normal",showlegend = True,marker_color = 'rgb(100, 240, 100)'),row = 3,col = 1)
 
fig.append_trace(go.Box(y = A.pelvic_radius,name = "Abnormal",showlegend = True,marker_color = 'rgb(255, 100, 100)'),row = 3 , col = 2)
fig.append_trace(go.Box(y = N.pelvic_radius,name = "Normal",showlegend = True,marker_color = 'rgb(100, 240, 100)'),row = 3, col = 2)

fig.update_xaxes(title_text="Class", row=1, col=1)
fig.update_xaxes(title_text="Class", row=1, col=2)
fig.update_xaxes(title_text="Class", row=2, col=1)
fig.update_xaxes(title_text="Class", row=2, col=2)
fig.update_xaxes(title_text="Class", row=3, col=1)
fig.update_xaxes(title_text="Class", row=3, col=2)

fig.update_yaxes(title_text="Pelvic Incidence", row=1, col=1)
fig.update_yaxes(title_text="Lumbar Lordosis Angle", row=1, col=2)
fig.update_yaxes(title_text="Pelvic Tilt Numeric", row=2, col=1)
fig.update_yaxes(title_text="Sacral Slope", row=2, col=2)
fig.update_yaxes(title_text="Degree Spondylolisthesis", row=3, col=1)
fig.update_yaxes(title_text="Pelvic Radius", row=3, col=2)

fig.update_layout(height=1000, width=800, title_text="Biomechanical Features' Quartiles and Outliers For Each Classes",template = "plotly_white")

fig.show()

<a id="10"></a> <br>
## Scatterplot
[Back to introduction](#26)
* These plots are showing patient feature values for some of the features.
* fig = It includes data and layout
* x = X-Axis
* y = Y-Axis
* mode = Type of plot
* name = Name of plot
* marker = Plot adjectives(color, line, etc.)
* layout = ıt is a dictionary that includes attributes of plot(s) like width, height, etc.

In [None]:
from plotly.subplots import make_subplots

fig = make_subplots(rows=3, cols=1,subplot_titles = ("Pelvic Incidence-Lumbar Lordosis Angle","Pelvic Tilt Numeric-Sacral Slope","Degree Spondylosthesis-Pelvic Radius"))

fig.append_trace(go.Scatter(
x = A.pelvic_incidence,
y = A.lumbar_lordosis_angle,
mode = "markers",
name = "Abnormal",
marker = dict(color = 'rgba(255, 100, 200, 0.9)') 
),row = 1, col = 1)

fig.append_trace(go.Scatter(
x = N.pelvic_incidence,
y = N.lumbar_lordosis_angle,
mode = "markers",
name = "Normal",
marker = dict(color = 'rgba(100, 100, 250, 0.9)') #100,250,100
),row = 1, col = 1 )
    
    
fig.append_trace(go.Scatter(
x = A["pelvic_tilt numeric"],
y = A.sacral_slope,
showlegend = True,
mode = "markers",
name = "Abnormal",
marker = dict(color = 'rgba(255, 100, 200, 0.9)')
),row = 2, col = 1)

fig.append_trace(go.Scatter(
x = N["pelvic_tilt numeric"],
y = N.sacral_slope,
showlegend = True,
mode = "markers",
name = "Normal",
marker = dict(color = 'rgba(100, 100, 250, 0.9)')
),row = 2, col = 1)  


fig.append_trace(go.Scatter(
x = A.degree_spondylolisthesis,
y = A.pelvic_radius,
showlegend = True,
mode = "markers",
name = "Abnormal",
marker = dict(color = 'rgba(255, 100, 200, 0.9)')
),row = 3, col = 1)

fig.append_trace(go.Scatter(
x = N.degree_spondylolisthesis,
y = N.pelvic_radius,
showlegend = True,
mode = "markers",
name = "Normal",
marker = dict(color = 'rgba(100, 100, 250, 0.9)')
),row = 3, col = 1)

fig.update_xaxes(title_text="Pelvic Incidence", row=1, col=1)
fig.update_xaxes(title_text="Pelvic Tilt Numeric", row=2, col=1)
fig.update_xaxes(title_text="Degree Spondylolisthesis", row=3, col=1)

fig.update_yaxes(title_text="Lumbar Lordosis Angle", row=1, col=1)
fig.update_yaxes(title_text="Sacral Slope", row=2, col=1)
fig.update_yaxes(title_text="Pelvic Radius", row=3, col=1)

fig.update_layout(height=1200, width=800, title_text="Patients' Classes According to Biomechanical Features",template = "plotly_white")

iplot(fig)

<a id="11"></a> <br>
## Correlation Heatmap
[Back to introduction](#26)
* Heatmap is a chart type that shows correlation between features.
* There are different types for calculating correlation but this chart is using **Pearson's r** correlation. 
* Here is how to calculate it: <img src = "https://wikimedia.org/api/rest_v1/media/math/render/svg/2b9c2079a3ffc1aacd36201ea0a3fb2460dc226f" height = "500" width = "500"> 
* r-values: <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Correlation_examples2.svg/600px-Correlation_examples2.svg.png" height = "500" width = "500"> 
* annot = Show correlation value
* fmt = Float digits

In [None]:
import seaborn as sns
#data1 = pd.concat([x,y],axis = 1) 

data1.corr() 

f,ax = plt.subplots(figsize=(12, 12))
sns.heatmap(data1.corr(), annot=True,annot_kws = {"size": 12}, linewidths=0.5, fmt = '.3f', ax=ax)
plt.xticks(rotation = 25)
plt.title("Correlation Between Biomechanical Features", fontsize = 20)
plt.show()

<a id="12"></a> <br>
## Correlated Features
[Back to introduction](#26)
* This net is showing the correlated features according to a threshold.

In [None]:
import networkx as nx
corr = data1.iloc[:,0:6].corr()

links = corr.stack().reset_index()

links.columns = ['var1', 'var2','value']

# correlation
threshold = 0.59

# Keep only correlation over a threshold and remove self correlation (cor(A,A)=1)
links_filtered=links.loc[ (links['value'] >= threshold ) & (links['var1'] != links['var2']) ]
 
# Build your graph
G=nx.from_pandas_edgelist(links_filtered, 'var1', 'var2', edge_attr=True) 

plt.figure(figsize = (10,6))
nx.draw_circular(G, with_labels=True, node_color='orange', node_size=300, edge_color='red', linewidths=1, font_size=10)
plt.title("Correlated Features(Threshold:{})".format(threshold))
plt.text(-0.85,-0.88,"*Pelvic Radius has no correlation with other features" ,fontsize = 12, color = "black")
plt.show()

<a id="13"></a> <br>
## 3D Scatterplots
[Back to introduction](#26)
* fig = It includes data and layout
* x = X-Axis
* y = Y-Axis
* z = Z-Axis
* mode = Type of plot
* name = Name of plot
* marker = Plot adjectives(color, line, etc.)
* layout = It is a dictionary that includes attributes of plot(s) like width, height, etc.

In [None]:
data1_pps = data1.drop(["lumbar_lordosis_angle", "pelvic_radius", "degree_spondylolisthesis","class"], axis = 1)
data1_ppl = data1.drop(["pelvic_radius","sacral_slope","degree_spondylolisthesis","class"], axis = 1)
data1_dlp =  data1.drop(["sacral_slope","pelvic_tilt numeric","pelvic_incidence","class"], axis = 1)

In [None]:
# 3D scatterplot1
trace1 = go.Scatter3d(
                      x = A.pelvic_incidence,
                      y = A["pelvic_tilt numeric"],
                      z = A.sacral_slope,
                      mode = "markers",
                      name = "Abnormal",
                      marker = dict(
                           size = 4,
                           color = "rgb(255,100,100)"
                      )
)

trace2 = go.Scatter3d(
                      x = N.pelvic_incidence,
                      y = N["pelvic_tilt numeric"],
                      z = N.sacral_slope,
                      mode = "markers",
                      name = "Normal",
                      marker = dict(
                           size = 4,
                           color = "rgb(100,250,100)"
                      )
)
combine = [trace1,trace2]
layout = go.Layout(template = "plotly_white",
    scene = dict(
    xaxis =dict(
        title = "Pelvic Incidence"),
    yaxis =dict(
        title ="Pelvic Tilt Numeric"),
    zaxis =dict(
        title = "Sacral Slope"),),
    width = 800,
    margin = dict(l = 10,r = 10,b = 10,t = 10 )
    )
fig = go.Figure(data = combine, layout = layout)
iplot(fig)

#heatmap
f, ax = plt.subplots(figsize = (6,5))
sns.heatmap(data1_pps.corr(),annot = True,annot_kws = {"size": 15}, linewidths = 0.8,cmap ="Blues", linecolor = "black", fmt = ".2f",ax=ax).set(title = "Pelvic Incidence-Pelvic Tilt Numeric-Sacral Slope")
plt.xticks(rotation = 45)
plt.yticks(rotation = 0)
plt.show()

# 3D scatterplot1
trace1 = go.Scatter3d(
                      x = A.pelvic_incidence,
                      y = A["pelvic_tilt numeric"],
                      z = A.lumbar_lordosis_angle,
                      mode = "markers",
                      name = "Abnormal",
                      marker = dict(
                           size = 4,
                           color = "rgb(255,100,100)"
                      )
)

trace2 = go.Scatter3d(
                      x = N.pelvic_incidence,
                      y = N["pelvic_tilt numeric"],
                      z = N.lumbar_lordosis_angle,
                      mode = "markers",
                      name = "Normal",
                      marker = dict(
                           size = 4,
                           color = "rgb(100,250,100)"
                      )
)
combine = [trace1,trace2]
layout = go.Layout(template = "plotly_white",
    scene = dict(
    xaxis =dict(
        title = "Pelvic Incidence"),
    yaxis =dict(
        title ="Pelvic Tilt Numeric"),
    zaxis =dict(
        title = "Sacral Slope"),),
    width = 800,
    margin = dict(l = 10,r = 10,b = 10,t = 10 )
    )
fig = go.Figure(data = combine, layout = layout)
iplot(fig)

#heatmap
f, ax = plt.subplots(figsize = (6,5))

sns.heatmap(data1_ppl.corr(),annot = True,annot_kws = {"size": 15}, linewidths = 0.8,cmap ="Blues", linecolor = "black", fmt = ".2f",ax=ax).set(title = "Pelvic Incidence-Pelvic Tilt Numeric-Sacral Slope")
plt.xticks(rotation = 45)
plt.yticks(rotation = 0)
plt.show()

# 3D scatterplot1
trace1 = go.Scatter3d(
                      x = A.degree_spondylolisthesis,
                      y = A.lumbar_lordosis_angle,
                      z = A.pelvic_radius,
                      mode = "markers",
                      name = "Abnormal",
                      marker = dict(
                           size = 4,
                           color = "rgb(255,100,100)"
                      )
)

trace2 = go.Scatter3d(
                      x = N.degree_spondylolisthesis,
                      y = N.lumbar_lordosis_angle,
                      z = N.pelvic_radius,
                      mode = "markers",
                      name = "Normal",
                      marker = dict(
                           size = 4,
                           color = "rgb(100,250,100)"
                      )
)
combine = [trace1,trace2]
layout = go.Layout(template = "plotly_white",
    scene = dict(
    xaxis =dict(
        title = "Pelvic Incidence"),
    yaxis =dict(
        title ="Pelvic Tilt Numeric"),
    zaxis =dict(
        title = "Sacral Slope"),),
    width = 800,
    margin = dict(l = 10,r = 10,b = 10,t = 10 )
    )
fig = go.Figure(data = combine, layout = layout)
iplot(fig)

#heatmap
f, ax = plt.subplots(figsize = (6,5))

sns.heatmap(data1_dlp.corr(),annot = True,annot_kws = {"size": 15}, linewidths = 0.8,cmap ="Blues", linecolor = "black", fmt = ".2f",ax=ax).set(title = "Pelvic Incidence-Pelvic Tilt Numeric-Sacral Slope")
plt.xticks(rotation = 45)
plt.yticks(rotation = 0)
plt.show()


<a id="14"></a> <br>
# Machine Learning(ML)
[Back to introduction](#26)

Thanks to: https://www.kaggle.com/vbmokin/biomechanical-features-20-popular-models

In [None]:
data1["class"] = [1 if each == "Abnormal" else 0 for each in data["class"]]

y_data = data1["class"].values
x_data = data1.drop(["class"],axis = 1)

data1.head(214) #normal patients starting at 210

<a id="15"></a> <br>
## Linear Regression
[Back to introduction](#26)

Thanks to: https://www.kaggle.com/kanncaa1/machine-learning-tutorial-for-beginners


In [None]:
plt.style.use('default')

x = np.array(A.loc[:,'pelvic_incidence']).reshape(-1,1)
y = np.array(A.loc[:,'sacral_slope']).reshape(-1,1)

fig, axs = plt.subplots(3, 1, figsize=(8, 24), sharey=True)
# Scatter1
axs[0].scatter(x=x,y=y)
axs[0].set_xlabel('Pelvic Incidence')
axs[0].set_ylabel('Sacral Slope')

x1 = np.array(A.loc[:,'lumbar_lordosis_angle']).reshape(-1,1)
y1 = np.array(A.loc[:,'sacral_slope']).reshape(-1,1)
# Scatter2
axs[1].scatter(x=x1,y=y1)
axs[1].set_xlabel('Lumbar Lordosis Angle')
axs[1].set_ylabel('Sacral Slope')

x2 = np.array(A.loc[:,'degree_spondylolisthesis']).reshape(-1,1)
y2 = np.array(A.loc[:,'pelvic_radius']).reshape(-1,1)
#scatter3
axs[2].scatter(x = x2,y=y2)
axs[2].set_xlabel("Degree Spondylolisthesis")
axs[2].set_ylabel("Pelvic Radius")
plt.show()


In [None]:
from sklearn.linear_model import LinearRegression

x = np.array(A.loc[:,'pelvic_incidence']).reshape(-1,1)
y = np.array(A.loc[:,'sacral_slope']).reshape(-1,1)

fig, axs = plt.subplots(3, 1, figsize=(10, 18), sharey=True) #10,18
#model1
reg = LinearRegression()
# Predict space
predict_space = np.linspace(min(x), max(x)).reshape(-1,1)
# Fit
reg.fit(x,y)
# Predict
predicted = reg.predict(predict_space)
# R^2 
print('R^2 score 1: ',reg.score(x, y))
# Plot regression line and scatter
axs[0].plot(predict_space, predicted, color='black', linewidth=3,label = "LR Prediction")
axs[0].scatter(x=x,y=y,label = "Data")
axs[0].legend()
axs[0].set_xlabel('Pelvic Incidence') 
axs[0].set_ylabel('Sacral Slope')
axs[0].text(0.79,0.7 ,"LR Prediction" ,fontsize = 13, rotation = 19, color = "red") #25 rot
axs[0].grid(True, alpha = 0.5)
axs[0].set_title("A Bit of Suitable")

#model2
reg1 = LinearRegression()
# Predict space
predict_space1 = np.linspace(min(x1), max(x1)).reshape(-1,1)
# Fit
reg1.fit(x1,y1)
# Predict
predicted1 = reg1.predict(predict_space)
# R^2 
print('R^2 score 2: ',reg1.score(x1, y1))
# Plot regression line and scatter
axs[1].plot(predict_space1, predicted1, color='black', linewidth=3,label = "LR Prediction")
axs[1].scatter(x=x1,y=y1,label = "Data")
axs[1].legend()
axs[1].text(0.8,0.58 ,"LR Prediction" ,fontsize = 13, rotation = 12, color = "red")
axs[1].set_xlabel('Lumbar Lordosis Angle') 
axs[1].set_ylabel('Sacral Slope')
axs[1].grid(True, alpha = 0.5)
axs[1].set_title("Not That Suitable")


#model3
reg2 = LinearRegression()
# Predict space
predict_space2 = np.linspace(min(x2), max(x2)).reshape(-1,1)
# Fit
reg2.fit(x2,y2)
# Predict
predicted2 = reg2.predict(predict_space)
# R^2 
print('R^2 score 3: ',reg2.score(x2, y2))
# Plot regression line and scatter
axs[2].plot(predict_space2, predicted2, color='black', linewidth=3,label = "LR Prediction")
axs[2].scatter(x=x2,y=y2,label = "Data")
axs[2].legend()
axs[2].text(0.8,0.72 ,"LR Prediction" ,fontsize = 13, rotation = 7.3, color = "red")
axs[2].set_xlabel('Degree Spondylolisthesis') 
axs[2].set_ylabel('Pelvic Radius')
axs[2].grid(True, alpha = 0.5)
axs[2].set_title("Not Suitable")
plt.show()

<a id="16"></a> <br>
## Outlier Detection
[Back to introduction](#26)

In [None]:
from sklearn.neighbors import KNeighborsClassifier, LocalOutlierFactor
data_ml = pd.read_csv("../input/biomechanical-features-of-orthopedic-patients/column_2C_weka.csv")

y = data_ml["class"] #data
x = data_ml.drop(["class"],axis = 1)

clf = LocalOutlierFactor()
y_pred = clf.fit_predict(x)
X_score = clf.negative_outlier_factor_

outlier_score = pd.DataFrame()
outlier_score["score"] = X_score
outlier_score.head()

threshold = -2.0 #-2.0
filter1 = outlier_score["score"] < threshold
outlier_index = outlier_score[filter1].index.tolist()

x = x.drop(outlier_index)
y = y.drop(outlier_index).values
x.info()
x_len = len(x)

plt.figure(figsize = (13,8))
plt.scatter(x.iloc[:,0],x.iloc[:,1],color = "k",s = 6,label = "Data Points")
plt.scatter(x.iloc[outlier_index,0],x.iloc[outlier_index,1],color = "red",s = 30, label = "Outlier")

radius = (X_score.max() - X_score)/(X_score.max() - X_score.min())
outlier_score["radius"] = radius
plt.scatter(x.iloc[:,0], x.iloc[:,1], s = 1000*radius, edgecolor = "b", facecolors = "none", label = "Outlier Score")
plt.legend()
plt.xlabel("Pelvic Incidence")
plt.ylabel("Pelvic Tilt Numeric")
plt.grid(True,alpha = 0.4)
plt.text(93,-6 ,"Number of Outliers:"+str(len(data_ml) - x_len) ,fontsize = 18,color = "black") #92
plt.title("Outlier Dtection Plot")
plt.show() 



<a id="17"></a> <br>
## Logistic Regression
[Back to introduction](#26)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

#data_ml = pd.read_csv("../input/biomechanical-features-of-orthopedic-patients/column_2C_weka.csv")
#x,y = data_ml.loc[:,data_ml.columns != 'class'], data_ml.loc[:,'class']

score_list_lr = []
train_list = []
for i in range(1,10):

    x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = i/10, random_state = 40)

    lr = LogisticRegression()
    lr.fit(x_train,y_train) 
    print("test accuracy:{}/Test Size:{}".format(lr.score(x_test,y_test),i))
    score_list_lr.append(lr.score(x_test,y_test))
    train_list.append(lr.score(x_train,y_train))

plt.figure(figsize = (13,8))
plt.plot(range(1,10),score_list_lr,label = "Test Accuracy")
plt.plot(range(1,10),train_list, label = "Train Accuracy")
plt.legend()
plt.xlabel("Test Sizes")
plt.ylabel("Accuracy")
plt.title("Scores For Each Test Sizes")
plt.grid(True, alpha = 0.4)
plt.show()

## Confusion Matrix Logistic Regression
[Back to introduction](#26)

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = (1+score_list_lr.index(np.max(score_list_lr)))/10, random_state = 40)

y_pred = lr.predict(x_test)
y_true = y_test

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true,y_pred)

f, ax = plt.subplots(figsize = (6,6))
sns.heatmap(cm,annot = True, linewidths = 0.5,cmap ="Greens",annot_kws = {"size": 12}, linecolor = "gray", fmt = ".0f", ax=ax )
plt.xlabel("Y Prediction")
plt.title("Confusion Matrix Logistic Regression")
plt.ylabel("Y True")
plt.show()


<a id="18"></a> <br>
## KNN
[Back to introduction](#26)

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.3, random_state = 42)

#x_train, x_test, y_train, y_test = train_test_split(x_data,y_data,test_size = 0.3, random_state = 42)

In [None]:
#KNN model

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 6)
knn.fit(x_train,y_train)
prediction = knn.predict(x_test)

prediction

In [None]:
#score
print("{} nn score: {}".format(6,knn.score(x_test,y_test)))

In [None]:
score_list_knn = []
train_list = []
for each in range(1,25):
    knn2 = KNeighborsClassifier(n_neighbors = each)
    knn2.fit(x_train,y_train)
    score_list_knn.append(knn2.score(x_test,y_test))
    train_list.append(knn2.score(x_train,y_train))

plt.figure(figsize=[12,8])
plt.plot(range(1,25),score_list_knn, label = "Test Accuracy")
plt.plot(range(1,25),train_list,c = "orange", label = "Train Accuracy")
plt.legend()
plt.xlabel("K Values")
plt.ylabel("Accuracy")
plt.title("Scores For Each K Values")
plt.grid(True , alpha = 0.4)
plt.show()

print("Best Accuracy(test):{}/Neighbors:{}".format(np.max(score_list_knn),1+score_list_knn.index(np.max(score_list_knn))))


## Confusion Matrix KNN
[Back to introduction](#26)

In [None]:
knn3 = KNeighborsClassifier(n_neighbors = 1+score_list_knn.index(np.max(score_list_knn)))
knn3.fit(x_train,y_train)
y_pred = knn3.predict(x_test)
y_true = y_test

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true,y_pred)

f, ax = plt.subplots(figsize = (6,6))
sns.heatmap(cm,annot = True, linewidths = 0.5,cmap ="Greens",annot_kws = {"size": 12}, linecolor = "gray", fmt = ".0f", ax=ax )
plt.xlabel("Y Prediction")
plt.title("Confusion Matrix KNN")
plt.ylabel("Y True")
plt.show()

<a id="19"></a> <br>
## SVM Model
[Back to introduction](#26)

In [None]:
from sklearn.svm import SVC
data_ml = pd.read_csv("../input/biomechanical-features-of-orthopedic-patients/column_2C_weka.csv")
x,y = data_ml.loc[:,data_ml.columns != 'class'], data_ml.loc[:,'class']

x_train, x_test, y_train, y_test = train_test_split(x_data,y_data,test_size = 0.2 , random_state = 1)

svm = SVC(random_state = 42)
svm.fit(x_train,y_train)
svm_score = svm.score(x_test,y_test)
print("Accuracy of SVM Algorithm: ",svm_score)
#svm_score = svm.score(x_test,y_test)

## Confusion Matrix SVM
[Back to introduction](#26)

In [None]:
y_pred = svm.predict(x_test)
y_true = y_test

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true,y_pred)

f, ax = plt.subplots(figsize = (6,6))
sns.heatmap(cm,annot = True, linewidths = 0.5,cmap ="Greens",annot_kws = {"size": 12}, linecolor = "gray", fmt = ".0f", ax=ax )
plt.xlabel("Y Prediction")
plt.title("Confusion Matrix SVM")
plt.ylabel("Y True")
plt.show()

<a id="20"></a> <br> 
## Navie Bayes
[Back to introduction](#26)

In [None]:
from sklearn.naive_bayes import GaussianNB

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.3 , random_state = 1) #x,y

nb = GaussianNB()
nb.fit(x_train,y_train)
nb_score = nb.score(x_test,y_test)
print("Accuracy of NB Algorithm: ",nb_score)


## Confusion Matrix Navie Bayes
[Back to introduction](#26)

In [None]:
y_pred = nb.predict(x_test)
y_true = y_test

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true,y_pred)

f, ax = plt.subplots(figsize = (6,6))
sns.heatmap(cm,annot = True, linewidths = 0.5,cmap ="Greens",annot_kws = {"size": 12}, linecolor = "gray", fmt = ".0f", ax=ax )
plt.xlabel("Y Prediction")
plt.title("Confusion Matrix GaussianNB")
plt.ylabel("Y True")
plt.show()

<a id="21"></a> <br>
## Decision Tree
[Back to introduction](#26)

In [None]:
from sklearn.tree import DecisionTreeClassifier

x_train, x_test, y_train, y_test = train_test_split(x_data,y_data,test_size = 0.3 , random_state = 1) #x,y

score_list_dt = []
train_list = []
for d in range(1,10):
    clf = DecisionTreeClassifier(max_depth = d,random_state = 42)
    clf = clf.fit(x_train, y_train)
    score_list_dt.append(clf.score(x_test,y_test))
    train_list.append(clf.score(x_train,y_train))
    
plt.figure(figsize = (13,8))
plt.plot(range(1,10),score_list_dt,label = "Test Score Max Depth Accuracy")
plt.plot(range(1,10),train_list,label = "Train Score Max Depth Accuracy")
plt.legend()
plt.xlabel("Max Depth")
plt.ylabel("Accuracy")
plt.grid(True, alpha = 0.5)
plt.title("Accuricies for each Max Depth Values")
plt.show()
print("Best Accuracy:{}/Max Dpeth:{}".format(np.max(score_list_dt),1+score_list_dt.index(np.max(score_list_dt))))

## Confusion Matrix Decision Tree
[Back to introduction](#26)

In [None]:
clf2 = DecisionTreeClassifier(max_depth = 1+score_list_dt.index(np.max(score_list_dt)),random_state = 42)
clf2 = clf2.fit(x_train, y_train)
y_pred = clf2.predict(x_test)
y_true = y_test

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true,y_pred)

f, ax = plt.subplots(figsize = (6,6))
sns.heatmap(cm,annot = True, linewidths = 0.5,cmap ="Greens",annot_kws = {"size": 12}, linecolor = "gray", fmt = ".0f", ax=ax )
plt.xlabel("Y Prediction")
plt.title("Confusion Matrix Decision Tree")
plt.ylabel("Y True")
plt.show()

<a id="22"></a> <br>
## Random Forest
[Back to introduction](#26)

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import r2_score

x_train, x_test, y_train, y_test = train_test_split(x_data,y_data,test_size = 0.3 , random_state = 1) #x,y

num_list = [20,40,60,80,100,120,140,160]
score_list_rf = []
train_list_rf = []
for i in num_list:
    rf = RandomForestClassifier(n_estimators = i, random_state = 42) #100
    rf.fit(x_train,y_train)

    print("=====Number of Trees:"+str(i)+"=====")
    print("Random Forest Algorithm Score: ",rf.score(x_test,y_test))
    print("Random Forest Algorithm Train Score: ",rf.score(x_train,y_train))
    score_list_rf.append(rf.score(x_test,y_test))
    train_list_rf.append(rf.score(x_train,y_train))

plt.figure(figsize = (13,8))
plt.plot(num_list,score_list_rf,label = "Test Accuracy")
plt.plot(num_list,train_list_rf,label = "Train Accuracy")
plt.legend()
plt.xlabel("N Estimators")
plt.ylabel("Accuracy")
plt.title("N Estimators' Effect to Accuracy")
plt.grid(True, alpha=0.4)
plt.show()    

<a id="18"></a> <br>
## Confusion Matrix Random Forest
[Back to introduction](#26)

In [None]:
rf2 = RandomForestClassifier(n_estimators = 20*(1+score_list_rf.index(np.max(score_list_rf))) , random_state = 42) 
rf2.fit(x_train,y_train)
y_pred = rf2.predict(x_test)
y_true = y_test

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true,y_pred)

f, ax = plt.subplots(figsize = (6,6))
sns.heatmap(cm,annot = True, linewidths = 0.5,cmap ="Greens",annot_kws = {"size": 12}, linecolor = "gray", fmt = ".0f", ax=ax )
plt.xlabel("Y Prediction")
plt.title("Confusion Matrix")
plt.ylabel("Y True")
plt.show()

<a id="23"></a> <br>
## Comparison
[Back to introduction](#26)

In [None]:
lr_s = np.max(score_list_lr).round(3)
knn_s = np.max(score_list_knn).round(3)
svm_s = svm_score.round(3)
nb_s = nb_score.round(3)
dt_s = np.max(score_list_dt).round(3)
rf_s = np.max(score_list_rf).round(3)

list_scores = [lr_s,knn_s,svm_s,nb_s,dt_s,rf_s]
list_scores.sort()
list_names = []

for i in list_scores:
    if i == lr_s:
        list_names.append("Logistic Regression")
    elif i == knn_s:
        list_names.append("KNN")
    elif i == svm_s:
        list_names.append("SVM")
    elif i == nb_s:
        list_names.append("GaussianNB")
    elif i == dt_s:
        list_names.append("Decision Tree")
    elif i == rf_s:
        list_names.append("Random Forest")

trace1 = go.Bar(
    x = list_names,
    y = list_scores,
    text = list_scores,
    textposition = "inside",
    marker=dict(color = list_scores,colorbar=dict(
            title="Colorbar"
        ),colorscale="Viridis",))

data = [trace1]
layout = go.Layout(title = "Comparison of Models",template = "plotly_white")

fig = go.Figure(data = data, layout = layout)
fig.update_xaxes(title_text = "Names")
fig.update_yaxes(title_text = "Scores")
fig.show()

<a id="24"></a> <br>
## K-Means
[Back to introduction](#26)

In [None]:
data_kmean = pd.read_csv("../input/biomechanical-features-of-orthopedic-patients/column_2C_weka.csv")
plt.scatter(data_kmean['pelvic_radius'],data_kmean['degree_spondylolisthesis'])
plt.xlabel('Pelvic Radius')
plt.ylabel('Degree spondylolisthesis')
plt.grid(True,alpha = 0.4)
plt.title("Normal Data")
plt.show()

In [None]:
data_kmean2 = data_kmean.loc[:,['degree_spondylolisthesis','pelvic_radius']]
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = 2)
kmeans.fit(data_kmean2)
labels = kmeans.predict(data_kmean2)
plt.scatter(data_kmean2['pelvic_radius'],data_kmean2['degree_spondylolisthesis'],c = labels)
plt.xlabel('Pelvic Radius')
plt.ylabel('Degree Spondylolisthesis')
plt.grid(True,alpha = 0.4)
plt.title("K-Means Prediction")
plt.show()

In [None]:
inertia_list = np.empty(8)
for i in range(1,8):
    kmeans = KMeans(n_clusters=i)
    kmeans.fit(data_kmean2)
    inertia_list[i] = kmeans.inertia_
plt.figure(figsize = (6.5,4))
plt.plot(range(0,8),inertia_list,'-o')
plt.xlabel('Number of cluster')
plt.ylabel('Inertia')
plt.grid(True,alpha = 0.4)
plt.title("Inertia Values for each Cluster Numbers")
plt.show()

<a id="25"></a> <br>
##  Artificial Neural Network(ANN)
[Back to introduction](#26)

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x_data,y_data,test_size = 0.2, random_state = 42) 

In [None]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from keras.models import Sequential # initialize neural network library
from keras.layers import Dense # build our layers library

def build_classifier():
    model = Sequential()
    model.add(Dense(units = 96, kernel_initializer = "uniform",activation = "relu", input_dim = x_train.shape[1]))
    model.add(Dense(units = 48, kernel_initializer = "uniform", activation = "linear"))
    model.add(Dense(units = 1, kernel_initializer = "uniform", activation = "sigmoid"))
    
    model.compile(optimizer = "adam", loss = "binary_crossentropy", metrics = ["accuracy"])
    
    return model

classifier = KerasClassifier(build_fn = build_classifier, epochs = 100)
accuracies = cross_val_score(estimator = classifier, X = x_train, y = y_train, cv = 3)
mean = accuracies.mean()
variance = accuracies.std()

print("Accuracy Mean:"+ str(mean))
print("Accuracy Variance:"+ str(variance))


# Conclusion
[Back to introduction](#26)
* We have seen ML models and statistical analysis examples. 
* We compared ML models' accuricies.
* You can check my other kernels: https://www.kaggle.com/mrhippo/notebooks

* If there is something wrong with this notebook please let me know in the comments.