### Hello, Everyone.
### Welcome to My Notebook for Classification Chronic kidney disease (CKD).
## The web application for this project is available in this link
 https://ckdclassification.herokuapp.com/

## Intro
### Chronic Kidney Disease (CKD) patients are at higher risk for more severe illness. Due to immunosuppression, the kidney disease patients who appear most at risk for  COVID-19. The complications of CKD are high blood pressure, anemia (low blood count), and poor nutritional health.  So, developing a model to classify CKD patients is needed and an important to avoid the risk of this disease


## Roadmap of Notebook
###	Problem Definition
###	CKD Features
###	Analysis & Visualization.
###	Data Preprocessing
####	Data Cleaning
####	Handling Missing Values
####	Outliers
####	Splitting Data
####	Feature Selection
####	Snapshots of some visualization
###	Used machine learning models
###	Results
### Save a Model
### Try a Model


### Used Machine Learning Models
#### 1.	Decision Tree
#### 2.	Random forest 
#### 3.	SVM
#### 4.	KNN
#### 5.	XGBoost


##  1. Data  Preprocessing  

In [None]:
#import needed libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.feature_selection import f_regression
from sklearn.feature_selection import SelectKBest
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import balanced_accuracy_score

In [None]:
#read and load data csv files
data = pd.read_csv('../input/ckdisease/kidney_disease.csv')


In [None]:
#Check the 10 samples of train data
data.head(10)

In [None]:
#Check the last 5 samples of train data
data.tail(5)

### 1.1 Data Cleaning

In [None]:
#check simple information like  columns names ,  columns datatypes and null values
data.info()

In [None]:
NewCols={"bp":"blood_pressure","sg":"specific_gravity", "al":"albumin","su":"sugar","rbc":"red_blood_cells","pc":"pus_cell",
         "pcc":"pus_cell_clumps","ba":"bacteria","bgr":"blood_glucose_random","bu":"blood_urea","sc":"serum_creatinine",
         "sod":"sodium","pot":"potassium","hemo":"haemoglobin","pcv":"packed_cell_volume","wc":"white_blood_cell_count",
          "rc":"red_blood_cell_count","htn":"hypertension","dm":"diabetes_mellitus","cad":"coronary_artery_disease",
          "appet":"appetite","pe":"pedal_edema","ane":"anemia"}

In [None]:
# Change columns of CKD data to new columns
data.rename(columns=NewCols, inplace=True)

In [None]:
#check summary of numerical data  such as count , mean , max , min  and standard deviation.
data.describe()

In [None]:
#check numbers of rows(samples) and columns(features)
data.shape

In [None]:
#check count of values for each features
data.count()

In [None]:
#Check total missing values in each feature
data.isnull().sum()

In [None]:
#visualization of null values in features
plt.subplots(figsize=(10, 7))
((data.isnull().sum())).sort_values(ascending=False).plot(kind='bar')

In [None]:
# Drop id column 
data.drop(["id"],axis=1,inplace=True) 

In [None]:
data[['red_blood_cells','pus_cell']] = data[['red_blood_cells','pus_cell']].replace(to_replace={'abnormal':1,'normal':0})
data[['pus_cell_clumps','bacteria']] = data[['pus_cell_clumps','bacteria']].replace(to_replace={'present':1,'notpresent':0})
data[['hypertension','diabetes_mellitus','coronary_artery_disease','pedal_edema','anemia']] = data[['hypertension','diabetes_mellitus','coronary_artery_disease','pedal_edema','anemia']].replace(to_replace={'yes':1,'no':0})
data[['appetite']] = data[['appetite']].replace(to_replace={'good':1,'poor':0,'no':np.nan})
data['coronary_artery_disease'] = data['coronary_artery_disease'].replace(to_replace='\tno',value=0)
data['diabetes_mellitus'] = data['diabetes_mellitus'].replace(to_replace={'\tno':0,'\tyes':1,' yes':1, '':np.nan})
data['classification'] = data['classification'].replace(to_replace={'ckd':1.0,'ckd\t':1.0,'notckd':0.0,'no':0.0})

In [None]:
data['pedal_edema'] = data['pedal_edema'].replace(to_replace='good',value=0) 
data['appetite'] = data['appetite'].replace(to_replace='no',value=0)
data['coronary_artery_disease']=data['coronary_artery_disease'].replace('yes',1)

In [None]:
##data=data.fillna(0)

### 1.2 Handling Missing Values

In [None]:
data['age']=data['age'].fillna(np.mean(data['age']))
data['blood_pressure']=data['blood_pressure'].fillna(np.mean(data['blood_pressure']))
data['albumin']=data['albumin'].fillna(np.mean(data['albumin']))


In [None]:
data['specific_gravity']=data['specific_gravity'].fillna(np.mean(data['specific_gravity']))
data['sugar']=data['sugar'].fillna(np.mean(data['sugar']))
data['blood_glucose_random']=data['blood_glucose_random'].fillna(np.mean(data['blood_glucose_random']))
data['blood_urea']=data['blood_urea'].fillna(np.mean(data['blood_urea']))
data['serum_creatinine']=data['serum_creatinine'].fillna(np.mean(data['serum_creatinine']))
data['haemoglobinhaemoglobin']=data['haemoglobin'].fillna(np.mean(data['haemoglobin']))
data['potassium']=data['potassium'].fillna(np.mean(data['potassium']))
data['sodium']=data['sodium'].fillna(np.mean(data['sodium']))


In [None]:
data = data.replace("\t?", np.nan)
data = data.replace(" ?", np.nan)
data = data.fillna(method='ffill')
data = data.fillna(method='backfill')

In [None]:
# Again ,Check missing values
data.isnull().sum()

### 1.3 Outliers

In [None]:
#check outliers
fig, ax = plt.subplots()
ax.scatter(x = data['specific_gravity'], y = data['classification'])
plt.ylabel('specific_gravity', fontsize=13)
plt.xlabel('classfication', fontsize=13)
plt.show()

In [None]:
fig, ax = plt.subplots()
ax.scatter(x = data['sugar'], y = data['classification'])
plt.ylabel('sugar', fontsize=13)
plt.xlabel('classfication', fontsize=13)
plt.show()

In [None]:
#check outliers
fig, ax = plt.subplots()
ax.scatter(x = data['blood_pressure'], y = data['classification'])
plt.ylabel('blood_pressure', fontsize=13)
plt.xlabel('classification', fontsize=13)
plt.show()

### 1.4 Visualization

In [None]:
numericalFeatures = data.select_dtypes(include=np.number)
categoricalFeatures = data.select_dtypes(include='object')


In [None]:
numericalFeatures

In [None]:
datacorrnumerical=numericalFeatures.corr()
datacorrcategorical=categoricalFeatures.corr()


In [None]:
sns.pairplot(numericalFeatures)

In [None]:
plt.subplots(figsize=(10, 10))
sns.heatmap(datacorrnumerical,annot=True)

In [None]:
plt.scatter(data['classification'],data['age'])
plt.xlabel('classification',fontsize=10)
plt.ylabel('age',fontsize=10)

In [None]:
plt.scatter(data['classification'],data['blood_pressure'])
plt.xlabel('classification',fontsize=10)
plt.ylabel('blood_pressure',fontsize=10)

In [None]:
plt.scatter(data['classification'],data['albumin'])
plt.xlabel('classification',fontsize=10)
plt.ylabel('albumin',fontsize=10)

In [None]:
sns.boxplot(x='hypertension', y='specific_gravity', data=data, palette='viridis')

In [None]:
sns.boxplot(x='hypertension', y='albumin', data=data, palette='viridis')

In [None]:
X = data.iloc[:, :-1]
y = data.iloc[:, 24]

In [None]:
X=X.drop('classification', axis=1)

In [None]:
X=pd.DataFrame(X)

### 1.6 Feature Selection

In [None]:
from sklearn.feature_selection import RFE
from sklearn.tree import DecisionTreeClassifier

In [None]:
model = DecisionTreeClassifier()
selector = RFE(estimator=model, n_features_to_select=14)

In [None]:
selector.fit(X, y)

In [None]:
selector.get_support(indices=True)

In [None]:
Features=X.columns

In [None]:
selected_features_idx = selector.get_support(indices=True)
selected_features_idx

In [None]:
selected_featuresDT = Features[selected_features_idx]
selected_featuresDT

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel

In [None]:
rfc = RandomForestClassifier(random_state=0, criterion='gini') 

In [None]:
selector = SelectFromModel(estimator=rfc)

In [None]:
selector.fit(X, y)

In [None]:
x=X[selected_featuresDT]

In [None]:
x.head()

### 1.6 Splitting Data

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)

### Train Data with Decision Tree, RF, XGBoost classifier, SVM, and KNN Models 

#### Decision Tree Classiffer

In [None]:
from sklearn.tree import DecisionTreeClassifier
modeldt = DecisionTreeClassifier()

In [None]:
modeldt.fit(x_train, y_train)

In [None]:
y_preddt = modeldt.predict(x_test)
y_preddt

In [None]:
CMDT=confusion_matrix(y_test,y_preddt)
CMDT

In [None]:
print(classification_report(y_test, y_preddt))

In [None]:
sns.set(font_scale=1.1)
sns.heatmap(CMDT, annot=True,fmt="g")
ax= plt.subplot()
plt.title("CM_CKD with DT")
#plt.tight_layout()
plt.ylabel(' True Label')
plt.xlabel(' Predicted Label ')
ax.xaxis.set_ticklabels(['NotCKD', 'CKD']); ax.yaxis.set_ticklabels(['NotCKD', 'CKD']);
plt.show()

In [None]:
############## DT ##########
accuracy = accuracy_score(y_test,y_preddt)
print('Accuracy: %f' % accuracy)
accuracy = balanced_accuracy_score(y_test,y_preddt)
print('Balanced_Accuracy: %f' % accuracy)
precision = precision_score(y_test,y_preddt)
print('Precision: %f' % precision)
recall = recall_score(y_test,y_preddt)
print('Recall: %f' % recall)
f1 = f1_score(y_test,y_preddt)
print('F1 score: %f' % f1)

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(x_train)

In [None]:
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

####  Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier
modelRF = RandomForestClassifier()

In [None]:
modelRF.fit(x_train, y_train)

In [None]:
y_predrf = modelRF.predict(x_test)
y_predrf

In [None]:
print(classification_report(y_test, y_predrf))

In [None]:
CMRF=confusion_matrix(y_test, y_predrf)
CMRF

In [None]:
sns.set(font_scale=1.1)
sns.heatmap(CMRF, annot=True,fmt="g")
ax= plt.subplot()
plt.title("CM_CKD with RF")
#plt.tight_layout()
plt.ylabel(' True Label')
plt.xlabel(' Predicted Label ')
ax.xaxis.set_ticklabels(['NotCKD', 'CKD']); ax.yaxis.set_ticklabels(['NotCKD', 'CKD']);
plt.show()

In [None]:
###########  RF #############
accuracy = accuracy_score(y_test,y_predrf)
print('Accuracy: %f' % accuracy)
accuracy = balanced_accuracy_score(y_test,y_predrf)
print('Balanced_Accuracy: %f' % accuracy)
precision = precision_score(y_test,y_predrf)
print('Precision: %f' % precision)
recall = recall_score(y_test,y_predrf)
print('Recall: %f' % recall)
f1 = f1_score(y_test,y_predrf)
print('F1 score: %f' % f1)

### Support Vector Machine (SVM)

In [None]:
from sklearn.svm import SVC
modelsvc = SVC(C=0.05)
modelsvc.fit(x_train, y_train)
y_predsvc = modelsvc.predict(x_test)

In [None]:
print(classification_report(y_test, y_predsvc))

In [None]:
CMsvm=confusion_matrix(y_test, y_predsvc)
CMsvm

In [None]:
sns.set(font_scale=1.1)
sns.heatmap(CMsvm, annot=True,fmt="g")
ax= plt.subplot()
plt.title("CM_CKD with SVM")
#plt.tight_layout()
plt.ylabel(' True Label')
plt.xlabel(' Predicted Label ')
ax.xaxis.set_ticklabels(['NotCKD', 'CKD']); ax.yaxis.set_ticklabels(['NotCKD', 'CKD']);
plt.show()

In [None]:
############## SVM ##########
accuracy = accuracy_score(y_test,y_predsvc)
print('Accuracy: %f' % accuracy)
accuracy = balanced_accuracy_score(y_test,y_predsvc)
print('Balanced_Accuracy: %f' % accuracy)
precision = precision_score(y_test,y_predsvc)
print('Precision: %f' % precision)
recall = recall_score(y_test,y_predsvc)
print('Recall: %f' % recall)
f1 = f1_score(y_test,y_predsvc)
print('F1 score: %f' % f1)

### K Nearest Neighbor (KNN)

In [None]:
from sklearn.neighbors import KNeighborsClassifier
modelknn = KNeighborsClassifier(n_neighbors=7)
modelknn.fit(x_train, y_train)
y_predknn = modelknn.predict(x_test)

In [None]:
print(classification_report(y_test, y_predknn))

In [None]:
CMknn=confusion_matrix(y_test, y_predknn)
CMknn

In [None]:
sns.set(font_scale=1.1)
sns.heatmap(CMknn, annot=True,fmt="g")
ax= plt.subplot()
plt.title("CM_CKD with KNN")
#plt.tight_layout()
plt.ylabel(' True Label')
plt.xlabel(' Predicted Label ')
ax.xaxis.set_ticklabels(['NotCKD', 'CKD']); ax.yaxis.set_ticklabels(['NotCKD', 'CKD']);
plt.show()

In [None]:
############# KNN ##############
accuracy = accuracy_score(y_test,y_predknn)
print('Accuracy: %f' % accuracy)
accuracy = balanced_accuracy_score(y_test,y_predknn)
print('Balanced_Accuracy: %f' % accuracy)
precision = precision_score(y_test,y_predknn)
print('Precision: %f' % precision)
recall = recall_score(y_test,y_predknn)
print('Recall: %f' % recall)
f1 = f1_score(y_test,y_predknn)
print('F1 score: %f' % f1)

In [None]:
from xgboost import XGBClassifier

modelxgb = XGBClassifier(n_estimators=100)
modelxgb.fit(x_train, y_train)
y_predxgb = modelxgb.predict(x_test)

In [None]:
CMxgb=confusion_matrix(y_test, y_predxgb)
CMxgb

In [None]:
sns.set(font_scale=1.1)
sns.heatmap(CMxgb, annot=True,fmt="g")
ax= plt.subplot()
plt.title("CM_CKD with XGB")
#plt.tight_layout()
plt.ylabel(' True Label')
plt.xlabel(' Predicted Label ')
ax.xaxis.set_ticklabels(['NotCKD', 'CKD']); ax.yaxis.set_ticklabels(['NotCKD', 'CKD']);
plt.show()

In [None]:
########### XGBoost ################
accuracy = accuracy_score(y_test,y_predxgb)
print('Accuracy: %f' % accuracy)
accuracy = balanced_accuracy_score(y_test,y_predxgb)
print('Balanced_Accuracy: %f' % accuracy)
precision = precision_score(y_test,y_predxgb)
print('Precision: %f' % precision)
recall = recall_score(y_test,y_predxgb)
print('Recall: %f' % recall)
f1 = f1_score(y_test,y_predxgb)
print('F1 score: %f' % f1)

### Save a Model

In [None]:
import joblib

In [None]:
joblib.dump(model,"FinalProject_model_RF.pkl")

In [None]:
joblib.dump(modelxgb,"FinalProject_model_xgb.pkl")

### Try a Model

In [None]:
predictions = [round(value) for value in y_predrf]

In [None]:
case=[1.02,0,0.0,0.0,0.0,148,11.3,38,6000,5.2,0,0,0.0,1]
case = np.array(case).reshape((1,-1))
res=modelRF.predict(case)[0]

In [None]:
str(res)

Many thank for  your visiting my notebook and your time.
 If my notebook is useful and helpful. Please, Upvote.