## Building Classifiers for Diabetes dataset using AdaBoost and XDABoost

The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.All patients here are females of at least 21 years old of Pima Indian heritage.

**Independent variables**
	
1. Pregnancies : number of times pregnant
2. Glucose : plasma glucose concentration 
3. BloodPressure : Diastolic blood pressure (mm Hg)
4. SkinThickness : Triceps skin fold thickness (mm)
4. Insulin : 2-Hour serum insulin (mu U/ml)
5. BMI : Body mass index 
6. DiabetesPedigreeFunction : Diabetes pedigree function
7. Age : age in years

**Target variable**

Outcome : categorical variable (0 or 1) 




In [1]:
import pandas
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier

In [2]:
dataframe = pandas.read_csv('pima-indians-diabetes.csv')
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 30

In [3]:
dataframe.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
kfold = model_selection.KFold(n_splits=10, random_state=seed, shuffle=True)
model = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.7552802460697198


In [5]:
results

array([0.76623377, 0.71428571, 0.71428571, 0.79220779, 0.79220779,
       0.74025974, 0.68831169, 0.77922078, 0.80263158, 0.76315789])

In [6]:
from xgboost import XGBClassifier
clf = XGBClassifier()

seed = 7
num_trees = 30
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = XGBClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())



0.768215994531784


In [7]:
results

array([0.7012987 , 0.85714286, 0.71428571, 0.63636364, 0.76623377,
       0.80519481, 0.83116883, 0.84415584, 0.72368421, 0.80263158])

There are only 2 classifiers.

You can use different classifiers and compare the results.
Also, check results after applying 10-fold Cross-validation.