## Building Classifiers for Diabetes dataset using AdaBoost and XDABoost

The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.All patients here are females of at least 21 years old of Pima Indian heritage.

**Independent variables**
	
1. Pregnancies : number of times pregnant
2. Glucose : plasma glucose concentration 
3. BloodPressure : Diastolic blood pressure (mm Hg)
4. SkinThickness : Triceps skin fold thickness (mm)
4. Insulin : 2-Hour serum insulin (mu U/ml)
5. BMI : Body mass index 
6. DiabetesPedigreeFunction : Diabetes pedigree function
7. Age : age in years

**Target variable**

Outcome : categorical variable (0 or 1) 




In [None]:
import pandas
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier
import warnings
warnings.filterwarnings('ignore')

In [None]:
dataframe = pandas.read_csv('pima-indians-diabetes.csv')
print(dataframe.head())
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
num_trees = 30

   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            6      148             72             35        0  33.6   
1            1       85             66             29        0  26.6   
2            8      183             64              0        0  23.3   
3            1       89             66             23       94  28.1   
4            0      137             40             35      168  43.1   

   DiabetesPedigreeFunction  Age  Outcome  
0                     0.627   50        1  
1                     0.351   31        0  
2                     0.672   32        1  
3                     0.167   21        0  
4                     2.288   33        1  


In [None]:
kfold = model_selection.KFold(n_splits=10)
model = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

0.760457963089542


In [1]:
from xgboost import XGBClassifier
clf = XGBClassifier()

seed = 7
num_trees = 30
kfold = model_selection.KFold(n_splits=10)
model = XGBClassifier(n_estimators=num_trees, random_state=seed, eval_metric='logloss', learning_rate=0.001)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())