<a href="https://colab.research.google.com/github/pydevcasts/MLHub/blob/master/detecting_parkinson_s_disease_with_xgboost.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

`XGBClassifier`

 یک الگوریتم یادگیری ماشین است که بخشی از کتابخانه **XGBoost** (Extreme Gradient Boosting) می‌باشد. این الگوریتم برای مسائل طبقه‌بندی (classification) طراحی شده است و به دلیل کارایی بالا و سرعت آن در رقابت‌های یادگیری ماشین بسیار محبوب است. در ادامه، ویژگی‌ها و جزئیات بیشتری درباره این الگوریتم آورده شده است:

### 1. **ویژگی‌های اصلی XGBClassifier**
- **بنیاد بر درختان تصمیم**: XGBoost از درختان تصمیم به عنوان پایه‌ای برای ساخت مدل‌های پیش‌بینی استفاده می‌کند و به طور خاص از تکنیک **Boosting** بهره می‌برد.
  
- **Gradient Boosting**: این الگوریتم به جای ایجاد درختان به صورت مستقل، درختان جدید را بر اساس خطاهای درختان قبلی می‌سازد. این فرآیند به تدریج مدل را بهبود می‌بخشد.

- **Regularization**: XGBoost دارای تکنیک‌های منظم‌سازی (regularization) است که می‌تواند به جلوگیری از بیش‌برازش (overfitting) کمک کند.

- **سرعت و کارایی**: XGBoost بهینه‌سازی‌های زیادی برای سرعت و کارایی دارد، از جمله استفاده از پردازش موازی و بهینه‌سازی حافظه.

### 2. **کاربردها**
- **طبقه‌بندی**: برای مسائل طبقه‌بندی باینری و چندکلاسه.
- **رتبه‌بندی**: در مسائل رتبه‌بندی (ranking) مانند موتورهای جستجو.
- **پیش‌بینی**: در پیش‌بینی‌های عددی و مسائل رگرسیون.

### 3. **نحوه استفاده**
برای استفاده از `XGBClassifier`، ابتدا باید کتابخانه XGBoost را نصب کنید. سپس می‌توانید مدل را با داده‌های خود آموزش دهید


### 4. **مزایا**
- **دقت بالا**: معمولاً دقت بالایی در مقایسه با سایر الگوریتم‌های یادگیری ماشین دارد.
- **قابلیت تنظیم**: دارای پارامترهای قابل تنظیم زیادی است که می‌توانند به بهینه‌سازی مدل کمک کنند.
- **مناسب برای داده‌های بزرگ**: به خوبی با داده‌های بزرگ و پیچیده کار می‌کند.

### 5. **معایب**
- **پیچیدگی**: ممکن است تنظیم پارامترها برای مبتدیان دشوار باشد.
- **نیاز به تنظیم دقیق**: برای دستیابی به بهترین نتایج، ممکن است نیاز به تنظیم دقیق پارامترها داشته باشد.

اگر سوال دیگری دارید یا نیاز به توضیحات بیشتری هست، خوشحال می‌شوم کمک کنم!

In [None]:
import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


In [None]:
#DataFlair - Read the data
df=pd.read_csv('/content/sample_data/parkinsons.data')
df.head()

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,...,0.0827,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.1047,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335


In [None]:
#DataFlair - Get the features and labels
features=df.loc[:,df.columns!='status'].values[:,1:]
labels=df.loc[:,'status'].values
labels.shape

(195,)

In [None]:
#DataFlair - Get the count of each label (0 and 1) in labels
print(labels[labels==1].shape, labels[labels==0].shape)

(147,) (48,)


# Normalize

In [None]:
#DataFlair - Scale the features to between -1 and 1
scaler=MinMaxScaler((-1,1))
x=scaler.fit_transform(features)
y=labels

# split the dataset

In [None]:
#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=train_test_split(x, y, test_size=0.2, random_state=7)

# XGBClassifier

In [None]:
#DataFlair - Train the model
model=XGBClassifier()
model.fit(x_train,y_train)

In [None]:
# DataFlair - Calculate the accuracy
y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)

94.87179487179486


In [None]:
new_row = {
    'name': 'phon_R01_S01_6',
    'MDVP:Fo(Hz)': 120.000,
    'MDVP:Fhi(Hz)': 150.000,
    'MDVP:Flo(Hz)': 100.000,
    'MDVP:Jitter(%)': 0.00800,
    # سایر ویژگی‌ها...
    # 'status': 1
}

In [None]:
new_row_df = pd.DataFrame([new_row])
data = pd.concat([df, new_row_df], ignore_index=True)
data

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,0.00007,0.00370,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.400,148.650,113.819,0.00968,0.00008,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.335590,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.01050,0.00009,0.00544,0.00781,0.01633,0.05233,...,0.08270,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,0.00009,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.10470,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.332180,0.410335
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
191,phon_R01_S50_3,209.516,253.017,89.488,0.00564,0.00003,0.00331,0.00292,0.00994,0.02751,...,0.04812,0.01810,19.147,0,0.431674,0.683244,-6.195325,0.129303,2.784312,0.168895
192,phon_R01_S50_4,174.688,240.005,74.287,0.01360,0.00008,0.00624,0.00564,0.01873,0.02308,...,0.03804,0.10715,17.883,0,0.407567,0.655683,-6.787197,0.158453,2.679772,0.131728
193,phon_R01_S50_5,198.764,396.961,74.904,0.00740,0.00004,0.00370,0.00390,0.01109,0.02296,...,0.03794,0.07223,19.020,0,0.451221,0.643956,-6.744577,0.207454,2.138608,0.123306
194,phon_R01_S50_6,214.289,260.277,77.973,0.00567,0.00003,0.00295,0.00317,0.00885,0.01884,...,0.03078,0.04398,21.209,0,0.462803,0.664357,-5.724056,0.190667,2.555477,0.148569


In [None]:
X = data.drop(columns=['name', 'status'])  # حذف ستون‌های غیرضروری

# پیش‌بینی بر روی داده‌های جدید
new_data = X.iloc[-1:]  # ردیف جدید را به عنوان داده جدید انتخاب کنید
new_prediction = model.predict(new_data)

# نمایش پیش‌بینی
print(f'Prediction for new data: {new_prediction[0]}')

Prediction for new data: 1
