# Decision Tree Learning

### Member: 
- [Febriawan Ghally Ar Rahman (1359111)](https://github.com/ghallyy)
- [Mgs. Tabrani (13519122)](https://github.com/mgstabrani)

### Content
1. [DecisionTreeClassifier](http://scikit-learn.org/stable/modules/tree.html)
2. [Id3Estimator](https://github.com/svaante/decision-tree-id3)
3. [K Means](https://scikit-learn.org/0.19/modules/generated/sklearn.cluster.KMeans.html)
4. [LogisticRegression](https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
5. [Neural_network](https://scikitlearn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html)
6. [SVM](https://scikitlearn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)

## Load Datasets

In [386]:
import pandas as pd
from sklearn import datasets

# Load breast cancer dataset
breast_cancer = datasets.load_breast_cancer()
X_breast_cancer, y_breast_cancer = datasets.load_breast_cancer(return_X_y=True)

# Load play tennis dataset
df_play_tennis = pd.read_csv('data/play_tennis.csv')
df_play_tennis = df_play_tennis.drop(['day'],axis=1)

## Encode Categorical Data

In [387]:
# Encode categorical data in play tennis dataframe
from sklearn import preprocessing
df_play_tennis = df_play_tennis.apply(preprocessing.LabelEncoder().fit_transform)

# Divide play tennis dataframe to data and target
dataset_play_tennis = df_play_tennis.to_numpy()
X_play_tennis = []
y_play_tennis = []
for i in range(len(dataset_play_tennis)):
    X_play_tennis.append(dataset_play_tennis[i][:-1])
    y_play_tennis.append(dataset_play_tennis[i][-1])

## Split Datasets

In [388]:
# Split dataset to 80% training data and 20% testing data
from sklearn.model_selection import train_test_split

# Split breast cancer dataset
X_training_breast_cancer, X_testing_breast_cancer = train_test_split(X_breast_cancer, test_size=0.2, random_state=25)
y_training_breast_cancer, y_testing_breast_cancer = train_test_split(y_breast_cancer, test_size=0.2, random_state=25)

# Split play tennis dataset
X_training_play_tennis, X_testing_play_tennis = train_test_split(X_play_tennis, test_size=0.2, random_state=25)
y_training_play_tennis, y_testing_play_tennis = train_test_split(y_play_tennis, test_size=0.2, random_state=25)

## Learning with Logistic Regression Algorithm

In [389]:
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

### Breast cancer

In [390]:
clf_breast_cancer = LogisticRegression(random_state=0, max_iter=10000).fit(X_training_breast_cancer, y_training_breast_cancer)
y_predict = clf_breast_cancer.predict(X_testing_breast_cancer)

#### Metrics Evaluation

In [391]:
print(metrics.classification_report(y_testing_breast_cancer, y_predict))

              precision    recall  f1-score   support

           0       0.90      0.90      0.90        39
           1       0.95      0.95      0.95        75

    accuracy                           0.93       114
   macro avg       0.92      0.92      0.92       114
weighted avg       0.93      0.93      0.93       114



### Play tennis

In [392]:
clf_play_tennis = LogisticRegression().fit(X_training_play_tennis, y_training_play_tennis)
y_predict = clf_play_tennis.predict(X_testing_play_tennis)


#### Metrics Evaluation

In [393]:
print(metrics.classification_report(y_testing_play_tennis, y_predict))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         2

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3



## Learning with Neural Network Algorithm

In [394]:
from sklearn.neural_network import MLPClassifier

### Breast cancer

In [395]:
clf_breast_cancer = MLPClassifier(random_state=1, max_iter=300).fit(X_training_breast_cancer, y_training_breast_cancer)
y_predict = clf_breast_cancer.predict(X_testing_breast_cancer)

#### Metrics Evaluation

In [396]:
print(metrics.classification_report(y_testing_breast_cancer, y_predict))

              precision    recall  f1-score   support

           0       0.97      0.90      0.93        39
           1       0.95      0.99      0.97        75

    accuracy                           0.96       114
   macro avg       0.96      0.94      0.95       114
weighted avg       0.96      0.96      0.96       114



### Play tennis

In [397]:
clf_play_tennis = MLPClassifier(random_state=1, max_iter=1000).fit(X_training_play_tennis, y_training_play_tennis)
y_predict = clf_play_tennis.predict(X_testing_play_tennis)

#### Metrics Evaluation

In [398]:
print(metrics.classification_report(y_testing_play_tennis, y_predict))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         2

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3



## Learning with SVM algorithm

In [399]:
from sklearn.svm import SVC

### Breast cancer

In [400]:
clf_play_tennis = SVC().fit(X_training_breast_cancer, y_training_breast_cancer)
y_predict = clf_play_tennis.predict(X_testing_breast_cancer)

#### Metrics Evaluation

In [401]:
print(metrics.classification_report(y_testing_breast_cancer, y_predict))

              precision    recall  f1-score   support

           0       0.97      0.77      0.86        39
           1       0.89      0.99      0.94        75

    accuracy                           0.91       114
   macro avg       0.93      0.88      0.90       114
weighted avg       0.92      0.91      0.91       114



### Play tennis

In [402]:
clf_play_tennis = SVC().fit(X_training_play_tennis, y_training_play_tennis)
y_predict = clf_play_tennis.predict(X_testing_play_tennis)

#### Metrics Evaluation

In [403]:
print(metrics.classification_report(y_testing_play_tennis, y_predict))

              precision    recall  f1-score   support

           0       0.50      1.00      0.67         1
           1       1.00      0.50      0.67         2

    accuracy                           0.67         3
   macro avg       0.75      0.75      0.67         3
weighted avg       0.83      0.67      0.67         3

