# Exercise 03 Classification Models

In this exercise, you need to follow the requirements of each question to generate the Python code, and the following example is for reference：

- Sample Question: Write a program that takes the user's name as input and prints "Hello, [name]!" where [name] is the user's input.

- Potential Answer:

```python
    name = input("Enter your name: ")
    print("Hello, " + name + "!")
```
- If you enter 'David', the code will output 'Hello, David!', and this will satisfy the requirements.

## Attention
- Generally, there will be multiple answers for one question and you don't have to strictly follow the instructions in the tutorial, as long as you can make the output of the code meet the requirements of the question.
- If possible, strive to make your code concise and avoid excessive reliance on less commonly used libraries.
- You may need to search for information on the Internet to complete the excercise.
- Please answer the questions in order.

## Question 01: The following code can generate a dataset and visualize them:

```python
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt

    np.random.seed(0)
    n_samples = 100
    overlap = 0.5  

    class_0_samples = int(n_samples * (1 - overlap))
    class_0 = np.random.normal(loc=[0.3, 0.7], scale=[0.15, 0.15], size=(class_0_samples, 2))
    labels_0 = np.zeros(class_0_samples)

    class_1_samples = int(n_samples * overlap)
    class_1 = np.random.normal(loc=[0.7, 0.3], scale=[0.15, 0.15], size=(class_1_samples, 2))
    labels_1 = np.ones(class_1_samples)

    data = np.vstack((class_0, class_1))
    labels = np.hstack((labels_0, labels_1))

    df = pd.DataFrame(data, columns=['Feature1', 'Feature2'])
    df['Label'] = labels

    plt.figure(figsize=(8, 6))
    plt.scatter(df[df['Label']==0]['Feature1'], df[df['Label']==0]['Feature2'], marker='*', label='Label 0')
    plt.scatter(df[df['Label']==1]['Feature1'], df[df['Label']==1]['Feature2'], marker='*', label='Label 1')
    plt.title('Visualization of Data with Two Features')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.legend()
    plt.show()   
```

### Requirements

- First, copy all the code above in the following code frame to generate the data and visualize them.
- Divide the original dataset into train dataset (0.8) and test dataset (0.2), you can use sklearn to do this or write the code yourself.
- Use logistic regression to fit the dataset and value your model with accuracy_score and plot the ROC curve.
- Prepare a `dict` to record the prediction accuracy of different models.

### Write your answer in the following code frame

## Question 02: 

As for the dataset generated (the same train data and test data) in question 01, change the classification model from linear regression to decision tree, then plot the tree, and evaluate the decision tree model with accuracy and ROC cureve. Record the prediction accuracy on the test set.

## Write your answer in the following code frame

Based on the prediction result on test set you get before, try to calculate precision, recall and f1-score of each class. 

Compare the prediction accuracy of different models in a pandas DataFrame.

### Write your answer in the following code frame

## Question 03: The following code can generate a dataset and visualize them:

```python
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split

    X, y = make_classification(n_samples=50, n_features=2, n_informative=2, n_redundant=0,
                            n_clusters_per_class=1, random_state=42, class_sep=2.0)

    data = pd.DataFrame(data=X, columns=['Feature 1', 'Feature 2'])
    data['Target'] = y

    plt.scatter(data['Feature 1'], data['Feature 2'], c=data['Target'], cmap=plt.cm.Paired)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('Linearly Separable SVM Classification Dataset')
    plt.show()
```

### Requirements

- First, copy all the code above in the following code frame to generate the data and visualize them.
- Divide the original dataset into train dataset (0.8) and test dataset (0.2), you can use sklearn to do this or write the code yourself.
- Use Support Vector Machine (SVM) model to fit the dataset and value your model with accuracy_score and plot the ROC curve.
- Plot the hyperplane of your SVM model (refer to the code in the tutorial).

## Write your answer in the following code frame:

## Question 04: As for the dataset generated (the same train data and test data) in question 03, change the classification model from SVM to AdaBoost, then evaluate the AdaBoost model with accuracy and ROC cureve.