# Electrocardiograms

👇 Import the `electrocardiograms.csv` dataset located in the data folder and display its first 5 rows

Each obervation of the dataset is a numerically represented heartbeat, taken from a patient's electrocardiogram (ECG). The target is binary and defines whether the heartbeat is at risk of cardiovascular disease [1] or not [0]. 


The **task** is to build a model that can **flag at-risk observations**.

## Data Exploration

👇 Visualise an observation of each class to get an idea of what the numbers represent

👇 How many observations of each classes are there?

##  Logistic Regression

👇 Cross-validate a `LogisticRegression` model and return the following metrics:
- Accuracy
- Recall
- Precision
- F1

❓ What is the model's accuracy score?

❓ What percentage of at-risk heartbeats is the model able to flag?

❓ When the model signals an at-risk heartbeat, how often is it correct?

❓ What is the model's ability to flag as many at-risk heartbeats as possible while limiting false alarms?

## KNN Classifier

👇 Cross-validate a `KNNClassifier`  model and return the following metrics:
- Accuracy
- Recall
- Precision
- F1

❓ What is the model's ability to correctly predict at-risk heartbeats?

❓ What is the model's precision/recall tradeoff score?

❓ What is the model's percentage of correct predictions?

❓ What percentage of the at risk hearbeats is the model able to detect?

## Model Selection

❓ Considering your **task** is to **flag at-risk observations** while **limiting false alarms**, which model would you pick?

<details>
<summary>Answer</summary>

You surely have noticed by now that the KNN model is best suited for the task. You also should have noticed that a high accuracy does not necessarily mean a highly performing model. Knowing which metric to observe is key and specific to each task and dataset.

</details>



## Confusion Matrix

👇 Using `plot_confusion_matrix`,  visualize the confusion matrix of the KNN model.

[`plot_confusion_matrix` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html)

<details>
<summary>💡 Hints</summary>

- `plot_confusion_matrix` takes a trained model as input
    
- You need to go back to the Holdout method
    
- Make sure you generalize
    
- Look into the `normalize` parameter
  
</details>



❓ How **many** false alarms does the model produce?

<details>
<summary>Answer</summary>
 
The answer is the count of 0's predicted as 1's.
    
</details>


❓ What **percentage** of potentially at risk heartbeats does the model miss out on?

<details>
<summary>💡 Hint</summary>

- Look into the `normalize` parameter 😉
  
</details>



<details>
<summary>Answer</summary>
 
The answer is the 1's predicted as 0's as a percentage.
    
</details>


## Prediction

👇 A patient comes to you for a second opinion on what he was told may be an at risk heartbeat. Use your model to get some insight.

In [None]:
new_data = pd.read_csv('data/new_data.csv')

new_data

⚠️ Please, push your exercice once completed 🙃

# 🏁