# Intro to KNN, Decision Trees, and Healthcare Challenges

## Goal of the Session
The purpose of this session is to introduce you to the basics of machine learning and how it applies to healthcare. By the end of this session, you should understand:
- What K-Nearest Neighbors (KNN) and Decision Trees are.
- How these algorithms can be used in healthcare (predicting disease, diagnosing conditions, etc.).
- Real-world challenges of using machine learning in healthcare.

Let's dive into each of these topics!

## 1. Introduction to K-Nearest Neighbors (KNN)
### 1.1 What is K-Nearest Neighbors?
Imagine you are trying to figure out what sport someone plays based on how similar they are to other people you know. For example, if three of your friends play soccer, and all have similar skills, you can assume a new friend with similar skills probably plays soccer too. This is the basic idea of KNN.

**KNN in One Sentence:** KNN is a way for computers to make predictions by finding examples that are most similar to what they are trying to predict.

In healthcare, KNN can be used to predict a patient’s condition based on how similar their symptoms are to those of other patients.

### 1.2 How KNN Works
1. The computer looks at all the examples it has seen before (these are called “neighbors”).
2. For each new data point, it finds the closest neighbors based on similarity (for example, age, blood pressure, cholesterol level).
3. It decides the most common outcome among these neighbors and uses that as the prediction.

### Example of KNN
Let's see a simple visualization to understand how KNN works.

In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.neighbors import KNeighborsClassifier

# Create some data points for demonstration
X, y = make_blobs(n_samples=100, centers=2, random_state=6)

# Plot the data points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', marker='o', edgecolor='k')
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("Data Points for KNN Example")
plt.show()

**Discussion:** What do you notice about the data? Which points are close to each other? What predictions could we make if a new point is added close to these groups?

### 1.3 Coding Exercise: KNN in a Healthcare Example
**Scenario:** We want to predict if a patient is at high risk for a disease based on two features: age and cholesterol level. We'll use KNN to look at similar patients and make predictions.

**Goal:** Understand how a new data point (new patient) would be classified based on similar patients' data.

In [None]:
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Patient data: [Age, Cholesterol Level]
X_health = np.array([[25, 180], [45, 200], [35, 190], [55, 220], [60, 230]])
y_health = np.array([0, 1, 0, 1, 1])  # 0 = low risk, 1 = high risk

# Define the KNN model
knn = KNeighborsClassifier(n_neighbors=3)

# Train the model
knn.fit(X_health, y_health)

# Predict for a new patient with age 40 and cholesterol level 210
new_patient = np.array([[40, 210]])
prediction = knn.predict(new_patient)

print(f"Prediction for new patient: {'High risk' if prediction[0] == 1 else 'Low risk'}")

The code above predicts if a new patient is high or low risk based on the closest three neighbors.

### 1.4 Additional Example of KNN
Let's try a second example where we use a larger set of patient data with additional features, such as blood pressure and BMI, to classify new patients.

In [None]:
# Extended patient data: [Age, Cholesterol Level, Blood Pressure, BMI]
X_health_extended = np.array([[25, 180, 120, 25], [45, 200, 140, 30], [35, 190, 135, 28], [55, 220, 150, 33], [60, 230, 160, 35]])
y_health_extended = np.array([0, 1, 0, 1, 1])  # 0 = low risk, 1 = high risk

# Train a new KNN model on this extended data
knn_extended = KNeighborsClassifier(n_neighbors=3)
knn_extended.fit(X_health_extended, y_health_extended)

# Predict for a new patient with age 50, cholesterol level 210, blood pressure 145, and BMI 32
new_patient_extended = np.array([[50, 210, 145, 32]])
prediction_extended = knn_extended.predict(new_patient_extended)

print(f"Prediction for new extended patient: {'High risk' if prediction_extended[0] == 1 else 'Low risk'}")

## 2. Introduction to Decision Trees
### 2.1 What is a Decision Tree?
Think of a Decision Tree like a game of 20 Questions: Each question you ask narrows down the possibilities until you guess the correct answer. Decision Trees work similarly by splitting data into categories at each “question” or “node.”

**Example in Healthcare:** A Decision Tree can be used to decide if a patient has a particular disease by asking questions like “Does the patient have a fever?” or “Is the patient’s cholesterol level high?”

In machine learning, each question or split is chosen to best separate data into categories.

In [None]:
from sklearn.tree import DecisionTreeClassifier

# Simple dataset: [Fever, Cough, Fatigue]
X_health = np.array([[1, 0, 1], [1, 1, 0], [0, 0, 1], [1, 1, 1], [0, 1, 0]])
y_health = np.array([1, 1, 0, 1, 0])  # 1 = Disease, 0 = No disease

# Define Decision Tree model
dt = DecisionTreeClassifier()

# Train the model
dt.fit(X_health, y_health)

# Test a new patient with symptoms: Fever (1), Cough (0), Fatigue (1)
new_patient = np.array([[1, 0, 1]])
prediction = dt.predict(new_patient)

print(f"Prediction for new patient: {'Disease' if prediction[0] == 1 else 'No disease'}")

### 3. Discussion: Challenges in Healthcare for Machine Learning
Using AI in healthcare can be beneficial, but it also has challenges. Here are some points for us to discuss:
1. Data privacy and security
2. Quality and availability of healthcare data
3. Bias in algorithms
4. Interpretability of models
5. Regulatory and ethical concerns
6. Data labeling challenges
7. Integrating AI with existing healthcare systems
8. Accuracy and reliability of models
9. Patient consent and trust
10. Costs and accessibility of AI in healthcare

Let's discuss each of these and think about solutions or improvements for the future!