# B2 - First Steps with Machine Learning

I am still learning how to code, so I am going to keep things simple and follow the theory about basic supervised learning. I will write down every step in English so I can practice explaining what I am doing.

## 1. Getting Ready

I start by importing the libraries that I saw in the notes: NumPy for numbers, pandas for tables, matplotlib for charts, and scikit-learn for the model tools.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

plt.style.use('seaborn-v0_8')

I feel better after importing everything. Using `plt.style.use` helps my charts look clearer without extra work.

## 2. Loading the Data

The theory says I should start with a known dataset. I use the Iris flowers dataset because it is small and friendly.

In [None]:
iris = load_iris()
features = iris['data']
feature_names = iris['feature_names']
target = iris['target']
target_names = iris['target_names']

df = pd.DataFrame(features, columns=feature_names)
df['species'] = target
df.head()

I check the first rows to be sure the table looks fine. Each row has measurements for a flower and the species as a number.

## 3. Quick Exploration

Because I am still practicing, I do a small exploration: the shape of the table, a statistical summary, and a simple chart.

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
plt.figure(figsize=(6, 4))
for label in np.unique(target):
    rows = df['species'] == label
    plt.scatter(df.loc[rows, feature_names[0]], df.loc[rows, feature_names[1]], label=target_names[label])
plt.xlabel(feature_names[0])
plt.ylabel(feature_names[1])
plt.title('First two features of the Iris dataset')
plt.legend()
plt.tight_layout()
plt.show()

The scatter plot lets me see that different species occupy slightly different areas. This matches the idea from the notes about patterns in the data.

## 4. Preparing the Model

The lesson suggests splitting the data to test the model. I separate the features from the label and then create training and testing sets.

In [None]:
X = df[feature_names].values
y = df['species'].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

X_train.shape, X_test.shape

Using `stratify=y` keeps the classes balanced, which was recommended in the material.

## 5. Training a Simple Model

I pick the k-nearest neighbors algorithm because it is easy to understand: the model looks at the closest points to decide the class.

In [None]:
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

`fit` teaches the model by storing the training samples.

## 6. Evaluating the Model

To follow the full cycle, I predict on the test set and measure accuracy.

In [None]:
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracy

The accuracy tells me how often the model guessed the right species.

## 7. Trying a New Sample

For a final check, I create a small example based on the average values to see how the model reacts.

In [None]:
sample = X.mean(axis=0)
predicted_species = target_names[knn.predict([sample])[0]]
predicted_species

I know this is a simple example, but it shows me the full workflow from the theory: gather data, explore it, train a model, and evaluate it.

## 8. Conclusions

- The Iris dataset is perfect for practicing the supervised learning pipeline.
- Basic exploration helps me understand the data before modeling.
- K-nearest neighbors is simple but works well for this problem.
- Evaluating with a test set keeps the process honest.

I feel more confident now, and I can revisit the theory to go deeper next time.