<a href="https://colab.research.google.com/github/guilhermelaviola/IntegrativePracticeInDataScience/blob/main/Class06.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Machine Learning (Learning for Scenario Anticipation)**
Machine Learning (ML) is a subset of Artificial Intelligence focused on developing systems capable of learning from data and experience without explicit programming. It includes Supervised Learning, which utilizes labeled data; Unsupervised Learning, which analyzes unlabeled data; and Reinforcement Learning, where an agent learns from interactions. Essential processes in ML are Predictive Analytics, employing algorithms like Linear Regression and Decision Trees, and Data-Driven Classification for categorizing items. Feature Engineering improves model effectiveness through data refinement. Python is the dominant programming language for ML, with support from platforms like Google Colab and libraries such as Pandas and scikit-learn. The ML workflow encompasses data preparation, model training, and performance evaluation through accuracy metrics.

In [1]:
# Importing all the necessary libraries and resources:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.cluster import KMeans

## **Data Load and Preparation**
These stages describe the initial process of preparing data for machine learning. First, the dataset is loaded and explored to understand its structure, size, and key characteristics. Then, feature engineering is performed to create or transform variables in ways that make patterns more meaningful for algorithms. Finally, the data is cleaned and prepared by handling missing values, removing inconsistencies, scaling features, and splitting it into training and testing sets to ensure it is ready for effective modeling.

In [2]:
# Generating a dataset with sample data:
data = pd.DataFrame({
    'feature1': np.random.rand(200),
    'feature2': np.random.rand(200),
    'target_class': np.random.choice([0, 1], 200),     # classification labels
    'target_value': np.random.randn(200) * 10 + 50     # regression target
})

In [3]:
# Creating a new feature based on old ones:
data['feature_sum'] = data['feature1'] + data['feature2']

# Select features and labels:
X = data[['feature1', 'feature2', 'feature_sum']]
y_class = data['target_class']
y_value = data['target_value']

In [4]:
# Scaling features:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

## **Example: Supervised Learning – Classification**
In classification tasks, the Python code trains a model to predict categorical outcomes based on labeled data. It separates input features from the target variable, fits a classification algorithm to the training data, and evaluates performance using metrics such as accuracy or precision. The objective is to correctly assign new observations to predefined categories.

In [5]:
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
    X_scaled, y_class, test_size=0.3, random_state=42
)

clf = DecisionTreeClassifier()
clf.fit(X_train_c, y_train_c)
pred_class = clf.predict(X_test_c)

print('Classification Accuracy:', accuracy_score(y_test_c, pred_class))

Classification Accuracy: 0.4


## **Example: Supervised Learning - Predictive Analysis (Regression)**
Regression focuses on predicting continuous numerical values. The code trains a regression model using labeled data, where the target variable represents a measurable quantity such as price or sales. After training, the model’s predictions are evaluated using error metrics like mean squared error or R² to measure how accurately it estimates numeric outcomes.

In [6]:
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(
    X_scaled, y_value, test_size=0.3, random_state=42
)

reg = LinearRegression()
reg.fit(X_train_r, y_train_r)
pred_value = reg.predict(X_test_r)

print('Regression MSE:', mean_squared_error(y_test_r, pred_value))

Regression MSE: 96.00503680862217


## **Example: Unsupervised Learning - Clustering**
Clustering is used when no labeled target variable is available. The Python code applies algorithms that group similar data points based on patterns in the features. The purpose is to uncover hidden structures or natural segments within the data, such as customer groups or behavioral patterns.

In [7]:
kmeans = KMeans(n_clusters=3, n_init='auto')
clusters = kmeans.fit_predict(X_scaled)

print('Cluster labels (first 10):', clusters[:10])

Cluster labels (first 10): [1 0 2 1 1 2 1 1 2 0]


## **Example: Reinforcement Learning with Q-Learning**
In Q-learning, the code trains an agent to make decisions through interaction with an environment. It defines states, actions, and rewards, and iteratively updates a Q-table based on the rewards received. Over time, the agent learns an optimal strategy that maximizes cumulative rewards, improving its decision-making through trial and error.

In [8]:
# Gridworld settings:
states = ['A', 'B', 'C', 'D']
actions = ['left', 'right']
rewards = {'A': -1, 'B': 0, 'C': 1, 'D': 5}  # goal is state D

# Q-table:
Q = {s: {a: 0.0 for a in actions} for s in states}

def next_state(state, action):
    order = ['A', 'B', 'C', 'D']
    idx = order.index(state)
    if action == 'right' and idx < 3:
        return order[idx + 1]
    elif action == 'left' and idx > 0:
        return order[idx - 1]
    return state

alpha = 0.3   # learning rate
gamma = 0.9   # discount

# Q-Learning episodes:
for _ in range(200):
    state = np.random.choice(states)
    for _ in range(10):
        action = np.random.choice(actions)
        s_next = next_state(state, action)
        reward = rewards[s_next]
        Q[state][action] += alpha * (reward + gamma * max(Q[s_next].values()) - Q[state][action])
        state = s_next

print('Q-table after training:')
for s in Q:
    print(s, Q[s])

Q-table after training:
A {'left': 36.22770473578018, 'right': 41.36855434691241}
B {'left': 36.228556788874926, 'right': 45.969145384854095}
C {'left': 41.37105810800532, 'right': 49.96660488769251}
D {'left': 45.96662732361072, 'right': 49.9721936736688}
