**Machine Learning**


**Machine Learning:**
Machine learning is a subset of artificial intelligence that involves the development of algorithms that allow computers to learn from and make predictions or decisions based on data. It enables systems to improve their performance on a specific task through experience without being explicitly programmed.

**Supervised learning vs Unsupervised learning**
 Supervised learning is a type of machine learning where the algorithm is trained on labeled data. This means that each training example is paired with an output label. The goal is to learn a mapping from inputs to outputs that can be used to predict the labels of new, unseen examples.Common tasks: classification and regression.
 Definition: Unsupervised learning is a type of machine learning where the algorithm is trained on unlabeled data. The goal is to find hidden patterns or intrinsic structures in the input data. Common Tasks: Clustering and Dimensionality

**validation set**
The purpose of a validation set is to provide an unbiased evaluation of a model fit during the training phase. It is used to tune hyperparameters and make decisions about the model architecture. By using a validation set, you can prevent overfitting to the training data and ensure that the model generalizes well to new, unseen data.

**Repeated cross-validation**
Repeated cross-validation is a technique where the cross-validation process is performed multiple times with different random splits of the data. For each split, the data is divided into different training and validation sets, and the model is trained and evaluated multiple times. The results are then averaged to provide a more robust estimate of the model's performance

**1. Creating a Validation set**

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training+validation set and test set
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Further split the training+validation set into training set and validation set
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.25, random_state=42)

print(f"Training set size: {len(X_train)}")
print(f"Validation set size: {len(X_val)}")
print(f"Test set size: {len(X_test)}")


Training set size: 90
Validation set size: 30
Test set size: 30


**Linear Regression Example**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate synthetic data
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear regression model
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

# Make predictions
y_pred = lin_reg.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

# Plot results
plt.scatter(X, y)
plt.plot(X_test, y_pred, color='red')
plt.show()


**Decision tree example**

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Decision Tree model
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
