<a href="https://colab.research.google.com/github/faisal-fida/100-Python-Projects-in-Google-Colab/blob/main/Machine_Learning_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Machine Learning**

Machine Learning is a subset of artificial intelligence that involves the development of algorithms and models that enable computers to learn from data without explicit programming. It focuses on pattern recognition and making predictions based on existing data.

## **Why is Machine Learning important?**
*   Extract valuable insights from large datasets
*   Make data-driven decisions
*   Automate tasks
*   Improve predictions in various domains like healthcare, finance, marketing, and more.

## **How does Machine Learning work?**
*   Learn from historical data through the process of training
*   Use this data to identify patterns and relationships in the input features and corresponding target labels
*   Once the model is trained, it can make predictions on new, unseen data.

## **Real-life examples of Machine Learning:**

*   Spam email filtering: Classifying emails as spam or non-spam based on their
content.
*   Image recognition: Identifying objects or people in images.
*   Autonomous vehicles: Teaching cars to navigate and make decisions based on their surroundings.
*   Medical diagnosis: Assisting doctors in diagnosing diseases based on patient data.

## **Example code (Python) for a simple classification model:**

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
housing = fetch_california_housing()
X, y = housing.data, housing.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Linear Regression Mean Squared Error:", mse)
print("Linear Regression R-squared:", r2)

Linear Regression Mean Squared Error: 0.5558915986952422
Linear Regression R-squared: 0.5757877060324524


# **Popular ML Algorithms (Decision Trees and K-Nearest Neighbors)**

## **Decision Trees:**

**What are Decision Trees?**
Decision Trees are tree-like models that use a series of binary decisions to classify data points. Each internal node represents a decision based on a specific feature, and each leaf node represents a class label.

**Why are Decision Trees popular?**
Decision Trees are popular due to their interpretability, ease of visualization, and ability to handle both classification and regression tasks.

**How do Decision Trees work?**
They recursively split the data based on the most informative features to create decision rules that segregate the data into classes.

**Real-life example:**
Credit risk assessment: Deciding whether to approve a loan based on various features of the applicant.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print("Decision Tree Accuracy:", accuracy)

Decision Tree Accuracy: 1.0


## **K-Nearest Neighbors (KNN):**

**What is K-Nearest Neighbors?**
K-Nearest Neighbors is a simple and intuitive algorithm that classifies data points based on the majority class of their 'k' nearest neighbors in the feature space.

**Why is KNN popular?**
KNN is popular for its simplicity, ease of implementation, and the fact that it can be used for both classification and regression tasks.

**How does KNN work?**
Given a new data point, the algorithm identifies the 'k' nearest data points in the training set based on a distance metric (e.g., Euclidean distance). It then assigns the class label by taking a majority vote among its neighbors.

**Real-life example:**
Product recommendation: Recommending products to customers based on the preferences of users with similar behavior.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print("K-Nearest Neighbors Accuracy:", accuracy)

K-Nearest Neighbors Accuracy: 1.0
