<a href="https://colab.research.google.com/github/shreyaasoba/Data-Science-Topics-Series/blob/main/Module5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Machine Learning Overview

Machine Learning (ML) is a subset of Artificial Intelligence that focuses on developing algorithms and statistical models to enable computers to improve performance on a specific task through experience. Below are key concepts and major topics in ML:

### Supervised Learning

Supervised learning involves training algorithms on labeled data. The main types of problems are:

- **Classification:** Predicting a categorical output (e.g., spam detection, image recognition).
- **Regression:** Predicting a continuous output (e.g., house price prediction, stock price forecasting).

Common algorithms:
- Linear Regression
- Logistic Regression
- Decision Trees
- Random Forests
- Support Vector Machines (SVM)
- Neural Networks

### Unsupervised Learning

Unsupervised learning deals with unlabeled data to uncover patterns or structures. Key tasks include:

- **Clustering:** Grouping similar data points (e.g., customer segmentation).
- **Dimensionality Reduction:** Reducing features while preserving information.

Popular algorithms:
- K-Means Clustering
- Hierarchical Clustering
- Principal Component Analysis (PCA)
- t-SNE (t-Distributed Stochastic Neighbor Embedding)

### Feature Engineering

Feature engineering improves model performance by creating or modifying features:
- One-hot encoding for categorical variables
- Scaling and normalization
- Binning and discretization
- Interaction and polynomial features

### Model Evaluation and Validation

Evaluating ML models involves techniques and metrics:
- **Cross-validation:** e.g., k-fold cross-validation.
- **Metrics for classification:** Accuracy, Precision, Recall, F1-score, ROC-AUC.
- **Metrics for regression:** Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared.

### Bias-Variance Tradeoff

Understanding the tradeoff between model complexity and generalization:
- **Bias:** Error due to oversimplified assumptions.
- **Variance:** Error due to sensitivity to training data.
- **Overfitting:** High variance, low bias.
- **Underfitting:** High bias, low variance.

### Regularization

Regularization prevents overfitting and improves generalization:
- L1 Regularization (Lasso)
- L2 Regularization (Ridge)
- Elastic Net (L1 + L2)
- Dropout (for neural networks)

### Ensemble Methods

Combining models for better performance:
- **Bagging:** Random Forests.
- **Boosting:** AdaBoost, Gradient Boosting.
- **Stacking:** Combining diverse models.

### Deep Learning

Deep learning uses neural networks with multiple layers for complex tasks:
- **Convolutional Neural Networks (CNNs):** Image processing.
- **Recurrent Neural Networks (RNNs):** Sequential data.
- **Transformers:** Natural language processing.

### Model Interpretability

Understanding model decisions:
- Feature importance
- SHAP (SHapley Additive exPlanations) values
- LIME (Local Interpretable Model-agnostic Explanations)

## Interview Questions and Answers

1. **What is the difference between supervised and unsupervised learning?**  
   Supervised learning uses labeled data to train models, while unsupervised learning works with unlabeled data to discover patterns or structures.

2. **Explain the bias-variance tradeoff.**  
   The bias-variance tradeoff balances model complexity and generalization. High bias leads to underfitting, while high variance leads to overfitting.

3. **What is feature engineering, and why is it important?**  
   Feature engineering creates or modifies features to improve model accuracy by capturing domain knowledge and reducing dimensionality.

4. **How do you handle missing data in a dataset?**  
   Approaches include:
   - Removing rows with missing values.
   - Imputing with mean, median, or mode.
   - Advanced imputation (e.g., KNN or regression).
   - Using algorithms that handle missing values (e.g., decision trees).

5. **What is the purpose of regularization in ML?**  
   Regularization prevents overfitting by adding a penalty term to the loss function, improving the model’s ability to generalize.

These questions serve as a foundation. Practical experience and a deep understanding of ML concepts are essential for success in interviews.


###Coding Task

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic data for regression
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2.5 * X + np.random.randn(100, 1) * 2

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train a Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions and evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print("Model Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print("Mean Squared Error:", mse)


Model Coefficients: [[2.40982601]]
Intercept: [0.25312854]
Mean Squared Error: 4.6834477008785065


###Practice

Try polynomial regression instead of linear regression.
Visualize the regression line along with the data points.