# Contents
- API Structure
- Built-in Datasets
- Algorithms

# API Structure
Scikit-learn follows a consistent and simple API design across all its modules, which revolves around three primary methods:

## 1. Estimator:
- Any object that can estimate model parameters.
- Examples: `LinearRegression`, `KMeans`.
- Methods:
    - `fit(X, y)`: Train the model on data `X` (features) and `y` (target).
## 2. Predictor:
- Any object capable of making predictions.
- Examples: `RandomForestClassifier`, `SVR`.
- Methods:
    - `predict(X)`: Predict target values for input `X`.
    - `score(X, y)`: Evaluate model performance.
## 3. Transformer:
- Objects used for data transformation.
- Examples: `StandardScaler`, `PCA`.
- Methods:
    - `fit(X)`: Learn transformation parameters from `X`.
    - `transform(X)`: Apply the learned transformation to `X`.
## 4. Pipeline:
- Combines preprocessing and modeling steps into a single workflow.
- Example: Scaling + Logistic Regression in one pipeline.

# Built-in Datasets
## Types
### 1. Small Toy Datasets:
- Preloaded datasets with small sizes, suitable for experimentation.
- Examples: Iris, Wine, Breast Cancer, Digits.
### 2. Real-world Datasets:
- Larger datasets requiring explicit download.
- Examples: California Housing, 20 Newsgroups, and OpenML datasets.
### 3. Synthetic Datasets:
- Programmatically generated datasets for testing.
- Examples: `make_classification`, `make_regression`.
## Exploring Built-in Datasets
Built-in datasets return a Bunch object, which acts like a dictionary. Key attributes include:

- `data`: Features (independent variables).
- `target`: Labels (dependent variable).
- `DESCR`: Description of the dataset.
- `feature_names`: Names of features (if available).
## Dataset Functions
### Preloaded Small Datasets
These datasets are loaded using specific functions and are small enough to load into memory.
- `load_iris()`: Iris flower dataset for classification.
- `load_wine()`: Wine classification dataset.
- `load_breast_cancer()`: Breast cancer classification dataset.
- `load_digits()`: Handwritten digits dataset for image classification.

In [1]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target 

### Downloadable Real-world Datasets
These datasets are larger and can be downloaded:
- `fetch_california_housing()`: California housing dataset for regression.
- `fetch_20newsgroups()`: Text dataset for classification.

In [2]:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
X = housing.data
y = housing.target

### Synthetic Dataset Generators
Generate synthetic datasets with controlled properties:
- `make_classification()`: Generate datasets for classification tasks.
- `make_regression()`: Generate datasets for regression tasks.

In [6]:
from sklearn.datasets import make_classification
X, y = make_classification(n_features=2, n_informative=1, n_redundant=0, n_classes=1)

# Supervised Learning Algorithms
- Regression
    - Linear Regression
    - Polynomial Regression
    - Ridge and Lasso Regression
    - Decision Tree Regression
    - Random Forest Regression
    - Support Vector Regression (SVR)
- Classification
    - Logistic Regression
    - k-Nearest Neighbors (KNN)
    - Support Vector Machines (SVM)
    - Decision Tree Classifier
    - Random Forest Classifier
    - Gradient Boosting Classifier (including XGBoost)
    - Naive Bayes
    - Neural Networks (Basic implementation using MLPClassifier)

# Unsupervised Learning Algorithms
- Clustering
    - k-Means Clustering
    - Hierarchical Clustering
    - DBSCAN
- Dimensionality Reduction
    - Principal Component Analysis (PCA)
    - t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Anomaly Detection
    - Isolation Forest
    - One-Class SVM