# Data Science & Machine Learning Basics

Rafiq Islam  
2024-09-20

<img src="/_assets/images/uc.jpeg" alt="Post under construction" width="400" height="400"/>

This page is my personal repository of most common and useful machine
learning algorithms using Python and other data science tricks and tips.

## **$\text{Data Science}$**

Data science involves extracting knowledge from structured and
unstructured data. It combines principle from statistics, machine
learning, data analysis, and domain knoledge to understand and interpret
the data

#### Data Collection & Accuisition

-   **<a href="../../dsandml/datacollection/index.qmd" target="_blank"
    style="text-decoration:none">Web srcaping</a>:** Data collection
    through Webscraping  
-   API integration  
-   Data Lakes, Data Warehouses

#### Data Cleaning & Preprocessing

-   Handling Missing Values  
-   Data Transformation  
-   Feature Engineering and Selection  
-   Encoding Categorical Variables  
-   Handling Outliers

#### Exploratory Data Analysis (EDA)

-   Descriptive Statistics  
-   Data Visualization  
-   Identifying Patterns, Trends, Correlations

#### Statistical Methods

-   **<a href="../../dsandml/dataengineering/index.qmd" target="_blank"
    style="text-decoration:none">ANOVA - Categorical Features’</a>:**
    How do we treat the categorical features for our data science
    project?
-   Hypothesis Testing  
-   Probability Distributions  
-   Inferential Statistics  
-   Sampling Methods

#### Big Data Techniques

-   Hadoop, Spark  
-   Distributed Data Storage (e.g., HDFS, NoSQL)
-   Data PipeLines, ETL (Extract, Transform, Load)

## **$\text{Machine Learning Algorithms}$**

### $\text{Supervised Learning}$

(Training with labeled data: input-output pairs)

#### **Regression**

##### Parametric

-   <a href="../../dsandml/simplelinreg/index.qmd" target="_blank"
    style="text-decoration:none">Simple Linear Regression</a>
-   <a href="../../dsandml/multiplelinreg/index.qmd" target="_blank"
    style="text-decoration:none">Multiple Linear Regression</a>
-   <a href="../../dsandml/polyreg/index.qmd" target="_blank"
    style="text-decoration:none">Polynomial Regression</a>

##### Non-Parametric

-   <a href="../../dsandml/knn/index.qmd" target="_blank"
    style="text-decoration:none">K-Nearest Neighbor (KNN) Regression</a>
-   <a href="../../dsandml/decisiontree/index.qmd" target="_blank"
    style="text-decoration:none">Decesion Trees Regression</a>
-   <a href="../../dsandml/randomforest/index.qmd" target="_blank"
    style="text-decoration:none">Random Forest Regression</a>
-   <a href="../../dsandml/svm/index.qmd" target="_blank"
    style="text-decoration:none">Support Vector Machine (SVM) Regression</a>

#### **Classification**

##### Parametric

-   <a href="../../dsandml/logreg/index.qmd" target="_blank"
    style="text-decoration:none">Logistic Regression</a>
-   <a href="../../dsandml/naivebayes/index.qmd" target="_blank"
    style="text-decoration:none">Naive Bayes</a>
-   <a href="../../dsandml/lda/index.qmd" target="_blank"
    style="text-decoration:none">Linear Discriminant Analysis (LDA)</a>  
-   Quadratic Discriminant Analysis (QDA)

##### Non-Parametric

-   <a href="../../dsandml/knn/index.qmd" target="_blank"
    style="text-decoration:none">KNN Classification</a>
-   <a href="../../dsandml/decisiontree/index.qmd" target="_blank"
    style="text-decoration:none">Decision Tree Classification</a>
-   <a href="../../dsandml/randomforest/index.qmd" target="_blank"
    style="text-decoration:none">Random Forest Classification</a>
-   <a href="../../dsandml/svm/index.qmd" target="_blank"
    style="text-decoration:none">Support Vector Machine (SVM)
    Classification</a>

##### Multi-Class Classification

-   <a href="../../dsandml/multiclass/index.qmd" target="_blank"
    style="text-decoration:none">Multi-class Classification</a>

##### Bayesian or Probabilistic Classification

-   <a href="../../dsandml/bayesianclassification/index.qmd" target="_blank"
    style="text-decoration:none">What is Bayesian or Probabilistic
    Classification?</a>  
-   <a href="../../dsandml/lda/index.qmd" target="_blank"
    style="text-decoration:none">Linear Discriminant Analysis (LDA)</a>  
-   Quadratic Discriminant Analysis (QDA)  
-   Naive Bayes
-   Bayesian Network Classifier (Tree Augmented Naive Bayes (TAN))

##### Non-probabilistic Classification

-   <a href="../../dsandml/svm/index.qmd" target="_blank"
    style="text-decoration:none">Support Vector Machine (SVM)
    Classification</a>  
-   <a href="../../dsandml/decisiontree/index.qmd" target="_blank"
    style="text-decoration:none">Decision Tree Classification</a>  
-   <a href="../../dsandml/randomforest/index.qmd" target="_blank"
    style="text-decoration:none">Random Forest Classification</a>  
-   <a href="../../dsandml/knn/index.qmd" target="_blank"
    style="text-decoration:none">KNN Classification</a>  
-   Perceptron

### $\text{Unsupervised Learning}$

(Training with unlabeled data)

##### Clustering

-   <a href="../../dsandml/kmeans/index.qmd" target="_blank"
    style="text-decoration:none">k-Means Clustering</a>  
-   Hierarchical Clustering  
-   DBSCAN (Density-Based Spatial Clustering)  
-   Gaussian Mixture Models (GMM)

##### Dimensionality Reduction

-   <a href="../../dsandml/pca/index.qmd" target="_blank"
    style="text-decoration:none">Principal Component Analysis</a>  
-   Latent Dirichlet Allocation (LDA)
-   t-SNE (t-distributed Stochastic Neihbor Embedding)  
-   Factor Analysis  
-   Autoencoders  

##### Anomaly Detection

-   Isolation Forests  
-   One-Class SVM

### $\text{Semi-Supervised Learning}$

(Combination of labeled and unlabeled data)

-   Self-training  
-   Co-training  
-   Label Propagation

### $\text{Reinforcement Learning}$

(Learning via rewards and penalties)

-   Markov Decision Process (MDP)  
-   Q-Learning  
-   Deep Q-Networks (DQN)  
-   Policy Gradient Method

## **$\text{Deep Learnings}$**

-   <a href="../../dsandml/pytorch/index.qmd" target="_blank"
    style="text-decoration:none">PyTorch</a>  
-   Artificial Neural Networks (ANN)  
-   Convolutional Neural Networks (CNN)  
-   Recurrent Neural Networks (RNN)  
-   Long Short-Term Memory (LSTM)  
-   Generative Adversarial Networks (GAN)

## **$\text{Model Evaluation and Fine Tuning}$**

#### Model Evaluation Metrics

-   **For Regression:** Mean Absolute Error (MAE), Mean Squared Error
    (MSE), Root Mean Squared Error (RMSE), $R^2$ score  
-   **For Classification:**
    <a href="../../dsandml/classificationmetrics/index.qmd" target="_blank"
    style="text-decoration:none">Accuracy, Precision, Recall, F1 Score,
    ROC-AUC</a>  
-   **Cross-validation:** kFold, Stratified k-fold, leave-one-out

#### Model Optimization

-   **Bias-Variance:**
    <a href="../../dsandml/biasvariance/index.qmd" target="_blank"
    style="text-decoration:none">Bias Variance Trade off</a>  
-   **Hyperparameter Tuning:** Grid Search, Random Search, Bayesian
    Optimization  
-   **Features Selection Techniques:** Recursive Feature Elimination
    (RFE),
    <a href="../../dsandml/regularization/index.qmd" target="_blank"
    style="text-decoration:none">L1 or Rasso Regurlarization</a>,
    <a href="../../dsandml/regularization/index.qmd" target="_blank"
    style="text-decoration:none">L2 or Ridge Regularization</a>  
-   **Model Interpretability:** SHAP (Shapley values), LIME (Local
    Interpretable Model-agnostic Explanations)

#### Ensemble Methods

-   **Bagging:**
    <a href="../../dsandml/randomforest/index.qmd" target="_blank"
    style="text-decoration:none">Random Forest</a>, Bootstrap
    Aggregating  
-   **Boosting:**
    <a href="../../dsandml/gradientboosting/index.qmd" target="_blank"
    style="text-decoration:none">Gradient Boosting</a>, AdaBoost,
    XGBoost, CatBoost  
-   **Stacking:** Stacked Generalization

<table data-quarto-postprocess="true">
<colgroup>
<col style="width: 33%" />
<col style="width: 33%" />
<col style="width: 33%" />
</colgroup>
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">Learning Type</th>
<th data-quarto-table-cell-role="th">Parametric</th>
<th data-quarto-table-cell-role="th">Non-Parametric</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">Supervised</td>
<td data-quarto-table-cell-role="th"><ul>
<li><a href="../../dsandml/simplelinreg/index.qmd"
style="text-decoration:none" target="_blank">Simple Linear
Regression</a></li>
<li><a href="../../dsandml/multiplelinreg/index.qmd"
style="text-decoration:none" target="_blank">Multiple Linear
Regression</a></li>
<li><a href="../../dsandml/polyreg/index.qmd"
style="text-decoration:none" target="_blank">Polynomial
Regression</a></li>
<li><a href="../../dsandml/logreg/index.qmd"
style="text-decoration:none" target="_blank">Logistic
Regression</a></li>
<li><a href="../../dsandml/naivebayes/index.qmd"
style="text-decoration:none" target="_blank">Naive Bayes</a></li>
</ul></td>
<td data-quarto-table-cell-role="th"><ul>
<li><a href="../../dsandml/knn/index.qmd" style="text-decoration:none"
target="_blank">KNN Regression and Classification</a></li>
<li><a href="../../dsandml/decisiontree/index.qmd"
style="text-decoration:none" target="_blank">Decision Trees</a></li>
<li><a href="../../dsandml/randomforest/index.qmd"
style="text-decoration:none" target="_blank">Random Forest</a></li>
<li><span style="text-decoration:none">Support Vector Machine
(SVM)</span></li>
</ul></td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">Unsupervised</td>
<td data-quarto-table-cell-role="th"><div class="cell raw">
<ul>
</div>
<div class="cell raw">
<li>
</div>
<a href="../../dsandml/pca/index.qmd" style="text-decoration:none"
target="_blank">Principle Component Analysis (PCA)</a>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Gaussian Mixture Model (GMM)</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Latent Dirichilet Allocation
(LDA)</span>
<div class="cell raw">
</li>
</div></td>
<td data-quarto-table-cell-role="th"><div class="cell raw">
<ul>
</div>
<div class="cell raw">
<li>
</div>
<a href="../../dsandml/kmeans/index.qmd" style="text-decoration:none"
target="_blank">K-Means</a>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Hierarchial Clustering</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Density-Based Spatial Clustering of
Applications with Noise (DBSCAN)</span>
<div class="cell raw">
</li>
</div></td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">Semi-Supervised</td>
<td data-quarto-table-cell-role="th">Self-training</td>
<td></td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">Reinforcement Learning</td>
<td data-quarto-table-cell-role="th"><div class="cell raw">
<ul>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Q-Learning</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">DQN</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Policy Gradient</span>
<div class="cell raw">
</li>
</div></td>
<td></td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">Dimensionality Reduction</td>
<td data-quarto-table-cell-role="th"><div class="cell raw">
<ul>
</div>
<div class="cell raw">
<li>
</div>
<a href="../../dsandml/pca/index.qmd" style="text-decoration:none"
target="_blank">Principle Component Analysis (PCA)</a>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Linear Discriminant Analysis
(LDA)</span>
<div class="cell raw">
</li>
</div></td>
<td data-quarto-table-cell-role="th"><div class="cell raw">
<ul>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">t-SNE</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Autoencoders</span>
<div class="cell raw">
</li>
</div></td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">Ensemble Methods</td>
<td data-quarto-table-cell-role="th"><div class="cell raw">
<ul>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Bagging</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<a href="../../dsandml/gradientboosting/index.qmd"
style="text-decoration:none" target="_blank">Gradient Boosting</a>
<div class="cell raw">
</li>
</div></td>
<td data-quarto-table-cell-role="th"><div class="cell raw">
<ul>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Stacking</span>
<div class="cell raw">
</li>
</div></td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">Deep Learning</td>
<td data-quarto-table-cell-role="th"><div class="cell raw">
<ul>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Artificial Neural Networks
(ANN)</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Convolutional Neural Networks
(CNN)</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Recurrent Neural Networks
(RNN)</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Long Short-Term Memory (LSTM)</span>
<div class="cell raw">
</li>
</div>
<div class="cell raw">
<li>
</div>
<span style="text-decoration:none">Generative Adversarial Networks
(GAN)</span>
<div class="cell raw">
</li>
</div></td>
<td></td>
</tr>
</tbody>
</table>

| Techniques | Description |
|----|----|
| <a href="../../dsandml/dataengineering/index.qmd"
style="text-decoration:none" target="_blank">Categorical Features</a> | How do we treat the categorical features for our data science project? |
| <a href="../../dsandml/datacollection/index.qmd"
style="text-decoration:none" target="_blank">Webscraping</a> | Data collection through Webscraping |
| <a href="../../dsandml/biasvariance/index.qmd"
style="text-decoration:none" target="_blank">Bias-Variance</a> | Model Fine Tuning: Bias-Variance Trade Off |
| <a href="../../dsandml/regularization/index.qmd"
style="text-decoration:none" target="_blank">Regularization</a> | Model Fine Tuning: Regularization |

------------------------------------------------------------------------

**You may also like**