Machine Learning Startup File.ReadMe
Version
Date
Author
Notes
0.1
19 July 2023
Ken Dizon
Initial version
0.2
26 July 2023
Ken Dizon
Refined all ML categories
1.0
28 July 2023
Ken Dizon
Alpha phase
Objective: write a sample ML script for initial start up. ML gives computers the ability to learn without being explicitly programmed .
CONTENT
Libraries
Types of Machine Learning
Models & Math
Startup sample content
TYPES: Supervised vs Unsupervised
Supervised
Regression: predicts trends using labeled data
Classification: classifies labeled data
Unsupervised
Clustering: finds patterns and groupings from unlabeled data
Data > Feature Extraction > ML Model > Outcome
ML Techniques
Types
Case Prediction
Regression
Simple, Multiple
continuos numerical value
Classification
Binary, Multi-class
class label for unlabled test case
Clustering
Partitioned, Hierarchial, Density
grouping data points by similarity
Regression
Classification
Clustering
Numeric
prediction, estimation, forecasting
detection, retension, daignostic
segmentation, target market, recommender systems
Regression
Classification
Clustering
ordinal, poisson, linear, polynomial, lasso, bayesian, NNR, decision forest R, KNNR
DT, naive bayes, Linear discriminant analysis, KNN, logistic regression, NN, SVM
K-means, k-Median, Fuzzy c-Means, agglomerative, divisive, DBSCAN
Regression Model
Evaluation
Selection
Linear Regression
MAE, MSE, R2, RSS, Variance Score
Influence of Xs to predict a continuous numeric Y value
Decision Tree Regression [DTR]
mse, mae accuracy
...
Multiclass Prediction
...
...
Classification Model
Evaluation
Selection
DT
DT classification Accuracy
Categorical identification based on characteristics
KNN
mean acc, std acc
...
Logistic Regression
Jaccard index, Confusion matrix (F1-score), logloss
predict probability of categorical dependent variable
SVM
...
Image recognition, text category assignment, detecting spam, sentiment analysis, gene expression
Clustering Model
Evaluation
Selection
k-means
non-overlaps to minimize 'intra' & max 'inter' cluster distance
relatively efficient (med/large data)
...
...
trees of clusters
...
...
arbitrary shaped clusters
Machine Learning TYPE Model Name
Startup File
Load Data
Data Preprocessing
2.1 Cleaning
2.2 Missing Data
2.3 Scaling
2.4 Feature Engineering/Selection
Split - Test & Train
Model Selection
Model Training
Model Evaluation