ai-dev

No numpy, no pandas, no sklearn. Only Hardcore.

Own AI/ML algorithms implementation from scratch and its testing.

The reason I do that: to learn internals of ml algorithms.

There are:

Regression: linear, non-linear, multi-variable.
Logistic Regression
K-Nearest Neighbors
K-Mean
Principal Component Analysis

To support data manipulation, I developed data_reader.py package instead of using pandas libs. Data is stored in table-like data-structure in the main memory. It's nice to work with data sets which fit memory.

Regression

Regression implementation.
About used data.
Results

Regression implementation

Regression was implemented with:

optimization algorithm: gradient descent
learning rate
regularization (penalty) L2 (Lasso)
for various number of iterations
logs writing for further debugging / plot

About data

Fuel Consumption vs CO2 EMISSIONS in /data

Results

Trained model shows as little as 3% error of prediction.

Workflow can be found here

https://kotsky.github.io/projects/ai_from_scratch/regression_workflow.html

Logistic regression

Logistic regression implementation.
About used data.
Results

Logistic regression implementation

With:

optimization algorithm: gradient descent
learning rate
regularization (penalty) L2 (Lasso)
for various number of iterations
logs writing for further debugging / plot
adjustable logistic coefficient for prediction threshold
evaluation with confusion matrix, precision and recall

About data

Loan data in /data

Results

Trained model has 71% accuracy that given people from a test data set will or not return loan taken before.

f1 score showed the best logistic threshold at 0.27, which we used to find out the best precision 74% and recall 95%.

Workflow can be found here

https://kotsky.github.io/projects/ai_from_scratch/logistic_regression_workflow.html

78% accuracy was achieved by using standard libs (sklearn). Its workflow can be found here

https://github.com/kotsky/ai-studies/blob/main/Projects/Project%20Loan/Loan%20Model.ipynb

K-Nearest Neighbors

KNN implementation.
About used data.
Results

KNN implementation

Normal KNN algorithm with optimized the nearest points storing/comparing based on a special k-size max heap data structure.

About data

Loan data in /data. This is done to compare results with LR model.

Results

We can say that k = 4 roughly is the best for our model, which gives 72% accuracy, 74% precision and 94% recall.

Its workflow can be found here:

https://kotsky.github.io/projects/ai_from_scratch/knn_workflow.html

K-Mean

K-Mean implementation.
About used data.
Results

K-Mean implementation

Normal K-Mean algorithm based on distance calculation between centroids and training points.

About data

It's all about customers of some store. So we can try to identify clusters of these customers.

Results

With a proper visualization and cost function analyse, we find our that the best K is 5 for customers' features Income vs Years of Employed.

Moreover, we could predict what cluster is the most suitable for a new customer.

Its workflow can be found here:

https://kotsky.github.io/projects/ai_from_scratch/kmean_workflow.html

Name	Name	Last commit message	Last commit date
Latest commit kotsky Regression link fix Apr 29, 2021 9222a98 · Apr 29, 2021 History 38 Commits
classification	classification	K-Mean: final upload	Mar 24, 2021
clusterization	clusterization	K-Mean: final upload	Mar 24, 2021
data	data	K-Mean: final upload	Mar 24, 2021
regression	regression	Restructuring	Mar 24, 2021
LICENSE	LICENSE	Update main	Mar 14, 2021
README.md	README.md	Regression link fix	Apr 29, 2021
data_reader.py	data_reader.py	K-Mean: final upload	Mar 24, 2021
helper_methods.py	helper_methods.py	Restructuring	Mar 24, 2021
kmean_workflow.ipynb	kmean_workflow.ipynb	K-Mean: final upload	Mar 24, 2021
knn_workflow.ipynb	knn_workflow.ipynb	K-Mean: final upload	Mar 24, 2021
logistic_regression_workflow.ipynb	logistic_regression_workflow.ipynb	KNN upload and results update	Mar 19, 2021
main.py	main.py	Workflow upload on site	Mar 26, 2021
regression_workflow.ipynb	regression_workflow.ipynb	Restructuring	Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-dev

Regression

Regression implementation

About data

Results

Logistic regression

Logistic regression implementation

About data

Results

K-Nearest Neighbors

KNN implementation

About data

Results

K-Mean

K-Mean implementation

About data

Results

About

Releases

Packages

Languages

License

kotsky/ai-dev

Folders and files

Latest commit

History

Repository files navigation

ai-dev

Regression

Regression implementation

About data

Results

Logistic regression

Logistic regression implementation

About data

Results

K-Nearest Neighbors

KNN implementation

About data

Results

K-Mean

K-Mean implementation

About data

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages