Skip to content

Implementation of statistics algorithms for Machine Learning & Data Mining. The algorithms were implemented with the Scikit-Learn Library

Notifications You must be signed in to change notification settings

kochlisGit/Data-Science-Algorithms

Repository files navigation

Data Science & ML-Statistics

Implementation of statistics & data science algorithms for Machine Learning & Data Mining in python, using Numpy, Pandas & SKLearn. Data were extracted from MovieLens and Google Trends.

This repository includes algorithms and statistical methods for:

  1. Data Visualization:
  • Bar Plots
  • Histogram Plot
  • LogLog Plot
  • Line Plot
  • Pie Plot
  1. Distribution Visualization
  • Gaussian Distribution Plot
  • Power Law Plot
  • QQ Plot
  1. Correlation Analysis
  • Covariance
  • Pearson Correlation
  • Spearman Correlation
  • Fisher-Z Transformation
  • Kendall Correlation
  • Weighted Kendall
  • Cosine Similarit
  1. Anomaly Detection (Outlier detection & removal)
  • Isolation Forest
  • Local Outlier Factor
  • Elliptic Envelope
  • DBSCAN
  • PCA + DBSCAN
  1. Data Scaling
  • Min-Max
  • Max-Abs
  • Z-Score
  • Robust-Scaling
  1. Dimensionality Reduction (Image Compression Example)