Skip to content

ritwikbera/MLrandom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Playground

Random ML experiments that illustrate some common concepts.

Algorithms

  • Graham Scan: An stack-based algorithm that generates a convex hull around a set of points. An SVM is essentially a midplane dividing the line joining the convex hulls belonging to the two different classes. Graham Scan can thus be used instead of a gradient-based optimization algorithm.

  • Isomap: An implementation of the Isomap visualization algorithm that operates on geodesic distances. Uses an sklearn utility function (based on Djikstra's + Priority Queue)to compute pairwise shortest distances from a given graph's adjacency-cum-distance matrix.

  • Pairwise Ranking: A simple implementation of Learning to Rank using pairwise ranking trained on a SVM and a ranked list obtained through BubbleSort (a comparison-based sort). The comparator functions of the items is overriden by the trained SVM.

Data Structures

  • Count-Min Sketch: A probablistic data structure used in Big Data domain applications where a lot of unique data is hashed to a lower dimensional space by multiple hash functions. Used in ML systems in applications like checking set membership etc. Its like a Bloom Filter that can count.

  • Sparsity: A test script that shows the memory effeciency of sparse matrices stored in compressed formats.

Metrics

  • Wasserstein Distance: A symmetric (as opposed to KL divergence) metric to compute distance between two distributions where the distributions have different probability space dimensionalities. Uses an iterative optimiation step described by Michiel Stock. Used a lot in Computer Vision and Deep Learning (Wasserstein GAN).

  • L2 Regularization: A fast and memory-efficient vectorized implementation of L2 regularization that run on Numpy. Numpy has a fast C++ backend and for the same reason also bypasses the Python GIL. Thus, it is much faster than a looped version.

ML Models

  • Multi-Armed Bandits: A test script that illustrates multi-armed bandits with updates based on the Thomson Sampling procedure.

  • GMM: Vectorized implementation of the Expectation-Maximization algorithm tested on simple Gaussian Mixture Models.

About

Random conceptual ML experiments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages