Permalink
Switch branches/tags
Nothing to show
Find file Copy path
236 lines (182 sloc) 18.9 KB

Go Data Science Tooling, Packages, Libraries, etc.

This is a curated list of well-maintained and developing tools, packages, libraries, etc. related to doing data science with Go.

Also, this space includes a list of proposed packages that would fill certain gaps in the ecosystem or provide enhanced functionality.

Proposed

Arithmetic

Bioinformatics

Classification

Clustering

  • github.com/salkj/kmeans - A ready-to-use naive kmeans package for Go.
  • github.com/mpraski/clusters - Go implementations of several clustering algoritms (k-means++, DBSCAN, OPTICS), as well as utilities for importing data and estimating optimal number of clusters.

CSV

Distributed Data Analysis/Pipelining

Geospatial

General data munging

General purpose machine learning

Graphs

JSON

I/O

Matrices/Arrays/Linear Algebra

Neural Networks

NLP

Non-SQL Database Interactions

Parquet

Plotting/dashboarding

Probability/statistics/experiments

Recommendation Systems

  • github.com/jbochi/facts - Matrix Factorization based recsys in Golang. Because facts are more important than ever.

Regression

SQL-like Database Interactions

Time Series

Web Scraping

Proposed

  • Multi-dimensional slices within Go itself (Proposal).
  • A robust (and concurrent) package to handle minimizations/fits of data and histograms (gonum/optimize would provide a nice foundation for this).
  • A robust (and concurrent) package to describe statistical models (Bayesian and frequentist) with many nuisance parameters, etc...
  • A Go native package for A/B testing.
  • A database with datalog querying. Inspiration can be drawn from Rich Hickey's Datomic database, but open source.
  • A datalog query system for distributed computation. Similar to Cascalog for the Hadoop ecosystem, but integrating with some of the Go tools instead.