Go Data Science Tooling, Packages, Libraries, etc.
This is a curated list of well-maintained and developing tools, packages, libraries, etc. related to doing data science with Go.
- Distributed Data Analysis/Pipelining
- General data munging
- General purpose machine learning
- Matrices/Linear Algebra
- Neural Networks
- Non-SQL Database Interactions
- Recommendation Systems/Engines
- SQL-like Database Interactions
- Time Series
- Web Scraping
Also, this space includes a list of proposed packages that would fill certain gaps in the ecosystem or provide enhanced functionality.
- math - Stdlib math functions.
- math/cmplx - Package cmplx provides basic constants and mathematical functions for complex numbers.
- gonum.org/v1/gonum/floats - A set of helper routines for dealing with slices of float64.
- gonum.org/v1/gonum/optimize - This is an optimization package for the Go language.
- go-hep.org/x/hep/fit - a WIP package to provide easy fitting models and curve fitting functions.
- github.com/biogo - a bioinformatics library collection for Go
- github.com/ExaScience/elprep - a high-performance tool for preparing sequence alignment/map files in sequencing pipelines
- github.com/MG-RAST/AWE - a workload management system for bioinformatic workflow applications.
- github.com/kelvins/chronobiology - Go package that provides functions for the study of biological temporal rhythms.
- github.com/jbrukh/bayesian - Naive Bayes classification.
- github.com/datastream/libsvm - libsvm golang version derived work based on LIBSVM 3.14.
- github.com/barnjamin/randomforest - Random Forest classification
- github.com/rikonor/go-ann - Approximate k-Nearest Neighbor search.
- github.com/salkj/kmeans - A ready-to-use naive kmeans package for Go.
- github.com/mpraski/clusters - Go implementations of several clustering algoritms (k-means++, DBSCAN, OPTICS), as well as utilities for importing data and estimating optimal number of clusters.
- encoding/csv - Stdlib CSV functionality.
- go-hep.org/x/hep/csvutil - A set of types and funcs to deal with CSV data files in a somewhat convenient way.
- go-hep.org/x/hep/csvutil/csvdriver - A CSV library for
- github.com/dinedal/textql - Execute SQL against structured text like CSV or TSV.
- github.com/shuLhan/dsv - The Go library for working with delimited separated value (DSV).
- github.com/frictionlessdata/tableschema-go - Schema inference and table-based tooling (e.g., for working with CSV).
Distributed Data Analysis/Pipelining
- github.com/pachyderm/pachyderm - Distributed data pipelining and data versioning built on containers http://pachyderm.io.
- github.com/chrislusf/glow - Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc.
- github.com/chrislusf/gleam - Another go based distributed execution system.
- github.com/flowbase/flowbase - A Flow-based Programming inspired micro-framework for Go (Golang) http://flowbase.org.
- github.com/ExaScience/pargo - Provides functions and data structures for expressing parallel algorithms in Go
- github.com/scipipe/scipipe - A Scientific workflow system written in pure Go (Golang) inspired by Flow-based Programming http://scipipe.org.
- github.com/matryer/vice - Go channels at horizontal scale (powered by message queues).
- github.com/golang/geo - S2 geometry library in Go
- github.com/twpayne/go-geom - Efficient geometry types, encoding and decoding (GeoJSON, IGC, KML, WKB, WKT), and 2D geometry functions.
- github.com/twpayne/go-gpx - Read and write GPX documents
- github.com/twpayne/go-kml - Convenience methods for creating and writing KML documents
- github.com/twpayne/go-polyline - Google Maps Polyline encoding and decoding
General data munging
- github.com/elves/elvish - A shell (bash alternative) supporting working with pipelines of structured objects - not just text - and a more natural scripting syntax.
- github.com/kniren/gota - Dataframes.
- github.com/Shixzie/ly - A very flexible and easy to use pkg to work with DataFrames aimed at ML.
- github.com/gopherdata/gophernotes - Go kernel for Jupyter notebooks.
- github.com/kevinschoon/fit - Toolkit for exploring and manipulating datasets.
- github.com/shuLhan/tabula - a Go library for working with rows, columns, or matrix (table), or in another terms working with dataset.
- neugram.io - a programming language written in Go, designed for data munging.
General purpose machine learning
- github.com/chewxy/gorgonia - Provides the necessary primitives for creating and executing neural networks and machine learning algorithms.
- github.com/sjwhitworth/golearn - GoLearn is a 'batteries included' machine learning library for Go.
- github.com/cdipaolo/goml -
gomlis a machine learning library written entirely in Golang which lets the average developer include machine learning into their applications.
- github.com/xlvector/hector - Golang machine learning lib. Currently, it can be used to solve binary classification problems.
- github.com/shuLhan/go-mining - Small Golang library that contains classifiers (CART, Random Forest, Cascaded Random Forest, and KNN) and resampling (SMOTE and LN-SMOTE).
- github.com/galeone/tfgo - Tensorflow + Go, the gopher way.
- github.com/ctava/tfcgo - Bridging the gap between go and the Tensorflow c++ framework.
- github.com/pa-m/sklearn - (WIP) port of bits of sklearn to Go.
- github.com/dgraph-io/dgraph - Fast, Distributed Graph Database dgraph.io
- github.com/gyuho/goraph - Package goraph implements graph data structure and algorithms.
- gonum.org/v1/gonum/graph - This is a generalized graph package for the Go language.
- github.com/cayleygraph/cayley - Cayley is an open-source graph inspired by the graph database behind Freebase and Google's Knowledge Graph.
- encoding/json - Stdlib json functionality.
- github.com/tidwall/gjson - A Go package the provides a very fast and simple way to get a value from a json document.
- github.com/pquerna/ffjson - ffjson generates static MarshalJSON and UnmarshalJSON functions for structures in Go.
- gonum.org/v1/hdf5 - CGo bindings to
- github.com/sbinet/npyio - Read/Write access to
- github.com/sbinet/go-arff - Read/Write access to
- gonum.org/v1/gonum/lapack - A collection of packages to provide LAPACK functionality for the Go programming language.
- gonum.org/v1/gonum/blas - A collection of packages to provide BLAS functionality for the Go.
- gonum.org/v1/gonum/mat - This is a matrix package for the Go language.
- github.com/akualab/narray - A multidimensional array package optimized with Go assemby.
- github.com/tleyden/neurgo - Neural Network toolkit in Go.
- github.com/fxsjy/gonn - GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN.
- github.com/NOX73/go-neural - Neural network implementation on golang.
- github.com/milosgajdos83/gosom - Self-organizing maps in Go
- github.com/made2591/go-perceptron-go - A single level perceptron classifier.
- github.com/advancedlogic/go-freeling - This is a partial port of Freeling 3.1 (http://nlp.lsi.upc.edu/freeling/).
- github.com/endeveit/enca - This is a minimal cgo bindings for libenca.
- github.com/Lazin/go-ngram - N-gram index for Go.
- github.com/reiver/go-porterstemmer - A native Go clean room implementation of the Porter Stemming Algorithm.
- github.com/blevesearch/segment - A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29.
- github.com/cdipaolo/sentiment - Simple, Drop In Sentiment Analysis in Golang.
- https://github.com/kljensen/snowball - A Go (golang) implementation of the Snowball stemmer for natural language processing.
- github.com/sajari/word2vec - word2vec is a Go package which provides functions for querying word2vec models.
- github.com/jlubawy/go-gcnl - This package can be used to easily access the Google Cloud Natural Language API.
- github.com/olebedev/when - A natural language date/time parser with pluggable rules.
- github.com/kampsy/gwizo - Simple Go implementation of the Porter Stemmer algorithm with powerful features.
- github.com/Shixzie/nlp - General purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model.
- github.com/abadojack/whatlanggo - Natural language detection for Go.
- github.com/ynqa/word-embedding - Word Embeddings: the full implementation of word2vec, GloVe in Go
- https://github.com/jdkato/prose - prose is a library written in pure Go that supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.
Non-SQL Database Interactions
- gopkg.in/mgo.v2 - mgo (pronounced as mango) is a MongoDB driver for the Go language.
- github.com/gocql/gocql - A fast and robust Cassandra client for the Go programming language.
- github.com/go-redis/redis - Redis client for Golang.
- github.com/garyburd/redigo - Go client for Redis.
- github.com/tsuna/gohbase - Pure Go HBase client.
- github.com/colinmarc/hdfs - Pure Go HDFS client.
- github.com/xitongsys/parquet-go - Go version of Read/Write parquet file.
- gonum.org/v1/plot - An API for building and drawing plots.
- github.com/gigablah/dashing-go - A port of dashing for real-time dashboarding.
- github.com/mmcloughlin/globe - Globe wireframe visualizations.
- github.com/ajstarks/svgo - Go Language Library for SVG generation.
- gonum.org/v1/gonum/stat - Statistics package for Go.
- github.com/montanaflynn/stats - A statistics package with common functions that are missing from the Golang standard library.
- github.com/URXtech/planout-golang - (Multi Variate Testing) Interpreter for Planout code written in Go.
- github.com/peleteiro/bandit-server - Bandit-server is a Multi-Armed Bandit api server which needs no configuration neither persistent store.
- github.com/dgryski/go-topk - A "filtered space saving" streaming topk algorithm.
- github.com/dgryski/go-kll - An implementation of KLL sketch for "Almost Optimal Streaming Quantiles."
- github.com/dgryski/go-linlog - Linear-log bucketing and histograms.
- github.com/dgryski/go-rbo - Computes the rank-biased overlap for two sorted result sets.
- github.com/jbochi/facts - Matrix Factorization based recsys in Golang. Because facts are more important than ever.
- github.com/sajari/regression - Multivariable linear regression.
- github.com/glycerine/zettalm - Go code to build linear regression models on zettabytes of data.
SQL-like Database Interactions
- databases/sql - Package sql provides a generic interface around SQL (or SQL-like) databases.
- github.com/Boostport/avatica - Apache Phoenix/Avatica driver for Go's
- github.com/lib/pq - A pure Go postgres driver for Go's
- github.com/go-pg/pg - Fast PostgreSQL client and ORM.
- github.com/jackc/pgx - A pure Go PostgreSQL driver that offers performance gains and more features while remaining
- github.com/go-sql-driver/mysql - A MySQL-Driver for Go's
- github.com/mattn/go-sqlite3 - sqlite3 driver conforming to the built-in
- github.com/lukasmartinelli/pgclimb - Export data from PostgreSQL into different data formats (JSON, JSON Lines, CSV, XLSX, XML) or use a Golang templates
- github.com/lukasmartinelli/pgfutter - Import CSV and JSON into PostgreSQL the easy way
- github.com/omniscale/imposm3 - Import OpenStreetMap data into PostgreSQL/PostGIS database
- github.com/influxdata/influxdb - an open source time series database.
- github.com/dgryski/go-holtwinters - An implementation of Holt-Winters forecasting.
- github.com/dgryski/go-tsz - A time series compression algorithm from Facebook's Gorilla paper.
- github.com/dgryski/go-timewindow - Counters over sliding windows.
- github.com/yhat/scrape - A simple, higher level interface for Go web scraping.
- github.com/cathalgarvey/sqrape - Simple Query Scraping with CSS and Go Reflection.
- github.com/PuerkitoBio/goquery - Gives you easy access to the HTML structure of a page and enables you to pick which elements you want to access by attribute or content.
- github.com/anaskhan96/soup - soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.
- github.com/schollz/linkcrawler - Cross-platform persistent and distributed web crawler
- Multi-dimensional slices within Go itself (Proposal).
- A robust (and concurrent) package to handle minimizations/fits of data and histograms (gonum/optimize would provide a nice foundation for this).
- A robust (and concurrent) package to describe statistical models (Bayesian and frequentist) with many nuisance parameters, etc...
- A Go native package for A/B testing.
- A database with datalog querying. Inspiration can be drawn from Rich Hickey's Datomic database, but open source.
- A datalog query system for distributed computation. Similar to Cascalog for the Hadoop ecosystem, but integrating with some of the Go tools instead.