Skip to content

💥 A curated list of data science, analysis and visualization tools

Notifications You must be signed in to change notification settings

jsfs2019/awesome-data-science-viz

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Data Science & Visualization

A curated list of data science, analysis and visualization tools with emphasis on python, d3 and web applications.

Contents

Machine Learning

Resources

Libraries

  • Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
  • TensorFlow library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
  • Keras Deep Learning library for Theano and TensorFlow
  • Caffe deep learning framework made with expression, speed, and modularity in mind. Written in C++ and has python bindings.
  • Torch provides several tools for fast tensor mathematics, storage interfaces and machine learning models. Written in C with Lua interface.
  • Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. Writtent in C++ with bindings for python and other languages.
  • Scikit Learn is a Python module for machine learning built on top of SciPy
  • CNTK computational network toolkit. A C++ library by Microsoft Research.
  • OpenNN a neural network C++ library
  • XGboost an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. Written in C++ with python integration.
  • Gym A toolkit for developing and comparing reinforcement learning algorithms. Written in Python.
  • Tpot is a python tool that automatically creates and optimizes machine learning pipelines using genetic programming.
  • TFLearn is a deep learning library featuring a higher-level API for TensorFlow.

Examples

NLP

Analysis

  • Natural Language Toolkit (NLTK) is a suite of python modules, data sets and tutorials supporting research and development in NLP. Some of its modules are out of date but still a useful resource nonetheless.
  • SpaCy is a powerful, production ready, NLP library for python
  • fastText a C++ library for sentence classification
  • TextBlob is a python library for processing textual data. It provides a simple API for diving into common NLP tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
  • simhash a python implementation of Simhash Algorithm for detecting near-duplicate web documents
  • langdetect is a port of Google's language-detection library to Python.

Tools

  • inflect.py Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

Images

  • tesseract-ocr well tested OCR engine written in C++
  • OpenCV computer vision and machine learning software library. The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc. Written in C++ with bindins for most languages including python.
  • SimpleCV is a framework for machine vision, using OpenCV and Python. It provides a concise, readable interface for cameras, image manipulation, feature extraction, and format conversion.
  • match makes it easy to search for images that look similar to each other
  • Noteshrink Convert scans of handwritten notes to beautiful, compact PDFs
  • srez Image super-resolution through deep learning

Data

Sources

Aggregators

  • pyspider a web crawler system in python.
  • Newspaper News, full-text, and article metadata extraction in Python 3.

Explore

  • Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser.

Storage

  • pytables a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data. It is built on top of the HDF5 library and the NumPy package.

Visualization

Resources

Libraries

  • dc.js Multi-Dimensional charting built to work natively with crossfilter rendered with d3.js
  • Chart.js HTML5 Charts using the tag

Languages

Python

  • Awesome Python A curated list of awesome Python frameworks, libraries, software and resources.
  • Interactive coding challenges which focus on algorithms and data structures that are typically found in coding interviews

JavaScript

License

CC0

To the extent possible under law, Quantmind has waived all copyright and related or neighboring rights to this work.

About

💥 A curated list of data science, analysis and visualization tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published