Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
507 lines (384 sloc) 40.6 KB

Python

Table of Contents

Tutorials

Comparison to R

Coding

Functional Programming

Concurrence Programming

Data science

IDE

Performance

packages 1

packages 2

  • huey - a little multi-threaded task queue for python
  • python-gearman
  • Dask - provides multi-core execution on larger-than-memory datasets using blocked algorithms and task scheduling.
  • Ruffus: a lightweight Python library for computational pipelines
  • Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
  • pyflow - A lightweight parallel task engine http://Illumina.github.io/pyflow/

packages 3

tools

  • vprof - Visual Python profiler

tutorials

Bigdata

Database

  • CodernityDB - a more advanced key-value Native Python Database, with multiple key-values indexes in the same engine
  • zerodb - ZeroDB is an end-to-end encrypted database. Data can be stored on untrusted database servers without ever exposing the encryption key. Clients can execute remote queries against the encrypted data without downloading all of it or suffering an excessive performance hit.
  • mongo数据库基本操作-python篇
  • ZODB4 - ZODB makes it really fast and easy to build and distribute Persistent Python applications
  • pickleDB - pickleDB is a lightweight and simple key-value store
  • tinydb - TinyDB is a lightweight document oriented database optimized for your happiness :) https://tinydb.readthedocs.org

orm

  • peewee - Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use.

File formats

Others

Kits

  • boltons - Like builtins, but boltons. Constructs/recipes/snippets that would be handy in the standard library. Nothing like Michael Bolton. https://boltons.readthedocs.org
  • pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. 

Utils

CLI

Data Science

  • GraphLab Create - powerful and usable data science tools that enable you to go quickly from inspiration to production

Statistical and visualization

packages

  • seaborn - Statistical data visualization using matplotlib
  • Statsmodels - Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
  • prettyplotlib - Painlessly create beautiful matplotlib plots
  • matplotlib - matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms
  • ggplot - ggplot is a plotting system for Python based on R's ggplot2 and the Grammar of Graphics. It is built for making profressional looking, plots quickly with minimal code.
  • pandas - Python Data Analysis Library
  • NumPy - NumPy is the fundamental package for scientific computing with Python
  • bokeh - Interactive Web Plotting for Python
  • pygal - A python SVG Charts Creator
  • trendvis - TrendVis is a plotting package that uses matplotlib to create information-dense, sparkline-like, quantitative visualizations of multiple disparate data sets in a common plot area against a common variable.
  • MPLD3 - Bringing Matplotlib to the Browser

tutorials

NumPy

Pandas

matplotlib

Plots

Map

  • folium - Python Data. Leaflet.js Maps

GUI

  • pyglet - a cross-platform windowing and multimedia library for Python.
  • kivy - Open source UI framework written in Python, running on Windows, Linux, OS X, Android and iOS https://kivy.org

Data structure and Algorithm

string

Machine Learning

General

packages

  • mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.
  • PyMC3 -a python module for Bayesian statistical modeling and model fitting which focuses on advanced Markov chain Monte Carlo fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.
  • deap - Distributed Evolutionary Algorithms in Python http://deap.readthedocs.org/

Scikit-learn

Theano

examples

NLP

Clustering

  • annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
  • pysparnn - Approximate Nearest Neighbor Search for Sparse Data in Python!

Optimization

  • Spearmint - Spearmint Bayesian optimization codebase

Deep learning

Image

GEO

  • geoplotlib - python toolbox for visualizing geographical data and making maps

Web

Flask

Comparison

Others

Fun

  • Jarvis
  • PyLatex
  • pyautogui - a Python module for programmatically controlling the mouse and keyboard.

Finance

More

Misc

update all packages:

  • pip upgrade all  sudo pip freeze --local | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 sudo pip install -U

My Python Environment Workflow with Conda

Open Sourcing a Python Project the Right Way, 中文译文

fake-factory - Faker is a Python package that generates fake data for you.