# TODOs

## Things to look at…

* [modin-project/modin](https://github.com/modin-project/modin) – Modin: Speed up your Pandas workflows by changing a single line of code
* [blue-yonder/tsfresh](https://github.com/blue-yonder/tsfresh) – Automatic extraction of relevant features from time series
* [santosjorge/cufflinks](https://github.com/santosjorge/cufflinks) – Productivity Tools for Plotly + Pandas
* [vi3k6i5/flashtext](https://github.com/vi3k6i5/flashtext) – Find or replace keywords in sentences
* [blaze/blaze](https://github.com/blaze/blaze) – NumPy and Pandas interface to Big Data
* [iterative/dvc](https://github.com/iterative/dvc) – Data & models versioning for ML projects, make them shareable and reproducible
* [JasonKessler/scattertext](https://github.com/jasonkessler/scattertext) – Beautiful visualizations of how language differs among document types
* [spotify/chartify](https://github.com/spotify/chartify) – Python library that makes it easy for data scientists to create charts (Bokeh wrapper)
* [eltonlaw/impyute](https://github.com/eltonlaw/impyute) – Data imputations library to preprocess datasets with missing data
* https://github.com/PatrikHlobil/Pandas-Bokeh
* [ricklupton/ipysankeywidget](https://github.com/ricklupton/ipysankeywidget) – IPython / Jupyter Sankey diagram widget

* [Liip Data Science Stack](http://datasciencestack.liip.ch/) (interactive web UI)
* [GregHilston/ds_pandas_presentation](https://github.com/GregHilston/ds_pandas_presentation)
* [TiesdeKok/LearnPythonforResearch](https://github.com/TiesdeKok/LearnPythonforResearch#notebooks) – Get started with Python for accounting and finance research

## Useful packages

Initially copied from [jupyter_notebooks/UsefulPackages.txt](https://github.com/pybokeh/jupyter_notebooks/blob/master/PyData/UsefulPackages.txt)



### PyData stack
* numpy
* scipy
* pandas
* jupyter
* statsmodels

### Profiling
* pandas-profiling: https://github.com/pandas-profiling/pandas-profiling
* memory_profiler: https://github.com/pythonprofilers/memory_profiler
* py-spy: https://github.com/benfred/py-spy/blob/master/README.md
* pyflame: https://github.com/uber/pyflame   # Does not support Windows

### Forecasting
* pyramid-arima https://github.com/tgsmith61591/pyramid
* fbprophet: time series forecasting (additive model) which performs best with high frequency data
* pyflux: time series

### Niche stats libraries
* lifelines: survival analysis: https://github.com/CamDavidsonPilon/lifelines
* convoys: https://better.engineering/convoys/

### "Large Data" libraries
* dask
* pyarrow
* fastparquet
* vaex: https://github.com/maartenbreddels/vaex
* Pandas on Ray: https://github.com/modin-project/modin
* dampr: https://github.com/Refefer/Dampr

### visualization libraries
* MATPLOTLIB
* seaborn
* altair / https://github.com/justinbois/altair-catplot
* pdvega
* bokeh / HoloViews / hvplot / pandas-bokeh (pyviz stack)
* https://github.com/spotify/chartify
* dash: dashboard library from plotly
* dataspyre: dashboard framework with flask backend
* folium
* geoplot
* plotnine: clone of R's ggplot2
* joypy: https://github.com/sbebo/joypy/blob/master/Joyplot.ipynb
* bqplot
* jmpy
* pyqtgraph
* plotly / cufflinks (need to install cufflinks too for pandas dataframe integration)
* toyplot
* ipyleaflet: https://github.com/jupyter-widgets/ipyleaflet/
* probscale: easily make probability scaled axis
* adjustText: easily add annotated text (https://github.com/Phlya/adjustText)

### Jupyter Notebook Related
* ipysheet
* ipypivot
* https://github.com/nteract/papermill (parameterized notebooks)
* jupytext (1.0 has better jupyter integration)

### database related
* pyodbc
* turbodbc
* ipython-sql
* db.py (dead project?)
* sqlalchemy
* sqlalchemy-turbodbc

### ETL or data engineering related sorted from lightest to heaviest framework
* https://github.com/petl-developers/petl
* bonobo
* pypeln - https://github.com/cgarciae/pypeln/
* botflow - https://github.com/kkyon/botflow
* https://github.com/mara/data-integration
* https://www.getdbt.com/
* Spotify Luigi
* Apache Airflow - Windows not supported

### Data validation and cleaning frameworks
* https://github.com/pyeve/cerberus
* https://github.com/great-expectations/great_expectations
* https://github.com/cosmicBboy/pandera
* https://pyjanitor.readthedocs.io/
* https://github.com/keleshev/schema
* https://github.com/TMiguelT/PandasSchema
* https://github.com/TomAugspurger/engarde

### R related
* rpy2
* plydata (dplyr clone)
* plotnine (ggplot2 clone)

### Machine Learning Related
* scikit-learn
* sklearn-pandas
* imbalanced-learn
* hyperopt-sklearn: https://github.com/hyperopt/hyperopt-sklearn # Not pip installable yet
* tpot
* xgboost
* lightgbm
* fastText
* https://github.com/kvh/recurrent  - extract datetimes from English sentence

### Webscraping
* beautifulsoup4
* mechanicalsoup
* selenium
* scrapy
* https://github.com/kennethreitz/requests-html
* https://newspaper.readthedocs.io  # easily extract text from articles
* requests-html

### Utilities
* https://github.com/tldr-pages/tldr-python-client # replacement for man pages
* bropages (http://bropages.org/): sudo apt-get install ruby-dev, sudo gem install bropages
* https://github.com/gleitz/howdoi
* inspect https://docs.python.org/3/library/inspect.html
* prettypandas
* https://github.com/seatgeek/fuzzywuzzy
* https://github.com/RobinL/fuzzymatcher
* pytest
* requests
* requests-html
* psutil

### Amazon Web Services
* PyAthena: https://github.com/laughingman7743/PyAthena

### CLI
* click - for making CLI
* fire - for making CLI
* https://github.com/tmbo/questionary

### Progress Bars
* tqdm: https://github.com/tqdm/tqdm
* fastprogress: https://github.com/fastai/fastprogress

### Misc.
* Sending Windows 10 notifications: https://github.com/jithurjacob/Windows-10-Toast-Notifications
* Another Windows notification library: https://github.com/malja/zroya
* glances: CPU/memory monitoring
* pendulum: a better datetime library, better than arrow
* visidata: https://jsvine.github.io/intro-to-visidata/index.html
* schedule: job scheduling for humans:https://github.com/dbader/schedule
* pyautogui
* ptpython: better REPL
* xlwings: Excel VBA, but with Python instead
* https://github.com/SimonBiggs/scriptedforms
* black: source code formatter

### debugging
* ipdb
* pudb: https://github.com/inducer/pudb - Windows not supported, need cygwin

### logging
* https://github.com/Delgan/loguru