# A brief tour of data wrangling & plotting in python

In [17]:
%%html
<style>
.jp-CodeCell .jp-Cell-inputWrapper { display: inline; }
.jp-CodeCell.jp-mod-selected .jp-Cell-inputWrapper { display: inline; }
.jp-Cell.jp-mod-selected~.jp-Cell { display: inline; }
</style>

# You win:

- A real programming language (i.e. more transferable skill) 
- Beauty, elegance, **readability**, in short the zen of python (ex?) 
- The ultimate glue/scripting language
- Usually more straightforward to do non-statistical tasks in Python (than in R), e.g. fancy preprocessing, string processing, web scraping, file handling,...
- Connections to many well-developed scientific libraries: numpy (matrix computations), scipy (scientific computing), scikit.image (image processing), scikit.learn (machine learning),...
- Interactive notebooks

# You lose:

- Switch cost (often considerable for non-professional programmers) when you program experiments in python anyway 
- Even basic data analysis functionality not built-in but available through libraries
- Specific advanced analysis techniques not available (though this is rapidly changing, but still more alpha/beta state instead of finished/tested/documented libraries)
- The large knowledge base/support (as a statistics tool) that R has (e.g. in department)


# But lots of commonalities in the logic of data wrangling, plotting & analyzing

No reason to choose, you can use both depending on your needs or processing stage.


# Useful python packages for psychologists

### in order of importance:
- [pandas](https://pandas.pydata.org/): data handling & descriptive analyses, 
- Plotting: [matplotlib](https://matplotlib.org/) (~R base plotting), [seaborn](http://seaborn.pydata.org/index.html) and/or [ggplot](http://ggplot.yhathq.com/) 
- [statsmodels](https://www.statsmodels.org): statistical models (glm, t-tests,...)
- Specialized analyses: Psignifit (psychometric function fitting), Bambi/Kabuki (hierarchical bayesian models), [NIPY](http://nipy.org/), [MNE](http://martinos.org/mne/stable/index.html), [PyMVPA](http://www.pymvpa.org/) 
- [More libraries](https://www.marsja.se/best-python-libraries-psychology/)



* [Data wrangling with python](https://drive.google.com/open?id=0BwlD7q-DXkdWRUMyc19NdGNKRHc): cleaning data and shaping it into the right format(s) for your visualizations and analyses ([cheat sheet](https://drive.google.com/open?id=0BwlD7q-DXkdWVkJQLVhGeHA3elk)). The importance of the [split-apply-combine](http://blog.yhat.com/posts/grouping-pandas.html) method (much more in this [book on python for data analysis](https://drive.google.com/open?id=0BwlD7q-DXkdWZHB6a0szLVN5WDQ)).
* [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook)
* [Python for social scientists](http://nealcaren.github.io/python-tutorials/) 
* [Jupyter notebook tutorial](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook)
* [Interactive pandas tutorial](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python)  
* [Basic pandas tutorial](https://www.dataquest.io/blog/pandas-tutorial-python-2/)
* [Basic tutorial pandas in notebook](http://nikgrozev.com/2015/12/27/pandas-in-jupyter-quickstart-and-useful-snippets/)
* [Advanced pandas/seaborn tutorial](https://tryolabs.com/blog/2017/03/16/pandas-seaborn-a-guide-to-handle-visualize-data-elegantly)
* [Preprocessing issues](https://www.datacamp.com/community/tutorials/the-importance-of-preprocessing-in-data-science-and-the-machine-learning-pipeline-i-centering-scaling-and-k-nearest-neighbours#gs.zBXNP2I)
* [Tidy data in python](https://github.com/jfpuget/Tidy-Data/blob/master/Tidy-Data.ipynb) ([more](http://www.jeannicholashould.com/tidy-data-in-python.html))
* [Notebook keyboard shortcuts & "magic"](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)
* [Multilevel Logistic Regression using PyMC3​](https://dansaber.wordpress.com/2016/08/27/analyze-your-experiment-with-a-multilevel-logistic-regression-using-pymc3%E2%80%8B/)
* [Understanding Matplotlib](http://pbpython.com/effective-matplotlib.html) 
* [Useful Jupyter notebooks](https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#psychology-and-neuroscience)
* [use R in jupyter notebooks](http://www.randalolson.com/2013/01/14/filling-in-pythons-gaps-in-statistics-packages-with-rmagic/).


# Interfacing between the two

In [33]:
from __future__ import print_function
from IPython.html.widgets import interact, interactive, fixed
from IPython.html import widgets

def f(x):
    print(x)
    
interact(f, x=10);


10
