# First steps with data analysis in Python

A workshop at [Aspects of Neuroscience](http://neuroaspects.org/) 2016, by [Piotr Migdał](http://p.migdal.pl/).

# 3. Data analysis in Python

## Plots

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

In [None]:
plt.plot([1, 2, 3, 6], [-2, 0, 1, -1])

In [None]:
plt.plot([1, 2, 3, 6], [-2, 0, 1, -1], 'ro')

In [None]:
import seaborn as sns
sns.set_style('whitegrid')

In [None]:
plt.plot([1, 2, 3, 6], [-2, 0, 1, -1])

In [None]:
plt.plot([1, 2, 3, 6], [-2, 0, 1, -1])
plt.plot([1, 2, 3, 6], [-1, -0.5, 0.5, -0.5])
plt.xlabel("some x label")
plt.ylabel("some y label")
plt.title("why am I drawing this?")

## Working with tabular data

In [None]:
# key library for working with tabular data
import pandas as pd

A 2 year record of bike renting in Washington DC.
https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset

The official description of columns goes as follows:

- `instant`: record index
- `dteday`: date
- `season`: season (1:springer, 2:summer, 3:fall, 4:winter)
- `yr`: year (0: 2011, 1:2012)
- `mnth`: month ( 1 to 12)
- `hr`: hour (0 to 23)
- `holiday`: weather day is holiday or not (extracted from [Web Link])
- `weekday`: day of the week
- `workingday`: if day is neither weekend nor holiday is 1, otherwise is 0.
- `weathersit`: 
    - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- `temp`: Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
- `atemp`: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
- `hum`: Normalized humidity. The values are divided to 100 (max)
- `windspeed`: Normalized wind speed. The values are divided to 67 (max)
- `casual`: count of casual users
- `registered`: count of registered users
- `cnt`: count of total rental bikes including both casual and registered

In [None]:
days = pd.read_csv("bike_sharing_day.csv", parse_dates=["dteday"], index_col="dteday")

In [None]:
# fist few entires
days.head()

In [None]:
# number of rows and columns
days.shape

In [None]:
days.info()

In [None]:
# selecting a column
temp = days["temp"]
# or: 
temp = days.temp

In [None]:
temp.head()

In [None]:
# a simple plot
temp.plot()

In [None]:
# a histogram of values
temp.hist(bins=25)

In [None]:
# selecting columns
weather = days[["temp", "hum", "windspeed"]]
weather.head()

In [None]:
weather.plot()

In [None]:
weather.mean()

In [None]:
weather.describe()

In [None]:
days[["registered", "casual"]].plot()

In [None]:
days[["registered", "casual"]].plot(logy=True)

In [None]:
days.corr()

In [None]:
sns.heatmap(days.corr())

In [None]:
sns.clustermap(days.corr())

In [None]:
sns.factorplot(data=days, x="weekday", y="casual", hue="season", kind="bar")

# Links

* General
    * [A modern guide to getting started with Data Science and Python](http://twiecki.github.io/blog/2014/11/18/python-for-data-science/)
    * [Data science intro for math/phys background](http://p.migdal.pl/2016/03/15/data-science-intro-for-math-phys-background.html) by me    
* Plots
    * [Overview of Python Visualization Tools](http://pbpython.com/visualization-tools-1.html)
    * [Pandas Visualization](http://pandas.pydata.org/pandas-docs/stable/visualization.html)
    * [Matplotlib tutorial](http://www.labri.fr/perso/nrougier/teaching/matplotlib/)
    * [Seaborn: statistical data visualization](https://stanford.edu/~mwaskom/software/seaborn/)
    * [Why xkcd-style graphs are important](https://www.chrisstucchio.com/blog/2014/why_xkcd_style_graphs_are_important.html)
* Other
    * [Web Scraping - It’s Your Civic Duty](http://pbpython.com/web-scraping-mn-budget.html)
    * [Scipy Lecture Notes - One document to learn numerics, science, and data with Python](http://www.scipy-lectures.org/)