In [None]:
import pandas as pd
import seaborn as sns
%pylab inline

<div class="exercise">
    
Breakout Exercise #10
============

Problem 1
--------

Consider you have the following data in a text file (The file `data/stations.txt` contains the full dataset):

    # Station  Lat    Long   Elev 
    BIRA    26.4840 87.2670 0.0120
    BUNG    27.8771 85.8909 1.1910
    GAIG    26.8380 86.6318 0.1660
    HILE    27.0482 87.3242 2.0880
    ... etc.
    
    
These are the names of seismographic stations in the Himalaya, with their coordinates in degrees and elevations in kilometers.

<b>

- Make a scatter plot of all of these, using both the size and the color to (redundantly) encode elevation.  Label each station by its 4-letter code, and add a colorbar on the right that shows the color-elevation map.

</b>

**Tips**
    
* Use the ``matplotlib`` online documentation to learn about drawing the color bar. You will want to save the returned object from the ``scatter`` plotting call.
    
* To label each station with its 4-letter code you can loop through them all and ``annotate`` each.
    

In [None]:
with open('./data/stations.txt', 'r') as f:
    data = f.readlines()
data = [line.strip('\n').split('\t') for line in data]
data = pd.DataFrame(data, columns=['name', 'lat', 'long', 'activity'])
data[['lat', 'long', 'activity']] = data[['lat', 'long', 'activity']].astype(float) 

In [None]:
f, ax = plt.subplots(figsize=(20, 20))
scat = ax.scatter(data['lat'], data['long'],
           c=data['activity'], s=data['activity']*100,
           cmap=plt.cm.rainbow)
for ix, row in data.iterrows():
    ax.annotate(row['name'], (row[['lat', 'long']].values),
                xytext=(2, 5), textcoords='offset points')
ax.set_ylabel('Latitude (deg)')
ax.set_xlabel('Longitude (deg)')
ax.set_title('Seismic stations in the himalayas')
cbar = f.colorbar(scat)
cbar.set_label('Elevation (km)')

Problem 2
---------

Write a notebook where you can load the image ``data/dessert.png`` and then perform the following operations on it:

<b>

- Create a figure with four plots that show both the full-color image and color channel of the image with the right colormap for that color. Ensure that the axes are linked so zooming in one image zooms the same region in the others.

- Compute a luminosity and per-channel histogram and display all four histograms in one figure, giving each a separate plot (hint: a 4x1 plot works best for this). Link the appropriate axes together.

- Create a black-and-white (or more precisely, grayscale) version of the image. Compare the results from a naive average of all three channels with that of a model that uses 30% red, 59% green and 11% blue, by displaying all three (full color and both grayscales) side by side with linked axes for zooming.
</b>

**Tips**
    
* ``matplotlib`` image tutorial: http://matplotlib.org/users/image_tutorial.html

In [None]:
img = plt.imread('./data/dessert.png')
f, axs = plt.subplots(1, 4, sharey=True, sharex=True)
img = img[::-1, ...]
axs[0].imshow(img)
cols = [plt.cm.Reds, plt.cm.Greens, plt.cm.Blues]
for i, (ax, col) in enumerate(zip(axs[1:], cols)):
    ax.imshow(img[:, :, i], cmap=col)

for ax in axs:
    ax.grid(False)

In [None]:
f, axs = plt.subplots(1, 4, sharex=True, sharey=True)
cols = ['k', 'r', 'g', 'b']
lum = img.mean(-1)
axs[0].hist(lum.ravel(), normed=True, color=cols[0])
for i, ax in enumerate(axs[1:]):
    chan = img[:, :, i]
    col = cols[i+1]
    ax.hist(chan.ravel(), normed=True, color=col)
    
for ax, col in zip(axs, cols):
    ax.patches

In [None]:
grey_simple = img.mean(-1)
weights = np.array([.3, .59, .11])
grey_weighted = (img * weights).mean(-1)

f, axs = plt.subplots(1, 3, figsize=(15, 10), sharex=True, sharey=True)
for ax, imdat in zip(axs, [img, grey_simple, grey_weighted]):
    ax.imshow(imdat)

#### Problem 3 - Exploring the matplotlib gallery

Have a look at the matplotlib gallery, find a cool looking figure, copy the code
into the box below, and modify it. Note that some of the examples might require packages that
are not installed on your machine (in particular those that make maps) -
if this is the case, pick another example for the purposes of this exercise.

In IPython, you can use the "load magic". Type %loadpy and then the URL of the py file containing the
code, and it will automatically copy it into a cell below. Run the cell with the code to see the
figure.

In [None]:
# %load http://matplotlib.org/mpl_examples/pylab_examples/contour_demo.py

## Problem 4
We've collected some data about tipping behavior from many people. Explore this dataset and see if there's anything to conclude.

**In Pandas**
* Read in the data at `./data/tips.csv` using pandas
* Using pandas, plot the total bill amount vs. the size of the tip
* Using pandas, plot a histogram of tip values so we see its distribution
* Modify the histogram so only the first and last xticklabel is displayed

**In Seaborn**
* Create a "violinplot" to see the distribution of tips for males vs. females. Plot each day on the x-axis, and the distributions along the y-axis. Make the color dependent on the gender of the person. *HINT: Use "split=True" to plot the distributions alongside one another*
* Plot the same data, but now as a mean tip amount +/- errorbars rather than the full distribution. *HINT: Use `factorplot`*
* Regress the `total_bill` amount against the `tip` to see what the relationship is. Plot one regression line + scatterplot for each gender.

In [None]:
data = pd.read_csv('./data/tips.csv')

In [None]:
# First make a scatterplot of the
# total bill price and size of the party using pandas
f, ax = plt.subplots()
data.plot('total_bill', 'size', kind='scatter', ax=ax)

# Plot a histogram of tip values so we get an idea of its distribution
# Use bin sizes of length .5. Remove all middle tick labels
f, ax = plt.subplots()
ax = data['tip'].hist(bins=np.arange(0, 20, .5), ax=ax)
labs = ax.get_xticklabels()
plt.setp(labs, visible=False)
plt.setp([labs[i] for i in [0, -1]], visible=True)

In [None]:
# Create a "violin plot" to display the distribution of tips.
# One column per day of the week, split by gender.
f, axs = plt.subplots(3, 1, figsize=(10, 15))
sns.violinplot('day', 'tip', data=data, hue='sex', ax=axs[0], split=True)

# Plot the mean tip size by day with error bars as a bar plot
# One color per gender, one pair of bars per day
sns.factorplot(x="tip", y="day", hue="sex", data=data,
               size=6, kind="bar", ax=axs[1])

# Display a scatterplot and fit a regression model to show how tips scale
# with the size of the bill
for sex, vals in data.groupby('sex'):
    sns.regplot('total_bill', 'tip', data=vals, ax=axs[2])