# Data Visualization

In [None]:
import pandas as pd
import numpy as np

In [None]:
airbnb = pd.read_csv('https://raw.githubusercontent.com/ishaandey/node/master/week-4/workshop/airbnb.csv') # For Seattle only

As with all new datasets, let's start by familiarizing ourselves with the dataset:

**Try it!** Print the shape, columns, and show a sample observation

## Scatter, Bars, and Histograms: The Basics

Some imports: Note that we'll rename `plotly express` as `px`.

`plotly express` is a "wrapper" for the base `plotly` package. What that means is we can use incredibly easy and readable functions, and plotly express will do the hard work of convering that input into formats that the software can understand.

Quick aside: If you're a web developer and love JS, or an academic and use R, the same Plotly library is available to use in both languages. 

Let's start off with a simple scatter plot, which we can whip up with `px.scatter()`

What does the association between price and availability look like?

It works, but doesn't really tell us too much. Let's modify the plot by adding parameters to `px.scatter()`

With *any* python package, we can pull up some quick documentation from Jupyter itself using `?`
<br>**Try it!** What parameters does `px.scatter` accept?

So we're still not seeing much of a clear trend here.

There are, however, quite a few outliers in the price. Let's see if we can adjust our graph so the rest of the data isnt squished down.

Peep the histogram on the right, that shows a pretty neat trend with the room types. We can check that out in more depth later.

### Quick aesthetics

Those outliers were causing us a bit of trouble, but wasn't too hard to deal with. 
<br>But that does make me a bit curious: What was so special about those listings?

The power of plotly is that we can use the interactvity to literally just hover over the data points to see what's going on. 
<br>All we have to do is suggest what features to display: 

See if you can find out which parameters can be used to show text on hover: 

Plotly is interactive! Play around with the legends and plot area. <br>Double click on the legend icon on the right, and plotly will automatically update the figure to select those points only.

We can change our colors fairly easily using color scales.

<br>If the feature we pass to `color=` is **discrete or categorical**, we'll add the `color_discrete_sequence` param
* Documentation for the color schemes accepted: https://plotly.com/python/discrete-color/

<br>If the feature is instead **continuous**, we'll use the `color_continuous_scale` param instead
* The corresponding docs for continuous schemes: https://plotly.com/python/colorscales/
* And the available color scales:  https://plotly.com/python/builtin-colorscales/

<br> Open the docs, and try out your favorite below:

Under the hood, we can see that each of these sequences are just lists of colors, so we could subset them to use different values

To finish off, we can add **titles**, **labels** and such pretty easily.

See if you can use the function documentation or google to figure out how to do that:

### Bar Plots & Histograms

Oftentimes we'll want to create visualizations at some aggregate level. 

For example, let's say we want to show neighborhoods with a high median rental price. 
<br>Our data is at a *per-listing* level, meaning that each individual row is its own listing, with its price. 
<br>To get data at the *per-neighborhood* level, we've got to *roll up* all the listing prices per neighborhood, in other words, group the data by neighborhood, then find the median for all those listings.

In [None]:
airbnb_byN = airbnb.groupby(by=['neighbourhood','neighbourhood_group']).agg('median').reset_index()
airbnb_byN.head(3)

Now, let's drop the columns that make no sense to have a median of. 

In [None]:
airbnb_byN = airbnb_byN.drop(columns=['host_id','latitude', 'longitude'])
airbnb_byN.head(3)

In breakout groups, see if you can (1) **build a bar plot** to show median prices in each neighbourhood group, and sort them in a meaningful way

Make it complete! Label axes, hover text, color, the whole nine yards.

Say my friend and I have a budget of of $90 per night. Show which regions are ideal for this, but how you wanna do that is entirely up to you: Draw a **horizontal line**, color the bars by color the ideal regions differently, as long as it communicates the which neighborhoods are generally cheaper.


**Hint**: To draw a line, use `fig.add_vline()` with corresponding parameters
<br>**Hint**: To color bars according to some condition, first **create a new column** that describes if the value is below budget.

In [None]:
budget = 85
airbnb_byN['budget'] = 

## Geographic Plots

There's quite a few different ways to show geogrpahical data, usually with choropleth charts or scatter plots.
<br>Our friend Plotly has them all: https://plotly.com/python/maps/

A quick note about how this work before letting you leaf through the docs page.

<br> Most of the params in `px.scatter_mapbox()` behave pretty similarly to `px.scatter`, except that we provide **latitude and longitude data** instead of `x` and `y`. Luckily, our dataset already has that included, but oftentimes we'll have to find a lookup table online to convert city names, for example, to lat / lon coordinates. 

<br>We don't necessarily have to provide a value to `size=`, but that usually can help highlight points of interest.
<br> `zoom=` on the other hand, just changes how zoomed in the initial picture is when first loaded.

<br>Finally, we'll have to update the `mapbox_style=` parameter of the figure to a specific base map to load. 
<br>For more information on what options are available here, check out https://plotly.com/python/mapbox-layers/

In [None]:
airbnb['budget'] = (airbnb.price <= budget).map({True:'Under Budget', False:'Over Budget'})

In [None]:
fig = px.scatter_mapbox(
                       )

fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

fig.show()

Show places in Downtown, Central Area, and Capitol Hill, and highlight those under budget