# Libraries

In [15]:
from bokeh.io import output_notebook, show
from bokeh.layouts import column, gridplot, layout, row
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, LinearColorMapper, NumeralTickFormatter, Range1d, HoverTool
from bokeh.models import Span, Label, Panel, Tabs, LabelSet, SingleIntervalTicker, LinearAxis, DatetimeTickFormatter
from bokeh.embed import components
from bokeh.transform import cumsum
from bokeh.palettes import Category20
from bokeh.models import Span, Legend
from bokeh.io import curdoc
from bokeh.themes import Theme
import pandas as pd

In [16]:
output_notebook()

# Our Data
This data comes from a little pipeline I built that was outline in this [medium article on AWS Lambda Pipelines](https://towardsdatascience.com/make-data-acquisition-easy-with-aws-lambda-python-in-12-steps-33fe201d1bb4). Basically our data is on New York City apartments, which is scraped from Craigslist over June and July 2019. This data from Craigslist has a few enrichments which brings in data from Mapquest and Walk Scores, but it should be pretty intuitive to understand.

# Import our data
Read in our data and let's convert the date column to a datefield

In [17]:
df = pd.read_csv('data/nyc_apartments.csv')
df['date'] = pd.to_datetime(df['datetime'], infer_datetime_format=True).dt.date
df.head()

Unnamed: 0,id,address,area,bedrooms,bikeScore,datetime,distanceToNearestIntersection,has_image,has_map,name,...,month,dow,day,hour,advertises_no_fee,is_repost,sideOfStreetEncoded,postalCodeChopped,neighborhood,date
0,6911917730,320 Chauncey St,,3.0,64.0,2019-06-21 14:34:00,0.0,1,1,you’re in good hands...t e x t us to view bk’s...,...,6,4,21,14,1,0,1.0,11233.0,Southeast Bronx,2019-06-21
1,6917210186,530 W 143rd St,800.0,1.0,88.0,2019-06-21 14:33:00,203.483553,1,1,spacious 1br penthouse with deck!! near col un...,...,6,4,21,14,0,1,0.0,10031.0,Upper West Side,2019-06-21
2,6914527887,410 Pulaski St,,3.0,79.0,2019-06-21 14:33:00,0.013114,1,1,this is the one you’ve been looking for… call ...,...,6,4,21,14,0,0,1.0,11221.0,Sunset Park,2019-06-21
3,6914529944,410 Pulaski St,,3.0,79.0,2019-06-21 14:33:00,0.013114,1,1,simplify your search with us**pro team w/ big ...,...,6,4,21,14,1,0,1.0,11221.0,Sunset Park,2019-06-21
4,6917173545,4754 Center Blvd,653.0,1.0,81.0,2019-06-21 14:33:00,61.301497,1,1,sunny 1br in long island city. brand new renov...,...,6,4,21,14,1,1,0.0,11109.0,Queens,2019-06-21


# Column Data Source
Bokeh has something called a ["ColumnDataSource"](https://bokeh.pydata.org/en/latest/docs/reference/models/sources.html), which will quickly become your best friend. You can read about it in the docs, but the high level way to think about it is it converts your Pandas dataframe to something Bokeh can easily use. You can see how we utilize this weapon of mass plotting in the charts below, but the general process is:

- Get your data in the proper format with pandas
- Make this properly formatted dataframe a ColumnDataSource
- Use this ColumnDataSource when you call your plotting function

# Let's Create Some Visualizations

## Mean Price Over Time

In [21]:
df_prices = df.groupby('date')[['price']].mean().reset_index()

# Create ColumnDataSource
source = ColumnDataSource(df_prices)

# Define a figure
p = figure(sizing_mode='stretch_width', title="Prices Over Time", x_axis_type='datetime')

# Plot the data
p.line(x='date', y='price', source=source)

# Show me our creation!
show(p)

## Mean Temperature Over Time - Prettier
Beauty is in the eye of the beholder, but I find myself often tweaking my charts to my own personal style. Let's make some changes to our chart above:

- The y-axis defaults to not starting at 0 (shame on you Bokeh)
- I hate gridlines
- Axis ticks are too small
- Make our title pop more by making it bigger
- Hide those ugly toolbars
- Add some circles on the data points to make individual points stand out a bit

In [30]:
# Define a figure
p = figure(sizing_mode='stretch_width', title="Prices Over Time", x_axis_type='datetime',
          tools=[], toolbar_location=None)

# Plot the data
p.line(x='date', y='price', source=source)

# Style our chart
p.title.text_font_size = '14pt' # Bigger title
p.y_range = Range1d(0, df_prices['price'].max() * 1.05) # Make our y-axis from 0 to slightly over the max
p.xgrid.grid_line_color, p.ygrid.grid_line_color = None, None


# Show me our creation!
show(p)