<h1>Charts with bokeh assignment</h1>
Download the nyc taxi data for 2016 January (see below) and prepare the following charts:

<ol>
    <li>A bokeh bar chart with day of the week (Monday, Tuesday, ...) on the x-axis and the average duration of rides on the y-axis. Make sure that the hover tool is activated and that it shows the average duration when the cursor hovers over it</li>
    <li>A bokeh interactive chart with a slider containing the hour of the day (0,1,...23) and the average total amount for each hour for each day of the week. I.e., the chart should contain days of the week on the x-axis and the mean total amount on the y-axis for a particular hour of the day. Moving the slider (e.g., from 10 to 11) should replace the chart for 1000 hrs by the chart for 1100 hrs). Don't forget the tooltip</li>
    <ul><li><a href="https://docs.bokeh.org/en/latest/docs/reference/models/widgets/sliders.html">sliders</a></li>
        <li><a href="https://docs.bokeh.org/en/latest/docs/reference/models/glyphs/vbar.html">vbar</a></li>
        <li>note that column names must be strings for converting a data frame into a column data source</li>
    </ul>
    <li>A piechart that shows how much of the total payment comes from each day of the week. The pie should have seven slices, one for each day, and the size of each slice depends on the fraction it contributes to the total. Again, don't forget the tooltip</li>
    
</ol>
<li>For the purposes of this exercise, remove any taxi rides that are less than 5 minute in duration</li>

<h2>NYC taxi data</h2>
<li>NYC taxi trip data is collected and made available (yellow, green, and black cabs)</li>
<li>We'll use data from January 2016</li>
<li><a href="https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet">https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet</a></li>
<li>The data is in <a href="https://parquet.apache.org/">parquet</a> format. Parquet is a data interchange format created by the <a href="https://www.apache.org/">Apache Foundation</a> for efficient data storage and retreival. Sort of like JSON but in binary</li>
<li>Use pandas <span style="color:blue">read_parquet</span> function to import the data</li>

<li>You may need to install pyarrow and fastparquet (using pip) - not sure!</li>

In [1]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

output_notebook()

In [2]:
!pip install pyarrow



In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bokeh.layouts import row
from bokeh.models import ColumnDataSource, Slider, CustomJS
from bokeh.plotting import figure, show
%matplotlib inline

#Get the data
datasource = "yellow_tripdata_2022-01.parquet"
df = pd.read_parquet(datasource)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2463931 entries, 0 to 2463930
Data columns (total 19 columns):
 #   Column                 Dtype         
---  ------                 -----         
 0   VendorID               int64         
 1   tpep_pickup_datetime   datetime64[us]
 2   tpep_dropoff_datetime  datetime64[us]
 3   passenger_count        float64       
 4   trip_distance          float64       
 5   RatecodeID             float64       
 6   store_and_fwd_flag     object        
 7   PULocationID           int64         
 8   DOLocationID           int64         
 9   payment_type           int64         
 10  fare_amount            float64       
 11  extra                  float64       
 12  mta_tax                float64       
 13  tip_amount             float64       
 14  tolls_amount           float64       
 15  improvement_surcharge  float64       
 16  total_amount           float64       
 17  congestion_surcharge   float64       
 18  airport_fee           

<span style="color:blue">Start with a small subset of the data</span>
<br>
<li>After you've completed the assignment with the subset, you can try using all the data</li>

In [4]:
df

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
0,1,2022-01-01 00:35:40,2022-01-01 00:53:29,2.0,3.80,1.0,N,142,236,1,14.50,3.0,0.5,3.65,0.0,0.3,21.95,2.5,0.0
1,1,2022-01-01 00:33:43,2022-01-01 00:42:07,1.0,2.10,1.0,N,236,42,1,8.00,0.5,0.5,4.00,0.0,0.3,13.30,0.0,0.0
2,2,2022-01-01 00:53:21,2022-01-01 01:02:19,1.0,0.97,1.0,N,166,166,1,7.50,0.5,0.5,1.76,0.0,0.3,10.56,0.0,0.0
3,2,2022-01-01 00:25:21,2022-01-01 00:35:23,1.0,1.09,1.0,N,114,68,2,8.00,0.5,0.5,0.00,0.0,0.3,11.80,2.5,0.0
4,2,2022-01-01 00:36:48,2022-01-01 01:14:20,1.0,4.30,1.0,N,68,163,1,23.50,0.5,0.5,3.00,0.0,0.3,30.30,2.5,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2463926,2,2022-01-31 23:36:53,2022-01-31 23:42:51,,1.32,,,90,170,0,8.00,0.0,0.5,2.39,0.0,0.3,13.69,,
2463927,2,2022-01-31 23:44:22,2022-01-31 23:55:01,,4.19,,,107,75,0,16.80,0.0,0.5,4.35,0.0,0.3,24.45,,
2463928,2,2022-01-31 23:39:00,2022-01-31 23:50:00,,2.10,,,113,246,0,11.22,0.0,0.5,2.00,0.0,0.3,16.52,,
2463929,2,2022-01-31 23:36:42,2022-01-31 23:48:45,,2.92,,,148,164,0,12.40,0.0,0.5,0.00,0.0,0.3,15.70,,


In [5]:
#df = df.sample(frac=0.2)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2463931 entries, 0 to 2463930
Data columns (total 19 columns):
 #   Column                 Dtype         
---  ------                 -----         
 0   VendorID               int64         
 1   tpep_pickup_datetime   datetime64[us]
 2   tpep_dropoff_datetime  datetime64[us]
 3   passenger_count        float64       
 4   trip_distance          float64       
 5   RatecodeID             float64       
 6   store_and_fwd_flag     object        
 7   PULocationID           int64         
 8   DOLocationID           int64         
 9   payment_type           int64         
 10  fare_amount            float64       
 11  extra                  float64       
 12  mta_tax                float64       
 13  tip_amount             float64       
 14  tolls_amount           float64       
 15  improvement_surcharge  float64       
 16  total_amount           float64       
 17  congestion_surcharge   float64       
 18  airport_fee           

<h3>Get the pickup hour (e.g., 11:20 corresponds to 11, 15:30pm corresponds to 15, etc.)</h3>

In [6]:
df['pickup_hour'] = df['tpep_pickup_datetime'].dt.hour
df['pickup_hour']

0           0
1           0
2           0
3           0
4           0
           ..
2463926    23
2463927    23
2463928    23
2463929    23
2463930    23
Name: pickup_hour, Length: 2463931, dtype: int32

<h3>Get the day of week (0-Monday, 1-Tuesday, ...)</h3>

In [7]:
df['day_of_week'] = df['tpep_pickup_datetime'].dt.dayofweek
df['day_of_week']

0          5
1          5
2          5
3          5
4          5
          ..
2463926    0
2463927    0
2463928    0
2463929    0
2463930    0
Name: day_of_week, Length: 2463931, dtype: int32

<h3>Get the taxi ride duration in minutes</h3>
<li>I've done this for you</li>

In [8]:
df['duration'] = (df['tpep_dropoff_datetime'] - df['tpep_pickup_datetime'])/np.timedelta64(1, 's')/60.0

<h3>Remove rides of 5 minutes or less and save in df</h3>

In [9]:
df = df[df['duration'] >= 5]
df

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,...,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee,pickup_hour,day_of_week,duration
0,1,2022-01-01 00:35:40,2022-01-01 00:53:29,2.0,3.80,1.0,N,142,236,1,...,0.5,3.65,0.0,0.3,21.95,2.5,0.0,0,5,17.816667
1,1,2022-01-01 00:33:43,2022-01-01 00:42:07,1.0,2.10,1.0,N,236,42,1,...,0.5,4.00,0.0,0.3,13.30,0.0,0.0,0,5,8.400000
2,2,2022-01-01 00:53:21,2022-01-01 01:02:19,1.0,0.97,1.0,N,166,166,1,...,0.5,1.76,0.0,0.3,10.56,0.0,0.0,0,5,8.966667
3,2,2022-01-01 00:25:21,2022-01-01 00:35:23,1.0,1.09,1.0,N,114,68,2,...,0.5,0.00,0.0,0.3,11.80,2.5,0.0,0,5,10.033333
4,2,2022-01-01 00:36:48,2022-01-01 01:14:20,1.0,4.30,1.0,N,68,163,1,...,0.5,3.00,0.0,0.3,30.30,2.5,0.0,0,5,37.533333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2463926,2,2022-01-31 23:36:53,2022-01-31 23:42:51,,1.32,,,90,170,0,...,0.5,2.39,0.0,0.3,13.69,,,23,0,5.966667
2463927,2,2022-01-31 23:44:22,2022-01-31 23:55:01,,4.19,,,107,75,0,...,0.5,4.35,0.0,0.3,24.45,,,23,0,10.650000
2463928,2,2022-01-31 23:39:00,2022-01-31 23:50:00,,2.10,,,113,246,0,...,0.5,2.00,0.0,0.3,16.52,,,23,0,11.000000
2463929,2,2022-01-31 23:36:42,2022-01-31 23:48:45,,2.92,,,148,164,0,...,0.5,0.00,0.0,0.3,15.70,,,23,0,12.050000


<h1>PROBLEM 1: Average duration by day of week bar chart</h1>

<h3>group the data by day of week</h3>

In [10]:
day_of_week_group = df.groupby('day_of_week')
day_of_week_group.size()

day_of_week
0    312725
1    272989
2    287574
3    303481
4    306905
5    303928
6    282476
dtype: int64

In [11]:
df

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,...,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee,pickup_hour,day_of_week,duration
0,1,2022-01-01 00:35:40,2022-01-01 00:53:29,2.0,3.80,1.0,N,142,236,1,...,0.5,3.65,0.0,0.3,21.95,2.5,0.0,0,5,17.816667
1,1,2022-01-01 00:33:43,2022-01-01 00:42:07,1.0,2.10,1.0,N,236,42,1,...,0.5,4.00,0.0,0.3,13.30,0.0,0.0,0,5,8.400000
2,2,2022-01-01 00:53:21,2022-01-01 01:02:19,1.0,0.97,1.0,N,166,166,1,...,0.5,1.76,0.0,0.3,10.56,0.0,0.0,0,5,8.966667
3,2,2022-01-01 00:25:21,2022-01-01 00:35:23,1.0,1.09,1.0,N,114,68,2,...,0.5,0.00,0.0,0.3,11.80,2.5,0.0,0,5,10.033333
4,2,2022-01-01 00:36:48,2022-01-01 01:14:20,1.0,4.30,1.0,N,68,163,1,...,0.5,3.00,0.0,0.3,30.30,2.5,0.0,0,5,37.533333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2463926,2,2022-01-31 23:36:53,2022-01-31 23:42:51,,1.32,,,90,170,0,...,0.5,2.39,0.0,0.3,13.69,,,23,0,5.966667
2463927,2,2022-01-31 23:44:22,2022-01-31 23:55:01,,4.19,,,107,75,0,...,0.5,4.35,0.0,0.3,24.45,,,23,0,10.650000
2463928,2,2022-01-31 23:39:00,2022-01-31 23:50:00,,2.10,,,113,246,0,...,0.5,2.00,0.0,0.3,16.52,,,23,0,11.000000
2463929,2,2022-01-31 23:36:42,2022-01-31 23:48:45,,2.92,,,148,164,0,...,0.5,0.00,0.0,0.3,15.70,,,23,0,12.050000


In [12]:
day_of_week_group['duration'].mean().reset_index().set_index(['day_of_week'])

Unnamed: 0_level_0,duration
day_of_week,Unnamed: 1_level_1
0,16.12228
1,15.915687
2,15.878888
3,16.379377
4,16.895126
5,16.205059
6,16.528917


<h3>Get the mean ride duration for each group</h3>
<li>And make a df out of it</li>
<li>day_of_week_mean has the day of week as the index</li>
<li>the dataframe will have seven rows with indexes 0,1,2,..7</li>
<li>add a new column with values Monday, Tuesday, Wedensday,...,Sunday</li>

In [13]:
#day_of_week_mean = day_of_week_group
day_of_week_mean_df = day_of_week_group['duration'].mean().reset_index().set_index(['day_of_week'])
day_of_week_mean_df['day'] = pd.Series([
                                'Monday',
                                'Tuesday',
                                'Wednesday',
                                'Thursday',
                                'Friday',
                                'Saturday',
                                'Sunday'],
                                index=np.arange(0,7))
day_of_week_mean_df

Unnamed: 0_level_0,duration,day
day_of_week,Unnamed: 1_level_1,Unnamed: 2_level_1
0,16.12228,Monday
1,15.915687,Tuesday
2,15.878888,Wednesday
3,16.379377,Thursday
4,16.895126,Friday
5,16.205059,Saturday
6,16.528917,Sunday


<h3>Make a column data source object from this dataframe</h3>

In [14]:
from bokeh.models import ColumnDataSource
cdata = ColumnDataSource(day_of_week_mean_df)
cdata.data

{'day_of_week': array([0, 1, 2, 3, 4, 5, 6], dtype=int32),
 'duration': array([16.12228012, 15.91568683, 15.87888839, 16.37937724, 16.89512564,
        16.20505948, 16.52891703]),
 'day': array(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
        'Sunday'], dtype=object)}

<h3>Draw the vertical bar chart</h3>
<li>You must include tooltips that show the duration when hovering over a bar</li>


In [15]:


tooltips = [
    ('Duration','@duration')
]


p = figure(x_range=day_of_week_mean_df['day'],
           y_range=(0, max(day_of_week_mean_df['duration'].values)),
           width=600,
           height=300,
           title = 'Avg Duration of Trip per Day of Week',
           x_axis_label = 'Day of the Week',
           y_axis_label ='Avg Trip Duration',
           tooltips=tooltips,
           tools='hover'
            )
p.vbar(x='day',
       top='duration',
       width=0.5,
       color = "green",
       legend_label="Duration",
       source=cdata
        )

p.xgrid.grid_line_color = None
p.y_range.start = 0    
show(p)

<h1>PROBLEM 2: Interactive chart with slider</h1>
<li>In this second problem, construct an interactive chart that shows the distribution of total fare amount by day of week while varying the pickup_hour</li>
<li>Each chart will have day of the week on the x-axis and the average total fare as the height of the bars for a single pickup_hour</li>
<li>Construct a slider that slides from 0 to 23 with the graph for all 24 pickup_hours</li>
<li>

<h3>Group the data by day of week and, within day of week by pickup_hour</h3>

In [16]:
hour_group = df.groupby(['day_of_week','pickup_hour'])
hour_group.size()

day_of_week  pickup_hour
0            0               4891
             1               2475
             2               1282
             3               1002
             4               1058
                            ...  
6            19             15383
             20             12994
             21             11450
             22             10247
             23              7587
Length: 168, dtype: int64

<h3>Get the average total amount for each group and unstack so that rows are weekdays (0, 1,...,7) and cols are hours (0,1,...23)</h3>
<li>Then add an additional column (24) as a copy of column 0. Col 24 will be the display column</li>
<li>Finally, convert all column names into str (since pickup_hour is an int and column data source objects need str column names)</li>
<li>amount_df should like like (col names should be strings):</li>
<li>Note that your numbers may be different if you're using a random subset of the data</li>

<pre>
	0	1	2	3	4	5	6	7	8	9	...	16	17	18	19	20	21	22	23	24	dayname
day_of_week																					
0	28.519591	27.871129	21.032270	22.854089	27.553843	27.676799	22.630954	19.790608	18.589532	18.314011	...	19.823463	19.087813	19.056134	19.880450	20.452326	22.545119	23.010316	25.220471	28.519591	Monday
1	26.523835	24.473547	22.464758	25.709178	24.027132	23.652944	21.546370	18.771057	17.414492	17.255911	...	19.631683	19.094055	18.343164	19.008278	19.145718	19.704968	20.285164	21.180154	26.523835	Tuesday
2	22.662570	23.111039	23.067922	19.263433	25.915858	25.043071	19.286858	17.697268	17.354702	16.875423	...	20.199947	18.939048	18.146021	18.688651	18.839771	18.879133	19.636418	19.631235	22.662570	Wednesday
3	20.806747	20.891364	20.104057	21.230155	21.545217	23.838166	19.245900	17.484051	17.593239	17.560638	...	19.601307	19.309099	18.675074	19.065926	18.602721	18.435254	18.848939	18.878703	20.806747	Thursday
4	19.091578	18.271015	19.781767	19.620808	23.030823	25.265687	21.332188	19.119613	18.374634	18.916849	...	19.979930	19.077043	18.743151	18.467401	17.985403	17.955496	17.998007	18.500657	19.091578	Friday
5	18.792271	18.033738	18.594487	19.076232	20.591734	23.261181	27.161993	21.153212	19.545850	17.098222	...	18.881910	18.955416	18.305366	17.865027	18.519547	19.021029	19.285096	18.937453	18.792271	Saturday
6	18.807702	18.348061	18.054653	19.275509	20.891784	28.260720	27.280063	23.220415	21.592732	18.765725	...	20.551586	20.630724	19.809704	20.566784	21.532079	22.483101	24.575294	27.071233	18.807702	Sunday
</pre>

In [17]:
amount_df = pd.DataFrame(hour_group['total_amount'].mean()).unstack()


In [18]:
# gets rid of layering
amount_df.columns= amount_df.columns.droplevel(0)


In [19]:
amount_df.columns = amount_df.columns.astype(str)
amount_df.columns.name = None
amount_df['24']= amount_df['0']
amount_df = amount_df.merge(day_of_week_mean_df)
amount_df = amount_df.drop(columns=['duration'])
amount_df

MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False

<h3>Draw the interactive chart by filling in the code below</h3>
<li>Mostly done. You need to fill in the missing parts identified by ??)</li>

In [20]:
source = ColumnDataSource(amount_df)

#Average Total Fare. Note the formatting so that the values
# show up currency formatted
tooltips = [
    ("Average Total Fare", "$@24{0,0.00}"),
]

p = figure(x_range=amount_df['day'], height=400, width=600,
           x_axis_label = 'Day of the Week',
           y_axis_label = 'Avg Total Fare Cost',
           title="Chart",tooltips=tooltips)

p.vbar(x='day',
        top='24', # max bar in bar chart
        source=source, width=0.9,
      fill_color='red', line_color='black',fill_alpha = 0.75,
   hover_fill_alpha = 1.0, hover_fill_color = 'navy')

p.xgrid.grid_line_color = None
p.y_range.start = 0    


slider = Slider(start=0,end=23,value=0,step=1,title="Hour of the Day")


jscallback = CustomJS(args={'source':source,'slider':slider},code="""
        console.log(' changed selected option', slider.value);

        var data = source.data;
        var col = slider.value.toString();
        console.log(' changed selected option', slider.value);
        data['24'] = data[col];

        source.change.emit();
""")


slider.js_on_change('value', jscallback)

layout = row(p,slider)
show(layout)

KeyError: 'day'

<h1>PROBLEM 3: Piechart</h1>
<li>Use the total_amount column</li>
<li>Use the grouped by day of week data</li>
<li>Sum the total amount for each group and then compute the fractional amount for each day</li>
<li>Using the class notebook piechart as a guide, construct the piechart for distribution of total amount collected by day of week</li>

In [21]:
# from lectures
from bokeh.palettes import Turbo256 
from bokeh.models import LabelSet, ColumnDataSource
from bokeh.transform import cumsum 
import math as m

In [22]:
pd.Series([
                        'Monday',
                        'Tuesday',
                        'Wednesday',
                        'Thursday',
                        'Friday',
                        'Saturday',
                        'Sunday'],
                        index=np.arange(0,7))

0       Monday
1      Tuesday
2    Wednesday
3     Thursday
4       Friday
5     Saturday
6       Sunday
dtype: object

In [23]:
p3 = df.groupby('day_of_week')['total_amount'].sum().reset_index(name='value')
total_sum = p3.sum()
fractional_amounts = p3 / total_sum
p3['day'] = pd.Series([
                        'Monday',
                        'Tuesday',
                        'Wednesday',
                        'Thursday',
                        'Friday',
                        'Saturday',
                        'Sunday'],
                        index=np.arange(0,7))
p3

Unnamed: 0,day_of_week,value,day
0,0,6734763.45,Monday
1,1,5531213.74,Tuesday
2,2,5698914.92,Wednesday
3,3,6068786.46,Thursday
4,4,6608037.42,Friday
5,5,6127802.34,Saturday
6,6,6252683.98,Sunday


In [24]:
# get pct and angles
p3['pct'] = (p3['value']/sum(p3['value'])*100).round(2)
p3['angle'] = p3['pct']/(p3['pct'].sum()) * 2*pi
p3['label_vals'] = p3['pct']
p3["label_vals"]=p3["label_vals"].astype(str).str.pad(26, side = "left") 
p3["label_vals"]=p3["label_vals"].apply(lambda x: "" if x.strip()=="0.0" else x)
p3['label_vals'] = p3['label_vals'].astype(str) + '%'

NameError: name 'pi' is not defined

In [25]:
import random
colors = [Turbo256[random.randint(0,255)] for i in range(len(p3))]
p3['colors'] = colors

In [26]:
p = figure(height=500, title="Portion of Total for each Day of the Week: ", 
        tools="hover", tooltips="@day: @pct"+ '%',x_range= (-0.5,1))

In [27]:
source = ColumnDataSource(p3)
source.data

{'index': array([0, 1, 2, 3, 4, 5, 6]),
 'day_of_week': array([0, 1, 2, 3, 4, 5, 6], dtype=int32),
 'value': array([6734763.45, 5531213.74, 5698914.92, 6068786.46, 6608037.42,
        6127802.34, 6252683.98]),
 'day': array(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
        'Sunday'], dtype=object),
 'pct': array([15.65, 12.86, 13.25, 14.11, 15.36, 14.24, 14.53]),
 'colors': array(['#9bfd40', '#bef334', '#42f687', '#42f687', '#d8e335', '#b11901',
        '#55fa76'], dtype=object)}

In [28]:
p3

Unnamed: 0,day_of_week,value,day,pct,colors
0,0,6734763.45,Monday,15.65,#9bfd40
1,1,5531213.74,Tuesday,12.86,#bef334
2,2,5698914.92,Wednesday,13.25,#42f687
3,3,6068786.46,Thursday,14.11,#42f687
4,4,6608037.42,Friday,15.36,#d8e335
5,5,6127802.34,Saturday,14.24,#b11901
6,6,6252683.98,Sunday,14.53,#55fa76


In [29]:
p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'), 
        line_color="white", fill_color="colors", legend_field='day', source=source)

labels = LabelSet(x=0, y=1, text='label_vals',
        angle=cumsum('angle', include_zero=True), source=source)

p.add_layout(labels)


p.axis.axis_label=None 
p.axis.visible=False 
p.grid.grid_line_color = None 

show(p)