# Citi Bike Program Analysis for November 2022
## Import Libraries & Data Profiling

### Objectives:
1. To determine the favourable time of trip by station
2. To determine the preferable bike by rideable type
3. To analyse the trip count by user type
4. To determine the trip duration by rideable type

In [73]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library
import plotly.express as px
import plotly.io as pio

import requests
import seaborn as sns

pio.renderers.default = "notebook_connected"

In [74]:
df = pd.read_csv('JC-202211-citibike-tripdata.csv')
print('Data downloaded and read into a dataframe!')

Data downloaded and read into a dataframe!


In [75]:
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,7B06EEA21608591A,classic_bike,12/11/2022 14:47,12/11/2022 14:58,Newport PATH,JC066,7 St & Monroe St,HB304,40.727146,-74.03362,40.746413,-74.037977,casual
1,B22A3E8765F1E165,classic_bike,30/11/2022 8:29,30/11/2022 8:45,Newport PATH,JC066,7 St & Monroe St,HB304,40.727216,-74.033609,40.746413,-74.037977,member
2,CFB2EB1663C02BB6,electric_bike,9/11/2022 8:28,9/11/2022 8:44,Newport PATH,JC066,7 St & Monroe St,HB304,40.727224,-74.033759,40.746413,-74.037977,member
3,85700816DAAE1C36,electric_bike,30/11/2022 14:48,30/11/2022 14:58,Newport PATH,JC066,7 St & Monroe St,HB304,40.727224,-74.033759,40.746413,-74.037977,member
4,009EBB1F54C5947E,electric_bike,10/11/2022 18:14,10/11/2022 18:23,Newport PATH,JC066,7 St & Monroe St,HB304,40.727224,-74.033759,40.746413,-74.037977,member


With this backend (using %matplotlib inline), the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document. It is apparent that Jupyter notebook is by default using inline.

# Data Preparation
Checking the missing values and the overall data integrity

### Handling Missing Values

In [76]:
# first look into the data
df.sample(10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
43934,C0D34010FB916D94,classic_bike,15/11/2022 17:59,15/11/2022 18:18,Oakland Ave,JC022,Astor Place,JC077,40.737604,-74.052478,40.719282,-74.071262,member
69308,3EDBB1E6DB2A58BC,classic_bike,5/11/2022 13:14,5/11/2022 13:19,Hamilton Park,JC009,City Hall,JC003,40.727596,-74.044247,40.717732,-74.043845,member
45079,D082B7A8DBF3CFDB,classic_bike,2/11/2022 18:17,2/11/2022 18:21,9 St HBLR - Jackson St & 8 St,HB305,Bloomfield St & 15 St,HB203,40.7481,-74.038364,40.75453,-74.02658,member
60629,53CDCAE72AA1B6AD,classic_bike,16/11/2022 17:52,16/11/2022 17:57,Manila & 1st,JC082,Hamilton Park,JC009,40.721651,-74.042884,40.727596,-74.044247,member
50296,26D05754701D40C1,electric_bike,20/11/2022 9:16,20/11/2022 9:24,Bloomfield St & 15 St,HB203,Church Sq Park - 5 St & Park Ave,HB601,40.75453,-74.02658,40.742659,-74.032233,casual
30275,13122C0624A6FC13,classic_bike,4/11/2022 17:02,4/11/2022 17:07,Christ Hospital,JC034,Hilltop,JC019,40.73475,-74.050498,40.731169,-74.057574,member
70418,F28486D1056E7FCC,classic_bike,1/11/2022 20:58,1/11/2022 21:03,Jersey & 6th St,JC027,Van Vorst Park,JC035,40.725289,-74.045572,40.718489,-74.047727,member
36197,135583B35781BB2D,classic_bike,16/11/2022 17:19,16/11/2022 17:30,Newport Pkwy,JC008,City Hall - Washington St & 1 St,HB105,40.728745,-74.032108,40.73736,-74.03097,member
14326,58C3084F4CCF32FF,classic_bike,15/11/2022 10:07,15/11/2022 10:27,9 St HBLR - Jackson St & 8 St,HB305,Hoboken Terminal - River St & Hudson Pl,HB102,40.747907,-74.038412,40.736068,-74.029127,member
27028,B775475BC508D234,classic_bike,2/11/2022 14:31,2/11/2022 14:36,Bergen Ave & Sip Ave,JC109,Fairmount Ave,JC093,40.731164,-74.064454,40.725726,-74.071959,member


In [77]:
# some of the values are written as string ' '.
    # NaN value is more useful in the cleaning process.
#df.replace(' ', np.NaN, inplace=True) 

# starttime and stoptime would be more useful if they are in datetime format
df['started_at'] = pd.to_datetime(df['started_at'], format='%d/%m/%Y %H:%M')
df['ended_at'] = pd.to_datetime(df['ended_at'], format='%d/%m/%Y %H:%M')

In [78]:
# check duplicate rows
df.duplicated()

0        False
1        False
2        False
3        False
4        False
         ...  
72704    False
72705    False
72706    False
72707    False
72708    False
Length: 72709, dtype: bool

In [79]:
# to see the high level data details, data dictionary
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72709 entries, 0 to 72708
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   ride_id             72709 non-null  object        
 1   rideable_type       72709 non-null  object        
 2   started_at          72709 non-null  datetime64[ns]
 3   ended_at            72709 non-null  datetime64[ns]
 4   start_station_name  72699 non-null  object        
 5   start_station_id    72699 non-null  object        
 6   end_station_name    72430 non-null  object        
 7   end_station_id      72430 non-null  object        
 8   start_lat           72709 non-null  float64       
 9   start_lng           72709 non-null  float64       
 10  end_lat             72609 non-null  float64       
 11  end_lng             72609 non-null  float64       
 12  member_casual       72709 non-null  object        
dtypes: datetime64[ns](2), float64(4), object(7)
me

In [80]:
df.sample(10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
8963,E777C39BB9F069F4,classic_bike,2022-11-26 18:50:00,2022-11-26 19:12:00,Leonard Gordon Park,JC080,Washington St,JC098,40.74591,-74.057271,40.724294,-74.035483,member
8203,B2680AC52F37F8C8,classic_bike,2022-11-13 15:14:00,2022-11-13 15:45:00,Liberty Light Rail,JC052,Liberty Light Rail,JC052,40.711242,-74.055701,40.711242,-74.055701,member
42287,AA1E5CDB40E24744,classic_bike,2022-11-25 15:28:00,2022-11-25 15:33:00,Harborside,JC104,Newport PATH,JC066,40.719252,-74.034234,40.727224,-74.033759,casual
15440,0F04A6AF485F1DF0,classic_bike,2022-11-15 11:51:00,2022-11-15 11:53:00,Grand St,JC102,Marin Light Rail,JC013,40.715072,-74.037654,40.714584,-74.042817,member
46593,78DC4DAE0C3F8CF2,classic_bike,2022-11-23 17:31:00,2022-11-23 17:55:00,Brunswick St,JC023,Bloomfield St & 15 St,HB203,40.724176,-74.050656,40.75453,-74.02658,member
45625,9990FE66384715F8,electric_bike,2022-11-01 18:20:00,2022-11-01 18:25:00,Marshall St & 2 St,HB408,Clinton St & 7 St,HB303,40.740802,-74.042521,40.74542,-74.03332,casual
25603,27FF9912DC9E4328,classic_bike,2022-11-15 15:22:00,2022-11-15 15:26:00,Newport Pkwy,JC008,Warren St,JC006,40.728854,-74.032108,40.721124,-74.038051,member
66784,B2B98322EAB4865B,electric_bike,2022-11-23 16:07:00,2022-11-23 16:10:00,Newark Ave,JC032,Monmouth and 6th,JC075,40.721525,-74.046305,40.725685,-74.04879,casual
8889,3CC58C63E25BCCAA,classic_bike,2022-11-13 15:34:00,2022-11-13 15:37:00,Newport PATH,JC066,Washington St,JC098,40.727224,-74.033759,40.724294,-74.035483,member
17504,2E474A7927CF9CB1,classic_bike,2022-11-03 15:56:00,2022-11-03 16:04:00,JC Medical Center,JC110,Brunswick St,JC023,40.715391,-74.049692,40.724176,-74.050656,member


In [81]:
df.isnull().sum()

ride_id                 0
rideable_type           0
started_at              0
ended_at                0
start_station_name     10
start_station_id       10
end_station_name      279
end_station_id        279
start_lat               0
start_lng               0
end_lat               100
end_lng               100
member_casual           0
dtype: int64

In [82]:
# drop null values 
df.dropna(axis=0, inplace = True)

In [83]:
delete = df[ df['rideable_type'] == 'docked_bike' ].index

df.drop(delete, inplace = True)

In [84]:
df['rideable_type'].value_counts()

classic_bike     57150
electric_bike    15084
Name: rideable_type, dtype: int64

In [85]:
df.isnull().sum()

ride_id               0
rideable_type         0
started_at            0
ended_at              0
start_station_name    0
start_station_id      0
end_station_name      0
end_station_id        0
start_lat             0
start_lng             0
end_lat               0
end_lng               0
member_casual         0
dtype: int64

In [86]:
# to check again whether there is missing values left
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 72234 entries, 0 to 72708
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   ride_id             72234 non-null  object        
 1   rideable_type       72234 non-null  object        
 2   started_at          72234 non-null  datetime64[ns]
 3   ended_at            72234 non-null  datetime64[ns]
 4   start_station_name  72234 non-null  object        
 5   start_station_id    72234 non-null  object        
 6   end_station_name    72234 non-null  object        
 7   end_station_id      72234 non-null  object        
 8   start_lat           72234 non-null  float64       
 9   start_lng           72234 non-null  float64       
 10  end_lat             72234 non-null  float64       
 11  end_lng             72234 non-null  float64       
 12  member_casual       72234 non-null  object        
dtypes: datetime64[ns](2), float64(4), object(7)
me

## Data Manipulation
idk what to call it

In [87]:
# extract hours, days and weekdays from start times
df['start_time'] = df['started_at'].dt.strftime('%H:%M:%S')
df['start_hour'] = df['start_time'].str[:2] + '00'

df['start_day'] = df['started_at'].dt.day.astype('category')
df['weekday']=df['started_at'].dt.weekday.astype('category')

# extract weekday name from start times
df['weekday_name'] = df['started_at'].dt.day_name()

df.sample()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,start_time,start_hour,start_day,weekday,weekday_name
32282,E558E3B1475C77CC,classic_bike,2022-11-17 06:50:00,2022-11-17 06:52:00,Jersey & 3rd,JC074,Grove St PATH,JC005,40.72329,-74.045686,40.719586,-74.043117,casual,06:50:00,600,17,3,Thursday


# extract hours, days and weekdays from start times
df['start_hour']=df['started_at'].dt.hour.astype('category')
df['start_day']=df['started_at'].dt.day.astype('category') 
df['weekday']=df['started_at'].dt.weekday.astype('category')

# extract weekday name from start times
df['weekday_name'] = df['started_at'].dt.day_name()

df.sample(30)

In [88]:
# just in case, if we want to create another objective using a trip duration column. 
# the objective: to determine trip count of Citi Bike stations for Nov 2022
# creating trip duration column
# cannot create trip duration directly into minutes so i had to do it in seconds
df['tripduration'] = df['ended_at']-df['started_at']
df['tripduration_sec']=df['tripduration'].dt.seconds
df['tripduration_min']=df['tripduration_sec']/60

df.sample(10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,...,end_lng,member_casual,start_time,start_hour,start_day,weekday,weekday_name,tripduration,tripduration_sec,tripduration_min
3562,3BA6AF57744BAA72,electric_bike,2022-11-09 07:32:00,2022-11-09 07:35:00,Willow Ave & 12 St,HB505,6 St & Grand St,HB302,40.751867,-74.030377,...,-74.034501,casual,07:32:00,700,9,2,Wednesday,0 days 00:03:00,180,3.0
63838,969ABF5DD1966276,classic_bike,2022-11-14 14:52:00,2022-11-14 14:57:00,Grove St PATH,JC005,Marin Light Rail,JC013,40.719783,-74.043362,...,-74.042817,casual,14:52:00,1400,14,0,Monday,0 days 00:05:00,300,5.0
53261,99A4254FA8B5F6C5,classic_bike,2022-11-29 17:58:00,2022-11-29 18:04:00,14 St Ferry - 14 St & Shipyard Ln,HB202,Hudson St & 4 St,HB607,40.752727,-74.024131,...,-74.028603,member,17:58:00,1700,29,1,Tuesday,0 days 00:06:00,360,6.0
44494,4203D5E8AA5FFE10,classic_bike,2022-11-10 14:58:00,2022-11-10 15:01:00,City Hall - Washington St & 1 St,HB105,Adams St & 2 St,HB407,40.73736,-74.03097,...,-74.036904,member,14:58:00,1400,10,3,Thursday,0 days 00:03:00,180,3.0
59344,972C488FDA5BA713,classic_bike,2022-11-29 22:06:00,2022-11-29 22:12:00,Hoboken Terminal - Hudson St & Hudson Pl,HB101,Marshall St & 2 St,HB408,40.735938,-74.030305,...,-74.042521,member,22:06:00,2200,29,1,Tuesday,0 days 00:06:00,360,6.0
7208,AF6974C96B11F49B,classic_bike,2022-11-05 10:55:00,2022-11-05 11:24:00,Essex Light Rail,JC038,Essex Light Rail,JC038,40.712774,-74.036486,...,-74.036486,casual,10:55:00,1000,5,5,Saturday,0 days 00:29:00,1740,29.0
36334,E8E1625DF9F5FD9C,classic_bike,2022-11-09 16:36:00,2022-11-09 16:52:00,12 St & Sinatra Dr N,HB201,City Hall - Washington St & 1 St,HB105,40.750604,-74.02402,...,-74.03097,member,16:36:00,1600,9,2,Wednesday,0 days 00:16:00,960,16.0
4699,0F1021213895C03A,classic_bike,2022-11-29 20:33:00,2022-11-29 20:35:00,City Hall - Washington St & 1 St,HB105,Church Sq Park - 5 St & Park Ave,HB601,40.73733,-74.03103,...,-74.032233,casual,20:33:00,2000,29,1,Tuesday,0 days 00:02:00,120,2.0
15415,57DE0AB0D76B45E6,classic_bike,2022-11-15 11:12:00,2022-11-15 11:15:00,JC Medical Center,JC110,Marin Light Rail,JC013,40.715391,-74.049692,...,-74.042817,member,11:12:00,1100,15,1,Tuesday,0 days 00:03:00,180,3.0
13195,379151D743811FC4,classic_bike,2022-11-19 20:17:00,2022-11-19 20:23:00,Madison St & 1 St,HB402,Hoboken Terminal - River St & Hudson Pl,HB102,40.73879,-74.0393,...,-74.029127,member,20:17:00,2000,19,5,Saturday,0 days 00:06:00,360,6.0


In [89]:
df.iloc[439]

ride_id                         B8A6E5B8A67C6F43
rideable_type                       classic_bike
started_at                   2022-11-08 15:21:00
ended_at                     2022-11-08 15:27:00
start_station_name                  Newport PATH
start_station_id                           JC066
end_station_name      Hoboken Ave at Monmouth St
end_station_id                             JC105
start_lat                               40.72728
start_lng                             -74.033662
end_lat                                40.735208
end_lng                               -74.046964
member_casual                             member
start_time                              15:21:00
start_hour                                  1500
start_day                                      8
weekday                                        1
weekday_name                             Tuesday
tripduration                     0 days 00:06:00
tripduration_sec                             360
tripduration_min    

In [90]:
df['start_day'].value_counts()

12    3724
5     3688
3     3467
2     3395
4     3384
10    3177
8     3032
1     3013
6     3000
7     2964
9     2875
17    2393
29    2339
16    2333
22    2311
18    2289
23    2241
13    2197
28    2197
14    2120
19    2069
15    2007
21    1936
30    1749
26    1722
11    1668
20    1447
25    1277
24    1152
27    1068
Name: start_day, dtype: int64

In [91]:
# checking all the columns with other values 
# to prove that other values refer to other categorical values and not random data

df['start_station_name'].unique()

array(['Newport PATH', '7 St & Monroe St', 'Bergen Ave',
       'Leonard Gordon Park', 'Paulus Hook', 'Newark Ave',
       'Brunswick & 6th', 'Grant Ave & MLK Dr',
       'Hoboken Ave at Monmouth St', 'Madison St & 10 St',
       'Adams St & 11 St', 'Willow Ave & 12 St',
       'Southwest Park - Jackson St & Observer Hwy',
       'Bergen Ave & Stegman St', 'Marin Light Rail', 'Christ Hospital',
       '5 Corners Library', 'Manila & 1st', 'JC Medical Center',
       'Riverview Park', 'City Hall - Washington St & 1 St',
       'York St & Marin Blvd', '14 St Ferry - 14 St & Shipyard Ln',
       '4 St & Grand St', 'Madison St & 1 St', 'Jersey & 3rd',
       'McGinley Square', 'Grand St', 'Montgomery St', 'Van Vorst Park',
       'Heights Elevator', '6 St & Grand St',
       'Church Sq Park - 5 St & Park Ave', 'Morris Canal',
       'Bloomfield St & 15 St', 'Clinton St & 7 St', 'Union St',
       'Dixon Mills', 'Liberty Light Rail', 'City Hall',
       'Essex Light Rail', '11 St & Washingto

In [93]:
df['start_station_id'].unique()

array(['JC066', 'HB304', 'JC095', 'JC080', 'JC002', 'JC032', 'JC081',
       'JC107', 'JC105', 'HB503', 'HB507', 'HB505', 'HB401', 'JC108',
       'JC013', 'JC034', 'JC018', 'JC082', 'JC011', 'JC057', 'HB105',
       'JC110', 'JC097', 'HB202', 'HB301', 'HB402', 'JC074', 'JC055',
       'JC102', 'JC099', 'JC035', 'JC059', 'HB302', 'HB601', 'JC072',
       'HB203', 'HB303', 'JC051', 'JC076', 'JC052', 'JC003', 'JC038',
       'HB502', 'JC093', 'JC024', 'JC098', 'HB305', 'HB603', 'HB602',
       'JC022', 'JC084', 'JC103', 'HB409', 'HB102', 'HB506', 'JC065',
       'JC106', 'JC094', 'JC023', 'JC109', 'JC006', 'JC008', 'JC005',
       'JC019', 'HB201', 'JC077', 'HB407', 'JC104', 'JC027', 'HB408',
       'HB101', 'HB501', 'HB103', 'HB607', 'JC014', 'HB404', 'JC053',
       'JC009', 'JC063', 'JC075', 'JC078', 'JC020'], dtype=object)

In [94]:
df['end_station_name'].unique()

array(['7 St & Monroe St', 'Bergen Ave', 'Hoboken Ave at Monmouth St',
       'Adams St & 11 St', 'Leonard Gordon Park', 'Grant Ave & MLK Dr',
       'Southwest Park - Jackson St & Observer Hwy', 'Willow Ave & 12 St',
       'Bergen Ave & Stegman St', 'Madison St & 10 St',
       '10 Ave & W 14 St', 'Little West St & 1 Pl', '5 Corners Library',
       'Madison St & 1 St', 'Christ Hospital', 'JC Medical Center',
       'Vesey St & Church St', 'Paulus Hook', 'Newark Ave',
       'Newport PATH', 'Manila & 1st', 'City Hall - Washington St & 1 St',
       'Riverview Park', 'Brunswick & 6th', 'York St & Marin Blvd',
       '14 St Ferry - 14 St & Shipyard Ln', 'Union St',
       'Barrow St & Hudson St', 'Jersey & 3rd', 'McGinley Square',
       '12 Ave & W 40 St', 'Murray St & West St', '6 St & Grand St',
       'Grand St', 'Montgomery St', 'Morris Canal',
       'Church Sq Park - 5 St & Park Ave', 'Clinton St & 7 St',
       'Essex Light Rail', 'Liberty Light Rail', 'City Hall',
       'Bloo

In [95]:
df['end_station_name'].value_counts()

Grove St PATH                                   4044
Hoboken Terminal - River St & Hudson Pl         3710
South Waterfront Walkway - Sinatra Dr & 1 St    2317
Hoboken Terminal - Hudson St & Hudson Pl        2299
City Hall - Washington St & 1 St                2009
                                                ... 
E 13 St & 2 Ave                                    1
E 7 St & Avenue A                                  1
W 17 St & 8 Ave                                    1
Bank St & Washington St                            1
Bank St & Hudson St                                1
Name: end_station_name, Length: 130, dtype: int64

In [96]:
df['end_station_name'].count()

72234

In [97]:
df['end_station_id'].unique()

array(['HB304', 'JC095', 'JC105', 'HB507', 'JC080', 'JC107', 'HB401',
       'HB505', 'JC108', 'HB503', '6157.04', '5001.08', 'JC018', 'HB402',
       'JC034', 'JC011', '5216.06', 'JC002', 'JC032', 'JC066', 'JC082',
       'HB105', 'JC057', 'JC081', 'JC097', 'JC110', 'HB202', 'JC051',
       '5805.05', 'JC074', 'JC055', '6765.01', '5329.08', 'HB302',
       'JC102', 'JC099', 'JC072', 'HB601', 'HB303', 'JC038', 'JC052',
       'JC003', 'HB203', 'JC024', 'JC093', 'HB502', 'JC076', 'HB603',
       'JC098', '5184.08', 'HB305', 'HB602', 'JC022', 'JC084', 'HB506',
       'JC109', 'HB102', '6535.04', 'JC065', 'JC013', 'HB409', '5175.08',
       'JC059', '5033.01', '4993.15', '6474.02', '5065.12', 'JC106',
       'JC094', 'JC023', '5114.06', 'HB301', '5539.06', 'JC006', 'JC008',
       '5470.02', '6955.01', '5297.02', 'JC035', '6289.06', '6839.1',
       'JC005', 'JC019', 'HB201', 'JC103', '5820.08', 'JC104', '5626.07',
       '6148.02', '5602.06', 'JC027', 'HB408', '6578.01', 'JC077',
       

# Data Visualization

### b) To determine the favourable time of trip by station

In [98]:
# to check total number of stations
total_stations = df['start_station_id'].count()
print(total_stations)

72234


# Introduction to Dash

In [101]:
!pip install -q dash
!pip install -q jupyter_dash
#!pip install -q pyngrok dash
import dash
print(dash.__version__)

2.10.0


In [102]:
#Check version
import plotly
print(plotly.__version__)

5.4.0


In [103]:
#A part of plotly library
#make sure the right version

#install dependencies
!pip install -q plotly==5.4 dash jupyter-dash

# allow the dashboard as the output of colab sell (mode='inline')

In [104]:
!pip install bash_kernel python -m bash_kernel.install


Usage:   
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> ...
  pip install [options] [-e] <local project path> ...
  pip install [options] <archive url/path> ...

no such option: -m


In [105]:
!pip install dash-bootstrap-components



In [106]:
#import related libraries and modules
#dcc --> allow access to the basic components to create dashboard
#html module-->contains elements tha help to build the web interface
#dash.dependencies --> input and output function--->allow use to interact within the dashbord. 
#from pyngrok import ngrok
#from flask import Flask
import dash
import jupyter_dash
from dash import dcc
from dash import html
import dash_bootstrap_components as dbc
from dash.dependencies import Input, Output
from jupyter_dash import JupyterDash
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import pandas as pd

import os
pd.unique(['os'])

array(['os'], dtype=object)

## Citi Bike Program Analysis for November 2022

In [107]:
#from IPython.display import HTML
#dbc_css = HTML(''.format(open('assets/bootstrap.min.css').read()))

In [108]:
#let's start building our application
app = JupyterDash(__name__) #object to build application
#app = dash.Dash(__name__)
#assign title to the application
app.title =("Citi Bike Program Analysis for November 2022")

app = JupyterDash(external_stylesheets=[dbc.themes.SUPERHERO])

server = app.server

### Custom Style

In [109]:
#dashboard styling
#-defind color pallates as dark green to light green
colors = ['#f2fffb', '#98ffe0', '#6df0c8', '#59dab2', '#31c194', '#25a27b', '#188463', '#11684d']

custom_theme = pio.templates["plotly_dark"]
custom_theme.layout.update({
    'paper_bgcolor': '#1f2630',
    'plot_bgcolor': '#1f2630',
    'colorway':colors,
    #'colorscale':colors,
    'font': {
        'color': 'white'  # Replace with a valid color code or named CSS color
    },
    'margin': {
        't': 75,
        'r': 50,
        'b': 100,
        'l': 75
    }
})
pio.templates.default = custom_theme 

### Dashboard Content

In [110]:
#create the header and website discription

header = html.Div(
    id = "header",
    children=[
        html.H1(children="🚲", style={'fontSize': "80px",'textAlign': 'center'}, className="header-emoji"), #emoji
        html.H1(
            children="Citi Bike Program Analysis for November 2022",style = {'textAlign': 'center'}, className="header-title"
        ),
    ]
)


### a) To recognize preferable type bike by member type. 

In [111]:
#lets create bar plot
#1. Give the header name for the first row (line plot)
bar_header = html.H3("Preferable Bike Type by Member Type")

In [112]:
df_agg1 = df.groupby(['rideable_type','member_casual'], as_index=False).agg(
    df_count=pd.NamedAgg(column='member_casual', aggfunc='count'))

fig1 = px.bar(df_agg1, 
               x="rideable_type", 
               y="df_count", 
               color="member_casual",
               color_discrete_map={"member": "#9370DB", "casual": "#E0B0FF"}, #Medium Purple #Mauve,  # Set colors for member and casual
               labels={'member_casual':'Member Type', 'df_count':'Count', 'rideable_type':'Rideable Type', 
                       'classic_bike':'Classic Bike', 'electric_bike':'Electric Bike'},
               title="Preferable Bike Type by Member Type"
             )

fig1.update_layout(
    title=dict(x=0.5), 
    paper_bgcolor="#1b265e", 
    plot_bgcolor="#1b265e",
)

fig1.show()

#2. Connecting bar plot selector with a call back

@app.callback(
    Output("bar-graph", "figure"), 
    [Input("bar-dropdown", "value")],
    allow_duplicate=True
)
def select_dayofweek(membertype):
    if membertype:
        subset = df_agg1.query("member_casual == @membertype")
    else:
        subset = df_agg1

    fig1 = px.bar(subset, 
               x="rideable_type", 
               y="df_count", 
               color="member_casual",
               color_discrete_map={"member": "#70DC6B", "casual": "#708090"},  # Set colors for member and casual
               labels={'member_casual':'Member Type', 'df_count':'Count', 'rideable_type':'Rideable Type'},
               title="Preferable Bike Type by Member Type"
             )

    fig1.update_layout(
    title=dict(x=0.5), 
    paper_bgcolor="#1b265e", 
    plot_bgcolor="#1b265e",
    )
    return fig1

#3. create check list to filter the graph then insert in #4. scatter_row
bar_dropdown = dcc.Dropdown(
    id = "bar-dropdown", #identify where the input from 
    options =[
        {'label': 'All', 'value' : ''},
        {'label': 'Member', 'value' : 'member'},
        {'label': 'Casual', 'value' : 'casual'},
    ],
    value = ''
)

#4. create figure
bar_graph = dcc.Graph(figure=fig1) #static
bar_graph = dcc.Graph(id = "bar-graph")#get the input from call back and display output in the figure

#5. create the html component to place the header AND graph 
bar_row = html.Div(
    children= [
        bar_dropdown,
        bar_graph
        
    ]
)

In [113]:
#5 then place the above row inside the page layout

### b) To determine the distribution of start time

In [114]:
#lets create bar plot
#1. Give the header name for the first row (line plot)
stationbar_header = html.H3(
                    "Favourable trip start time at station")

In [115]:
df_agg2 = df.groupby(['start_hour'], as_index=False).agg(
    df_count=pd.NamedAgg(column='start_hour', aggfunc='count'))

fig2 = px.bar(df_agg2, x="start_hour", y="df_count",
              labels={'start_hour':'Time', 'df_count':'Count'},
              title="Distribution of Start Time"
             )

fig2.update_layout(
    title=dict(x=0.5), #set title in the center
    paper_bgcolor="#1b265e", #set the background color of the chart
    plot_bgcolor="#1b265e",
)

### c) To analyse and monitor the weekday and weekend trip for the programme.

In [116]:
#lets create line plot
#1. Give the header name for the second row (line plot)
line_header = html.H3("Monitor and Analyse Trips by Weekdays and Weekends")

In [117]:
df.sample(10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,...,end_lng,member_casual,start_time,start_hour,start_day,weekday,weekday_name,tripduration,tripduration_sec,tripduration_min
35442,47510697E30DCDA7,classic_bike,2022-11-08 19:10:00,2022-11-08 19:41:00,Liberty Light Rail,JC052,City Hall - Washington St & 1 St,HB105,40.711242,-74.055701,...,-74.03097,casual,19:10:00,1900,8,1,Tuesday,0 days 00:31:00,1860,31.0
12259,DC3A83EA18059204,classic_bike,2022-11-10 13:32:00,2022-11-10 13:40:00,Adams St & 11 St,HB507,Hoboken Terminal - River St & Hudson Pl,HB102,40.750597,-74.033608,...,-74.029127,member,13:32:00,1300,10,3,Thursday,0 days 00:08:00,480,8.0
29770,6DE656A6A160548A,electric_bike,2022-11-08 17:07:00,2022-11-08 17:10:00,Bergen Ave & Sip Ave,JC109,McGinley Square,JC055,40.731009,-74.064437,...,-74.067622,casual,17:07:00,1700,8,1,Tuesday,0 days 00:03:00,180,3.0
72617,9AC78D3B069AEB58,classic_bike,2022-11-09 07:59:00,2022-11-09 08:01:00,Hamilton Park,JC009,Jersey & 3rd,JC074,40.727596,-74.044247,...,-74.045953,member,07:59:00,700,9,2,Wednesday,0 days 00:02:00,120,2.0
15020,E31D91B62A815DBC,classic_bike,2022-11-05 15:03:00,2022-11-05 15:18:00,Hoboken Terminal - River St & Hudson Pl,HB102,Grand St & 14 St,HB506,40.736068,-74.029127,...,-74.0316,casual,15:03:00,1500,5,5,Saturday,0 days 00:15:00,900,15.0
29657,6735F66A9F4605E1,electric_bike,2022-11-10 20:58:00,2022-11-10 21:01:00,Bergen Ave & Sip Ave,JC109,McGinley Square,JC055,40.731009,-74.064437,...,-74.067622,casual,20:58:00,2000,10,3,Thursday,0 days 00:03:00,180,3.0
15484,1C35A14F7E786E8D,electric_bike,2022-11-12 12:47:00,2022-11-12 12:49:00,City Hall,JC003,Marin Light Rail,JC013,40.717732,-74.043845,...,-74.042817,member,12:47:00,1200,12,5,Saturday,0 days 00:02:00,120,2.0
17216,74CB9B222BD32E15,classic_bike,2022-11-26 13:57:00,2022-11-26 14:00:00,Grand St,JC102,Columbus Dr at Exchange Pl,JC106,40.715178,-74.037683,...,-74.03281,member,13:57:00,1300,26,5,Saturday,0 days 00:03:00,180,3.0
41634,2096826C5AFF3E6F,classic_bike,2022-11-12 17:06:00,2022-11-12 17:11:00,Harborside,JC104,Newport PATH,JC066,40.719252,-74.034234,...,-74.033759,member,17:06:00,1700,12,5,Saturday,0 days 00:05:00,300,5.0
32860,6B88054D4936044D,electric_bike,2022-11-16 06:53:00,2022-11-16 06:57:00,Brunswick St,JC023,Grove St PATH,JC005,40.724176,-74.050656,...,-74.043117,casual,06:53:00,600,16,2,Wednesday,0 days 00:04:00,240,4.0


df_agg3 = df.groupby(['start_hour', 'weekday_name'], as_index=False).agg(
    df_count= pd.NamedAgg(column='start_station_id', aggfunc='count')
)

fig3 = px.line(df_agg3, x='start_hour', y='df_count', color='weekday_name',
               labels={'start_hour':'Time', 'df_count':'Count', 'weekday_name':'Day of Week'},
               title="Monitor and Analyse Trips by Weekdays and Weekends",
               color_discrete_sequence=px.colors.qualitative.Prism
)
fig3.update_traces(line={'width':3})
fig3.update_layout(
    title=dict(x=0.5), #set title in the center
    paper_bgcolor="#1b265e", #set the background color of the chart
    plot_bgcolor="#1b265e",   
    hovermode='x'
)

In [119]:
import plotly.express as px

# Filter the DataFrame for weekends
df_agg3 = df.groupby(['start_hour', 'weekday_name'], as_index=False).agg(
    df_count= pd.NamedAgg(column='start_station_id', aggfunc='count')
)

# Define custom colors for each day of the week
color_map = {
    'Monday': '#FFC107',
    'Tuesday': '#4CAF50',
    'Wednesday': '#2196F3',
    'Thursday': '#9C27B0',
    'Friday': '#FF5722',
    'Saturday': '#D3D3D3',
    'Sunday': '#CD7F32'
}

# Create the figure
fig3 = px.line(df_agg3, x='start_hour', y='df_count', color='weekday_name',
              labels={'start_hour': 'Time', 'df_count': 'Count', 'weekday_name': 'Day of Week'},
              title="Monitor and Analyse Trips by Whole Week",
              color_discrete_map=color_map)
fig3.update_traces(line={'width': 3})
fig3.update_layout(
    title=dict(x=0.5),  # set title in the center
    paper_bgcolor="#1b265e",  # set the background color of the chart
    plot_bgcolor="#1b265e",
    hovermode='x'
)

# Display the figure
fig3.show()

In [120]:
# Filter the DataFrame for weekends
df_agg3 = df.groupby(['start_hour', 'weekday_name'], as_index=False).agg(
    df_count= pd.NamedAgg(column='start_station_id', aggfunc='count')
)

weekends = ['Saturday', 'Sunday']
df_weekends = df_agg3[df_agg3['weekday_name'].isin(weekends)]

color_map = {
    'Saturday': '#D3D3D3',
    'Sunday': '#CD7F32'
}

# Create figure for weekends
fig_weekends = px.line(df_weekends, x='start_hour', y='df_count', color='weekday_name',
                       labels={'start_hour':'Time', 'df_count':'Count', 'weekday_name':'Day of Week'},
                       title="Monitor and Analyse Trips on Weekends",
                       color_discrete_map=color_map)
fig_weekends.update_traces(line={'width': 3})
fig_weekends.update_layout(
    title=dict(x=0.5),  # set title in the center
    paper_bgcolor="#1b265e",  # set the background color of the chart
    plot_bgcolor="#1b265e",
    hovermode='x'
)
fig_weekends.show()

In [121]:
# Filter the DataFrame for weekends
df_agg3 = df.groupby(['start_hour', 'weekday_name'], as_index=False).agg(
    df_count= pd.NamedAgg(column='start_station_id', aggfunc='count')
)
weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
df_weekdays = df_agg3[df_agg3['weekday_name'].isin(weekdays)]

color_map = {
    'Monday': '#FFC107',
    'Tuesday': '#4CAF50',
    'Wednesday': '#2196F3',
    'Thursday': '#9C27B0',
    'Friday': '#FF5722'
}

# Create figure for weekends
fig_weekdays = px.line(df_weekdays, x='start_hour', y='df_count', color='weekday_name',
                       labels={'start_hour':'Time', 'df_count':'Count', 'weekday_name':'Day of Week'},
                       title="Monitor and Analyse Trips on Weekdays",
                       color_discrete_map=color_map)

fig_weekdays.update_traces(line={'width': 3})
fig_weekdays.update_layout(
    title=dict(x=0.5),  # set title in the center
    paper_bgcolor="#1b265e",  # set the background color of the chart
    plot_bgcolor="#1b265e",
    hovermode='x'
)
fig_weekdays.show()

In [122]:
from dash.dependencies import Input, Output, State

# Define the callback function
@app.callback(
    Output("output-graph", "figure"),
    Input("weekdays-button", "n_clicks"),
    Input("weekends-button", "n_clicks"),
    Input("all-button", "n_clicks"),
    State("output-graph", "figure")
)
def update_graph(weekdays_clicks, weekends_clicks, all_clicks, current_figure):
    ctx = dash.callback_context
    button_id = ctx.triggered[0]["prop_id"].split(".")[0]

    # Determine which button was clicked
    if button_id == "weekdays-button":
        # Update the graph for weekdays
        return fig_weekdays
    elif button_id == "weekends-button":
        # Update the graph for weekends
        return fig_weekends
    elif button_id == "all-button":
        # Update the graph for the whole week
        return fig3
    else:
        # Return the current graph if no button was clicked
        return current_figure


### d) To determine trip count of Citi Bike stations for Nov 2022

In [123]:
#lets create line plot
#1. Give the header name for the second row (line plot)
scattermap_header = html.H3("Average Trip Duration of Citi Bike Stations November 2022")

In [124]:
df['start_hour_num'] = df['start_hour'].astype(int)
df['start_day_num'] = df['start_day'].astype(int)

In [125]:
# preparing an aggregate table for our first map:
df_agg4 = df.groupby('start_station_id', as_index=False).agg(
    durationmin_avg=pd.NamedAgg(column='tripduration_sec', aggfunc=lambda x: round(np.mean((x/60)),2)),
    starttime_avg=pd.NamedAgg(column='start_hour_num', aggfunc=lambda x: round(np.mean(x))),
    startday_avg=pd.NamedAgg(column='start_day_num', aggfunc=lambda x: round(np.mean(x))),
    classic_count = pd.NamedAgg(column='rideable_type', aggfunc=lambda x: x[x == 'classic_bike'].count()),
    electric_count = pd.NamedAgg(column='rideable_type', aggfunc=lambda x: x[x == 'electric_bike'].count()),
    df_count=pd.NamedAgg(column='start_station_id', aggfunc='count'),
)

df_agg4['latitude'] = 0
df_agg4['longitude'] = 0
df_agg4['station_name'] = 0

for i in df_agg4['start_station_id']:
    df_agg4.loc[df_agg4['start_station_id'] == i, 'latitude'] = df.loc[df['start_station_id'] == i]['start_lat'].iloc[0]
    df_agg4.loc[df_agg4['start_station_id'] == i, 'longitude'] = df.loc[df['start_station_id'] == i]['start_lng'].iloc[0]
    df_agg4.loc[df_agg4['start_station_id'] == i, 'station_name'] = df.loc[df['start_station_id'] == i]['start_station_name'].iloc[0]

# the last loop needed because coordinates are not exactly same for the same station. Therefore I gathered the first instance for the coordinates and added for the necessary station id

# mapping the prepared data
fig4 = px.scatter_mapbox(df_agg4, lat='latitude', lon='longitude', size='df_count', color='df_count',
                        zoom=12, center={'lat':40.7307, 'lon':-74.0060}, mapbox_style='carto-positron',
                        color_continuous_scale=px.colors.sequential.matter,
                        opacity=0.8,
                        hover_name='station_name',
                        hover_data={'start_station_id':True, 'df_count':True, 'classic_count':True, 'electric_count':True, 'startday_avg':True, 'starttime_avg':True, 'durationmin_avg':True, 'latitude':False, 'longitude':False},
                        labels={'start_station_id':'ID', 'df_count':'Trip Counts', 'classic_count': 'Classic Bikes', 'electric_count': 'Electric Bikes', 'durationmin_avg':'Avg Duration Spent (mins)', 
                                'starttime_avg': 'Avg Start Time (24 hrs)', 'startday_avg': 'Avg Start Day'},
                        title="Bike Availability and Operational Efficiency by Trip Count of Start Stations"
)

fig4.update_layout(
    title=dict(x=0.5), #set title in the center
    paper_bgcolor="#1b265e", #set the background color of the chart
    plot_bgcolor="#1b265e",
    height=600, 
    width=1490,
)


In [126]:
#3. create figure
scattermap_graph = dcc.Graph(figure=fig4) #static

In [127]:
#3. create the html component to place the header AND graph 
scattermap_row = html.Div(
    children= [
        scattermap_graph     
    ]
)

## Daily Trip Count

In [128]:
import plotly.express as px
from scipy import stats

df_agg5 = df.groupby(['start_day'], as_index=False).agg(
    df_count= pd.NamedAgg(column='start_station_id', aggfunc='count')
)

fig5 = px.line(df_agg5, x='start_day', y='df_count', 
               labels={'start_day':'Day', 'df_count':'Count'},
               title="Daily Trip"
)
fig5.update_traces(line={'width': 3, 'color': '#663399'})
fig5.update_layout(
    title=dict(x=0.5), #set title in the center
    paper_bgcolor="#1b265e", #set the background color of the chart
    plot_bgcolor="#1b265e",   
    hovermode='x'
)

#4. create figure
tripday_graph = dcc.Graph(figure=fig5) #static
tripday_graph = dcc.Graph(id = "tripday-graph")#get the input from call back and display output in the figure

## Top Start Station

In [129]:
df.start_station_name.value_counts()

Grove St PATH                                   3848
Hoboken Terminal - River St & Hudson Pl         3779
South Waterfront Walkway - Sinatra Dr & 1 St    2289
Hoboken Terminal - Hudson St & Hudson Pl        2282
City Hall - Washington St & 1 St                1987
                                                ... 
York St & Marin Blvd                             134
Jackson Square                                    97
Bergen Ave & Stegman St                           70
Grant Ave & MLK Dr                                44
8 St & Washington St                              13
Name: start_station_name, Length: 81, dtype: int64

In [130]:
# Group the data by start_station_name and count the occurrences
df_agg2 = df.groupby('start_station_name', as_index=False).agg(df_count=('start_station_name', 'count'))

# Sort the data in descending order and select the top 10 stations
df_top10 = df_agg2.nlargest(10, 'df_count').sort_values('df_count', ascending=True)

# Create the horizontal bar chart
fig6 = px.bar(df_top10, y="start_station_name", x="df_count",
               labels={'start_station_name': 'Start Station', 'df_count': 'Count'},
               title="Top 10 Start Stations",
               orientation='h'  # Set the orientation to horizontal
              )

# Update the color of the bar chart
fig6.update_traces(marker_color="#663399")

# Update the layout and styling
fig6.update_layout(
    title=dict(x=0.5),
    paper_bgcolor="#1b265e",
    plot_bgcolor="#1b265e",
)

# Display the figure
fig6.show()


In [131]:
@app.callback(
    Output('top-n-bar-chart', 'figure'),
    [Input('top-n-input', 'value')]
)
def update_top_n_bar_chart(top_n):
    # Sort the data in descending order and select the top 'top_n' stations
    df_top_n = df_agg2.nlargest(top_n, 'df_count').sort_values('df_count', ascending=True)

    # Create the horizontal bar chart
    fig6 = px.bar(df_top_n, y="start_station_name", x="df_count",
               labels={'start_station_name': 'Start Station', 'df_count': 'Count'},
               title=f"Top {top_n} Start Stations",
               orientation='h'  # Set the orientation to horizontal
              )

    # Update the color of the bar chart
    fig6.update_traces(marker_color="#663399")

    # Update the layout and styling
    fig6.update_layout(
        title=dict(x=0.5),
        paper_bgcolor="#1b265e",
        plot_bgcolor="#1b265e",
    )

    return fig6


## To determine the favourable time by start station name

In [132]:
dock = "Bergen Ave"  ## Input the dock name here ##
inputdf = df.query('start_station_name == @dock')
inputdf = inputdf.sort_values(by='start_hour')  # Sort by start_hour in ascending order

fig7 = px.histogram(inputdf, x="start_hour", title=f'Favourable trip start time at station: {dock}',
                    labels={'start_hour':'Time', 'count':'Count'},
)

# Update the color of the bar chart
fig7.update_traces(marker_color="#663399")

# Update the layout and styling
fig7.update_layout(
    title=dict(x=0.5),
    paper_bgcolor="#1b265e",
    plot_bgcolor="#1b265e",
)

fig7.show()


# Define the callback function to update the graph
@app.callback(
    Output('favhist-graph', 'figure'),
    [Input('dock-name', 'value')]
)
def update_chart(dock):
    inputdf = df.query('start_station_name == @dock')
    inputdf = inputdf.sort_values(by='start_hour')  # Sort by start_hour in ascending order

    fig7 = px.histogram(inputdf, x="start_hour", title=f'Favourable trip start time at station: {dock}',
                   labels={'start_hour':'Time', 'count':'Count'})

    # Update the layout and styling
    fig7.update_layout(
        title=dict(x=0.5),
        paper_bgcolor="#1b265e",
        plot_bgcolor="#1b265e",
    )
    return fig7

In [133]:
@app.callback(
    Output("favhist-graph", "figure"), 
    [Input("stationbar-dropdown", "value")]
)
def update_chart(stationname):
    inputdf = df.query('start_station_name == @stationname')
    inputdf = inputdf.sort_values(by='start_hour')  # Sort by start_hour in ascending order

    fig7 = px.histogram(inputdf, x="start_hour", title=f'Favourable trip start time at station: {stationname}',
                   labels={'start_hour':'Time', 'count':'Count'})
    
    # Update the color of the bar chart
    fig7.update_traces(marker_color="#663399")

    # Update the layout and styling
    fig7.update_layout(
        title=dict(x=0.5),
        paper_bgcolor="#1b265e",
        plot_bgcolor="#1b265e",
    )
    return fig7


In [134]:
#3. create check list to filter the graph then insert in #4. scatter_row
stationbar_dropdown = dcc.Dropdown(
    id = "stationbar-dropdown", #identify where the input from 
    options =[
        {'label': 'All Stations', 'value' : ''},
        {'label': 'Newport PATH', 'value' : 'Newport PATH'},
        {'label': '7 St & Monroe St', 'value' : '7 St & Monroe St'},
        {'label': 'Bergen Ave', 'value' : 'Bergen Ave'},
        {'label': 'Leonard Gordon Park', 'value' : 'Leonard Gordon Park'},
        {'label': 'Paulus Hook', 'value' : 'Paulus Hook'},
        {'label': 'Newark Ave', 'value' : 'Newark Ave'},
        {'label': 'Brunswick & 6th', 'value' : 'Brunswick & 6th'},
        {'label': 'Grant Ave & MLK Dr', 'value' : 'Grant Ave & MLK Dr'},
        {'label': 'Hoboken Ave at Monmouth St', 'value' : 'Hoboken Ave at Monmouth St'},
        {'label': 'Madison St & 10 St', 'value' : 'Madison St & 10 St'},
        {'label': 'Adams St & 11 St', 'value' : 'Adams St & 11 St'},
        {'label': 'Willow Ave & 12 St', 'value' : 'Willow Ave & 12 St'},
        {'label': 'Southwest Park - Jackson St & Observer Hwy', 'value' : 'Southwest Park - Jackson St & Observer Hwy'},
        {'label': 'Bergen Ave & Stegman St', 'value' : 'Bergen Ave & Stegman St'},
        {'label': 'Marin Light Rail', 'value' : 'Marin Light Rail'},
        {'label': 'Christ Hospital', 'value' : 'Christ Hospital'},
        {'label': '5 Corners Library', 'value' : 'Christ Hospital'},
        {'label': 'Manila & 1st', 'value' : 'Manila & 1st'},
        {'label': 'JC Medical Center', 'value' : 'JC Medical Center'},
        {'label': 'Riverview Park', 'value' : 'Riverview Park'},
        {'label': 'City Hall - Washington St & 1 St', 'value' : 'City Hall - Washington St & 1 St'},
        {'label': 'York St & Marin Blvd', 'value' : 'York St & Marin Blvd'},
        {'label': '14 St Ferry - 14 St & Shipyard Ln', 'value' : '14 St Ferry - 14 St & Shipyard Ln'},
        {'label': '4 St & Grand St', 'value' : '4 St & Grand St'},
        {'label': 'Madison St & 1 St', 'value' : 'Madison St & 1 St'},
        {'label': 'Jersey & 3rd', 'value' : 'Jersey & 3rd'},
        {'label': 'McGinley Square', 'value' : 'McGinley Square'},
        {'label': 'Grand St', 'value' : 'Grand St'},
        {'label': 'Montgomery St', 'value' : 'Montgomery St'},
        {'label': 'Van Vorst Park', 'value' : 'Van Vorst Park'},
        {'label': 'Heights Elevator', 'value' : 'Heights Elevator'},
        {'label': '6 St & Grand St', 'value' : '6 St & Grand St'},
        {'label': 'Church Sq Park - 5 St & Park Ave', 'value' : 'Church Sq Park - 5 St & Park Ave'},
        {'label': 'Morris Canal', 'value' : 'Morris Canal'},
        {'label': 'Bloomfield St & 15 St', 'value' : 'Bloomfield St & 15 St'},
        {'label': 'Clinton St & 7 St', 'value' : 'Clinton St & 7 St'},
        {'label': 'Union St', 'value' : 'Union St'},
        {'label': 'Dixon Mills', 'value' : 'Dixon Mills'},
        {'label': 'Liberty Light Rail', 'value' : 'Liberty Light Rail'},
        {'label': 'City Hall', 'value' : 'City Hall'},
        {'label': 'Essex Light Rail', 'value' : 'Essex Light Rail'},
        {'label': '11 St & Washington St', 'value' : '11 St & Washington St'},
        {'label': 'Fairmount Ave', 'value' : 'Fairmount Ave'},
        {'label': 'Pershing Field', 'value' : 'Pershing Field'},
        {'label': 'Washington St', 'value' : 'Washington St'},
        {'label': '9 St HBLR - Jackson St & 8 St', 'value' : '9 St HBLR - Jackson St & 8 St'},
        {'label': '8 St & Washington St', 'value' : '8 St & Washington St'},
        {'label': 'Stevens - River Ter & 6 St', 'value' : 'Stevens - River Ter & 6 St'},
        {'label': 'Oakland Ave', 'value' : 'Oakland Ave'},
        {'label': 'Communipaw & Berry Lane', 'value' : 'Communipaw & Berry Lane'},
        {'label': 'Journal Square', 'value' : 'Journal Square'},
        {'label': 'Clinton St & Newark St', 'value' : 'Clinton St & Newark St'},
        {'label': 'Journal Square', 'value' : 'Journal Square'},
        {'label': 'Hoboken Terminal - River St & Hudson Pl', 'value' : 'Hoboken Terminal - River St & Hudson Pl'},
        {'label': 'Columbus Park - Clinton St & 9 St', 'value' : 'Columbus Park - Clinton St & 9 St'},
        {'label': 'South Waterfront Walkway - Sinatra Dr & 1 St', 'value' : 'South Waterfront Walkway - Sinatra Dr & 1 St'},
        {'label': 'Hudson St & 4 St', 'value' : 'Hudson St & 4 St'},
        {'label': 'Columbus Drive', 'value' : 'Columbus Drive'},
        {'label': 'Mama Johnson Field - 4 St & Jackson St', 'value' : 'Mama Johnson Field - 4 St & Jackson St'},
        {'label': 'Lincoln Park', 'value' : 'Lincoln Park'},
        {'label': 'Hamilton Park', 'value' : 'Hamilton Park'},
        {'label': 'Jackson Square', 'value' : 'Jackson Square'},
        {'label': 'Monmouth and 6th', 'value' : 'Monmouth and 6th'},
        {'label': 'Lafayette Park', 'value' : 'Lafayette Park'},
        {'label': 'Baldwin at Montgomery', 'value' : 'Baldwin at Montgomery'},
        {'label': 'Grove St PATH', 'value' : 'Grove St PATH'},
        {'label': 'Bergen Ave & Sip Ave', 'value' : 'Bergen Ave & Sip Ave'},
        {'label': 'Hoboken Terminal - Hudson St & Hudson Pl', 'value' : 'Hoboken Terminal - Hudson St & Hudson Pl'},
    ],
    value = ''
)


## - Frequency by Citi Bike Start Stations (Sunburst, Treemap) 
tak boleh run

#distribution graph
distribution_head = html.H4("Population by Country")

distribution_radio = dcc.RadioItems(
    id = "distribution-radio",
    options = [
        {'label':'Sunburst','value':'sunburst'},
        {'label':'Treemap','value':'treemap'},
        ],
        value = 'sunburst'
)

df_agg21 = df.groupby(['member_casual', 'rideable_type', 'weekday_name'], as_index=False).agg(
    df_count= pd.NamedAgg(column='start_station_id', aggfunc='count')
)

fig = px.sunburst(df_agg21, path = ['member_casual', 'rideable_type', 'weekday_name'], 
                  values = 'df_count', 
                  labels = ['Member', 'Classic Bike', 'Electric Bike', 'Casual', 'Classic Bike', 'Electric Bike'],
                  color_continuous_scale=px.colors.sequential.matter)

# color_discrete_sequence=px.colors.qualitative.Pastel
fig.show()

@app.callback(
    Output("distribution-graph","figure"),
    [Input("distribution-radio","value")],
    allow_duplicate = True
)
def select_distribution_chart(chart):
  if chart == 'sunburst':
    fig = px.sunburst(wb_data, path = ['continent', 'country'], 
                      values = 'pop', color='country', 
                      hover_data= ['iso_alpha'])
  else:
      fig = px.treemap(wb_data, path = ['continent', 'country'],
                       values = 'pop', color='country', 
                       hover_data= ['iso_alpha'])
  return fig
  

#distribution_graph = dcc.Graph(figure = fig)
distribution_graph = dcc.Graph(id = "distribution-graph")

distribution_row = html.Div(
    children = [
        distribution_head,
        distribution_radio,
        distribution_graph
    ]
)

df_agg22 = df.groupby(['member_casual', 'rideable_type', 'weekday_name'], as_index=False).agg(
    df_count= pd.NamedAgg(column='start_station_id', aggfunc='count')
)

fig = px.treemap(df_agg21, path = ['member_casual', 'rideable_type', 'weekday_name'], 
                  values = 'df_count', 
                  labels = ['Member', 'Classic Bike', 'Electric Bike', 'Casual', 'Classic Bike', 'Electric Bike'],
                  color_continuous_scale=px.colors.sequential.matter)

fig.show()

## Most Popular Routes

In [135]:
#pip3 install folium == 0.5.0, ##you may install bY using command prompt
import folium

print('Folium installed and imported!')

Folium installed and imported!


In [136]:
from folium import plugins

tmp = df.groupby(['start_lat', 'end_lat', 'start_station_name', 'start_lng', 'end_lng', 'end_station_name']).size().nlargest(18).to_frame('size').reset_index()
latstart = tmp['start_lat']
longstart = tmp['start_lng']
namestart = tmp['start_station_name']
latsend = tmp['end_lat']
longsend = tmp['end_lng']
nameend = tmp['end_station_name']
size = tmp['size']
colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred',\
          'lightred', 'beige', 'darkblue', 'darkgreen', 'cadetblue',\
          'darkpurple', 'pink', 'lightblue', 'lightgreen', 'gray', 'black', 'lightgray']
placestart = [[x[0],x[1]] for x in zip(latstart, longstart)]
placesend = [[x[0],x[1]] for x in zip(latsend, longsend)]
places = list(zip(placestart, placesend))

m = folium.Map(places[0][0], tiles='Stamen Terrain', zoom_start=12)

# Set background color and font color
m.get_root().html.add_child(folium.Element('<h3 align="center" style="font-size:20px"><b>Most Popular Routes</b></h3>'))

for i, pair in enumerate(places):
    marker_cluster = plugins.MarkerCluster().add_to(m)
    folium.Marker(pair[0], icon=folium.Icon(color=colors[i]), popup=f"Start {namestart[i]}\nUsage in a month:\n{size[i]}").add_to(marker_cluster)
    folium.Marker(pair[1], icon=folium.Icon(color=colors[i]), popup=f"End {nameend[i]}\nUsage in a month:\n{size[i]}").add_to(marker_cluster)
    folium.PolyLine(pair, color=colors[i]).add_to(m)
    
m


In [137]:
location_map = m
location_map.save('Most_Popular_Routes2.html')

## Website

In [138]:
trips = [
    dbc.CardBody(
        [
            html.H5("Trips", className="card-title"),
            html.P(
                len(df['ride_id']),
                style={
                       'textAlign': 'center',
                       'color': 'white',
                       'fontSize': 40},
                className="card-text",
            ),
        ]
    ),
]

classic_bike = [
    dbc.CardBody(
        [
            html.H5("Classic Bike", className="card-title"),
            html.P(
                len(df[df['rideable_type'] == 'classic_bike']),
                style={
                       'textAlign': 'center',
                       'color': 'white',
                       'fontSize': 40},
                className="card-text",
            ),
        ]
    ),
]

electric_bike = [
    dbc.CardBody(
        [
            html.H5("Electric Bike", className="card-title"),
            html.P(
                len(df[df['rideable_type'] == 'electric_bike']),
                style={
                       'textAlign': 'center',
                       'color': 'white',
                       'fontSize': 40},
                className="card-text",
            ),
        ]
    ),
]

member = [
    dbc.CardBody(
        [
            html.H5("Member", className="card-title"),
            html.P(
                len(df[df['member_casual'] == 'member']),
                style={
                       'textAlign': 'center',
                       'color': 'white',
                       'fontSize': 40},
                className="card-text",
            ),
        ]
    ),
]

casual = [
    dbc.CardBody(
        [
            html.H5("Casual", className="card-title"),
            html.P(
                len(df[df['member_casual'] == 'casual']),
                style={
                       'textAlign': 'center',
                       'color': 'white',
                       'fontSize': 40},
                className="card-text",
            ),
        ]
    ),
]

start_station = [
    dbc.CardBody(
        [
            html.H5("Start Station", className="card-title"),
            html.P(
                len(df['start_station_name'].unique()),
                style={
                       'textAlign': 'center',
                       'color': 'white',
                       'fontSize': 40},
                className="card-text",
            ),
        ]
    ),
]

In [139]:
#COMPONENTS

app.layout = html.Div(
    children=[
        html.Div(
            children=[
                header,
            ], #
            className="header",
        ), #Description below the header
        
        html.Div([
            dbc.Row(
            [
                dbc.Col(dbc.Card(trips, color="#4B0082", inverse=True)), #dark purple
                dbc.Col(dbc.Card(start_station, color="#663399", inverse=True)), #deep purple
                dbc.Col(dbc.Card(classic_bike, color="#9932CC", inverse=True)), #Dark Orchid
                dbc.Col(dbc.Card(electric_bike, color="#BA55D3", inverse=True)), #Medium-Light Purple
                dbc.Col(dbc.Card(member, color="#9370DB", inverse=True)), #Medium Purple
                dbc.Col(dbc.Card(casual, color="#E0B0FF", inverse=True)), #Mauve
            ],
            className="mb-4",
            ),
        ]),
        
        html.Div(
            className="row",
            children=[
                html.Div(
                    children=[
                        html.Button("Weekdays", id="weekdays-button", n_clicks=0,
                                   style={"font-size": "20px", "margin-right": "10px", "float": "right"}),
                        html.Button("Weekends", id="weekends-button", n_clicks=0,
                                   style={"font-size": "20px", "margin-right": "10px", "float": "right"}),
                        html.Button("Whole Week", id="all-button", n_clicks=0,
                                   style={"font-size": "20px", "margin-right": "10px", "float": "right"}),
                    ],
                    style={"margin-bottom": "20px"}
                ),
            ],
        ),

        
        html.Div(
            children=[
                html.Div(
                children = dcc.Graph(
                    id = 'bar-graph',
                    figure = fig1,
                    config={"displayModeBar": False},
                ),
                style={'width': '40%', 'display': 'inline-block', "border": "15px #0F2537 solid"},
            ),
                html.Div(
                children = 
                    dcc.Graph(
                    id = 'output-graph',
                    figure = fig3,
                    #config={"displayModeBar": False},
                ),
                style={'width': '60%', 'display': 'inline-block', "border": "15px #0F2537 solid"},
            ),

        ],
        className = 'double-graph',
        ),
        
        #html.Label("Select Station Name:", style={'fontSize': '15px'}),
        #dcc.Input(id="dock-name", type="text", placeholder="", value='Bergen Ave', style={'marginRight':'400px'}),
        html.Div(
            children=[
                html.Label("Select Station Name:", style={'fontSize': '15px', 'marginLeft': '20px'}),
                html.Div(stationbar_dropdown, style={'display': 'inline-block', 'width': '50%'}),
                html.Label("Insert Value:", style={'fontSize': '15px', 'marginLeft': '200px'}),
                html.Div(
                    dcc.Input(
                        id="top-n-input",
                        type="number",
                        placeholder="",
                        value=10,
                        debounce=True,
                    ),
                    style={'display': 'inline-block', 'width': '50%'}
                ),
            ],
            style={'width': '60%', 'display': 'flex', 'justify-content': 'space-between'}
        ),

        #html.Label("Insert Value:", style={'fontSize': '15px'}),
        #dcc.Input(id="top-n-input", type="number", placeholder="", value=10, debounce=True, style={'marginLeft':'500px'}),


        html.Div(
            children=[
                html.Div(
                children = dcc.Graph(
                    id = 'favhist-graph',
                    figure = fig7,
                    #config={"displayModeBar": False},
                ),
                style={'width': '35%', 'display': 'inline-block', "border": "15px #0F2537 solid"},
            ),
                html.Div(
                children = dcc.Graph(
                    id = 'top-n-bar-chart',
                    figure = fig6,
                    #config={"displayModeBar": False},
                ),
                style={'width': '35%', 'display': 'inline-block', "border": "15px #0F2537 solid"},
            ),
                html.Div(
                children = dcc.Graph(
                    id = 'tripday-graph',
                    figure = fig5,
                    #config={"displayModeBar": False},
                ),
                style={'width': '30%', 'display': 'inline-block', "border": "15px #0F2537 solid"},
            ),
                html.Div(
                children = dcc.Graph(
                    figure = fig4,
                    #config={"displayModeBar": False},
                ),
                style={'width': '60%', 'display': 'inline-block', "border": "15px #0F2537 solid"},
            ),

        ],
        className = 'double-graph',
        ),        

        
     html.Iframe(id = 'map', srcDoc = open('Most_Popular_Routes2.html', 'r').read(), width = '100%', height = '600'),
    ]
) 


#app.run_server(debug=True, mode = ("inline"))
app.run_server(debug=True,host = '127.0.0.2')

Dash is running on http://127.0.0.2:8050/

Dash app running on http://127.0.0.2:8050/


pip install --upgrade pip

!pip install dash-tools

git init

!pip install -r requirement.txt