### Install dependencies

This notebook requires a dependency which can be installed with the following command `pip install open-bus-stride-client`.

You can also launch it online at [this URL](https://mybinder.org/v2/gh/hasadna/open-bus-stride-client/main?labpath=notebooks%2Fload%20siri%20vehicle%20locations%20to%20pandas%20dataframe.ipynb), when launching online the dependencies are already installed.


In [1]:
# !pip install open-bus-stride-client

In [2]:
import pandas as pd
import datetime
from dateutil import tz

pd.options.display.max_columns = 1000
pd.options.display.max_colwidth = 1000

import stride

### Find a route to investigave

As SIRI data doesn't hold the `route_short_name` data (the bus line number) we will use the GTFS to find a route.

Let's look for line number `480` (Egged).

In [3]:
pd.DataFrame(stride.get('/gtfs_routes/list', {'route_short_name':480,
                                                            'agency_name': 'אגד',
                                                            'route_long_name_contains': 'ירושלים',
                                              'date_from': '2022-03-18',
                                              'date_to':  '2022-03-18'}))

Unnamed: 0,id,date,line_ref,operator_ref,route_short_name,route_long_name,route_mkt,route_direction,route_alternative,agency_name,route_type
0,26129,2022-03-18,7020,3,480,ת. רכבת תל אביב- סבידור/רציפים-תל אביב יפו<->ת. מרכזית ירושלים/הורדה-ירושלים-1#,10480,1,#,אגד,3
1,26130,2022-03-18,7022,3,480,מסוף רדינג/רציפים-תל אביב יפו<->ממילא/קריב-ירושלים-1ק,10480,1,ק,אגד,3
2,26131,2022-03-18,7023,3,480,ת. מרכזית ירושלים קומה 3/רציפים-ירושלים<->ת. רכבת תל אביב - סבידור/הורדה-תל אביב יפו-3#,10480,3,#,אגד,3
3,26132,2022-03-18,7034,3,480,ממילא/קריב-ירושלים<->מסוף רדינג/הורדה-תל אביב יפו-3ק,10480,3,ק,אגד,3


This line as 4 `line_ref` (routes), let's pick `7020` route and get its SIRI data.

### Get rides data

We use the stride iterate method to efficiently iterate over a possibly long list of results.

Behind the scenes it uses the offset/limit parameters so you don't have to worry about it.

We pass on the iterator directly on to Pandas to create a DataFrame.

In [4]:
siri_vehicle_locations_480 = pd.DataFrame(stride.iterate('/siri_vehicle_locations/list', {
    'siri_routes__line_ref': '7020',
    'siri_rides__schedualed_start_time_from': datetime.datetime(2022,3, 18, tzinfo=tz.gettz('Israel')),
    'siri_rides__schedualed_start_time_to': datetime.datetime(2022,3, 18, tzinfo=tz.gettz('Israel'))+datetime.timedelta(days=1),
    'order_by': 'recorded_at_time desc'
}, limit=1000000))

siri_vehicle_locations_480.shape

(3760, 35)

In [5]:
siri_vehicle_locations_480[['recorded_at_time','siri_route__line_ref',
                                    'siri_route__operator_ref','siri_ride__scheduled_start_time',
                                   'lon','lat','siri_ride__vehicle_ref']].head()

Unnamed: 0,recorded_at_time,siri_route__line_ref,siri_route__operator_ref,siri_ride__scheduled_start_time,lon,lat,siri_ride__vehicle_ref
0,2022-03-18 15:59:54+00:00,7020,3,2022-03-18 14:04:00+00:00,34.796696,32.082924,7658569
1,2022-03-18 15:59:54+00:00,7020,3,2022-03-18 14:04:00+00:00,34.796696,32.082924,7658569
2,2022-03-18 15:59:54+00:00,7020,3,2022-03-18 14:04:00+00:00,34.796696,32.082924,7658569
3,2022-03-18 15:57:57+00:00,7020,3,2022-03-18 14:04:00+00:00,34.797039,32.083153,7658569
4,2022-03-18 15:57:57+00:00,7020,3,2022-03-18 14:04:00+00:00,34.797039,32.083153,7658569


The date columns are on UTC timezone, let's localize the dates to Israel timezone.

In [6]:
def localize_dates(data, dt_columns = None):
    if dt_columns is None:
        dt_columns=[]
    
    data = data.copy()
    
    for c in dt_columns:
        data[c] = pd.to_datetime(data[c]).dt.tz_convert('Israel')
    
    return data

In [7]:
dt_columns = ['recorded_at_time','siri_ride__scheduled_start_time']

siri_vehicle_locations_480 = localize_dates(siri_vehicle_locations_480, dt_columns)

In [8]:
siri_vehicle_locations_480.siri_ride__scheduled_start_time.value_counts().sort_index()

2022-03-18 00:10:00+02:00     45
2022-03-18 06:00:00+02:00     47
2022-03-18 06:30:00+02:00     49
2022-03-18 07:00:00+02:00     46
2022-03-18 07:20:00+02:00     46
2022-03-18 07:40:00+02:00     46
2022-03-18 08:00:00+02:00     46
2022-03-18 08:12:00+02:00     45
2022-03-18 08:36:00+02:00     49
2022-03-18 08:48:00+02:00     46
2022-03-18 09:00:00+02:00     47
2022-03-18 09:10:00+02:00     48
2022-03-18 09:20:00+02:00     52
2022-03-18 09:30:00+02:00     50
2022-03-18 09:40:00+02:00     58
2022-03-18 09:50:00+02:00     67
2022-03-18 10:00:00+02:00     62
2022-03-18 10:10:00+02:00     66
2022-03-18 10:20:00+02:00     73
2022-03-18 10:30:00+02:00     72
2022-03-18 10:40:00+02:00     74
2022-03-18 10:50:00+02:00     79
2022-03-18 11:00:00+02:00     88
2022-03-18 11:10:00+02:00     97
2022-03-18 11:20:00+02:00     98
2022-03-18 11:30:00+02:00     91
2022-03-18 11:40:00+02:00    111
2022-03-18 11:50:00+02:00     98
2022-03-18 12:00:00+02:00     94
2022-03-18 12:10:00+02:00     91
2022-03-18

It looks great! (*note 18/03/2022 is Friday*)

Now we can use Pandas to get some information from this data.

### Notes and Resources

siri_rides/list: 
- `siri_route_ids`: route_ids field can be a comma-separated string containing a list of ids.
- all date/time parameters must have a timezone (for example: `datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=1)`).
- `order_by`: any field can be specified in `order_by` with asc or desc specifier, you can specify comma-separated multiple values.
- `limit`: any number can be specified for the limit as we use pagination behind the scenes, default is 10,000.

Documentation: https://open-bus-stride-api.hasadna.org.il/docs#/