### Install dependencies

This notebook requires two dependencies which can be installed with the following commands:

Pandas library:

```
pip install pandas
```

Stride API client library:

```
pip install open-bus-stride
```

In [1]:
import pandas as PD
import stride

### Find a route to investigave

Because the GTFS data is not available yet, we have to use the GTFS operator_ref and line_ref numbers to find a route

In [4]:
stride.get('/siri_routes/list', {'operator_refs': 14, 'line_refs': 28153})

[{'id': 684, 'line_ref': 28153, 'operator_ref': 14}]

We determine that the route id is `684` we can now use that to get the rides for this route

### Get rides data

We use the stride iterate method to efficiently iterate over a possibly long list of results.

Behind the scenes it uses the offset/limit parameters so you don't have to worry about it.

By default it limits results to 10,000 which should be enough for most cases, but you can increase this number by passing the limit keyword argument.

We pass on the iterator directly on to Pandas to create a DataFrame.

In [5]:
df = PD.DataFrame(stride.iterate('/siri_rides/list', {'route_ids': '684'}))
df

Unnamed: 0,id,siri_route_id,journey_ref,scheduled_start_time,vehicle_ref,updated_first_last_vehicle_locations,first_vehicle_location_id,last_vehicle_location_id,updated_duration_minutes,duration_minutes
0,1665155,175,2021-11-24-747141,2021-11-24T12:23:00,9260501,2021-11-28T10:19:23.345742+00:00,91566294.0,92083131.0,2021-11-28T10:19:23.345825+00:00,88.0
1,1665158,271,2021-11-24-5495747,2021-11-24T12:16:00,7547069,2021-11-28T10:19:23.324854+00:00,91566608.0,91719635.0,2021-11-28T10:19:23.324927+00:00,27.0
2,1665164,379,2021-11-24-10493647,2021-11-24T12:20:00,9004258,2021-11-28T10:19:23.332233+00:00,91566738.0,91737939.0,2021-11-28T10:19:23.332323+00:00,30.0
3,1665165,394,2021-11-24-57890328,2021-11-24T12:24:00,67287102,2021-11-28T10:19:23.337973+00:00,91566775.0,91755741.0,2021-11-28T10:19:23.338045+00:00,32.0
4,1770402,2063,2021-11-25-8432830,2021-11-25T13:00:00,7728769,2021-11-28T10:19:23.226201+00:00,97225263.0,97554703.0,2021-11-28T10:19:23.226271+00:00,52.0
...,...,...,...,...,...,...,...,...,...,...
9995,1936548,585,2021-11-28-51150312,2021-11-28T12:25:00,7718369,,,,,
9996,1936549,636,2021-11-28-1307844,2021-11-28T12:25:00,7725269,,,,,
9997,1936550,649,2021-11-28-270897,2021-11-28T12:20:00,9008858,,,,,
9998,1936551,687,2021-11-28-45566148,2021-11-28T12:26:00,9252401,,,,,


### Manipulate the data using Pandas

Now we can use Pandas to get some information from this data

First, let's filter out results which don't have a duration (those are most likely recent rides for which we haven't yet calculated the duration)

In [15]:
df = df[df.duration_minutes.notnull()]
df

Unnamed: 0,id,siri_route_id,journey_ref,scheduled_start_time,vehicle_ref,updated_first_last_vehicle_locations,first_vehicle_location_id,last_vehicle_location_id,updated_duration_minutes,duration_minutes
0,1665155,175,2021-11-24-747141,2021-11-24T12:23:00,9260501,2021-11-28T10:19:23.345742+00:00,91566294.0,92083131.0,2021-11-28T10:19:23.345825+00:00,88.0
1,1665158,271,2021-11-24-5495747,2021-11-24T12:16:00,7547069,2021-11-28T10:19:23.324854+00:00,91566608.0,91719635.0,2021-11-28T10:19:23.324927+00:00,27.0
2,1665164,379,2021-11-24-10493647,2021-11-24T12:20:00,9004258,2021-11-28T10:19:23.332233+00:00,91566738.0,91737939.0,2021-11-28T10:19:23.332323+00:00,30.0
3,1665165,394,2021-11-24-57890328,2021-11-24T12:24:00,67287102,2021-11-28T10:19:23.337973+00:00,91566775.0,91755741.0,2021-11-28T10:19:23.338045+00:00,32.0
4,1770402,2063,2021-11-25-8432830,2021-11-25T13:00:00,7728769,2021-11-28T10:19:23.226201+00:00,97225263.0,97554703.0,2021-11-28T10:19:23.226271+00:00,52.0
...,...,...,...,...,...,...,...,...,...,...
9905,96296,2833,2021-11-04-28461975,2021-11-04T15:05:00,7309952,2021-11-28T12:19:18.690686+00:00,4845577.0,4970666.0,2021-11-28T12:19:18.690756+00:00,20.0
9906,96297,6203,2021-11-04-9765746,2021-11-04T15:00:00,7316052,2021-11-28T12:19:18.695185+00:00,4845583.0,5094015.0,2021-11-28T12:19:18.695274+00:00,39.0
9907,96298,5590,2021-11-04-56458061,2021-11-04T15:05:00,14915102,2021-11-28T12:19:18.698944+00:00,4845628.0,4977253.0,2021-11-28T12:19:18.699022+00:00,21.0
9908,96299,3286,2021-11-04-58842185,2021-11-04T15:05:00,8801187,2021-11-28T12:19:18.702648+00:00,4845655.0,5068073.0,2021-11-28T12:19:18.702711+00:00,34.0


Get the mean duration for all those rides

In [16]:
df.duration_minutes.mean()

55.672137832602715