**Atributes to keep + description:**  
`trip_id`: identifier of the trip, not unique (there are as many trip_id as the length of the trip ie. number of stations in the trip, each stop in the trip has a number given by stop_sequence)    
`arrival_time`: scheduled (local) time of arrival at the stop (same as DEPARTURE_TIME if this is the start of the journey)  
`departure_time` : scheduled (local) time of departure at the stop  
`stop_id`: stop (station) identifier (FK), from stops.txt  
`stop_sequence`: sequence number of the stop on this trip id, starting at 1.  

Note about the attributes:  
If Departure_time == Arrival_time in the middle of a trip, this means that the train waits less that 1 minute (in theory) in the station before continuing. But all the stop_id in each trip is a station where the trains stops at.  
Stop_sequence: 1 = departure station, final_number=arrival station, all other numbers in the sequence=stations where the train stops at, but not arrival.  

Data cleaning notes:  
-ARRIVAL_TIME and DEPARTURE_TIME has some malformed date_times (more than 24h)  
-no nan beyond malformed arrival and departure times





In [1]:
import pandas as pd
#pd.options.display.max_rows = 10
#pd.options.mode.use_inf_as_na = True

In [85]:
filepath = "../data/stop_times.txt"
df = pd.read_csv(filepath)

  df = pd.read_csv(filepath)


In [86]:
# print number of nan values in each column
print(df.isna().sum())

trip_id           0
arrival_time      0
departure_time    0
stop_id           0
stop_sequence     0
pickup_type       0
drop_off_type     0
dtype: int64


In [3]:
df.dtypes

trip_id           object
arrival_time      object
departure_time    object
stop_id           object
stop_sequence      int64
pickup_type        int64
drop_off_type      int64
dtype: object

In [7]:
df.head()

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
0,1.TA.37-1-Y-j20-1.1.H,10:00:00,10:00:00,8505380,1,0,0
1,1.TA.37-1-Y-j20-1.1.H,13:00:00,13:00:00,8578641,2,0,0
2,1.TA.37-2-Y-j20-1.1.H,15:12:00,15:12:00,8574796,1,0,0
3,1.TA.37-2-Y-j20-1.1.H,16:04:00,16:04:00,8574935,2,0,0
4,1.TA.37-3-Y-j20-1.1.H,10:28:00,10:28:00,8574935,1,0,0


In [10]:
# convert arrival_time and departure_time to datetime
df['arrival_time'] = pd.to_datetime(df['arrival_time'], format='%H:%M:%S', errors='coerce')
df['departure_time'] = pd.to_datetime(df['departure_time'], format='%H:%M:%S', errors='coerce')

In [11]:
# compute number of naT values in percent of total and drop naT values
malformed_percent = df.isna().sum() / len(df) * 100
df = df.dropna()
malformed_percent

trip_id           0.000000
arrival_time      3.234473
departure_time    3.255544
stop_id           0.000000
stop_sequence     0.000000
pickup_type       0.000000
drop_off_type     0.000000
dtype: float64

In [12]:
print(f"Number of rows: {len(df)}")

Number of rows: 11510548


In [17]:
df.describe()

Unnamed: 0,stop_sequence,pickup_type,drop_off_type
count,11510550.0,11510550.0,11510550.0
mean,9.92048,0.002947644,0.0007077856
std,7.980645,0.05421214,0.02659482
min,1.0,0.0,0.0
25%,3.0,0.0,0.0
50%,8.0,0.0,0.0
75%,14.0,0.0,0.0
max,66.0,1.0,1.0


Each row corresponds to a stop, there are 11,897,889 in total ie 11 mio approx.  
We need to group by trip_id to get number of trips.  
However, we only need to keep trips 15km away max from Zürich

In [19]:
df["trip_id"].nunique()

1112257

In [11]:
df.groupby('trip_id').size().sort_values(ascending=False)

trip_id
3.TA.16-623-A-j20-1.3.H       69
249.TA.27-11-j20-1.27.R       66
253.TA.27-11-j20-1.28.R       66
243.TA.27-11-j20-1.27.R       66
87.TA.27-11-j20-1.12.H        66
                              ..
3178.TA.61-101-Y-j20-1.1.H     2
3177.TA.88-146-Y-j20-1.1.H     2
3177.TA.61-333-Y-j20-1.1.H     2
3177.TA.61-281-Y-j20-1.1.H     2
999.TA.90-161-Y-j20-1.1.H      2
Length: 1132287, dtype: int64

Longest trip has 69 stops, shortest has 2 stops (start stop and arrival stop)

example below:

In [12]:
df.head(2)

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
0,1.TA.37-1-Y-j20-1.1.H,10:00:00,10:00:00,8505380,1,0,0
1,1.TA.37-1-Y-j20-1.1.H,13:00:00,13:00:00,8578641,2,0,0


This trip has two stops: departure (stop_id=8505380) and arrival (stop_id=8578641).  
Trip starts at 10h00 and trains arrive at 13h00  

In [20]:
df[df['trip_id'] == "1.TA.91-1-Y-j20-1.1.H"]

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
8,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:17:00,1900-01-01 08:17:00,8501300:0:8,1,0,0
9,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:19:00,1900-01-01 08:19:00,8501360,2,0,0
10,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:23:00,1900-01-01 08:23:00,8501361,3,0,0
11,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:25:00,1900-01-01 08:25:00,8501362,4,0,0
12,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:28:00,1900-01-01 08:29:00,8501370,5,0,0
13,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:30:00,1900-01-01 08:30:00,8501230,6,0,0
14,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:32:00,1900-01-01 08:32:00,8501308,7,0,0
15,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:34:00,1900-01-01 08:34:00,8501363,8,0,0
16,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:39:00,1900-01-01 08:40:00,8501371,9,0,0
17,1.TA.91-1-Y-j20-1.1.H,1900-01-01 08:43:00,1900-01-01 08:43:00,8501364,10,0,0


In [24]:
df[df['trip_id'] == "7.TA.91-1-Y-j20-1.1.H"]

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
24,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:17:00,1900-01-01 09:17:00,8501300:0:8,1,0,0
25,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:19:00,1900-01-01 09:19:00,8501360,2,0,0
26,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:23:00,1900-01-01 09:23:00,8501361,3,0,0
27,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:25:00,1900-01-01 09:25:00,8501362,4,0,0
28,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:28:00,1900-01-01 09:29:00,8501370,5,0,0
29,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:30:00,1900-01-01 09:30:00,8501230,6,0,0
30,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:32:00,1900-01-01 09:32:00,8501308,7,0,0
31,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:34:00,1900-01-01 09:34:00,8501363,8,0,0
32,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:39:00,1900-01-01 09:40:00,8501371,9,0,0
33,7.TA.91-1-Y-j20-1.1.H,1900-01-01 09:43:00,1900-01-01 09:43:00,8501364,10,0,0


What does this one mean ? Sometimes arrival_time = departure_time on the middle of the trip


# Analyze a known trip

In [25]:
df_stops = pd.read_csv("../data/stops.txt")
df_stops.head()

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station
0,1100006,"Zell (Wiesental), Bahnhof",47.704632,7.847772,,
1,1100008,"Zell (Wiesental), Wilder Mann",47.710084,7.859648,,
2,1100009,"Zell (Wiesental), Grönland",47.713191,7.862909,,
3,1100010,Atzenbach,47.714624,7.872341,,
4,1100011,"Mambach, Brücke",47.728209,7.87747,,


In [44]:
# all stops with "Collombey" in the name
df_stops[df_stops['stop_name'].str.contains("Collombey")]

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station
6006,8501221,"Muraz (Collombey), collège",46.278389,6.929415,,
6214,8501422:0:1,Collombey,46.271254,6.948756,,
6240,8501462,Collombey-Muraz,46.270168,6.946016,,
6696,8501983,"Saillon, Collombeyres",46.171886,7.187268,,
9442,8504459,"Muraz (Collombey), Pré Géroux",46.28808,6.933215,,
9460,8504473,"Collombey, Barme",46.274874,6.941381,,
9463,8504476,"Collombey, Charmettes",46.272186,6.944651,,
9464,8504478,"Collombey, St Didier",46.268746,6.951846,,
9511,8504497,"Collombey, ch. du Verger",46.265802,6.946178,,
9515,8504502,"Collombey-le-Grand, Chapelle",46.281425,6.945944,,


In [28]:
df_stops[df_stops["stop_name"] == "St-Maurice"]

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station
6196,8501403,St-Maurice,46.216372,7.002044,,8501403P
6197,8501403:0:1,St-Maurice,46.216683,7.002044,,8501403P
6198,8501403:0:2,St-Maurice,46.21662,7.002044,,8501403P
6199,8501403:0:3,St-Maurice,46.216745,7.002044,,8501403P
6200,8501403:0:3AB,St-Maurice,46.216807,7.002044,,8501403P
6201,8501403:0:4,St-Maurice,46.216434,7.002044,,8501403P
6202,8501403:0:6,St-Maurice,46.216496,7.002044,,8501403P
6203,8501403P,St-Maurice,46.216372,7.002044,1.0,


In [68]:
df_stops[df_stops["stop_name"] == "Martigny"]

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station
6275,8501500,Martigny,46.105825,7.07912,,8501500P
6276,8501500:0:1,Martigny,46.10595,7.07912,,8501500P
6277,8501500:0:1AB,Martigny,46.106386,7.07912,,8501500P
6278,8501500:0:1CD,Martigny,46.106323,7.07912,,8501500P
6279,8501500:0:2,Martigny,46.106012,7.07912,,8501500P
6280,8501500:0:3,Martigny,46.105887,7.07912,,8501500P
6281,8501500:0:40,Martigny,46.106136,7.07912,,8501500P
6282,8501500:0:50,Martigny,46.106074,7.07912,,8501500P
6283,8501500P,Martigny,46.105825,7.07912,1.0,


In [77]:
df_stops[df_stops["stop_name"] == "Monthey"]

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station
6213,8501421,Monthey,46.255654,6.955224,,


In [35]:
df[df['stop_id'] == "8501403"].head(20)

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
9769179,1.TA.30-100-Y-j20-1.1.H,1900-01-01 05:08:00,1900-01-01 05:08:00,8501403,1,0,0
9769191,2.TA.30-100-Y-j20-1.1.H,1900-01-01 05:26:00,1900-01-01 05:26:00,8501403,1,0,0
9769203,3.TA.30-100-Y-j20-1.2.H,1900-01-01 05:40:00,1900-01-01 05:40:00,8501403,1,0,0
9769207,4.TA.30-100-Y-j20-1.2.H,1900-01-01 06:40:00,1900-01-01 06:40:00,8501403,1,0,0
9769211,5.TA.30-100-Y-j20-1.2.H,1900-01-01 07:40:00,1900-01-01 07:40:00,8501403,1,0,0
9769215,6.TA.30-100-Y-j20-1.2.H,1900-01-01 08:40:00,1900-01-01 08:40:00,8501403,1,0,0
9769219,7.TA.30-100-Y-j20-1.2.H,1900-01-01 09:40:00,1900-01-01 09:40:00,8501403,1,0,0
9769223,8.TA.30-100-Y-j20-1.2.H,1900-01-01 10:40:00,1900-01-01 10:40:00,8501403,1,0,0
9769227,9.TA.30-100-Y-j20-1.2.H,1900-01-01 11:40:00,1900-01-01 11:40:00,8501403,1,0,0
9769231,10.TA.30-100-Y-j20-1.2.H,1900-01-01 12:40:00,1900-01-01 12:40:00,8501403,1,0,0


## St-Maurice

In [38]:
ex_trip = df[df["trip_id"] == "1.TA.30-100-Y-j20-1.1.H"]

In [41]:
# print all the stop names of the trip
for i, stop_id in enumerate(ex_trip["stop_id"]):
    print("Stop name {}: {}".format(i, df_stops[df_stops["stop_id"] == stop_id]["stop_name"].values[0]))

Stop name 0: St-Maurice
Stop name 1: Bex
Stop name 2: Aigle
Stop name 3: Roche VD
Stop name 4: Villeneuve
Stop name 5: Veytaux-Chillon
Stop name 6: Territet
Stop name 7: Montreux
Stop name 8: Clarens
Stop name 9: Burier
Stop name 10: La Tour-de-Peilz
Stop name 11: Vevey


**Hypothesis**  
When arrival_time==departure_time DURING a trip for a stop_id, then it means that the train doesn't stop at this stop_id, but passes through it.

## Collombey

In [65]:
df[df["stop_id"] == "8501422:0:1"].head()

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
10657893,1.TA.66-130-j20-1.1.H,1900-01-01 22:43:00,1900-01-01 22:43:00,8501422:0:1,6,0,0
10657900,2.TA.66-130-j20-1.2.H,1900-01-01 23:43:00,1900-01-01 23:43:00,8501422:0:1,6,0,0
10657909,79.TA.66-130-j20-1.5.H,1900-01-01 09:10:00,1900-01-01 09:10:00,8501422:0:1,6,0,0
10657937,81.TA.66-130-j20-1.5.H,1900-01-01 09:10:00,1900-01-01 09:10:00,8501422:0:1,6,0,0
10657965,84.TA.66-130-j20-1.5.H,1900-01-01 10:10:00,1900-01-01 10:10:00,8501422:0:1,6,0,0


In [60]:
ex_trip2 = df[df["trip_id"] == "1.TA.66-130-j20-1.1.H"]
# print all the stop names of the trip
for i, stop_id in enumerate(ex_trip2["stop_id"]):
    print("Stop name {}: {}".format(i, df_stops[df_stops["stop_id"] == stop_id]["stop_name"].values[0]))

Stop name 0: St-Gingolph (Suisse)
Stop name 1: Bouveret
Stop name 2: Les Evouettes
Stop name 3: Vouvry
Stop name 4: Vionnaz
Stop name 5: Collombey
Stop name 6: Monthey


In [61]:
ex_trip2

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
10657888,1.TA.66-130-j20-1.1.H,1900-01-01 22:27:00,1900-01-01 22:27:00,8501428:0:1,1,0,0
10657889,1.TA.66-130-j20-1.1.H,1900-01-01 22:32:00,1900-01-01 22:32:00,8501426:0:1,2,0,0
10657890,1.TA.66-130-j20-1.1.H,1900-01-01 22:34:00,1900-01-01 22:34:00,8501425:0:1,3,0,0
10657891,1.TA.66-130-j20-1.1.H,1900-01-01 22:37:00,1900-01-01 22:37:00,8501424:0:1,4,0,0
10657892,1.TA.66-130-j20-1.1.H,1900-01-01 22:40:00,1900-01-01 22:40:00,8501423:0:1,5,0,0
10657893,1.TA.66-130-j20-1.1.H,1900-01-01 22:43:00,1900-01-01 22:43:00,8501422:0:1,6,0,0
10657894,1.TA.66-130-j20-1.1.H,1900-01-01 22:49:00,1900-01-01 22:49:00,8501421,7,0,0


In [66]:
ex_trip3 = df[df["trip_id"] == "79.TA.66-130-j20-1.5.H"]
# print all the stop names of the trip
for i, stop_id in enumerate(ex_trip2["stop_id"]):
    print("Stop name {}: {}".format(i, df_stops[df_stops["stop_id"] == stop_id]["stop_name"].values[0]))

Stop name 0: St-Gingolph (Suisse)
Stop name 1: Bouveret
Stop name 2: Les Evouettes
Stop name 3: Vouvry
Stop name 4: Vionnaz
Stop name 5: Collombey
Stop name 6: Monthey


In [67]:
ex_trip3

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
10657904,79.TA.66-130-j20-1.5.H,1900-01-01 08:52:00,1900-01-01 08:52:00,8501428:0:1,1,0,0
10657905,79.TA.66-130-j20-1.5.H,1900-01-01 08:57:00,1900-01-01 08:59:00,8501426:0:1,2,0,0
10657906,79.TA.66-130-j20-1.5.H,1900-01-01 09:01:00,1900-01-01 09:01:00,8501425:0:1,3,0,0
10657907,79.TA.66-130-j20-1.5.H,1900-01-01 09:04:00,1900-01-01 09:04:00,8501424:0:1,4,0,0
10657908,79.TA.66-130-j20-1.5.H,1900-01-01 09:07:00,1900-01-01 09:07:00,8501423:0:1,5,0,0
10657909,79.TA.66-130-j20-1.5.H,1900-01-01 09:10:00,1900-01-01 09:10:00,8501422:0:1,6,0,0
10657910,79.TA.66-130-j20-1.5.H,1900-01-01 09:16:00,1900-01-01 09:17:00,8501421,7,0,0
10657911,79.TA.66-130-j20-1.5.H,1900-01-01 09:19:00,1900-01-01 09:19:00,8501420:0:1,8,0,0
10657912,79.TA.66-130-j20-1.5.H,1900-01-01 09:24:00,1900-01-01 09:36:00,8501403:0:3,9,0,0
10657913,79.TA.66-130-j20-1.5.H,1900-01-01 09:40:00,1900-01-01 09:40:00,8501404:0:2,10,0,0


## Martigny

In [71]:
df[df["stop_id"] == "8501500"].head(20)

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
9818682,1.TA.30-515-Y-j20-1.1.H,1900-01-01 05:45:00,1900-01-01 05:45:00,8501500,4,0,0
9818686,57.TA.30-515-Y-j20-1.1.H,1900-01-01 06:09:00,1900-01-01 06:09:00,8501500,4,0,0
9818690,58.TA.30-515-Y-j20-1.1.H,1900-01-01 06:59:00,1900-01-01 06:59:00,8501500,4,0,0
9818694,59.TA.30-515-Y-j20-1.1.H,1900-01-01 08:09:00,1900-01-01 08:09:00,8501500,4,0,0
9818698,60.TA.30-515-Y-j20-1.1.H,1900-01-01 09:09:00,1900-01-01 09:09:00,8501500,4,0,0
9818702,61.TA.30-515-Y-j20-1.1.H,1900-01-01 10:09:00,1900-01-01 10:09:00,8501500,4,0,0
9818706,62.TA.30-515-Y-j20-1.1.H,1900-01-01 11:09:00,1900-01-01 11:09:00,8501500,4,0,0
9818710,63.TA.30-515-Y-j20-1.1.H,1900-01-01 12:09:00,1900-01-01 12:09:00,8501500,4,0,0
9818714,64.TA.30-515-Y-j20-1.1.H,1900-01-01 13:09:00,1900-01-01 13:09:00,8501500,4,0,0
9818718,65.TA.30-515-Y-j20-1.1.H,1900-01-01 14:09:00,1900-01-01 14:09:00,8501500,4,0,0


**This is a direct:**

In [72]:
ex_martigny = df[df["trip_id"] == "2.TA.30-515-Y-j20-1.2.H"]
# print all the stop names of the trip
for i, stop_id in enumerate(ex_martigny["stop_id"]):
    print("Stop name {}: {}".format(i, df_stops[df_stops["stop_id"] == stop_id]["stop_name"].values[0]))

Stop name 0: St-Maurice
Stop name 1: Martigny


In [75]:
ex_martigny

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
9818759,2.TA.30-515-Y-j20-1.2.H,1900-01-01 07:07:00,1900-01-01 07:07:00,8501403,1,0,0
9818760,2.TA.30-515-Y-j20-1.2.H,1900-01-01 07:27:00,1900-01-01 07:27:00,8501500,2,0,0


**This is not a direct:**

In [74]:
ex_martigny2 = df[df["trip_id"] == "1.TA.30-515-Y-j20-1.1.H"]
# print all the stop names of the trip
for i, stop_id in enumerate(ex_martigny2["stop_id"]):
    print("Stop name {}: {}".format(i, df_stops[df_stops["stop_id"] == stop_id]["stop_name"].values[0]))

Stop name 0: St-Maurice
Stop name 1: Evionnaz
Stop name 2: Vernayaz
Stop name 3: Martigny


In [76]:
ex_martigny2

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
9818679,1.TA.30-515-Y-j20-1.1.H,1900-01-01 05:17:00,1900-01-01 05:17:00,8501403,1,0,0
9818680,1.TA.30-515-Y-j20-1.1.H,1900-01-01 05:29:00,1900-01-01 05:29:00,8501404,2,0,0
9818681,1.TA.30-515-Y-j20-1.1.H,1900-01-01 05:37:00,1900-01-01 05:37:00,8501405,3,0,0
9818682,1.TA.30-515-Y-j20-1.1.H,1900-01-01 05:45:00,1900-01-01 05:45:00,8501500,4,0,0


**Hypothesis:**  
If departure_time=arrival_time during a trip, it means that the train doesn't wait a lot (less than 1 min in theory) in the stop before starting over. The above example shows that the hypothesis above is wrong!.

## Monthey

In [78]:
df[df["stop_id"] == "8501421"].head(20)

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
5484405,22.TA.24-103-j20-1.3.R,1900-01-01 01:03:00,1900-01-01 01:03:00,8501421,3,1,0
5484432,23.TA.24-103-j20-1.3.R,1900-01-01 02:03:00,1900-01-01 02:03:00,8501421,3,1,0
5484459,24.TA.24-103-j20-1.3.R,1900-01-01 03:03:00,1900-01-01 03:03:00,8501421,3,1,0
5484486,25.TA.24-103-j20-1.3.R,1900-01-01 04:03:00,1900-01-01 04:03:00,8501421,3,1,0
10646433,1.TA.63-83-Y-j20-1.1.H,1900-01-01 14:35:00,1900-01-01 14:35:00,8501421,2,0,0
10657894,1.TA.66-130-j20-1.1.H,1900-01-01 22:49:00,1900-01-01 22:49:00,8501421,7,0,0
10657901,2.TA.66-130-j20-1.2.H,1900-01-01 23:48:00,1900-01-01 23:49:00,8501421,7,0,0
10657910,79.TA.66-130-j20-1.5.H,1900-01-01 09:16:00,1900-01-01 09:17:00,8501421,7,0,0
10657938,81.TA.66-130-j20-1.5.H,1900-01-01 09:16:00,1900-01-01 09:17:00,8501421,7,0,0
10657966,84.TA.66-130-j20-1.5.H,1900-01-01 10:16:00,1900-01-01 10:17:00,8501421,7,0,0


In [81]:
ex_monthey1 = df[df["trip_id"] == "79.TA.66-130-j20-1.5.H"]
# print all the stop names of the trip
for i, stop_id in enumerate(ex_monthey1["stop_id"]):
    print("Stop name {}: {}".format(i, df_stops[df_stops["stop_id"] == stop_id]["stop_name"].values[0]))

Stop name 0: St-Gingolph (Suisse)
Stop name 1: Bouveret
Stop name 2: Les Evouettes
Stop name 3: Vouvry
Stop name 4: Vionnaz
Stop name 5: Collombey
Stop name 6: Monthey
Stop name 7: Massongex
Stop name 8: St-Maurice
Stop name 9: Evionnaz
Stop name 10: Vernayaz
Stop name 11: Martigny
Stop name 12: Charrat-Fully
Stop name 13: Saxon
Stop name 14: Riddes
Stop name 15: Chamoson
Stop name 16: Ardon
Stop name 17: Châteauneuf-Conthey
Stop name 18: Sion
Stop name 19: St-Léonard
Stop name 20: Sierre/Siders
Stop name 21: Salgesch
Stop name 22: Leuk
Stop name 23: Turtmann
Stop name 24: Gampel-Steg
Stop name 25: Raron
Stop name 26: Visp
Stop name 27: Brig


In [82]:
ex_monthey1

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
10657904,79.TA.66-130-j20-1.5.H,1900-01-01 08:52:00,1900-01-01 08:52:00,8501428:0:1,1,0,0
10657905,79.TA.66-130-j20-1.5.H,1900-01-01 08:57:00,1900-01-01 08:59:00,8501426:0:1,2,0,0
10657906,79.TA.66-130-j20-1.5.H,1900-01-01 09:01:00,1900-01-01 09:01:00,8501425:0:1,3,0,0
10657907,79.TA.66-130-j20-1.5.H,1900-01-01 09:04:00,1900-01-01 09:04:00,8501424:0:1,4,0,0
10657908,79.TA.66-130-j20-1.5.H,1900-01-01 09:07:00,1900-01-01 09:07:00,8501423:0:1,5,0,0
10657909,79.TA.66-130-j20-1.5.H,1900-01-01 09:10:00,1900-01-01 09:10:00,8501422:0:1,6,0,0
10657910,79.TA.66-130-j20-1.5.H,1900-01-01 09:16:00,1900-01-01 09:17:00,8501421,7,0,0
10657911,79.TA.66-130-j20-1.5.H,1900-01-01 09:19:00,1900-01-01 09:19:00,8501420:0:1,8,0,0
10657912,79.TA.66-130-j20-1.5.H,1900-01-01 09:24:00,1900-01-01 09:36:00,8501403:0:3,9,0,0
10657913,79.TA.66-130-j20-1.5.H,1900-01-01 09:40:00,1900-01-01 09:40:00,8501404:0:2,10,0,0


In [84]:
df_stops[df_stops["stop_id"] == "8501403:0:3"]

Unnamed: 0,stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station
6199,8501403:0:3,St-Maurice,46.216745,7.002044,,8501403P


As expected, there is a wait of 12 min in St-Maurice.  
arrival_time=09:24:00  
departure_time=09:36:00