<h1> <a href="https://gtfs.org/">GTFS: General Public Transit Feed Specification</a></h1>

Around the world, public transit agencies make data available about their services, routes, and stops via a standardized data format called <a href="https://gtfs.org/">GTFS</a> (originally developed by Google). 

It has two parts: the static component contains information that changes rarely including locations of stops, routes and schedules. A new version of this static information is typically released every few months. Some agencies also provide a real-time component based on live GPS data from their buses, trains etc to provide up to the minute data about vehicle positions and arrival predictions - typically updated every 30 seconds.

This practical exercise will be based on only the static GTFS data.

Start by downloading the current GTFS schedule data for South East Queensland from:
https://gtfsrt.api.translink.com.au/ (https://gtfsrt.api.translink.com.au/GTFS/SEQ_GTFS.zip)

You will need to upload the following files to your Jupyter account in the cloud:
- <code>calendar.txt</code>
- <code>routes.txt</code>
- <code>stops.txt</code>
- <code>stop_times.txt</code>
- <code>trips.txt</code>

# Finding our way to the CBD via public transport
Our goal is to travel from where we live to the Bribane CBD via public transport.
We don't know where the closest stop is, we don't know which route the trains or buses follow and we don't know when those buses or trains will arrive. 

Once you have <code>stops.txt</code> uploaded to your Jupyter account, open it from the Jupyter File Browser to view its contents.

In [None]:
# Start by reading stops.txt into a Pandas data frame using the read_csv method and set the stop_id column as the index

import pandas
stops = # insert your code here

# display its contents
stops

In [None]:
# There are thousands of stops across south east Queensland. 
# Our first goal is to find some stops near where we live.

# We start by determining the longitude and latitude of the property where we live.
# Open google maps https://www.google.com/maps and locate the property where you currently live.
# Put a pin on that location and make note of the longitude and latitude. 
# The longitude should be about 153 and the latitude about -27

my_longitude = # insert your longitude
my_latitude = # insert your latitude

In [None]:
# Next we need to be able to measure the distance from our property to each of the stops. 
# To measure the distance between two pairs of longitude and latitude, we need to use a formula, 
# such as the haversine formula (https://en.wikipedia.org/wiki/Haversine_formula) to determine the
# distance between two points on a sphere (since the earth is not flat).
# The earth is not a perfect sphere, it's radius varies at different points, but we approximate its radius as 6371 kilometres.

import math

def haversine_distance(lon1, lat1, lon2, lat2):
      # convert decimal degrees to radians 
      lon1 = math.radians(lon1)
      lat1 = math.radians(lat1)
      lon2 = math.radians(lon2)
      lat2 = math.radians(lat2)
        
      # haversine formula 
      delta_lon = lon2 - lon1 
      delta_lat = lat2 - lat1 
      a =  math.sin(delta_lat/2)**2 +  math.cos(lat1) * math.cos(lat2) *  math.sin(delta_lon/2)**2
      c = 2 * math.asin(math.sqrt(a)) 
      r = 6371 # Radius of earth in kilometers.
      return c * r
    
# Test case Brisbane CBD to Nudgee
haversine_distance(-27.467834, 153.019079, -27.371936, 153.099357) # should be about 13 kilometres

In [None]:
# We can then use this function to compute the distance from our specified longitude and latitude, to each stop

def near(stop_row, lon, lat) :
    return haversine_distance(lon, lat, stop_row.stop_lat, stop_row.stop_lon)

stops['dist_from_home'] = stops.apply(near, lon=my_longitude, lat=my_latitude, axis=1)
stops # see the new column ...

In [None]:
# We can then sort the stops by this new column using the sort_values method

nearby_stops = # insert your code here
nearby_stops

In [None]:
# Let's choose the first of these stops and see which buses or trains are coming soon and where they are going to ...
our_stop_id = nearby_stops.index[0]
our_stop_id

In [None]:
# Read stop_times.txt into a data frame using the read_csv method.
# Set the data type of the stop_id column to type string by adding parameter: dtype={'stop_id':'str'}

stop_times = # insert your code here

In [None]:
# View just those stop_time rows that match our stop_id

# insert your code here

In [None]:
# Not all of those trips will necessarily be coming today. 
# Transit agencies run different schedules on different days of the week, especially for weekends and public holidays.
# To learn about these service schedules we need to load the calendar.txt file into a data frame.
# Set the service_id column as the index and parse the two date columns as dates

services = # insert your code here
services

In [None]:
# Start by viewing only those services that run on this day of the week.
# So, for example, if today is a Thurdsday, then we require services.thursday == 1

# insert your code here

In [None]:
# We also need to ensure that today falls within the start_date and end_date period of that service.
# For that we need to know today's date ...
import pytz
timezone = pytz.timezone('Australia/Brisbane')
today = pandas.Timestamp.now(tz=timezone).tz_localize(None)

In [None]:
# Find the list of service_ids for services that run today and are within the service start and end dates

todays_services = # insert your code here
todays_services

In [None]:
# Next we need to learn which trips occur on those services, so we need to load trips.txt into a Pandas data frame.
# Set the trip_id column as the index.

trips = # insert your code here
trips

In [None]:
# To test if a trip is part of a service, we can use the isin method
# trips.service_id.isin(todays_services)

# Find the list of trip_ids for those trips
todays_trips = # insert your code here

todays_trips

In [None]:
# We can then use this list of trip ids to find stop times matching these trip ids.
# stop_times.trip_id.isin(todays_trips)

# Find all stop times that stop at our stop today.
# insert your code here

In [None]:
# We aren't interested in trying to catch any trains or buses that have already departed, 
# so view only those stop times that have an arrival_time after the time now.

time_now = today.strftime('%H:%M:%S')

arriving_soon = # insert your code here
arriving_soon

In [None]:
# That's great, but we don't know where any of these trains or buses are going to ...
# So, we start by joining this stop_time data with the trips data frame
stops_with_trips = arriving_soon.join(trips, on='trip_id')
stops_with_trips

In [None]:
# We now have a trip_headsign column, which may help us determine where the bus or train is going
# We also now have a route_id, but it's not particularly meaningful.
# To get information about the route we need to join our stop_time and trip data with the route.txt data.

In [None]:
# Read routes.txt into a Pandas data frame.
# Set the route_id column as the index
routes = # insert your code here

In [None]:
# Join our stop_time and route data frame with the routes data frame based on the 'route_id' column

# insert your code here

In [None]:
# Filter the output so that we only see the trip_id, arrival_time, route_short_name, route_long_name and trip_headsign columns

summary = # insert your code here
summary

In [None]:
# Lets select one of those trips to explore precisely where it goes ...
our_trip_id = summary.iloc[0,0]

In [None]:
# Find all stop_times for our trip_id (do not restrict to our stop_id)

my_stops = # insert your code here
my_stops

In [None]:
# Unfortunately, these stop_ids don't mean anything to us,
# so we need to join this data with the stops data frame
# display only the arrival_time and stop_name

# insert your code here

In [None]:
# Will this get us towards the Brisbane CBD? If not, explore some other options.