# How many people are in Heathrow terminal 5 now?
I'm not aware of any footfall counting source, but I think the best way to determine this will be understanding the capacity of flights leaving and arriving, relative to the time now.

### Logic:
The number of people in Heathrow T5 right now is equal to: 
1. The capacity of flights leaving now and for the next 90 minutes __plus__ 
2. the capacity of flights which have landed in the last half an hour, as people don't tend to wait in the airport as long after they have landed.

### Steps:
1. Calculate number of seats on flights leaving t5 at __the present time + 90 minutes__
2. Calculate number of seats on flights which have __landed in the last 30 minutes__ as they don't tend to wait in the airport once they have arrived
3. Understanding the capacity of the planes from the plane type. Will need to match the flight # and plane type the flight number corresponds to

### Assumptions:
1. multiple plane types aren't assigned to the same plane number. i.e. BAW952 will always be an A380


### Blockers:
__Head count in each plane__  
presently I haven't found a source of information that tells you the capacity of a flight. This historical data could be bought from the airlines themselves and predicted
this script uses local UTC time within its filters on an updating datasource so it can be run at any time of day to get the most accurate answer

maybe i could go to an aggregator to source flight fullness. 

seats are in a different airport but the seat capacity seems wrong. 

__Terminal 5 issue__  
gates A1-A23, B32 - B48,  c52 - C66


__change of datasource__
- datasource is scraped from the live JSON that feeds the tables on https://www.heathrow.com/departures and https://www.heathrow.com/arrivals
- since I last looked at this, Heathrow have banned get requests to their server that feeds the website. Very annoying. 
- I've found something at https://flightaware.com/ that looks like it can deliver the same deal. 

## ongoing requirements/steps 
1. Can I turn my API call into a reusable function? Can I get it to run every 30 mins?
2. Can I print my results to a dataframe with date time?

In [13]:
import requests
import json
import datetime
import pandas as pd
from pandas.io.json import json_normalize
from datetime import date
from datetime import timedelta
from bs4 import BeautifulSoup

#### Save date and time to variables and format them

#### Flight aware

API documentation is good and can be found at https://flightaware.com/commercial/flightxml/explorer

##### What we need:
- arrival time - last hour (day will do and we can filter)
- departure time
- airport (if we make this function modular we can likely turn this into a variable field
- plane type - needed for capacities


In [125]:
username = "elliotamcbride"
apiKey = 'bbc28d34ef1fa87b15a85578b596fab184af1942'
fxmlUrl = "https://flightxml.flightaware.com/json/FlightXML2/"
endpoint = 'Departed'

payload = {'airport': 'EGLL', 'howMany': 10}
response = requests.get(fxmlUrl + endpoint,
	params=payload, auth=(username, apiKey))

if response.status_code == 200:
	flight_payload = response.json()
	print("Good request")
else:
	print("Error executing request")

Good request


### Retrieve JSON headers for analysis

In [None]:
flight_payload['DepartedResult']['departures']

### Parse departure data into dataframe

In [120]:
departures_payload_df = json_normalize(flight_payload['DepartedResult']['departures'])

In [130]:
departures_payload_df.head(10)

Unnamed: 0,ident,aircrafttype,actualdeparturetime,estimatedarrivaltime,actualarrivaltime,origin,destination,originName,originCity,destinationName,destinationCity
0,AIC112,B788,1603917334,1603946100,0,EGLL,VIDP,London Heathrow,"London, England",Indira Gandhi Int'l,New Delhi
1,ETH701,B789,1603917107,1603944000,0,EGLL,HAAB,London Heathrow,"London, England",Bole Int'l,Addis Ababa
2,CCA938,,1603915762,0,0,EGLL,,London Heathrow,"London, England",,
3,BAW946,A319,1603915602,1603918320,0,EGLL,EDDL,London Heathrow,"London, England",Dusseldorf Int'l,Dusseldorf
4,BAW544,A319,1603915431,1603920360,0,EGLL,LIPE,London Heathrow,"London, England",Bologna (Guglielmo Marconi),Bologna
5,BAW958,A319,1603915205,1603920000,0,EGLL,EDDM,London Heathrow,"London, England",Munich Int'l,Munich
6,BAW444,A320,1603914611,1603916940,1603916940,EGLL,EHAM,London Heathrow,"London, England",Amsterdam Schiphol,Amsterdam
7,ETD69K,,1603914159,0,0,EGLL,,London Heathrow,"London, England",,
8,EIN937,A320,1603914044,1603917180,1603917180,EGLL,EGAC,London Heathrow,"London, England",George Best Belfast City,"Belfast, Northern Ireland"
9,BAW634,32Q,1603913949,1603924800,0,EGLL,LGAV,London Heathrow,"London, England","Athens Int'l, Eleftherios Venizelos",Athens


### Rename column headers

In [132]:
df = departures_payload_df.rename(columns={'actualdeparturetime': 'departureTime'})

df = df[['aircrafttype', 'departureTime', 'ident']]


display(df)

Unnamed: 0,aircrafttype,departureTime,ident
0,B788,1603917334,AIC112
1,B789,1603917107,ETH701
2,,1603915762,CCA938
3,A319,1603915602,BAW946
4,A319,1603915431,BAW544
5,A319,1603915205,BAW958
6,A320,1603914611,BAW444
7,,1603914159,ETD69K
8,A320,1603914044,EIN937
9,32Q,1603913949,BAW634


 ***for plane use to recreate the dataframes, delete when back online***

### Change epoch date to date time format

In [9]:
df['departureTime'] = pd.to_datetime(df['departureTime'],unit='s')

print(df.head(2))

  aircrafttype       departureTime
0              2020-10-28 18:12:13
1         A320 2020-10-28 18:02:21


### get plane capacities

In [None]:
dummy_plane_capacities = pd.DataFrame({
    "capacity": [350, 160, 450, 200],
    "aircrafttype": ['A319', 'A380', '777', 'B772'],
})

display(dummy_plane_capacities)

### Get BA and Iberia plane types

In [None]:
def scrapeforflights(location, fleet_name):
    url = location
    websiteurl = requests.get(url).text 
    soup = BeautifulSoup(websiteurl, 'lxml')
    My_table = soup.find('table',{'class':'wikitable'})
    mytablebody = My_table.find('tbody')
    links = mytablebody.findAll('a')
    
    fleet_name = []
    for link in links:
        fleet_name.append(link.get('title'))
    #good point + return
    good_value_BA_planeTypes = list(filter(None, fleet_name)) 
    display(good_value_BA_planeTypes)
    return good_value_BA_planeTypes

In [110]:
#remove None types
good_value_BA_planeTypes = list(filter(None, ba_planes)) 
display(good_value_BA_planeTypes)

['Wikipedia:Citation needed',
 'Airbus A319-100',
 'Airbus A320-200',
 'Airbus A320neo',
 'Airbus A321-200',
 'Airbus A321neo',
 'Airbus A321XLR',
 'Airbus A330-200',
 'Airbus A330-300',
 'Airbus A350-900']

In [116]:
scrapeforflights('https://en.wikipedia.org/wiki/British_Airways_fleet', 'Brit airways')

[None,
 None,
 'Airbus A319-100',
 'Airbus A320-200',
 'Airbus A320neo',
 None,
 'Airbus A321-200',
 'Airbus A321neo',
 None,
 'Airbus A350-1000',
 None,
 'Airbus A380-800',
 'Boeing 777-200ER',
 'Boeing 777-300ER',
 None,
 'Boeing 777-9',
 None,
 None,
 'Boeing 787-8',
 'Boeing 787-9',
 'Boeing 787-10',
 None,
 None]

[None,
 None,
 'Airbus A319-100',
 'Airbus A320-200',
 'Airbus A320neo',
 None,
 'Airbus A321-200',
 'Airbus A321neo',
 None,
 'Airbus A350-1000',
 None,
 'Airbus A380-800',
 'Boeing 777-200ER',
 'Boeing 777-300ER',
 None,
 'Boeing 777-9',
 None,
 None,
 'Boeing 787-8',
 'Boeing 787-9',
 'Boeing 787-10',
 None,
 None]

In [117]:
#convert to data frame:
good_value_BA_planeTypes_df = pd.DataFrame()
    
good_value_BA_planeTypes_df['BA_planetypes'] = good_value_BA_planeTypes

#get rid of alfavalues
ba_plane_numbers = pd.DataFrame()

ba_plane_numbers['BA_planetypes'] = good_value_BA_planeTypes_df['BA_planetypes'].str.replace('Airbus ', '').str.replace('Boeing ', '').str.replace('neo', '')

print(good_value_BA_planeTypes_df)
#sweet

               BA_planetypes
0  Wikipedia:Citation needed
1            Airbus A319-100
2            Airbus A320-200
3             Airbus A320neo
4            Airbus A321-200
5             Airbus A321neo
6             Airbus A321XLR
7            Airbus A330-200
8            Airbus A330-300
9            Airbus A350-900


In [None]:
url = 'https://en.wikipedia.org/wiki/Iberia_(airline)'
websiteurl = requests.get(url).text 
soup = BeautifulSoup(websiteurl, 'lxml')

My_table = soup.find('table',{'class':'wikitable'})

print(My_table.prettify())
# links = My_table.find('a')

# fleet_name = []
    
# for link in links:
#     fleet_name.append(links.get('title'))
    
# fleet_name 

In [79]:
scrapeforfleet('https://en.wikipedia.org/wiki/Iberia_(airline)', 'Iberia')

[None]

### merge datasets

In [None]:
merged_flight_and_aircraft = dummy_flight_on_flight.merge(dummy_plane_capacities, on="aircrafttype", how="left")

print(merged_flight_and_aircraft)

In [10]:
today = date.today()

timenow = datetime.datetime.now()

stringoftimernow = timenow.strftime("%H:%M:%S")

print (str(today))
print(timenow)
print(stringoftimernow)

2020-10-25
2020-10-25 10:20:49.018556
10:20:49


### ***now get current time + 90 minutes***

Current time values

In [10]:
today = date.today()

timenow = datetime.datetime.now()

stringoftimernow = timenow.strftime("%H:%M:%S")

print (str(today))
print(timenow)
print(stringoftimernow)

2020-10-25
2020-10-25 10:20:49.018556
10:20:49


In [65]:
#filter
#create time filter values

timeplus90 = datetime.datetime.now() + timedelta(hours = 1.5)

time90 = timeplus90.strftime("%H:%M:%S")

#filter out non-relevant times
shortdf = df[(df.departureTime >= timenow) & (df.departureTime <= time90)]

depflights = shortdf.shape[0]

print(depflights)

0


## Filter to get t5 terminal only

### Terminal 5 issue
When I was scraping off the Heathrow page i could subset by terminal, unfortunately terminal information is not available on this API. However information tells us that only Iberia and British airways fly from T5 so I can just filter my results to those. I need to check that no other British airways or Iberia flights leave from outside Terminal 5 and for this - heathrow haven't blocked their jsons for terminal information so I can go scrape it from here:

https://www.heathrow.com/bin/heathrow/terminalLanding.en.json

Annoyingly, I can't scrape the terminal information. And I found out that Iberia and Heathrow also can fly from Terminal 3 so that filtering by carrier will be inaccurate. This sucks, the alternative is that I find out what routes BA and Iberia are from T5. This will likely involve destination information which I can't get now as I'm on a plane. I'll block this out and come back later. 

In [32]:
#Departure info request

# terminal_info = 'https://www.heathrow.com/bin/heathrow/terminalLanding.en.json'

# terminal_info_resp = requests.get(terminal_info)

# if terminal_info_resp.status_code == 200:
# 	terminal_flight_payload = terminal_info_resp.json()
# 	print("Good request")
# else:
# 	print("Error executing request")

Good request


In [None]:
Print head of terminal info

In [None]:
t5 = df[df.terminal == '5']

In [None]:
### mutate colum with just time.

In [None]:
t5['times'] = t5['datentime'].dt.time

---

# Arrival Flights

---
__ARRIVAL Data taken from:__ https://www.heathrow.com/arrivals
https://www.heathrow.com/fihub/activeArrivals/2019-11-19Z

In [139]:
arrurl = 'https://www.heathrow.com/fihub/activeArrivals/' + str(today) + 'Z'

arresp = requests.get(arrurl)
arf = arresp.json()

print(arf.keys())


dict_keys(['header', 'flightSummaryList', 'references'])


In [140]:
# Same wrangle different headers - so no for loop, Sorry!

arf_payload = json_normalize(arf['flightSummaryList']['flight'])

arf_df = arf_payload.rename(columns={'destination.terminalCode': 'terminal', 'destination.scheduledDateTime.local': 'datentime'})

arf_df = arf_df[['terminal', 'datentime']]

arf_df['datentime'] = pd.to_datetime(arf_df['datentime'], format='%Y-%m-%d')

#filter terminal
t5 = arf_df[arf_df.terminal == '5']

arf_df['times'] = arf_df['datentime'].dt.time

In [141]:
#filters

timeminus30 = datetime.datetime.now() + timedelta(hours = -.5)

arr_shortdf = arf_df[(arf_df.datentime >= timeminus30) & (arf_df.datentime <= timenow)]

arrflights = arr_shortdf.shape[0]

print(arrflights)

42


#there are 416 seats in a 747 and as we cannot scrape the airplane information
we are also going to assume that on average a plane is 80% full
there are 416 seats in a 747

In [143]:
departures = (depflights * 186) * .8

arrivals = (arrflights*186) * .8

#Total number of people in Heathrow T5 right now is (rounded to the nearest person)

print(round(departures + arrivals))

21278
