In this notebook we'll try to add to the **Demand** table an **estimation of the daily number of travellers**.  
This will be based on the Demand_weight fields, which is relative to the county. The value of travellers from each county is a weighted mean obtained by grouping data from the Users profile/Variation tables

In [2]:
import pandas as pd

In [3]:
variation = pd.read_csv('intermediate-data/variation.csv')
demand = pd.read_csv('/home/primity/wdl/1-stage/intermediate-data/demand.csv')

We start by grouping by the county of origins, and getting the sum of all travellers from that county

In [4]:
county_sum = variation.groupby('County_of_Origin')[['Average_BusUsers_per_Day_first', 'Average_BusUsers_per_Day_second']].sum().reset_index()
county_sum.head(3)

Unnamed: 0,County_of_Origin,Average_BusUsers_per_Day_first,Average_BusUsers_per_Day_second
0,Alcochete,148.504551,157.793211
1,Almada,8328.202617,7266.927503
2,Amadora,8474.874368,6913.452852


And then calculate the mean number of travellers across these two periods, taking int account that the "Sep-19 to Feb-20" corresponds to 6 months and the "sep 20 to Jan-21" period is 5 months

In [5]:
county_sum['user_av'] = county_sum['Average_BusUsers_per_Day_first']*6/11 +county_sum['Average_BusUsers_per_Day_second']*5/11
county_sum=county_sum.drop(['Average_BusUsers_per_Day_first', 'Average_BusUsers_per_Day_second'], 1)

We can now merge this to the demand table, and by multiplying these values by the demand weight we can estimate the average number of travellers in each County-Parish "route"

In [7]:
demand_people = pd.merge(demand, county_sum,
                    right_on=['County_of_Origin'],left_on=['County_of_Public_Transportation'])
demand_people['Avg_daily_travellers'] = demand_people['Demand_weight'] * demand_people['user_av']

In [8]:
demand_people.sort_values('Avg_daily_travellers', ascending=False).head(3)

Unnamed: 0,Region_of_Origin,District_of_Origin,County_of_Origin_x,Region_of_Public_Transportation,District_of_Public_Transportation,County_of_Public_Transportation,Parish_of_Public_Transportation,Demand_weight,County_of_Origin_y,user_av,Avg_daily_travellers


In [7]:
demand_people.to_csv('intermediate-data/demand-people.csv', index=False)