With both the User profiles table (with its information condensed in our 'Variations' table, and 'County-level' granularity for origin) and the demands table (which tells us where the traffic goes for each county, with a 'Parish-level' granularity for destination), for **each demographic** we can estimate the **variations in travellers for county-parish routes**.  
We can use this information to pinpoint churn more accurately.

In [1]:
import pandas as pd

Our variation table and the demand table, with the dicofre code already translated to Parish name

In [2]:
variation = pd.read_csv('intermediate-data/variation.csv')
demand_parish = pd.read_csv('intermediate-data/demand.csv')

Merging both tables

In [3]:
parish_variation = pd.merge(demand_parish, variation,
                    on=['County_of_Origin'])

By multiplying the weight of each parish in the total demand of its county by the variation of travellers in each county, we can estimate the variations of travellers for each county-parish route. This is simply an estimation, which assumes the variations in number of travellers are similar for parishes in the same county

In [4]:
parish_variation['Parish_variation_abs'] = parish_variation.Demand_weight * parish_variation.Variation_abs
parish_variation['Parish_variation_rel'] = parish_variation.Demand_weight * parish_variation.Variation_rel

Dropping columns that do not refer to a specific county-parish route 

In [5]:
parish_variation=parish_variation.drop(['Region_of_Origin_x', 
                                        'District_of_Origin_x',
                                        'Average_BusUsers_per_Day_first', 
                                        'Average_BusUsers_per_Day_second', 
                                        'Variation_abs', 'Variation_rel'], 1)
# Note: removed duplicate columns, and columns not related to specific county-parish routes

Reordering columns and changing some column names

In [6]:
reordered_columns=['Region_of_Origin_y', 'District_of_Origin_y', 'County_of_Origin', 
'Region_of_Public_Transportation', 'District_of_Public_Transportation', 
'County_of_Public_Transportation', 'Parish_of_Public_Transportation', 
'GenderDescription', 'AgeClassDescription', 'Demand_weight',
'Parish_variation_abs', 'Parish_variation_rel']
parish_variation = parish_variation[reordered_columns]

In [7]:
parish_variation = parish_variation.rename(
                        columns={'Region_of_Origin_y': 'Region_of_Origin',
                                'District_of_Origin_y': 'District_of_Origin',
                                'Demand_weight': 'Demand_rel_county'})

In [8]:
parish_variation.to_csv('intermediate-data/parish-variation.csv', index=False)

Also saving a very simplified version of this table

In [9]:
simplified_columns = ['County_of_Origin', 'Parish_of_Public_Transportation', 
                      'GenderDescription', 'AgeClassDescription',
                      'Parish_variation_abs','Parish_variation_rel']

In [10]:
parish_variation[simplified_columns].to_csv('intermediate-data/parish-variation-simplified.csv', index=False)