# Google Community Mobility Report (on COVID-19)

**DISCLAIMER**: This data is subject to change more often than the report structure and so most of the text will seen out of date if not viewed around the same time of creation of this report (12th May 2020).

Below is a short report and analysis of the Google Community Mobility Data for the UK. It makes use of the data provided by Google [here](https://www.google.com/covid19/mobility/) as well as data available through the Johns Hopkins [GitHub](https://www.google.com/covid19/mobility/).

In [133]:
import pandas
import plotly.graph_objects as go
from statsmodels.tsa.seasonal import seasonal_decompose, STL
import plotly.offline as pyo
pyo.init_notebook_mode(connected=True)

In [134]:
google_url = 'https://raw.githubusercontent.com/nshyam97/Google-Community-Mobility-Data/master/UK_Global_Mobility_Report.csv'
community_data = pandas.read_csv(google_url)
community_data.head()

Unnamed: 0,country_region,sub_region_1,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
0,United Kingdom,,2020-02-15,-12.0,-7.0,-35.0,-12.0,-4.0,2.0
1,United Kingdom,,2020-02-16,-7.0,-6.0,-28.0,-7.0,-3.0,1.0
2,United Kingdom,,2020-02-17,10.0,1.0,24.0,-2.0,-14.0,2.0
3,United Kingdom,,2020-02-18,7.0,-1.0,20.0,-3.0,-14.0,2.0
4,United Kingdom,,2020-02-19,6.0,-2.0,8.0,-4.0,-14.0,3.0


Above is the data made available from the Google Community Mobility Report. The full csv file provided by Google includes a number of different countries and also within each country, sub regions. For the sake of this analysis and also for the storage limits applied by GitHub, I am just using the UK data which contains sub regions as well. For the majority of this analysis, I will use the UK as a whole, but later on will look further into the differences across the country regions.

We can also see the different locations where we have mobility data and before we start the main part of the analysis, it would be important to define each of them according to how Google has classified them.

**Retail**: Places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.

**Grocery**: Places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies.

**Parks**: Places like national parks, public beaches, marinas, dog parks, plazas, and public gardens.

**Transit Stations**: Places like public transport hubs such as subway, bus, and train stations.

**Workplace**: Places of work.

**Residential**: Places of residence.

In [135]:
max(community_data.date)

'2020-05-07'

We can see that the dataset provides data from the 15th February 2020 till 2nd May 2020 at the time of writing (12th May 2020). Google outlines this in their website stating that the data is about 2-3 days behind which is why we will not have up-to-date data.

We can also see that we have a number of different areas in which we have mobility data which will allow us to see the difference between various industries.

First I'm going to clean the data slightly to make it easier to plot and understand. We need to change the column names to make them shorter and make the date column the index column.

In [136]:
community_data = community_data.set_index('date')
community_data = community_data.rename(columns={'retail_and_recreation_percent_change_from_baseline':'retail',
                      'grocery_and_pharmacy_percent_change_from_baseline':'grocery',
                      'parks_percent_change_from_baseline':'parks',
                      'transit_stations_percent_change_from_baseline':'transit-stations',
                      'workplaces_percent_change_from_baseline':'workplace',
                      'residential_percent_change_from_baseline':'residential'})
community_data_UK = community_data[community_data['sub_region_1'].isnull()]
community_data_UK.head()

Unnamed: 0_level_0,country_region,sub_region_1,retail,grocery,parks,transit-stations,workplace,residential
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-02-15,United Kingdom,,-12.0,-7.0,-35.0,-12.0,-4.0,2.0
2020-02-16,United Kingdom,,-7.0,-6.0,-28.0,-7.0,-3.0,1.0
2020-02-17,United Kingdom,,10.0,1.0,24.0,-2.0,-14.0,2.0
2020-02-18,United Kingdom,,7.0,-1.0,20.0,-3.0,-14.0,2.0
2020-02-19,United Kingdom,,6.0,-2.0,8.0,-4.0,-14.0,3.0


Firstly, just to show the data, I've plotted the percentage change in retail locations. We can see that there was a sharp rise in the days leading up to lockdown announcement and a further sharp drop on the day after the announcement was made. It then fluctuates depending on the day of the week but no apparent trend is seen on certain days.

In [137]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.retail, mode='lines'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='Mobility Changes to Retail Locations (UK)')
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 1, yref = 'paper',
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=-30, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

We can take this a step further and compare this with other locations that we have available to us and see how things change then.

In [138]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.retail, mode='lines', name='Retail'))
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.grocery, mode='lines', name='Grocery'))
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.parks, mode='lines', name='Parks'))
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK['transit-stations'], mode='lines', name='Transit'))
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.workplace, mode='lines', name='Workplace'))
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.residential, mode='lines', name='Residential'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='Mobility Changes to All Locations (UK)')
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 1, yref = 'paper',
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=-30, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

As we can see from the graph, we have a lot of lines for all the locations we have data for but we can see some interesting trends. For example, the parks seem to have reduced as the lockdown was announced but remained quite high up with roughly a -20-40% reduction from baseline. Most likely cause would be because the UK public were allowed to exercise once per day and so public parks were used for running or for walks. However, at the time of writing (13th May 2020), we have data up until 7th May 2020 and we can see that the use of parks has already increased over baseline to nearly +20%. 

On 10th May 2020, Boris Johnson announced the first stage of moving from level 4 to level 3 alert levels which are outlined in [this](https://order-order.com/wp-content/uploads/2020/05/FINAL-6.6637_CO_HMG_C19_Recovery_FINAL_110520_v1_WEB.pdf) 50 page document. One of the main points in this move towards level 3 is the ability to exercise an unlimited amount compared to just the 1 time before. This will inevitably cause the increase in mobility and number of people at parks but these rules only come into effect on 13th May 2020. The increase above baseline before the announcement was even made is worrying.

There seems to be a weird trend with the workplace mobility with an increase of people going to workplaces every Saturday and Sunday and droppping back down during the week. Google does not define the workplace further than "Place of work" so the range of workplaces could be vast and we would need more information to be able to see a reason why this could be happening.

We can look at the difference between Retail locations and Grocery locations as well to see how these compare.

In [139]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.retail, mode='lines', name='Retail'))
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.grocery, mode='lines', name='Grocery'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='Mobility Changes to Retail and Grocery Locations (UK)')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=-30, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

We can see that in the run up to the lockdown announcement that the retail locations reduce rapidly while the grocery locations see a sharp rise and fall before the lockdown. There was widespread news coverage due to many people hoarding toilet paper, handwash, hand sanitizers, and other household items in the run up to lockdown which would be a reason why the grocery locations saw a sharp rise.

From my [Apple Mobility Report](https://github.com/nshyam97/Apple-Mobility-Data/blob/master/trend_analysis.ipynb), I had data available for transit mobility, and we could use that to compare to the transit stations data that we have here so we can compare them both to see if they show similar results. First, I will have to get the [Apple Mobility Data](https://github.com/nshyam97/Apple-Mobility-Trends-Data) and arrange the data in a way that will allow me to plot it.

In [140]:
apple_url = 'https://raw.githubusercontent.com/nshyam97/Apple-Mobility-Trends-Data/master/applemobility.csv'
apple_data = pandas.read_csv(apple_url)
apple_data.head()

Unnamed: 0,geo_type,region,transportation_type,alternative_name,2020-01-13,2020-01-14,2020-01-15,2020-01-16,2020-01-17,2020-01-18,...,2020-05-05,2020-05-06,2020-05-07,2020-05-08,2020-05-09,2020-05-10,2020-05-11,2020-05-12,2020-05-13,2020-05-14
0,country/region,Albania,driving,,100.0,95.3,101.43,97.2,103.55,112.67,...,42.61,43.11,46.13,45.78,41.59,45.39,,,49.19,50.2
1,country/region,Albania,walking,,100.0,100.68,98.93,98.46,100.85,100.13,...,46.44,52.84,52.37,48.1,44.86,68.87,,,61.79,56.46
2,country/region,Argentina,driving,,100.0,97.07,102.45,111.21,118.45,124.01,...,33.63,35.13,35.56,40.25,33.82,19.82,,,38.87,41.01
3,country/region,Argentina,walking,,100.0,95.11,101.37,112.67,116.72,114.14,...,22.63,23.84,23.84,30.63,24.84,15.58,,,28.33,28.44
4,country/region,Australia,driving,,100.0,102.98,104.21,108.63,109.08,89.0,...,64.04,66.19,71.34,67.64,50.96,63.56,,,71.12,77.24


In [141]:
# Function to transpose the data. Currently the date values are columns and so we need to make these rows
# to make plotting the data easier
def transpose_df(region_name, transportation):
    # First choose the region and transportation type to create the usable dataframe
    df = apple_data[(apple_data['region'] == region_name) &
                       (apple_data['transportation_type'] == transportation)]
    # Drop the geo_type column as it isn't useful anymore
    df = df.drop(['geo_type', 'alternative_name'], axis=1)
    # Pivots the dataframe from a wide to a tall format. Move the Date and Values as separate rows and corresponding
    # columns.
    df_t = df.melt(['region', 'transportation_type'], var_name='Date', value_name='Value')
    # Convert date column to datetime column
    df_t.Date = pandas.to_datetime(df_t.Date, format='%Y-%m-%d')
    # Make date column the index column to allow for easier plotting
    df_t.set_index('Date', inplace=True)
    # Values are currently percentages with the first value being the baseline. To make it a change in
    # baseline mobility, minus all values by the first value.
    df_t.Value = df_t.Value - df_t.Value.iloc[0]
    # Round all the values to 2 decimal places
    df_t.Value = df_t.Value.round(2)
    # Return the finished dataframe, ready to plot
    return df_t

In [142]:
apple_uk_transit = transpose_df('UK', 'transit')
apple_uk_transit.head()

Unnamed: 0_level_0,region,transportation_type,Value
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-13,UK,transit,0.0
2020-01-14,UK,transit,4.2
2020-01-15,UK,transit,5.37
2020-01-16,UK,transit,3.89
2020-01-17,UK,transit,9.38


In [143]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK['transit-stations'], mode='lines', 
                         name='Google Transit Stations'))
fig.add_trace(go.Scatter(x=apple_uk_transit.index, y=apple_uk_transit.Value, mode='lines', name='Apple Transit', 
                        line=dict(color='green')))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='Mobility Changes of Google Transit Stations and Apple Transit Mobility (UK)')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=-30, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

So as we can see, the Apple data shows that there is a lower percentage of people using public transit than Google. However, we know that the data is slightly different between the 2, where Google is specifically targeting stations rather than the use of transit which is what Apple is showing. It is interesting to note that mobility towards to transit stations slightly lagged behind the Apple transit data, possibly showing that people were still travelling back in the run up to and just after the lockdown announcement.

We are also starting to see an increase in the mobility at transit stations from Google, but no such trend is apparent in the Apple data.

We could also compare the walking data provided by Apple and the parks data provided by Google as this is where most people usually choose to walk when they are on their provisioned 1 exercise of per day.

In [144]:
apple_uk_walking = transpose_df('UK', 'walking')
apple_uk_walking.head()

Unnamed: 0_level_0,region,transportation_type,Value
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-13,UK,walking,0.0
2020-01-14,UK,walking,6.14
2020-01-15,UK,walking,14.37
2020-01-16,UK,walking,12.59
2020-01-17,UK,walking,28.99


In [145]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=community_data_UK.index, y=community_data_UK.parks, mode='lines', 
                         name='Google Parks'))
fig.add_trace(go.Scatter(x=apple_uk_walking.index, y=apple_uk_walking.Value, mode='lines', name='Apple Walking', 
                        line=dict(color='green')))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='Mobility Changes of Google Parks and Apple Walking Mobility (UK)')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-28'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=30, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

As you can see, there are a lot of spikes in both data sources making it hard to see any trends. We can make this easier by using a 7 day moving average for both sets of data and plotting that instead. this will "smooth" out both curves and make it easier to see any trends.

In [146]:
apple_uk_walking = apple_uk_walking.dropna()
apple_uk_walking = apple_uk_walking.assign(average = STL(apple_uk_walking.Value, 
                                                        period=7, seasonal=7).fit().trend)
apple_uk_walking.head()

Unnamed: 0_level_0,region,transportation_type,Value,average
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-13,UK,walking,0.0,13.723451
2020-01-14,UK,walking,6.14,15.006524
2020-01-15,UK,walking,14.37,16.321487
2020-01-16,UK,walking,12.59,17.678344
2020-01-17,UK,walking,28.99,19.078312


In [147]:
google_parks = community_data_UK.dropna(subset = ['retail', 'grocery', 'parks', 
                                                  'transit-stations', 'workplace', 'residential'])
google_parks = google_parks.assign(average_parks = STL(google_parks.parks, 
                                                        period=7, seasonal=7).fit().trend)
google_parks.head()

Unnamed: 0_level_0,country_region,sub_region_1,retail,grocery,parks,transit-stations,workplace,residential,average_parks
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2020-02-15,United Kingdom,,-12.0,-7.0,-35.0,-12.0,-4.0,2.0,-7.438162
2020-02-16,United Kingdom,,-7.0,-6.0,-28.0,-7.0,-3.0,1.0,-4.531083
2020-02-17,United Kingdom,,10.0,1.0,24.0,-2.0,-14.0,2.0,-1.832865
2020-02-18,United Kingdom,,7.0,-1.0,20.0,-3.0,-14.0,2.0,0.78979
2020-02-19,United Kingdom,,6.0,-2.0,8.0,-4.0,-14.0,3.0,3.391201


In [148]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=google_parks.index, y=google_parks.average_parks, mode='lines', 
                         name='Google Parks'))
fig.add_trace(go.Scatter(x=apple_uk_walking.index, y=apple_uk_walking.average, mode='lines', name='Apple Walking', 
                        line=dict(color='green')))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='7 Day Moving Average of Mobility Changes of Google Parks and Apple Walking Mobility (UK)')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-28'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=30, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

Using a 7 day moving average allows us to see a better view of the overall trend between the 2. We can see that the trends of both generally match with the Apple walking data showing a much larger drop. This could be due to the fact that people do still cycle through parks and walking isn't the only form of exercise. We can also see that the dip in park activity around 30th April 2020 is also reflected in the Apple data as well but with a smaller distance.

However, the increase that is seen in the Google park data is not reflected in the Apple data even though the Apple data is available until 10th May 2020 whereas the Google data only goes up to 7th May 2020 (At time of writing 14th May 2020). This suggests that even with lag, the walking data provided by Apple does not reflect this increase shown in the parks.

A recent [report](https://www.itv.com/news/2020-05-14/data-study-cambridge-university-public-health-england-rate-of-infection-coronavirus-covid19-highest-north-east-yorkshire/) has suggested that there is a higher rate of COVID-19 cases present in the North East of England. We can use this by comparing different locations of different sub-regions to see if, for example, the North East isn't isolating as much as London which has one of the lowest rates while still being more densely populated.

First, let's see what sub-regions we have so we can identify a suitable North East region or regions to compare to London with.

In [149]:
community_data['sub_region_1'].unique()

array([nan, 'Aberdeen City', 'Aberdeenshire', 'Angus Council',
       'Antrim and Newtownabbey', 'Ards and North Down',
       'Argyll and Bute Council', 'Armagh City, Banbridge and Craigavon',
       'Bath and North East Somerset', 'Bedford', 'Belfast',
       'Blackburn with Darwen', 'Blackpool', 'Blaenau Gwent',
       'Borough of Halton', 'Bracknell Forest', 'Bridgend County Borough',
       'Brighton and Hove', 'Buckinghamshire',
       'Caerphilly County Borough', 'Cambridgeshire', 'Cardiff',
       'Carmarthenshire', 'Causeway Coast and Glens',
       'Central Bedfordshire', 'Ceredigion', 'Cheshire East',
       'Cheshire West and Chester', 'City of Bristol', 'Clackmannanshire',
       'Conwy Principal Area', 'Cornwall', 'County Durham', 'Cumbria',
       'Darlington', 'Denbighshire', 'Derby', 'Derbyshire',
       'Derry and Strabane', 'Devon', 'Dorset', 'Dumfries and Galloway',
       'Dundee City Council', 'East Ayrshire Council',
       'East Dunbartonshire Council', 'East Lo

According to Wikipedia, the North East of England is made up of Northumberland, County Durham and Tyne & Wear, all of which are present sub-regions in our data. We can also add the smaller regions within the North East: Hartlepool, Redcar and Cleveland, Middlesbrough, Darlington, and Stockton-on-Tees. An important thing to note however is that the Greater London population still dwarfs the population of the North East, with about 8 million in Greater London and only about 2.5 million in the entire North East. However, this won't affect the percentage change data as the baselines are calculated for each region as opposed to a nationwide baseline. So a 5% change in London and the North East will reflect a different real change but the same percentage change.

First, we should identify and extract our regions for comparison.

In [150]:
tandw = community_data[community_data['sub_region_1'] == 'Tyne and Wear']
durham = community_data[community_data['sub_region_1'] == 'County Durham']
northumberland = community_data[community_data['sub_region_1'] == 'Northumberland']
hartlepool = community_data[community_data['sub_region_1'] == 'Hartlepool']
redcar = community_data[community_data['sub_region_1'] == 'Redcar and Cleveland']
middlesbrough = community_data[community_data['sub_region_1'] == 'Middlesbrough']
darlington = community_data[community_data['sub_region_1'] == 'Darlington']
stockton = community_data[community_data['sub_region_1'] == 'Stockton-on-Tees']
london = community_data[community_data['sub_region_1'] == 'Greater London']

# Smaller regions within the northeast have null values so fill with 0s to prevent null values appearing
# in the final dataframe
hartlepool = hartlepool.fillna(0)
redcar = redcar.fillna(0)
middlesbrough = middlesbrough.fillna(0)
darlington = darlington.fillna(0)
stockton = stockton.fillna(0)

To make things easier and clearer on the plots, I will combine the 3 sub-regions into one North East dataframe using just an average of each location for each day. This will make it easier to visualise on the graph as well as giving an overall figure for the North East. Unfortunately this is not a perfect average as we don't have access to the underlying data or the baseline figure for each region so it won't accurate.

In [151]:
northeast = tandw
northeast = northeast.drop(['sub_region_1'], axis=1)
northeast = northeast.copy()

for i in range(len(northeast)):
    northeast.iloc[i,1] = (tandw['retail'][i] + durham['retail'][i] + northumberland['retail'][i] 
                           + hartlepool['retail'][i] + redcar['retail'][i] + 
                           darlington['retail'][i] + stockton['retail'][i])/7
    northeast.iloc[i,2] = (tandw['grocery'][i] + durham['grocery'][i] + northumberland['grocery'][i] 
                           + hartlepool['grocery'][i] + redcar['grocery'][i] + 
                           darlington['grocery'][i] + stockton['grocery'][i])/7
    northeast.iloc[i,3] = (tandw['parks'][i] + durham['parks'][i] + northumberland['parks'][i] 
                           + hartlepool['parks'][i] + redcar['parks'][i] + 
                           darlington['parks'][i] + stockton['parks'][i])/7
    northeast.iloc[i,4] = (tandw['transit-stations'][i] + durham['transit-stations'][i] 
                           + northumberland['transit-stations'][i] + hartlepool['transit-stations'][i] 
                           + redcar['transit-stations'][i] + darlington['transit-stations'][i] 
                           + stockton['transit-stations'][i])/7
    northeast.iloc[i,5] = (tandw['workplace'][i] + durham['workplace'][i] + northumberland['workplace'][i] 
                           + hartlepool['workplace'][i] + redcar['workplace'][i] + 
                           darlington['workplace'][i] + stockton['workplace'][i])/7
    northeast.iloc[i,6] = (tandw['residential'][i] + durham['residential'][i] + northumberland['residential'][i] 
                           + hartlepool['residential'][i] + redcar['residential'][i] + 
                           darlington['residential'][i] + stockton['residential'][i])/7

northeast.head()

Unnamed: 0_level_0,country_region,retail,grocery,parks,transit-stations,workplace,residential
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-02-15,United Kingdom,-8.571429,-4.428571,-33.714286,-7.571429,-1.857143,1.428571
2020-02-16,United Kingdom,1.428571,-2.857143,-8.142857,-2.571429,-0.428571,0.142857
2020-02-17,United Kingdom,8.285714,-0.857143,16.285714,1.0,-15.285714,2.285714
2020-02-18,United Kingdom,8.714286,-1.571429,11.571429,-2.142857,-14.285714,2.714286
2020-02-19,United Kingdom,7.714286,-1.714286,13.857143,-0.428571,-14.571429,2.142857


The following graphs will just be comparisons between the regions per location. As with the Parks graph above, I will use a 7 day moving average for all of the graphs to make it easier to see large scale trends and to minimise the cylic nature of this data leading to spikes.

In [152]:
northeast = northeast.assign(average_retail = STL(northeast.retail, period=7, seasonal=7).fit().trend)
london = london.assign(average_retail = STL(london.retail, period=7, seasonal=7).fit().trend)

fig = go.Figure()
fig.add_trace(go.Scatter(x=northeast.index, y=northeast.average_retail, mode='lines', name='NE Retail'))
fig.add_trace(go.Scatter(x=london.index, y=london.average_retail, mode='lines', name='LDN Retail'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='7 day moving average of Mobility Changes to Retail Locations in the North East and London')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=-30, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

In [153]:
northeast = northeast.assign(average_grocery = STL(northeast.grocery, period=7, seasonal=7).fit().trend)
london = london.assign(average_grocery = STL(london.grocery, period=7, seasonal=7).fit().trend)

fig = go.Figure()
fig.add_trace(go.Scatter(x=northeast.index, y=northeast.average_grocery, mode='lines', name='NE Grocery'))
fig.add_trace(go.Scatter(x=london.index, y=london.average_grocery, mode='lines', name='LDN Grocery'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='7 day moving average of Mobility Changes to Grocery Locations in the North East and London')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=5, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

In [154]:
northeast = northeast.assign(average_parks = STL(northeast.parks, period=7, seasonal=7).fit().trend)
london = london.assign(average_parks = STL(london.parks, period=7, seasonal=7).fit().trend)

fig = go.Figure()
fig.add_trace(go.Scatter(x=northeast.index, y=northeast.average_parks, mode='lines', name='NE Parks'))
fig.add_trace(go.Scatter(x=london.index, y=london.average_parks, mode='lines', name='LDN Parks'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='7 day moving average of Mobility Changes to Park Locations in the North East and London')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=20, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

In [155]:
northeast = northeast.assign(average_transit = STL(northeast['transit-stations'], period=7, seasonal=7).fit().trend)
london = london.assign(average_transit = STL(london['transit-stations'], period=7, seasonal=7).fit().trend)

fig = go.Figure()
fig.add_trace(go.Scatter(x=northeast.index, y=northeast.average_transit, mode='lines', 
                         name='NE Transit Stations'))
fig.add_trace(go.Scatter(x=london.index, y=london.average_transit, mode='lines', 
                         name='LDN Transit Stations'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='7 day moving average of Mobility Changes to Transit Station Locations in the North East and London')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=2.5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=-25, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

In [156]:
northeast = northeast.assign(average_wp = STL(northeast.workplace, period=7, seasonal=7).fit().trend)
london = london.assign(average_wp = STL(london.workplace, period=7, seasonal=7).fit().trend)

fig = go.Figure()
fig.add_trace(go.Scatter(x=northeast.index, y=northeast.average_wp, mode='lines', 
                         name='NE Workplaces'))
fig.add_trace(go.Scatter(x=london.index, y=london.average_wp, mode='lines', 
                         name='LDN Workplaces'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='7 day moving average of Mobility Changes to Workplace Locations in the North East and London')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=2.5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=-25, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

In [157]:
# There is (currently) 1 NA value in this data so replace with a value that is the 
# average of the values either side
null_index = northeast.residential.isna().to_numpy().nonzero()[0][0]
above_index, below_index = null_index+1, null_index-1
average_for_null = (northeast.residential[above_index] + northeast.residential[below_index])/2
northeast.residential = northeast.residential.fillna(average_for_null)

In [158]:
northeast = northeast.assign(average_res = STL(northeast.residential, period=7, seasonal=7).fit().trend)
london = london.assign(average_res = STL(london.residential, period=7, seasonal=7).fit().trend)

fig = go.Figure()
fig.add_trace(go.Scatter(x=northeast.index, y=northeast.average_res, mode='lines', 
                         name='NE Residential'))
fig.add_trace(go.Scatter(x=london.index, y=london.average_res, mode='lines', 
                         name='LDN Residential'))
fig.update_layout(xaxis_title='Date', yaxis_title='Percentage Change from baseline(%)', 
                  title='7 day moving average of Mobility Changes to Residential Locations in the North East and London')
fig.add_shape(dict(type = "line", y0 = 0, y1 = 1, yref = "paper",
            x0 = pandas.Timestamp('2020-03-23'), x1 = pandas.Timestamp('2020-03-23'), 
            line=dict(color='red', width=2)))
fig.add_shape(dict(type = 'line', y0 = 0, y1 = 0,
            x0 = 0, x1 = 1, xref='paper', 
            line=dict(color='black', width=2, dash='dot')))
fig.add_annotation(x=pandas.Timestamp('2020-04-26'), y=2.5, xref='x', yref='y', text='Baseline', showarrow=False)
fig.add_annotation(x=pandas.Timestamp('2020-03-24'), y=5, xref='x', yref='y', 
                   text='Lockdown announced', showarrow=True, ax=65, 
                  ay=-40, arrowhead=2, arrowsize=1)
fig.show(renderer='notebook_connected')

As we can see from the above graphs, there seems to be a general trend that there is a lower percentage change in the NE compared to the London area. For example, the transit stations show one of the biggest differences between the 2 locations. There are far more transit stations available to Londoners than in the North East but this should not matter too much as the percentage change from their respective baselines would negate the differences. We can also see that the North East workplaces are higher which could be why the transit stations are still higher as people still need to use public transport to get to work. This could be due to the fact that London has a large concentration of IT jobs, which, for the most part, are the easiest industry to work from home for compared to the North East which doesn't have as big of an IT job concentration.

We can also see that the park activity is much greater in London and much larger percentage changes between days and weeks. With numerous reports of London parks still being busy and the recent warm weather that has reached the UK, it is not unsurprising to see that in both locations, the parks are above the baseline figure with London at a far greater change. It is something that should not be encouraged when the virus is still at large, even if the numbers are dwindling and can still put a number of people at risk.