## <span style="color:blue">Attaching geo data to survey locations</span>

### <span style="color:#008891">The geo data is calculated weekly*</span>

__What this does:__ Maps values from GIS data to survey data for each location. Alerts for locations with no GIS data

__When to use it:__ After running 'getdataforrepo' run this notebook.

The GIS data is updated weekly. 


#### Tasks:

1. Define the ranking boundaries for different geographic and demographic attributes at each survey location
2. Exports a file to resources/location_data directory:
   1. A .csv file with survey location data, attribute rankings and attribute values

questions or comments: analyst@hammerdirt.ch

In [1]:
# sys, file and nav packages:
import os
import datetime as dt
import csv
import json

# math packages:
import pandas as pd
import numpy as np
from scipy import stats
import datetime as dt 


# charting:
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib import ticker
import seaborn as sns
import matplotlib.gridspec as gridspec
from IPython.display import display, HTML


# home brew utitilties
import utilities.utility_functions as ut

# documenting
from IPython.display import Markdown as md

display(HTML("""
<style>
.output {
    display: flex;
    align-items: center;
    text-align: center;
}
</style>
"""))

In [2]:
# the local file structure. The resources are located in the corresponding directory.
# the purpose and date should be included in the filename when saving results to output
most_recent, survey_data, location_data, code_defs, stat_ent, geo_data, output = ut.make_local_paths()

In [3]:
today = dt.datetime.now().date().strftime("%Y-%m-%d")

In [4]:
# this is qgis output with aggregated values for meters of streets and number of river intersects
# within 1500 meters of the survey location
geo_data = pd.read_csv(F"{location_data}/beaches_no_buildings_data.csv")
geo_data.rename(columns={'length':'streets', 'water_na_1':'water_name_slug'}, inplace=True)
geo_data.set_index('slug', inplace=True)

# the surface area of buildings within 1500 meters of the survey site
blds_data = pd.read_csv(F"{location_data}/beaches_buildings.csv")
blds_data.set_index('slug', inplace=True)

# make map to results using the index as key(slug)
blds = blds_data['surface']
streets = geo_data['streets']
intersects = geo_data['intersects']

In [5]:
# aggregated survey data
dfAgg = pd.read_csv(F"{survey_data}/results_with_zeroes_aggregated_parent.csv")
dfAgg['date'] = pd.to_datetime(dfAgg['date'])

# non aggregated survey data
dfSurveys = pd.read_csv(F"{survey_data}/results_with_zeroes.csv")
dfSurveys['date'] = pd.to_datetime(dfSurveys['date'])

# beach data
dfBeaches = pd.read_csv(F"{location_data}/beaches_pop_bfs.csv")

# population data
popdata = pd.read_csv(F"{stat_ent}/STATPOP2018_GMDE.csv")
popdata.set_index('GDENR', inplace=True, drop=True)

# bfs number and commune keys
popkeys = pd.read_csv(F"{stat_ent}/bfs_num.csv")

In [6]:
project_directory = ut.make_project_folder(location_data, 'infrastructure_rankings')

### <span style="color:#008891">Map GIS data to beach data</span>

When the attributes surounding each survey location are calculated in QGIS the data is keyed to the survey location 'slug'. Therefore the 'slug' column of the beach data can be used to map the results from the GIS data to the beach data.

In [7]:
def check_values(df, x):
    try:
        package = df.loc[x]
    except:
        package = 'none'
    return package
    
dfBeaches['buildings'] = dfBeaches.slug.map(lambda x:check_values(blds, x))
no_buildings = dfBeaches[dfBeaches.buildings == 'none']

dfBeaches['streets'] = dfBeaches.slug.map(lambda x:check_values(streets, x,))
no_streets = dfBeaches[dfBeaches.streets == 'none']

dfBeaches['intersects'] = dfBeaches.slug.map(lambda x:check_values(intersects, x))
no_intersects = dfBeaches[dfBeaches.intersects == 'none']

no_geo_data = list(set(no_intersects.slug.unique()) | set(no_streets.slug.unique()) | set(no_buildings.slug.unique()))

print(F"These are the locations that have no geo data:\n\n{no_geo_data}\n")

print(F"{no_intersects}\n")
print(F"{no_streets}\n")
print(F"{no_buildings}\n")

These are the locations that have no geo data:

['via-brunari-spiaggia', 'spiaggia-parco-ciani', 'clean-up-event-test', 'foce-del-cassarate']

                     slug              location   latitude  longitude  post  \
49    clean-up-event-test   Clean up event test  46.457879   6.847148  1800   
60     foce-del-cassarate    Foce del Cassarate  46.002411   8.961477  6900   
194  spiaggia-parco-ciani  Spiaggia Parco Ciani  46.002510   8.960820  6900   
214  via-brunari-spiaggia  Via Brunari spiaggia  46.202350   9.016910  6500   

    country water           water_name   city_slug      water_name_slug  \
49       CH     r  Clean up tour vevey       vevey  clean-up-tour-vevey   
60       CH     r            Cassarate      lugano            cassarate   
194      CH     l       Lago di Lugano      lugano       lago-di-lugano   
214      CH     r               Ticino  bellinzona               ticino   

     is_2020        city  bfsnum  population buildings streets intersects  
49      T

#### <span style="color:#008891">Export the data</span>

In [8]:
filename = F"{location_data}/beaches_with_gis.csv"
dfBeaches.to_csv(filename, index=False)

#### Hopefully that just worked for you

if not contact analyst@hammerdirt.ch

In [9]:
author = "roger@hammerdirt.ch"
my_message = "Statistics is fun when you do it outside"
print(F"\nProduced by: {author}\nDate: {today}\n\n{my_message}")


Produced by: roger@hammerdirt.ch
Date: 2021-05-19

Statistics is fun when you do it outside
