# Sign processing

This notebook starts by fetching observations from the WFS defined layer, the main idea is to convert the sps scripts to python code, which we could then execute directly via github and/or process without having to have SPSS installed.
The notebook requires `pandas`, as per the instructions in the requirements file.

In [None]:
from owslib.wfs import WebFeatureService
from datetime import datetime
import time
import json

import numpy as np
import pandas as pd
from datetime import date

## Configuration
Configuration variables are defined here, this is only temporary since this code will all be converted to scripts.

In [None]:
url = "https://opendata.apps.mow.vlaanderen.be/opendata-geoserver/awv/wfs?version=2.0.0" #&service=wfs&request=GetCapabilities"
wfs = WebFeatureService(url=url, version="2.0.0", timeout=3600)
vb_type_name = "awv:Verkeersborden.Vlaanderen_Borden"

# Configuration
# Output file where we will store the WFS results
feature_output_file = "output.csv"
# Previous processed data, used to filter out previous data
previous_processed_date = "31/07/2022"
# Previous traffic signs
traffic_signs_info = "../find-interesting-signs/road_signs_cleaned.csv"
# Traffic sign processing output file
processing_output_file = "maproulette.csv"


## Fetch number of features
Fetch all the features for the required layer from the WFS service, we use this later on to query for them all.

In [None]:
def get_total_features_by_type(feature_type):
    response = wfs.getfeature(typename=feature_type, outputFormat="json", maxfeatures=1)
    r = response.read()
    d = r.decode('UTF-8')
    j = json.loads(d)
    return j['totalFeatures'] 

total_features = get_total_features_by_type(vb_type_name)
print("{}: #features = {}".format(datetime.now(), total_features))

## Obtain and store the signs
Fetch data from WFS, rmeove line breaks and store into the defined csv file.

In [None]:
def remove_linebreaks(data):
    replace1 = data.replace(b'\n',b' ')
    replace2 = replace1.replace(b'\r ',b'\r\n')
    return replace2

def get_and_store_features(file_name, feature_type, max_features):
    response = wfs.getfeature(typename=feature_type, maxfeatures=max_features, outputFormat="csv", startindex=0)
    cleaned_response = remove_linebreaks(response.read())
    decoded_response = cleaned_response.decode('UTF-8')

    with open(file=file_name, encoding='UTF-8', mode='w', newline='') as csvfile:
        csvfile.write(decoded_response)
        
print("{}: Starting fetching data from WFS service".format(datetime.now()))
get_and_store_features(feature_output_file, vb_type_name, total_features)
print("{}: WFS data stored in {}".format(datetime.now(), feature_output_file))

## Process data

Load the signs data in `panda` dataframes, this data is filtered by the `previous_processed_date` and joined with the signs metadata by `bordcode`.

**Note:** All this code is dataset specific, ideally this should be abstracted away, including column definitions.

In [None]:
feature_df = pd.read_csv(feature_output_file)

In [None]:
feature_df.dtypes

### Date filtering

Filter the dataframe for all signs with date greater than the `previous_processed_date` configuration value. This is done by: 1) converting the `datum_plaatsing` to date in the `date` column, and 2) filtering the dataframe.

In [None]:
print(f"The file containes {len(feature_df)} features before filtering by date.")
feature_df['date'] = pd.to_datetime(feature_df['datum_plaatsing'], errors = 'coerce', infer_datetime_format=True)
filter_mask = feature_df['date'].notna() & (dataframe["date"] > previous_processed_date)
filtered_df = feature_df[filter_mask]
print(f"The file contains {len(filtered_df)} features after filtering by date greater than {previous_processed_date}.")

### Data parsing anc vonersion

Some small conversion on the `bordcode` field, as per the SPS code. This code also create the identifier removing the string from the `FID` value.

In [None]:
# Bordcode processing, remove Z from it and add (zone) description.
filtered_df['bordcode'] = filtered_df.apply(lambda row: (f"{row['bordcode'][1:]} (zone)" if row['bordcode'].startswith('Z') else row['bordcode']).replace("/", ""), axis=1)
# Replace strings from FID
filtered_df['id'] = filtered_df['FID'].str.replace('Verkeersborden.Vlaanderen_Borden.','')
filtered_df.drop(columns=['FID'])
# This will need require some cleaning on the parameters as well. Probably better to do it before saving.

In [None]:
sign_metadata = pd.read_csv(traffic_signs_info, sep=";", encoding = "ISO-8859-1")
sign_metadata.dtypes

### Join and grouping

Merge the sign metadata with the current dataset based on the `bordcode` field. Then group by `id_aanzicht` to identified clustered signs. After that we get the required values and store them based on `processing_output_file` configuration value.

In [None]:
# Join both datasets by the bordcode
joined_df = filtered_df.join(sign_metadata.set_index("bordcode"), on='bordcode')
# Remove NaN parameters and name
joined_df[['parameters', 'name']] = joined_df[['parameters','name']].fillna('')
joined_df.dtypes
display(joined_df)

In [None]:
grouped_df = joined_df.groupby('id_aanzicht', as_index=False).agg({
     'opinion': 'max', 
     'bordcode': ' | '.join,
     'locatie_x': 'max',
     'locatie_y': 'max',
     'parameters': lambda x : '|'.join(y for y in x if y != ''),
     'name': lambda x : '|'.join(y for y in x if y != ''),
     'datum_plaatsing': 'max',
     'id': 'max'})
grouped_df = grouped_df[grouped_df['opinion'] > 0]
print(f"Found {len(grouped_df)} signs after grouping by id_aanzicht")
display(grouped_df)

In [None]:
result = grouped_df.rename(columns = {
    "bordcode": "traffic_sign_code", 
    "parameters": "extra_text",
    "datum_plaatsing": "date_installed",
    "name": "traffic_sign_description"
})[['id', 'traffic_sign_code', 'extra_text', 'traffic_sign_description', 'date_installed', 'locatie_x', 'locatie_y']]

In [None]:
result.to_csv(processing_output_file, sep=";")

## TODO

The panda dataframe should be stored in geojson format already, including the comversion to ESPG:4326.

```
open in QGIS as Lambert 72, EPSG 31370), save as GeoJSON in ESPG:4326
```