# Scraping Telraam Data

Inspired by the Datathon in 2020 (https://lstat.kuleuven.be/Datathon2020/data) and the results of our NLP notebook, we chose to scrape some extra traffic data from Telraam. A telraam device counts the number of pedestrians, cyclists, and vehicles passing by the sensor. This traffic data could be a valuable addition to the noise and weather data, since the noise sources are often humans or vehicles. The locations of the Telraam monitors were found on the Telraam map (https://www.telraam.net/en#15/50.8748/4.7005). In the neighbourhood of Naamsestraat, there are 3 Telraam monitors that measure traffic passing by some locations in our noise dataset. However, the monitors tracked traffic for different periods in 2022 (or none).

- The first Telraam on Naamsestraat tracked traffic data passing Naamsestraat 81 and Parkstraat 2 (La Filosofia), but no period in 2022. (https://www.telraam.net/en/location/351160)
- The second Telraam on Naamsestraat monitors traffic passing by the Calvariekapel and His&Hears, for the period of 01-01-2022 until 29-09-2022. (https://www.telraam.net/en/location/347295)
- The third Telraam on Naamsestraat measures traffic data close to (not passing) Naamsestraat 62 (Taste), for the period 01-01-2022 until 07-10-2022. (https://www.telraam.net/en/location/9000000637)

Based on the period of monitoring the traffic, we chose to only scrape the data of the second (segment 347295) and third Telraam device (segment 9000000637). 

In this notebook, we are scraping the data and exploring it using some descriptive statistics and basic visualizations. The dataset will afterwards be used for modelling in another notebook. More info on Telraam can be found on https://telraam.net/self-measure/what-is-telraam. In order to get an API key, a personal account was created on the Telraam website. More info on how to scrape the data can be found on https://documenter.getpostman.com/view/8210376/TWDRqyaV and https://telraam.helpspace-docs.io/article/27/you-wish-more-data-and-statistics-telraam-api.

_Note: the third Telraam monitors the traffic that is very close to Naamsestraat 62, but that might not actually pass the location. This might influence the modelling results._ 




### Scraping segment 347295

In [1]:
# Loading packages
import requests
import json
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer

In [3]:
#url = "https://telraam-api.net/v1/reports/traffic"

#body = {
 #   "id":"347295", # the segment id for the 2nd Telraam device
  #  "time_start":"2022-06-01 00:00:00Z", # request time interval is limited to 3 months, therefore we choose the period june-august 2022 since Calvariekapel has noise data for this period
   # "time_end":"2022-09-01 00:00:00Z",
    #"level":"segments",
    #"format":"per-hour"
#}

#headers = {
 # 'X-Api-Key': 'qPiL4G41LO3BIskRRucBn5xN119IkNyO2B107hOa' # API key needs to be removed after data is exported !
#}

#payload = str(body)

#response = requests.request("POST", url, headers=headers, data=payload)
#print(response.text)


{"status_code": 200, "message": "ok", "report": [{"instance_id": -1, "segment_id": 347295, "date": "2022-06-01T00:00:00.000Z", "interval": "hourly", "uptime": 0.0, "heavy": 0.0, "car": 0.0, "bike": 0.0, "pedestrian": 0.0, "heavy_lft": 0.0, "heavy_rgt": 0.0, "car_lft": 0.0, "car_rgt": 0.0, "bike_lft": 0.0, "bike_rgt": 0.0, "pedestrian_lft": 0.0, "pedestrian_rgt": 0.0, "direction": 1, "car_speed_hist_0to70plus": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], "car_speed_hist_0to120plus": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], "timezone": "Europe/Brussels", "v85": null}, {"instance_id": -1, "segment_id": 347295, "date": "2022-06-01T01:00:00.000Z", "interval": "hourly", "uptime": 0.0, "heavy": 0.0, "car": 0.0, "bike": 0.0, "pedestrian": 0.0, "heavy_lft": 0.0, "heavy_rgt": 0.0, "car_lft": 0.0, "car_rgt": 0.0, "bike_lft": 0.0, "bike_rgt": 0.0, "pedestrian_lft": 0.0, "pedestrian_rgt": 0.0, "direction": 1, "car_s

In [4]:
#json = response.json()
#data_segment347295 = pd.DataFrame(json['report']) # create dataframe from json object
#print(data_segment347295.head(10)) 


   instance_id  segment_id                      date interval    uptime   
0           -1      347295  2022-06-01T00:00:00.000Z   hourly  0.000000  \
1           -1      347295  2022-06-01T01:00:00.000Z   hourly  0.000000   
2           -1      347295  2022-06-01T02:00:00.000Z   hourly  0.000000   
3           -1      347295  2022-06-01T03:00:00.000Z   hourly  0.146111   
4           -1      347295  2022-06-01T04:00:00.000Z   hourly  0.791667   
5           -1      347295  2022-06-01T05:00:00.000Z   hourly  0.794167   
6           -1      347295  2022-06-01T06:00:00.000Z   hourly  0.752778   
7           -1      347295  2022-06-01T07:00:00.000Z   hourly  0.755556   
8           -1      347295  2022-06-01T08:00:00.000Z   hourly  0.756111   
9           -1      347295  2022-06-01T09:00:00.000Z   hourly  0.750556   

       heavy         car        bike  pedestrian  heavy_lft  ...    car_rgt   
0   0.000000    0.000000    0.000000    0.000000   0.000000  ...   0.000000  \
1   0.000000    

In [5]:
#data_segment347295.to_csv('Data/TelraamCalvariekapel.csv')

In [2]:
df = pd.read_csv('Data/TelraamCalvariekapel.csv', header = 0, sep=',')
df.head(10)

Unnamed: 0.1,Unnamed: 0,instance_id,segment_id,date,interval,uptime,heavy,car,bike,pedestrian,...,car_rgt,bike_lft,bike_rgt,pedestrian_lft,pedestrian_rgt,direction,car_speed_hist_0to70plus,car_speed_hist_0to120plus,timezone,v85
0,0,-1,347295,2022-06-01T00:00:00.000Z,hourly,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",Europe/Brussels,
1,1,-1,347295,2022-06-01T01:00:00.000Z,hourly,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",Europe/Brussels,
2,2,-1,347295,2022-06-01T02:00:00.000Z,hourly,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",Europe/Brussels,
3,3,-1,347295,2022-06-01T03:00:00.000Z,hourly,0.146111,20.532319,6.844106,20.532319,0.0,...,0.0,13.688213,6.844106,0.0,0.0,1,"[0.0, 0.0, 100.0, 0.0, 0.0, 0.0, 0.0, 0.0]","[0.0, 0.0, 0.0, 0.0, 100.0, 0.0, 0.0, 0.0, 0.0...",Europe/Brussels,21.5
4,4,-1,347295,2022-06-01T04:00:00.000Z,hourly,0.791667,37.894737,87.157895,56.842105,2.526316,...,18.947368,29.052632,27.789474,0.0,2.526316,1,"[7.2463768116, 15.9420289855, 55.0724637681, 8...","[0.0, 7.2463768116, 2.8985507246, 13.043478260...",Europe/Brussels,35.0
5,5,-1,347295,2022-06-01T05:00:00.000Z,hourly,0.794167,33.997901,156.13851,175.026233,2.518363,...,26.442812,81.8468,93.179433,0.0,2.518363,1,"[1.6129032258, 21.7741935484, 54.0322580645, 9...","[1.6129032258, 0.0, 6.4516129032, 15.322580645...",Europe/Brussels,35.5
6,6,-1,347295,2022-06-01T06:00:00.000Z,hourly,0.752778,49.151292,263.02583,475.571956,19.926199,...,50.479705,257.712177,217.859779,2.656827,17.269373,1,"[4.5454545455, 33.3333333333, 48.4848484848, 5...","[2.0202020202, 2.5252525253, 7.0707070707, 26....",Europe/Brussels,27.0
7,7,-1,347295,2022-06-01T07:00:00.000Z,hourly,0.755556,29.117647,210.441176,338.823529,9.264706,...,70.147059,203.823529,135.0,2.647059,6.617647,1,"[16.9811320755, 22.641509434, 34.5911949686, 1...","[11.320754717, 5.6603773585, 7.5471698113, 15....",Europe/Brussels,37.0
8,8,-1,347295,2022-06-01T08:00:00.000Z,hourly,0.756111,34.386481,197.060985,234.092579,13.225569,...,67.450404,122.997796,111.094783,6.612785,6.612785,1,"[24.1610738255, 20.8053691275, 33.5570469799, ...","[10.7382550336, 13.4228187919, 9.3959731544, 1...",Europe/Brussels,33.5
9,9,-1,347295,2022-06-01T09:00:00.000Z,hourly,0.750556,14.655811,242.487047,182.531458,9.326425,...,70.61436,125.240563,57.290896,5.329386,3.997039,1,"[29.1208791209, 28.5714285714, 26.3736263736, ...","[23.6263736264, 5.4945054945, 6.5934065934, 21...",Europe/Brussels,28.5


### Preprocessing segment 347295

In the following section, the Telraam data for segment 347295 will be preprocessed. We start by only keeping the columns that are useful for the modeling. The variables we want to keep are the counted number of pedestrians, cyclists, cars, and heavy vehicles. The speed related measurements will not be included in the modeling, since Telraam reports that their accuracy is around 10%. The date and timezone variables are kept in order to create a date variable on which the noise and weather data can be merged later on.

In [17]:
# Pipeline
# Step 1: Drop unnecessary columns
def drop_columns(df):
    columns_to_keep = [ 'date', 'heavy', 'car', 'bike', 'pedestrian', 'timezone']
    columns_to_drop = set(df.columns) - set(columns_to_keep)
    return df.drop(columns=columns_to_drop)

# Step 2: Add Calvariekapel as location 
def add_location(df):
    df['description'] = 'MP 05: Calvariekapel KU Leuven'
    return df

# Step 3: Convert timestamps to datetime
def convert_to_datetime(df):
    df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%dT%H:%M:%S.%fZ')
    return df

# Step 3: Extract month, day, hour from timestamps
def extract_time(df):
    df['month'] = df['date'].dt.month
    df['day'] = df['date'].dt.day
    df['hour'] = df['date'].dt.hour
    return df

# Step 4: Define a custom transformer to create the new column date
class DateTransformer:
    def transform(self, df):
        df['year'] = 2022
        df['date'] = df.apply(lambda row: pd.to_datetime(f"{int(row['day']):02d}-{int(row['month']):02d}-{int(row['year']):04d}", format='%d-%m-%Y'), axis=1)
        df['date'] = df['date'].dt.strftime('%d-%m-%Y')
        return df

    def fit(self, df, y=None):
        return self
    
# Define the pipeline
pipeline_traffic_hourly = Pipeline([
    ('drop_columns', FunctionTransformer(drop_columns)),
    ('add_location', FunctionTransformer(add_location)),
    ('convert_to_datetime', FunctionTransformer(convert_to_datetime)),
    ('extract_time', FunctionTransformer(extract_time)),
    ('date_transformer', DateTransformer())
])

In [19]:
# Apply the pipeline
traffic_preprocessed = pipeline_traffic_hourly.fit_transform(df)
traffic_preprocessed.head()

Unnamed: 0,date,heavy,car,bike,pedestrian,timezone,description,month,day,hour,year
0,01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,0,2022
1,01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,1,2022
2,01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,2,2022
3,01-06-2022,20.532319,6.844106,20.532319,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,3,2022
4,01-06-2022,37.894737,87.157895,56.842105,2.526316,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,4,2022


In [20]:
# exporting file (only needs to be run one time so comment it out)
traffic_preprocessed.to_csv('hourly_traffic_2022.csv', index=False)  

In [8]:
'''
# Drop unnecessary columns
columns_to_keep = [ 'date', 'heavy', 'car', 'bike', 'pedestrian', 'timezone']
df = df[columns_to_keep]
df.head(5)
'''

Unnamed: 0,date,heavy,car,bike,pedestrian,timezone
0,2022-06-01T00:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels
1,2022-06-01T01:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels
2,2022-06-01T02:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels
3,2022-06-01T03:00:00.000Z,20.532319,6.844106,20.532319,0.0,Europe/Brussels
4,2022-06-01T04:00:00.000Z,37.894737,87.157895,56.842105,2.526316,Europe/Brussels


We add the location to the dataframe, which corresponds with the 'description' variable in the noise data.

In [10]:
'''
# Add Calvariekapel as location 
df['description'] = 'MP 05: Calvariekapel KU Leuven'
df.head(5)
'''

Unnamed: 0,date,heavy,car,bike,pedestrian,timezone,description
0,2022-06-01T00:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven
1,2022-06-01T01:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven
2,2022-06-01T02:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven
3,2022-06-01T03:00:00.000Z,20.532319,6.844106,20.532319,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven
4,2022-06-01T04:00:00.000Z,37.894737,87.157895,56.842105,2.526316,Europe/Brussels,MP 05: Calvariekapel KU Leuven


We create new variables based on the recorded date.

In [11]:
'''
df['year'] = pd.to_datetime(df['date']).dt.year
df['month'] = pd.to_datetime(df['date']).dt.month
df['day'] = pd.to_datetime(df['date']).dt.day
df['hour'] = pd.to_datetime(df['date']).dt.hour
df.head(5)
'''

Unnamed: 0,date,heavy,car,bike,pedestrian,timezone,description,year,month,day,hour
0,2022-06-01T00:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,2022,6,1,0
1,2022-06-01T01:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,2022,6,1,1
2,2022-06-01T02:00:00.000Z,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,2022,6,1,2
3,2022-06-01T03:00:00.000Z,20.532319,6.844106,20.532319,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,2022,6,1,3
4,2022-06-01T04:00:00.000Z,37.894737,87.157895,56.842105,2.526316,Europe/Brussels,MP 05: Calvariekapel KU Leuven,2022,6,1,4


We create a new 'result_date' variable which corresponds to the 'result_date' variable in the hourly noise dataset.