# Scraping Telraam Data

Inspired by the Datathon in 2020 (https://lstat.kuleuven.be/Datathon2020/data) and the results of our NLP notebook, we chose to scrape some extra traffic data from Telraam. A telraam device counts the number of pedestrians, cyclists, and vehicles passing by the sensor. This traffic data could be a valuable addition to the noise and weather data, since the noise sources are often humans or vehicles. The locations of the Telraam monitors were found on the Telraam map (https://www.telraam.net/en#15/50.8748/4.7005). In the neighbourhood of Naamsestraat, there are 3 Telraam monitors that measure traffic passing by some locations in our noise dataset. However, the monitors tracked traffic for different periods in 2022 (or none).

- The first Telraam on Naamsestraat tracked traffic data passing Naamsestraat 81 and Parkstraat 2 (La Filosofia), but no period in 2022. (https://www.telraam.net/en/location/351160)
- The second Telraam on Naamsestraat monitors traffic passing by the Calvariekapel and His&Hears, for the period of 01-01-2022 until 29-09-2022. (https://www.telraam.net/en/location/347295)
- The third Telraam on Naamsestraat measures traffic data close to (not passing) Naamsestraat 62 (Taste), for the period 01-01-2022 until 07-10-2022. (https://www.telraam.net/en/location/9000000637)

Based on the location and the period of monitoring the traffic, we chose to only scrape the data of the second (segment 347295). 

In this notebook, we are scraping the data and exploring it using some descriptive statistics and basic visualizations. The dataset will afterwards be used for modelling in another notebook. More info on Telraam can be found on https://telraam.net/self-measure/what-is-telraam. In order to get an API key, a personal account was created on the Telraam website. More info on how to scrape the data can be found on https://documenter.getpostman.com/view/8210376/TWDRqyaV and https://telraam.helpspace-docs.io/article/27/you-wish-more-data-and-statistics-telraam-api.


### Scraping segment 347295

In [25]:
# Loading packages
import requests
import json
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer

In [26]:
#url = "https://telraam-api.net/v1/reports/traffic"

#body = {
 #   "id":"347295", # the segment id for the 2nd Telraam device
  #  "time_start":"2022-06-01 00:00:00Z", # request time interval is limited to 3 months, therefore we choose the period june-august 2022 since Calvariekapel has noise data for this period
   # "time_end":"2022-09-01 00:00:00Z",
    #"level":"segments",
    #"format":"per-hour"
#}

#headers = {
 # 'X-Api-Key': 'qPiL4G41LO3BIskRRucBn5xN119IkNyO2B107hOa' # API key needs to be removed after data is exported !
#}

#payload = str(body)

#response = requests.request("POST", url, headers=headers, data=payload)
#print(response.text)


In [27]:
#json = response.json()
#data_segment347295 = pd.DataFrame(json['report']) # create dataframe from json object
#print(data_segment347295.head(10)) 


In [28]:
#data_segment347295.to_csv('Data for modelling/TelraamCalvariekapel.csv')

In [29]:
df = pd.read_csv('../Data for modelling/TelraamCalvariekapel.csv', header = 0, sep=',')
df.head(410)

FileNotFoundError: [Errno 2] No such file or directory: '../Data for modelling/TelraamCalvariekapel.csv'

In [None]:
len(df)

410

### Preprocessing segment 347295

In the following section, the Telraam data for segment 347295 will be preprocessed. We start by only keeping the columns that are useful for the modeling. The variables we want to keep are the counted number of pedestrians, cyclists, cars, and heavy vehicles. The speed related measurements will not be included in the modeling, since Telraam reports that their accuracy is around 10%. The date and timezone variables are kept in order to create a date variable on which the noise and weather data can be merged later on.

- Traffic data for modelling

In [None]:
# Pipeline
# Step 1: Add Calvariekapel as location 
def add_location(df):
    df['description'] = 'MP 05: Calvariekapel KU Leuven'
    return df

# Step 2: Convert timestamps to datetime
def convert_to_datetime(df):
    df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%dT%H:%M:%S.%fZ')
    return df

# Step 3: Extract month, day, hour from timestamps
def extract_time(df):
    df['month'] = df['date'].dt.month
    df['day'] = df['date'].dt.day
    df['hour'] = df['date'].dt.hour
    return df

# Step 4: Define a custom transformer to create the new column date
class DateTransformer:
    def transform(self, df):
        df['year'] = 2022
        df['date'] = df.apply(lambda row: pd.to_datetime(f"{int(row['day']):02d}-{int(row['month']):02d}-{int(row['year']):04d}-{int(row['hour']):02d}", format='%d-%m-%Y-%H'), axis=1)
        df['date'] = df['date'].dt.strftime('%H:%M %d-%m-%Y')
        return df

    def fit(self, df, y=None):
        return self
    
# Step 5: Drop unnecessary columns
def drop_columns(df):
    columns_to_keep = ['date','heavy','car','bike','pedestrian','timezone','description','month','day','hour','year']
    columns_to_drop = set(df.columns) - set(columns_to_keep)
    return df.drop(columns=columns_to_drop)
    
# Define the pipeline
pipeline_traffic_hourly = Pipeline([
    ('add_location', FunctionTransformer(add_location)),
    ('convert_to_datetime', FunctionTransformer(convert_to_datetime)),
    ('extract_time', FunctionTransformer(extract_time)),
    ('date_transformer', DateTransformer()),
    ('drop_columns', FunctionTransformer(drop_columns))
])

In [None]:
# Apply the pipeline
traffic_preprocessed = pipeline_traffic_hourly.fit_transform(df)
traffic_preprocessed.head()

Unnamed: 0,date,heavy,car,bike,pedestrian,timezone,description,month,day,hour,year
0,00:00 01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,0,2022
1,01:00 01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,1,2022
2,02:00 01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,2,2022
3,03:00 01-06-2022,20.532319,6.844106,20.532319,0.0,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,3,2022
4,04:00 01-06-2022,37.894737,87.157895,56.842105,2.526316,Europe/Brussels,MP 05: Calvariekapel KU Leuven,6,1,4,2022


In [None]:
min(traffic_preprocessed['month'])
max(traffic_preprocessed['month'])

8

In [None]:
# exporting file (only needs to be run one time so comment it out)
# traffic_preprocessed.to_csv('Data for modelling/hourly_traffic_2022.csv', index=False)

- Merge traffic noise and weather togerther for reference

In [None]:
# Load hourly noise data
noise_hourly = pd.read_csv('Data for visualization/hourly_noisedata_2022.csv', header = 0, sep=',')
noise_hourly = noise_hourly.drop(['lamax_standardized', 'laeq_standardized'], axis = 1)
noise_hourly.head(10)

Unnamed: 0,month,day,hour,description,lamax,laeq,date
0,1,1,0,MP 03: Naamsestraat 62 Taste,60.322528,57.126833,00:00 01-01-2022
1,1,1,0,MP 05: Calvariekapel KU Leuven,53.230972,49.987639,00:00 01-01-2022
2,1,1,0,MP 06: Parkstraat 2 La Filosovia,53.666056,50.752,00:00 01-01-2022
3,1,1,0,MP 07: Naamsestraat 81,50.056861,47.440222,00:00 01-01-2022
4,1,1,1,MP 03: Naamsestraat 62 Taste,53.033583,50.853806,01:00 01-01-2022
5,1,1,1,MP 05: Calvariekapel KU Leuven,53.599639,50.578806,01:00 01-01-2022
6,1,1,1,MP 06: Parkstraat 2 La Filosovia,59.880694,56.942167,01:00 01-01-2022
7,1,1,1,MP 07: Naamsestraat 81,50.097278,47.878333,01:00 01-01-2022
8,1,1,2,MP 03: Naamsestraat 62 Taste,52.173702,50.049903,02:00 01-01-2022
9,1,1,2,MP 05: Calvariekapel KU Leuven,51.078083,47.974361,02:00 01-01-2022


In [None]:
# Select noise data only at MP 05 from june to august
noise_hourly_MP05 = noise_hourly[noise_hourly['description'] == 'MP 05: Calvariekapel KU Leuven']
noise_hourly_MP05 = noise_hourly_MP05[noise_hourly_MP05['month'].isin([6, 7, 8])]
noise_hourly_MP05.head(100000)

Unnamed: 0,month,day,hour,description,lamax,laeq,date
20397,6,1,0,MP 05: Calvariekapel KU Leuven,46.389500,43.854917,00:00 01-06-2022
20404,6,1,1,MP 05: Calvariekapel KU Leuven,42.325056,39.724889,01:00 01-06-2022
20411,6,1,2,MP 05: Calvariekapel KU Leuven,40.794889,38.388694,02:00 01-06-2022
20418,6,1,3,MP 05: Calvariekapel KU Leuven,40.153528,38.418528,03:00 01-06-2022
20425,6,1,4,MP 05: Calvariekapel KU Leuven,41.507194,39.652750,04:00 01-06-2022
...,...,...,...,...,...,...,...
35818,8,31,19,MP 05: Calvariekapel KU Leuven,57.021222,55.048833,19:00 31-08-2022
35825,8,31,20,MP 05: Calvariekapel KU Leuven,55.358528,53.073083,20:00 31-08-2022
35832,8,31,21,MP 05: Calvariekapel KU Leuven,54.855583,52.532750,21:00 31-08-2022
35839,8,31,22,MP 05: Calvariekapel KU Leuven,52.758611,50.213583,22:00 31-08-2022


In [None]:
# Merge traffic data with noise data
merge_columns = ['date', 'month','day','hour','description']
merged_traffic_noise = pd.merge(noise_hourly_MP05, traffic_preprocessed, on=merge_columns,  how='left')
merged_traffic_noise = merged_traffic_noise.drop('year', axis=1)
merged_traffic_noise['timezone'] = 'Europe/Brussels'


In [None]:
# Load weather data
weather_hourly = pd.read_csv('Data for visualization/hourly_weatherdata_2022.csv', header = 0, sep=',')
weather_hourly.head()

Unnamed: 0,Month,Day,Hour,LC_RAININ,LC_DAILYRAIN,LC_WINDDIR,LC_WINDSPEED,LC_TEMP_QCL3,DATECEST
0,1,1,0,2.3e-05,0.002997,-33.566358,1.487099,15.513391,2023-01-01 00:25:00.000000000
1,1,1,1,1.9e-05,0.002174,-29.188272,1.465571,15.770757,2023-01-01 01:25:00.000000000
2,1,1,2,3e-06,0.00036,-18.197324,0.389565,13.100358,2022-03-08 00:28:59.799331072
3,1,1,3,7e-06,0.0,-16.227891,0.222602,12.669197,2022-01-01 03:25:00.000000000
4,1,1,4,9e-06,0.0,-13.710884,0.217194,12.520271,2022-01-01 04:25:00.000000000


In [None]:
# Change column names for merging
new_column_names = {'Month': 'month', 'Day': 'day', 'Hour': 'hour', }
weather_hourly = weather_hourly.rename(columns=new_column_names)

In [None]:
# Select weather data only in jun, jul, and aug
weather_hourly = weather_hourly[weather_hourly['month'].isin([6, 7, 8])]
weather_hourly.head()

Unnamed: 0,month,day,hour,LC_RAININ,LC_DAILYRAIN,LC_WINDDIR,LC_WINDSPEED,LC_TEMP_QCL3,DATECEST
3624,6,1,0,0.0,0.0,-1.339506,0.007145,10.378574,2022-06-01 00:25:00.000000000
3625,6,1,1,0.0,0.0,-2.313272,0.020062,9.603619,2022-06-01 01:25:00.000000000
3626,6,1,2,0.0,0.0,-4.881173,0.014985,9.042755,2022-06-01 02:25:00.000000000
3627,6,1,3,0.0,0.0,-3.787037,0.013488,8.435343,2022-06-01 03:25:00.000000000
3628,6,1,4,0.0,0.0,-1.902778,0.025355,7.976961,2022-06-01 04:25:00.000000000


In [None]:
# Merge traffic, noise data with weather date
merge_columns = ['month','day','hour']
merged_traffic_noise_weather = pd.merge(merged_traffic_noise, weather_hourly, on=merge_columns,  how='left')

In [None]:
merged_traffic_noise_weather.head()

Unnamed: 0,month,day,hour,description,lamax,laeq,date,heavy,car,bike,pedestrian,timezone,LC_RAININ,LC_DAILYRAIN,LC_WINDDIR,LC_WINDSPEED,LC_TEMP_QCL3,DATECEST
0,6,1,0,MP 05: Calvariekapel KU Leuven,46.3895,43.854917,00:00 01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,0.0,0.0,-1.339506,0.007145,10.378574,2022-06-01 00:25:00.000000000
1,6,1,1,MP 05: Calvariekapel KU Leuven,42.325056,39.724889,01:00 01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,0.0,0.0,-2.313272,0.020062,9.603619,2022-06-01 01:25:00.000000000
2,6,1,2,MP 05: Calvariekapel KU Leuven,40.794889,38.388694,02:00 01-06-2022,0.0,0.0,0.0,0.0,Europe/Brussels,0.0,0.0,-4.881173,0.014985,9.042755,2022-06-01 02:25:00.000000000
3,6,1,3,MP 05: Calvariekapel KU Leuven,40.153528,38.418528,03:00 01-06-2022,20.532319,6.844106,20.532319,0.0,Europe/Brussels,0.0,0.0,-3.787037,0.013488,8.435343,2022-06-01 03:25:00.000000000
4,6,1,4,MP 05: Calvariekapel KU Leuven,41.507194,39.65275,04:00 01-06-2022,37.894737,87.157895,56.842105,2.526316,Europe/Brussels,0.0,0.0,-1.902778,0.025355,7.976961,2022-06-01 04:25:00.000000000


In [None]:
# exporting file (only needs to be run one time so we comment it out)
#merged_traffic_noise_weather.to_csv('Data for modelling/hourly_traffic_noise_weather_2022.csv', index=False)