# **INF161 - Bike Traffic Prediction Project**
### *Ole Kristian Westby | owe009@uib.no | H23*

This project uses data from Statens vegvesen and Geofysisk institutt. The goal is to create a model that can predict the volume (how many) of bikers at a given time over Nygårdsbroen. I'll need to prepare, strip and sort the data so I'm left with the data I deem valuable to perform this task. That's what this Jupyter notebook is for. I'll also be explaining my steps throughout the book. At the end, we'll have some juicy, ready data that we'll use to insert into /ready_data/ ready for the model to work on.

I recognize that throughout the years there has been some times where people might have used the bikes more/less frequently because of certain factors. I will keep a list that I will update continuously as I find them.
- Covid-19 likely kept more people home, especially in peak times. Less people using bicycles to get to work as they had work from home. Only interested in peak covid-19 times though. 
- 2017 UCI Road World Championships. I've checked the routes and don't see that any bikes passed Nygårdsbroen but I will look closer at the data later.
- 

#### Let's start by importing some libraries.

In [13]:
import numpy as np
import pandas as pd
import os

#### We'll handle the traffic data first.

In [14]:
dir_weather = "raw_data/weather_data/"

files = [f for f in os.listdir(dir_weather) if f.endswith('.csv')]

# Interesting columns
columns = ["Dato", "Tid", "Solskinstid", "Lufttemperatur", "Vindstyrke", "Vindkast"]

dfs = []
for file in files:

    file_path = os.path.join(dir_weather, file)

    df = pd.read_csv(file_path, usecols=columns)

    dfs.append(df)

merged_df = pd.concat(dfs, ignore_index=True)
print(merged_df)

              Dato    Tid  Solskinstid  Lufttemperatur  Vindstyrke  Vindkast
0       2010-01-01  00:00          0.0            -4.6         1.1       NaN
1       2010-01-01  00:10          0.0            -4.1         1.6       NaN
2       2010-01-01  00:20          0.0            -3.5         1.3       NaN
3       2010-01-01  00:30          0.0            -4.1         0.7       NaN
4       2010-01-01  00:40          0.0            -4.4         0.8       NaN
...            ...    ...          ...             ...         ...       ...
709216  2023-06-30  23:10          0.0            13.7         2.3       3.6
709217  2023-06-30  23:20          0.0            13.6         1.9       3.3
709218  2023-06-30  23:30          0.0            13.6         1.7       3.0
709219  2023-06-30  23:40          0.0            13.6         1.9       3.3
709220  2023-06-30  23:50          0.0            13.5         1.9       3.0

[709221 rows x 6 columns]


#### Now we've created one big dataframe containing all interesting weather data from 2010 to 2023. However, the traffic data only goes from 2015-2023, and so I want to clear the dataset for 