# Flow visualization - Oslo City Bike Station Network

### Authors: Niclas Classen & Manuel Knepper
**TL;DR:** This notebook visualizes the flow of bikes between the stations of the Oslo City Bike network. The data for the bike rides is taken from https://oslobysykkel.no/en/open-data and the data for the city districts is taken from https://kartkatalog.dev.geonorge.no/metadata/administrative-enheter-kommuner/041f1e6e-bdbc-4091-b48f-8a5990f3cc5b. The purpose of this notebook is to analyze the flow and cycling patterns of the Oslo City Bike network.

**Reproducibility:** The repository contains the raw and preprocessed data as well as the code to reproduce the analysis. Depending on where you want to start the analysis, you can skip certain parts of the code as indicated in the notebook. 

## Part 1: Data Preprocessing

#### Data Sources:
- Oslo City Bike Data: https://oslobysykkel.no/en/open-data
- Oslo Districts Geojson: https://kartkatalog.dev.geonorge.no/metadata/administrative-enheter-kommuner/041f1e6e-bdbc-4091-b48f-8a5990f3cc5b

#### Data Structure
The data is stored in the following folder structure:
- `data/`: Contains all the raw and preprocessed data
- `data/monthly`: Contains the monthly raw data for the bike rides
- `data/oslo_districts.geojson`: Contains the geojson file of the Oslo districts

It is recommended to use this folder structure as it only requires minor changes in the code to run the analysis.

As result of the data preprocessing, we have the following files:
- `data/preprocessed_bike_rides.csv`: Contains the preprocessed bike rides data of all months

In [6]:
import pandas as pd
import os
from tqdm import tqdm

In [7]:
# Path to your data directory
_path = "/Users/niclasclassen/Code/Master/geospatial-ds-exam/data" #! Change this to your data directory

#### Create one file for all bike rides

In [11]:
# The following runs for 2 minutes on a 2019 MacBook Pro with 16GB RAM

dir_path = os.path.join(_path, 'monthly')

# Directory to export the combined file
export_path = os.path.join(_path)

# Get a list of all csv files in the directory
csv_files = [f for f in os.listdir(dir_path) if f.endswith('.csv')]

# List to hold dataframes for each file
dataframes = []

for file in tqdm(csv_files):
    # Extract year and month from filename
    year, month = file.split('.')[0].split('_')

    # Read csv file into a dataframe
    df = pd.read_csv(os.path.join(dir_path, file))

    # Add a new column for the month and year
    df['month'] = month
    df['year'] = year

    # Check if the DataFrame is not empty or does not contain only NaN values
    if not df.empty and not df.isna().all().all():
        # Append dataframe to the list
        dataframes.append(df)

# Concatenate all dataframes into a single dataframe
if dataframes:
    combined_df = pd.concat(dataframes)

    # Write the combined dataframe to a new csv file in the export directory
    combined_df.to_csv(f'{export_path}/preprocessed_bike_rides.csv', index=False)

100%|██████████| 61/61 [00:33<00:00,  1.85it/s]


## Part 2: Data Analysis
### 2.1 General data analysis