# Introduction

## Background

There are nearly 1 million motor vehicles [[1]](https://data.gov.sg/dataset/annual-motor-vehicle-population-by-vehicle-type?view_id=6aca1157-ea79-4e39-9e58-3e5313a9a715&resource_id=dec53407-9f97-47b8-ba89-b2070569a09e) on Singapore's road network of more than 9,500 lane-kilometers which constitutes around 12% of the total Singapore Land Area [[2]](https://www.lta.gov.sg/content/ltagov/en/who_we_are/our_work/road.html#:~:text=Singapore's%20road%20network%20connects%20all,km%20of%20roads%20and%20expressways). Despite the high proportion of land being used for roads, Singapore still ranks 88th out of 405 cities in the World with the highest traffic congestion in peak hours [[3]](https://www.tomtom.com/traffic-index/ranking/). This puts Singapore a couple of spots above Sydney (rank 97) and not too far below Bangkok (rank 74). Singapore roads have an average congestion level of 29%[[4]](https://www.tomtom.com/traffic-index/singapore-traffic/#), which is defined as the ratio of how much longer drivers spend on their commute during peak hours compared to non-peak hours. Singaporean commuters on avarage wastes about 105 additional hours per year when driving in rush hour.

Therefore, it is critical for drivers and commuters to be able to monitor the current traffic conditions, as well as get a prediction on when the conditions are ideal for travel. This will then enable road users to decide on the best time and route for their travel. 

## Existing Solutions

#### Static traffic images
Road users typically use various traffic monitoring apps and websites in order to plan their travel and attempt to beat the traffic. To assess the current traffic conditions, commuters usually rely on several websites which display a static image of traffic cameras (e.g.: [[5]](https://onemotoring.lta.gov.sg/content/onemotoring/home/driving/traffic_information/traffic-cameras/bke.html#trafficCameras), [[6]](https://www.motorist.sg/traffic-cameras), [[7]](https://www.trafficiti.com/)) which are obtained from the same traffic cameras provided by the Land Transport Authority (LTA) Datamall API [[8]](https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html). 

The static images provided by these websites means that users would have to estimate the amount of cars present in the image and guess the curent traffic conditions based on a single static image. Furthermore, these websites typically do not provide historical/previous traffic images, making it difficult for users to assess the current traffic condition in the context of the overall traffic condition. Ultimately, since neither vehicle count nor historical data is provided, users would not be able to obtain a reasonable forecast of traffic conditions using these websites

![image](../images/notebook_images/other_traffic_images_website.jpeg)

#### Crowd-sourced traffic density estimations
An alternative for the traffic images websites are the apps and websites which provides crowd-sourced traffic density estimations, such as Google Maps or Apple Maps. Other than being used for navigation purposes, some of these apps are able to display the current traffic conditions based on crowd-sourced informations (i.e.: getting data from fellow app users). 

However, these type of crowd-sourced information could not achieve the level of accuracy of actual sensors and traffic measurements. Furthermore, they are vulnerable to malicious attacks such as when someone used 99 iPhones to create a cluster of "traffic jams" in Berlin's Road [[9]](https://www.youtube.com/watch?v=k5eL_al_m7Q).

With a recent update, Google maps is now able to display 'typical' traffic conditions based on historical crowd-sourced traffic data. This provides users a very basic estimate/forecast of future traffic conditions based on the average weekly traffic information. However, as we shall discover later in this project, the so-called 'naive seasonal' prediction based on past week's data does not provide the best accuracy compared to other more sophisticated models.

Finally, the level of detail / granularity of information for the traffic condition provided by Google Maps is highly lacking, by only providing users with a 4-level color-coded information regarding the traffic condition, going from low to high. While this level of granularity should be sufficient for road users who are planning their commute, it might not be sufficient for the purposes of a long-term traffic monitoring / survey for government agencies for the purposes of transportation planning. 

![image](../images/notebook_images/other_traffic_images_website_gmaps.jpeg)

#### Traffic monitoring and survey for government
Government agencies typically employ a multitude of sensors and techniques in order to conduct traffic counting for monitoring and survey purposes. These techniques range from manual counting or using radar-based sensors, which are highly labour-intensive or have high equipment cost, respectively. As such, using traffic counts from traffic cameras could be a highly efficient solution with relatively low costs compared to the alternative, especially for preliminary survey prior to employing other more accurate but more costly solutions.

![image](../images/notebook_images/radar_traffic_monitoring.jpg)

## Problem Statement

The goal of this project will be to develop an app for **detecting, monitoring, and predicting the vehicle count** in various stretches of highways in Singapore. This will enable users to rapidly assess the **historical, current, and future traffic condition** in order to make their commuting decision.

The same app can also be deployed by government agencies for the purpose of **vehicle count monitoring** over a period of time, in order to drive regulatory decisions with regards to transporatation planning in Singapore. 

## Objectives

The goal of this project are as follows:
- Apply computer vision and deep learning techniques to extract real-time traffic count from traffic camera images
- Apply time series analysis and modelling to obtain prediction / forecast of future traffic count
- Build a data pipeline for downloading, traffic counting, and traffic predictions for various camera instances in Singapore's highway
- Architect a database solution for storing processed and unprocessed traffic camera images, as well as traffic counts and traffic forecasts 
- Develop a traffic monitoring app for traffic monitoring, displaying historical data as well as future predictions of traffic conditions

# Project Summary

The various steps taken in this project is as follows:
1. Scraping of image links and traffic images from data.gov.sg traffic images API (notebooks 1a, 1b) [get links]()
2. Vehicle detection and counting using a pre-trained YOLOv7 model using OpenCV2 (notebook 2)
    - Object detection done using a pre-trained darknet implementation of YOLOv7 [weights location]()
    - Image masking is done to isolate the region of interest in the traffic images
3. Time series analysis on the vehicle counts obtained from the previous step (notebook 3a)
    - Explore data aggregation methods (i.e.: aggregating traffic count by 30 minutes)
    - Exploratory data analysis on the seasonal patterns of vehicle counts of the different cameras
    - Obtained descriptive statistics regarding the vehicle counts of the different cameras
4. Time series modelling for generating predictions and forecast of future vehicle counts (notebook 3b)
    - Investigate the model that yielded the best prediction for a 7-day prediction window
    - Assess the accuracy of the model with and without retraining
    - Assess the accuracy of the model with different prediction windows
5. Develop and productionize the code required for continuous download and detection of traffic images (notebook 4a)
6. Develop the front-end application for displaying the traffic count information and forecasts (notebook 4b)

# Key Findings

# Future Works

# Process Overview for Scraping Images

The first step for in this traffic analysis project is to scrape the traffic images from LTA. The traffic images are provided by LTA through the [Datamall API](https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html):
- The API returns traffic images from 87 different locations in Singapore's expressways
- The images are updated every 1 to 5 minutes
- These images are retraived by data.gov.sg, and wrapped in another API
- We will be using the data.gov.sg API for the purposes of this project

The process for interacting with the data.gov.sg [Traffic Images API](https://data.gov.sg/dataset/traffic-images) is as follows:
- Provide a datetime call to the API
- The API will retrieve the latest available data at that moment in time, which will contain the following:
    - image metadata (image dimensions, camera longitude & latitude)
    - datetime of acquisition
    - a static link to the traffic image
    - an MD5 hash
    
Therefore, to obtain the traffic images, we would first need to use the data.gov.sg traffic images API to obtain the links to the images.

For the purposes of this study, we will be scraping traffic images from selected cameras for the entire month of October.

# Imports

In [1]:
import datetime
import pytz
import requests
import ast
import numpy as np
import urllib 
import shutil
import matplotlib.pyplot as plt
import pandas as pd
import os
from tqdm import tqdm

# Function Definition for Image Link Scraping

First, we will need to define a couple of functions to interact with the data.gov traffic images API, and then transform the information returned to a dataframe

In [2]:
def convert_to_dataframe(input_list):
    '''
    This function converts the output from the API (a list of dictionaries) into a pandas dataframe
    '''
    
    # casting the input list as a dataframe
    output = pd.DataFrame(input_list)

    # convert location columns from dictionary to columns
    output = (pd.concat([output, # concatenating the original output column
                         output['location'].apply(pd.Series)], # with the dictionary of data from the location column
                        axis=1). # concatenating on columns
              drop('location',axis = 1)) # dropping the original location column

    # convert image_metadata columns from dictionary to columns
    output = (pd.concat([output, # concatenating the original output column
                         output['image_metadata'].apply(pd.Series)], # with the dictionary of data from the image_metadata column
                        axis=1). # concatenating on columns
              drop('image_metadata',axis = 1)) # dropping the original image_metadata column
    
    # returning the output dataframe
    return output


def get_lta_traffic_camera_data(datetime_call):
    '''
    This function converts the datetime_call (in the form of a datetime object) to the required format, and then calls the traffic images API
    '''
    
    # formats the datetime_call
    datetime_call_formatted = datetime_call.strftime("%Y-%m-%d") + "T" + datetime_call.strftime("%H") + "%3A" + datetime_call.strftime("%M") + "%3A00" 
    
    # getting the api call
    api = 'https://api.data.gov.sg/v1/transport/traffic-images?date_time='+ datetime_call_formatted
    

    # reading the camera data from data.gov.sg
    camera_data = ast.literal_eval(requests.get(api).content.decode("utf-8"))["items"][0]["cameras"]

    # returning the output, converted as a dataframe
    return convert_to_dataframe(camera_data)


def lta_traffic_camera_scraping(start_datetime, end_datetime, resolution_minute):
    '''
    This function takes in the start and end datetime as well as the required resolution, and returns a dataframe that contains the API call with links to the traffic images
    '''
    # calculate number of observations required:
    num_obs = (end_datetime - start_datetime) / datetime.timedelta(minutes=resolution_minute)
    num_obs = int(np.floor(num_obs)) # converting the num_obst to an integer
    
    # create a list of datetime to be called inside the API
    datetime_list = [start_datetime + datetime.timedelta(minutes=resolution_minute*x) for x in range(num_obs)]
    
    # convert the datetime list into a tqdm for displaying progress bar
    datetime_list_pbar = tqdm(datetime_list)
    
    # insantiating a dictionary to contain the API calls
    output_dict = {}
    
    # iterating through all the datetime_call from the datetime_list above and calling the api for each of the datetime_call
    for datetime_call in datetime_list_pbar:
        
        # printing out current datetime_call
        datetime_list_pbar.set_description(f'Currently scraping: {datetime_call.strftime("%Y-%m-%d_%H:%M")}')

        # actual API call
        current_output = get_lta_traffic_camera_data(datetime_call) # calling the api on the specified datetime_call
        output_dict[datetime_call.strftime("%Y-%m-%d_%H:%M")] = current_output # appending the output dictionary with the current API call
        
    # CONVERTING THE DICTIONARY TO A DATAFRAME
    # converting the output_dict to a long dataframe
    output_df = pd.concat(output_dict.values(), axis=0)
    
    # filtering the columns of the df
    output_df = output_df[['timestamp','image','camera_id','md5']]
    
    # removign the image link prefix (same for all images) in order to save space
    image_link_prefix='https://images.data.gov.sg/api/traffic-images/'
    output_df['image'] = output_df['image'].str.replace(image_link_prefix,'',regex=False)
    
    # returns the dataframe
    return output_df

# Scraping Image Links

We first need to scrape the image links from the data.gov.sg API. The actual images will be downloaded in the next notebook.

In [3]:
# for this example, we will only be scraping the data for 1 hour on 01/10/2022
# the actualdata required for scraping has already been obtained and scraped beforehand

# DEFINING THE PARAMETERS FOR SCRAPING
start_datetime = datetime.datetime(2022,10,1,0,0,0)
end_datetime = datetime.datetime(2022,10,1,1,0,0)
resolution_minute = 5

df = lta_traffic_camera_scraping(start_datetime=start_datetime,
                                 end_datetime=end_datetime,
                                 resolution_minute=resolution_minute,
                                )

Currently scraping: 2022-10-01_00:55: 100%|████████████████████████████████████████████| 12/12 [00:03<00:00,  3.16it/s]


# Saving Scraped Data

In [4]:
# getting the filename for saving the file
filename = 'LTA_traffic_cam_' + start_datetime.strftime('%Y%m%d') + '-' + end_datetime.strftime('%Y%m%d') + '.csv'

# saving the data
df.to_csv(f'../data/{filename}')