# Traffic forecasting using Neural Networks(SAEs、LSTM、GRU)

**Author:** JIANHONG LIU<br>
**Date created:** 2023/4/1<br>
**Last modified:** 2023/4/23<br>
**Description:** This example demonstrates how to do timeseries forecasting over Neural Networks.

## Introduction

Accurate and real-time traffic flow prediction is crucial in intelligent transportation systems (ITS), especially for traffic control. Existing models such as ARMA, and ARIMA are mainly linear models, which cannot capture the randomness and nonlinearity of traffic flow, leading to poor prediction accuracy.

In recent years, deep learning methods have been proposed as a new alternative for traffic flow prediction. However, it is still unclear which type of deep neural network is the most suitable model for traffic flow prediction. In this paper, we use Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Stacked Autoencoder (SAE) models to predict short-term traffic flow. 

The experimental results show that all three deep neural network models can effectively predict traffic flow, and their prediction performance is greatly improved compared to traditional linear models such as ARMA. Among these three models, SAE performs slightly better than LSTM and GRU. Specifically, SAE's MAE, MSE, and RMSE metrics are slightly better than those of LSTM and GRU. In addition, SAE's R2 and explained variance score are also slightly higher than those of LSTM and GRU, indicating that SAE has a better fitting effect on the traffic flow data in this article.


## Literature review



## Setup

In [5]:
import pandas as pd
import numpy as np

## Data preparation

### Data description

We use a real-world traffic speed dataset named `PeMSD7`. We use the version
collected and prepared by [JIANHONG LIU](https://github.com/jianhongliu99/CASA0006/tree/main/data)
and available
[here](https://github.com/VeritasYin/STGCN_IJCAI-18/tree/master/data_loader).

The data consists of two files:

- `W_228.csv` contains the distances between 228
stations across the District 7 of California.
- `V_228.csv` contains traffic
speed collected for those stations in the weekdays of May and June of 2012.

The full description of the dataset can be found in
[Yu et al., 2018](https://arxiv.org/abs/1709.04875).
Data are obtained from the Caltrans Performance Measurement System (PeMS). Data are collected in real-time from individual detectors spanning the freeway system across all major metropolitan areas of the State of California.



### Loading data

In [37]:
import requests
import pandas as pd

# 下载CSV文件
url = "https://github.com/jianhongliu99/CASA0006/blob/main/datasets.csv"
response = requests.get(url)

# 将CSV文件存储到本地
with open("datasets.csv", "wb") as f:
    f.write(response.content)

# 加载CSV文件
df = pd.read_csv("datasets.csv")

# 查看数据框
print(df.head())


ParserError: Error tokenizing data. C error: Expected 1 fields in line 26, saw 367


In [38]:
data = pd.read_csv('datas.csv')

In [39]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12096 entries, 0 to 12095
Data columns (total 4 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   5 Minutes                    12096 non-null  object
 1   Lane 1 Flow (Veh/5 Minutes)  12096 non-null  int64 
 2   Lane Points                  12096 non-null  int64 
 3    Observed_percent            12096 non-null  object
dtypes: int64(2), object(2)
memory usage: 378.1+ KB


In [40]:
data.head()

Unnamed: 0,5 Minutes,Lane 1 Flow (Veh/5 Minutes),Lane Points,Observed_percent
0,04/01/2016 00:00,12,1,100.00%
1,04/01/2016 00:05,13,1,100.00%
2,04/01/2016 00:10,11,1,100.00%
3,04/01/2016 00:15,13,1,100.00%
4,04/01/2016 00:20,10,1,100.00%
