# UTD19: Understanding traffic capacity of urban networks

## Summary

UTD19 is a large-scale traffic dataset from over 23541 stationary detectors on urban roads in 40 cities worldwide making it the largest multi-city traffic dataset publically available. The data mainly consists of measurements from loop detectors, which record vehicle flow and occupancy (or speed) in relatively small aggregation interval, typically 3-5min. The data have been collected in 40 cities by 23541 detectors, which are stationary traffic sensors, namely inductive loop detectors, supersonic detectors, cameras, bluetooth detectors or similar, among others. All this data have been collected from 2017 to 2019, in other words, we are going to work with 3.8 years of data in this dataset.

This dataset has provided us at least two out of the three fundamental traffic variablesspeed, flow and density of each city. In case of density, the sensors usually report occupancy levels, the fraction of time a detector is occupied, during an observation period.For most of the cities, the flow-occupancy datasets are received directly from the local authorities, while for other cities APIs or the OpenData access points have been used.

The cities from which we obtained tha data are: 
* Augsburg 
* Basel 
* Bern 
* Birmingham
* Bolton
* Bordeaux
* Bremen
* Cagliari
* Constance
* Darmstadt
* Essen
* Frankfurt
* Graz
* Groningen
* Hamburg
* Innsbruck
* Kassel
* London
* Los Angeles
* Luzern
* Madrid'
* Melbourne
* Manchester
* Marseille
* Munich
* Paris
* Rotterdam
* Santander
* Speyer
* Strasbourg
* Stuttgart
* Taipeh
* Tokyo
* Torino
* Toronto
* Toulouse
* Utrecht
* Vilnius
* Wolfsburg
* Zurich

Also this dataset provide us the location ofthe sensor with respect to the road network. This data did easier obtain more data like the road name for attribute (variable road), the functional road class for attribute (variable fclass) and the speed limit (variable limit) when available.

It is important to mention that this dataset is composed by three csvs, one called detectors in which we have the data about the location and information of the detectors that were used to collect the information. Other csv called links, in which we have data that relate or link the dataset detectors and the dataset UTD19. The last csv that compose this dataset is the UTD19 in which we have the data collected by the detectors in the different cities. Now we know that we are going to work with one of the most complete datasets, or the most complete dataset about the traffic, to do the analysis and understand the traffic capacity of the urban networks.

## Dictionary

### UTD19

Traffic measurements with original and filtered data. Note that not all detectors report
occupancy, speed and flow, i.e., where no data was recorded or is available, missing values are
stored. Detectors do not provide all variables at every interval. Some detectors that only provide
flow and occupancy while others report flow and speed. In some cases (e.g., Melbourne), loops
provide either flow or speed.

Records: 168074643. 

Columns: 8.


* **City**: Name of the city. $\rightarrow$ *Dtype Object*
* **Detid**: Detector identificiation. $\rightarrow$ *Dtype Object* 
* **Day**: Day of recording. $\rightarrow$ *Dtype Object* 
* **Interval**: Beginning of recording interval in seconds from midnight.  $\rightarrow$ *Dtype Int64* 
* **Flow**: Flow in vehicles per hour for that detector.  $\rightarrow$ *Dtype Float64* 
* **Occ**: Detector occupancy.  $\rightarrow$ *Dtype Float64* 
* **Speed**: Average speed in recording interval in km per hour. $\rightarrow$ *Dtype Float64* 
* **Error**: identified or reported error if a non-missing value is reported.  $\rightarrow$ *Dtype Float64* 

### Detectors

Geospatial information about detector location with OpenStreetMap attributes and
lat-long coordinates. Linked to measurements via detid and to links via linkid.

Records: 23626.

Columns: 11.


* **Detid**: Detector identificiation. $\rightarrow$ *Dtype Object* 
* **Citycode**: Name of the city. $\rightarrow$ *Dtype Object* 
* **Length**: Length of the monitored lane in km. $\rightarrow$ *Dtype Float64* 
* **Pos**: Distance to downstream traffic signal in km. $\rightarrow$ *Dtype Float64* 
* **Long**: Longitude of detector location. $\rightarrow$ *Dtype Float64* 
* **Lat**: Latitude of detector location. $\rightarrow$ *Dtype Float64* 
* **Lanes**: Number of lanes monitored. $\rightarrow$ *Dtype Float64* 
* **Linkid**: Link id of the monitored lane. $\rightarrow$ *Dtype Float64* 
* **Fclass**: OpenStreetMap’s functional road class classification.  $\rightarrow$ *Dtype Object* 
* **Road**: Road name. $\rightarrow$ *Dtype Object* 
* **Limit**: Speed limit, if available. $\rightarrow$ *Dtype Object* 

### Links

Spatial lines object for each monitored traffic lane or link converted to a text file.
Linked to detectors via linkid. Detectors not matched to a link or lane have a missing value for
the linkid. Order of poitns in direction of traffic.

Records: 140858. 

Columns: 7.


* **Citycode**: Name of the city. $\rightarrow$ *Dtype Object* 
* **Linkid**: Link id of the monitored lane. $\rightarrow$ *Dtype Int64* 
* **Order**: Order of waypoint sequence. $\rightarrow$ *Dtype Int64* 
* **Piece**: Spatial feature number of that link id. $\rightarrow$ *Dtype Int64* 
* **Group**: Group number of that spatial feature. $\rightarrow$ *Dtype Float64* 
* **Long**: Longitude of waypoint. $\rightarrow$ *Dtype Float64* 
* **Lat**: Latitude of waypoint. $\rightarrow$ *Dtype Float64* 

In [4]:
import pandas as pd
import numpy as np

In [5]:
utd = pd.read_csv('UTD19.csv', sep=',', na_values = np.nan, dtype = {'detid':str})
dtc = pd.read_csv("detectors_public.csv", sep=',', na_values=np.nan)
links = pd.read_csv("links.csv", sep=',', na_values=np.nan)

In [6]:
utd.shape

(168074643, 8)

In [7]:
utd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 168074643 entries, 0 to 168074642
Data columns (total 8 columns):
 #   Column    Dtype  
---  ------    -----  
 0   day       object 
 1   interval  int64  
 2   detid     object 
 3   flow      float64
 4   occ       float64
 5   error     float64
 6   city      object 
 7   speed     float64
dtypes: float64(4), int64(1), object(3)
memory usage: 10.0+ GB


In [8]:
utd.isnull().sum()

day                 0
interval            0
detid               0
flow                0
occ           3279527
error        82288700
city                0
speed       163444688
dtype: int64

In [9]:
dtc.shape

(23626, 11)

In [10]:
dtc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23626 entries, 0 to 23625
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   detid     23626 non-null  object 
 1   length    23626 non-null  float64
 2   pos       23581 non-null  float64
 3   fclass    23626 non-null  object 
 4   road      22085 non-null  object 
 5   limit     16994 non-null  object 
 6   citycode  23626 non-null  object 
 7   lanes     23622 non-null  float64
 8   linkid    23021 non-null  float64
 9   long      23626 non-null  float64
 10  lat       23626 non-null  float64
dtypes: float64(6), object(5)
memory usage: 2.0+ MB


In [11]:
dtc.isnull().sum()

detid          0
length         0
pos           45
fclass         0
road        1541
limit       6632
citycode       0
lanes          4
linkid       605
long           0
lat            0
dtype: int64

In [12]:
links.shape

(140858, 7)

In [13]:
links.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 140858 entries, 0 to 140857
Data columns (total 7 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   long      140858 non-null  float64
 1   lat       140858 non-null  float64
 2   order     140858 non-null  int64  
 3   piece     140858 non-null  int64  
 4   linkid    140858 non-null  int64  
 5   group     140858 non-null  float64
 6   citycode  140858 non-null  object 
dtypes: float64(3), int64(3), object(1)
memory usage: 7.5+ MB


In [14]:
links.isnull().sum()

long        0
lat         0
order       0
piece       0
linkid      0
group       0
citycode    0
dtype: int64