# Dublin Traffic Analysis


### Traffic sensors in Dublin


Dublin employs the Sydney Coordinated Adaptive Traffic System (SCATS) to monitor traffic at key intersections and adapt traffic signals for high priority vehicles like public busses. Periodically, the city releases samples of the traffic counts that the SCATS system records on the SmartDublin open data store. This sample is from January to April 2012, so a lot has changed in Dublin regarding the transit system, traffic levels, and how traffic is collected. The data contains a table with:


- *streetSegId* - street segment ids which can be matched to an accompanying spatial .gps file
- *upperTime* - the timestamp losing the previous 6-minute aggregation
- *armNumber* - an id for the arm where the sensor is located within a street segment id (not used)
- *aggregateCount* - the number of vehicles detected by the sensor for the given time period (not used)
- *flow* - ratio of the volume count to the maximum value in a 1-week sliding window

In the future, I will use the forthcoming sample from Jan-Apr 2020 and compare. For now, we can use this older sample to demonstrate the process of clustering time series and learn about the spatial and temporal patterns of commuting in Dublin.

We are going to use the time series created by 4 months of 6-minute resolution data to find prototypical daily time series of 4 unique commuting patterns. Those patterns also exhibit spatial patterns, from which we can differentiate between regions of Dublin and how they experience daily traffic flows. We will do some R data table manipulation, Dynamic Time Warping to determine the 'distance' between sensors' time series, partitional clustering, and then finally ggplot and tmap to view the results.

First, download the data. Save the zipped folder in the same folder as your Jupyter notebook. Then extract into a new folder with the default name. The data is available from Smart Dublin [here](https://data.smartdublin.ie/dataset/volume-data-for-dublin-city-from-dublin-city-council-traffic-departments-scats-system/resource/b111de96-47ff-44fa-85d4-81155b8f2f83).

In R, we'll extract the .dis files (Oracle Discover workbook files, but R can read them as tables) and check for symmetry symmentry

In [5]:
#filenames <- list.files(path="YOUR LOCAL FOLDER PATH HERE",
filenames <- list.files(path="/",
    pattern="*.dis")

filenames

In [None]:
allfiles <- list()

for(file in filenames){
allfiles[[file]]<-as.data.table(read.table(paste("YOUR LOCAL FOLDER PATH  AGAIN",
                                file,sep=""),
                                sep=",",
                                header=TRUE,
                                strip.white=TRUE))
}

# manually check for daily differences 
numobservations <- list()
uniquesensors<-list()
uniquetimestamps<-list()
for(readfile in names(allfiles)){
    numobservations[[readfile]] <- nrow(allfiles[[readfile]])
    uniquesensors[[readfile]]<-length(unique(allfiles[[readfile]]$streetSegId))
    uniquetimestamps[[readfile]]<-length(unique(allfiles[[readfile]]$upperTime))         
}

There are some discrepancies in the unique lists. The number of sensors changes, with several being added to the network throughout the 4 month time period. On one date, a couple of time are missing from the time series. Thus, the number of observations fluxuates. But when we create transposed versions of these table which are readable as time series objects for the tsclust package, these discrepancies will be accounted for.

Let's create new tables for each day, with the timestamps in the columns and the street segments as the rows.

In [3]:
#unique list of all of the identifiers of each sensor
allSegIds<-list()
#putting the sensor readings in a time series readable format
transposedfiles<-list()
for(fullfile in names(allfiles)) {
    transpose <- dcast.data.table(allfiles[[fullfile]], 
                                  streetSegId ~ upperTime, fun.aggregate=mean, value.var="flow")
    streetSegIDs <- as.list(transpose[,1][[1]])
    transpose <- transpose[,-1]
    rownames(transpose) <- streetSegIDs
    allSegIds<-c(allSegIds, streetSegIDs)
    transposedfiles[[fullfile]]<-transpose
}
                        
allSegIds<-unique(allSegIds)

#convert text to dates and times and store in usable formats
dates<-as.POSIXct(as.numeric(as.character(names(transposedfiles[[1]]))),
                  origin="1970-01-01",tz="GMT")
times <- lapply(dates, FUN=function(x) as.POSIXct(x, format="%H:%M:%S"))
timestamps<-colnames(transposedfiles[[1]])

ERROR: Error in transposedfiles[[1]]: subscript out of bounds


In [10]:
print(numobservations)

list()
