# Trajectory Prediction with LSTM
### CS230 - Deep Learning -  Final Submission. 
#### Mitchell Dawson, Benjamin Goeing, Tyler Hughes.  

## Project introduction
 In this project, we are trying to predict the trajectory of traffic objects (people walking, bikers, cars, etc.), as they move and interact with one another. Our model could be used to predict movements of crowds of people and vehicles given an overhead image of a scene. This may have potential applications in video surveillance, helping to make public spaces less susceptible to crowding or accidents, or improving control of autonomous vehicles. 

In order to train and test our model, we are leveraging the Stanford Drone dataset, which contains a large number of overhead images of crowded spaces on Stanford campus. (http://cvgl.stanford.edu/projects/uav_data/). Here are some examples of drone footage from this dataset:

<img src="bookstore.jpg?raw=true" width="250"> 
<img src="deathCircle.jpg?raw=true" width="250">   

The dataset al contains (x,y) coordinates for each 
Our final goal for this project is to predict the trajectories of multiple objects in complex scenes with a high accuracy. In a first step, we have implemented a simple linear model as well as a simple LSTM to predict the (x,y) coordinates of objects during the later frames of a scene given the earlier frames of the same scene. We are then evaluating our model by predicting the trajectory of objects in completely unseen scenes. 

In addition to the provided data, which contains the x,y coordinates of all moving objects for a number of sequential frames, we have used MIT's Lableme (http://labelme.csail.mit.edu/Release3.0/) to further classify each pixel in the background of an image. We have manually labeled the following classes: 
- a) road
- b) sidewalk
- c) grass
- d) inaccessible (describing objects such as building walls, trees etc.) 

Here is an example of a labeled image: 

<img src="segImage2.jpg?raw=true" width="250">

As a first step, we are running our model without this additional information to get a sense of the baseline performance. We will then run the model again with the background information to see if and by how much it improves our accuracy.


## Module Imports
Our LSTM model is built on Pytorch.

We will be importing helper modules for processing the dataset and loading an LSTM trajectory tracker class


In [1]:
# Module import (add here as necessary)
import sys
import os
import numpy as np
import torch
import torch.utils.data

from simple_processing import load_simple_array
from lstm import TrajectoryPredictor
from linear_error import compute_linear_error

## Define Constants

In [2]:
# constants
Nf = 10         # number of frames to observe before making prediction
batch_size = 4  # TrajectoryPredictor training batch size
num_epochs = 10 # number of training epochs

## Load Training Data
We will load in and process the drone dataset.  
For now, we will just load (x,y) pairs of positions (normalized between -1 and 1) at a series of frames for each person in the scene.  
Our data is taken from the 'stanford' dataset, which contains drone footage from places on Stanford's campus.

The data is loaded into a pytorch dataset for feeding into the LSTM

In [3]:
def load_data(): 
    train_trajectories = []
    for filename in os.listdir('train/stanford/annotations/'):
        if not filename.endswith('.txt'):
            continue
        train_trajectories += load_simple_array('train/stanford/annotations/' + filename)
    
    dev_trajectories = []
    for filename in os.listdir('dev/stanford/annotations/'):
        if not filename.endswith('.txt'):
            continue
        dev_trajectories += load_simple_array('dev/stanford/annotations/' + filename)

    test_trajectories = []
    for filename in os.listdir('test/stanford/annotations/'):
        if not filename.endswith('.txt'):
            continue
        test_trajectories += load_simple_array('test/stanford/annotations/' + filename)

    return train_trajectories, dev_trajectories, test_trajectories


train_trajectories, dev_trajectories, test_trajectories = load_data()

np_train_trajectories = np.stack(train_trajectories)

np_train_data = np_train_trajectories[:,:Nf,:]
np_train_target = np_train_trajectories[:,Nf:,:]

train_data_tensor = torch.Tensor(np_train_data)
train_target_tensor = torch.Tensor(np_train_target)

train_dataset = torch.utils.data.TensorDataset(train_data_tensor, train_target_tensor)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = batch_size)

Here we print out the trajectory of the first person in the training set. The first Nf (x,y) pairs are for observation. The second Nf (x,y) pairs are those we want to predict.

In [4]:
print(train_dataset[0])

(
 0.0120  0.8285
 0.0120  0.8285
 0.0134  0.8382
 0.0219  0.8382
 0.0318  0.8405
 0.0403  0.8424
 0.0502  0.8442
 0.0598  0.8465
 0.0697  0.8507
 0.0792  0.8526
[torch.FloatTensor of size 10x2]
, 
 0.0905  0.8526
 0.1011  0.8526
 0.1124  0.8549
 0.1234  0.8563
 0.1347  0.8581
 0.1457  0.8600
 0.1563  0.8600
 0.1676  0.8600
 0.1772  0.8563
 0.1885  0.8535
[torch.FloatTensor of size 10x2]
)


## Training the LSTM
Now that we have our data loaded into a pytorch dataset and data loader, we are ready to train the LSTM.  

We have implemented a TrajectoryPredictor() class that trains an LSTM on the pytorch data loader.
TrajectoryPredictor takes as initial parameters: input dimension (2 here because we have x,y inputs), output dimension (2 here because x,y output predictions), and batch size.

We now initialize a trajectory predictor and train it. This will take about 1 minute per epoch on a CPU.

In [None]:
p = TrajectoryPredictor(2, 2, batch_size)
p.train(train_loader, num_epochs)

epoch 1. mean loss: 1.82455535136
epoch 2. mean loss: 0.546421991492
epoch 3. mean loss: 0.386153386321
epoch 4. mean loss: 0.323843483411
epoch 5. mean loss: 0.289107988643
epoch 6. mean loss: 0.266656400561
epoch 7. mean loss: 0.250179112599
epoch 8. mean loss: 0.236991802132


## Model Prediction
With our model trained, we predict on some validation trajectories to get a sense how well our LSTM performs.  

In [6]:
np_dev_trajectories = np.stack(dev_trajectories)

np_dev_data = np_dev_trajectories[:,:Nf,:]
np_dev_target = np_dev_trajectories[:,Nf:,:]

dev_data_tensor = torch.Tensor(np_dev_data)
dev_target_tensor = torch.Tensor(np_dev_target)

dev_dataset = torch.utils.data.TensorDataset(dev_data_tensor, dev_target_tensor)
dev_loader = torch.utils.data.DataLoader(dev_dataset, batch_size = batch_size)

In [7]:
p.test(dev_loader)

mean loss: 0.534271395503


We compare with the predictions of a linear model on the same trajectory, where the (x,y) coordinates of a future time frame are estimated by extrapolating from the person's velocity at the final observation time frame.

In [4]:
compute_linear_error(dev_trajectories, Nf)

mean loss: 0.128018440883


Clearly, we have a ways to go in refining our model.

## Future Directions

We wish to improve our model soon by incorporating the following into our model:
- Presence of other people in the scene
- The scene's features (sidewalks, grass, roads, buildings/obstacles)

For the presence of other people, we have implemented a data processing step that, for each frame:
- looks at all of the positions of the other people in the scene.
- constructs an array of 0's and 1's where a 1 indicates the prescence of another person.
    - this array is centered at the tracked person's location and is recomputed at each time step.
    - the parameter 'N_pixels' gives the number of pixels in each x and y in this discretized representation.
We have yet to test this input on the LSTM as we believe it would be more useful to embed this information into a more compact representation, perhaps by first sending through a small CNN for feature extraction.  

For the scene information, we wish to:
-  Segment the underlying image for each scene into features, labeled by numbers.
-  Feed a featurized version of this segmented scene into the LSTM.
We hope that including this information will allow the LSTM to learn things such as:
-  People usually avoid roads
-  People never walk through obstacles
-  etc.

Eventually, we will incorporate both the information about other people in the scene and the scene features into our LSTM and want to show that including these features improves performance in tracking accuracy