# Perceptive Automata Research Coding Challenge 2021

## Introduction

Hi, there! 

This is the Research Coding Challenge for the Data Scientist position at Perceptive Automata. This challenge focuses primarily on data science and machine learning. We've tried to minimize the amount of imports, but these are all fairly standard packages. If you find yourself spending lots of time installing things before you can even start, shoot me an email as that is not the point of this challenge, and I'll try to help you get up and coding faster as best I can.

There are instructions throughout that should help guide you through the challenge.  You should be able to run all of the code on your machine, without any long runtimes. Also, if you find any errors or typos please be sure to email me at till@perceptiveautomata.com. 

This should be inherently clear, but its worth reiterating: **all work should be your own!**  Please make sure your code is clear and commented so we understand what you did, and when you send the challenge back, include a brief description of your approach.

We're excited to see what you come up with!  

### Imports

In [1]:
import ast
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

In [2]:
# Older version of sklearn
# from sklearn.cross_validation import train_test_split

# Newer versions of sklearn
from sklearn.model_selection import train_test_split

# Pedestrian Crossing Prediction

In this problem we use the opensource JAAD dataset, which you can read more about here: http://data.nvision2.eecs.yorku.ca/JAAD_dataset/ if you are interested.

The dataset consists of tracked people in some videos from a car's dashcam.  Each of these people have been carefully annotated with a bunch of different attributes, such as whether or not they are stopped or moving fast or moving slow.  For this problem we will try to predict whether or not the pedestrian will cross the street in the next frame based on all previous data we have about the pedestrian.  You will use the bounding boxes of the pedestrians along with the other actions that they take to try to predict this for a test set.

## Dataframe

Each row of the dataframe that we construct for you consists of some meta data about the video id and the ped id so that you can match them up with the JAAD videos, and then an ordered list of frames where that pedestrian appears.  

* frame_numbers - These should be continuous and there should be no gaps in these lists.  The other fields all align with the frame number field.
* bounding_boxes - This field is a series of boxes that aligns with the frame_numbers field.  Each box is constructed of [box x, box y, box width, box height], where x and y represent the upper left hand corner of the box
* moving_slow, stopped, handwave, look, clear path, moving fast, looking, standing, slow down, nod, speed up - The annotated attributes you will use to train the model, each is a list that aligns with the frame_numbers field of whether or not the attribute is true for that frame number
* crossing - This is the field that you will try to predict, whether or not the pedestrian is crossing for this corresponding frame number 
* cross_overall - Whether or the person crossed at any point in the sequence

In [4]:
pedestrians_df = pd.read_csv('pedestrian_df.csv')
for col_name in ['bounding_boxes', 'frame_numbers', 'moving slow', 'stopped', 'handwave', 'look', 'clear path', 'crossing', 'moving fast', 'looking', 'standing', 'slow down', 'nod', 'speed up']:
    pedestrians_df[col_name] = pedestrians_df[col_name].apply(ast.literal_eval)
pedestrians_df.head()

Unnamed: 0,video_id,ped_ind,frame_numbers,bounding_boxes,moving slow,stopped,handwave,look,clear path,crossing,moving fast,looking,standing,slow down,nod,speed up,cross_overall
0,video_0071,1,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1209, 598, 51, 191], [1214, 598, 52, 192], [...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",False
1,video_0071,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1249, 621, 51, 127], [1254, 620, 51, 129], [...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
2,video_0204,1,"[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...","[[1135, 673, 28, 97], [1139, 672, 29, 92], [11...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
3,video_0204,3,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[906, 670, 35, 65], [906, 672, 32, 65], [907,...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
4,video_0204,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1152, 657, 42, 114], [1158, 657, 42, 117], [...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",False


In [4]:
count = 0
for i, row in pedestrians_df.iterrows():
    #print(type(row['bounding_boxes']))
    count += len(row['frame_numbers'])
    
print("Number of Pedestrian-Frames: %d" % count)

Number of Pedestrian-Frames: 128220


In [5]:
pedestrians_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 666 entries, 0 to 665
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   video_id        666 non-null    object
 1   ped_ind         666 non-null    object
 2   frame_numbers   666 non-null    object
 3   bounding_boxes  666 non-null    object
 4   moving slow     666 non-null    object
 5   stopped         666 non-null    object
 6   handwave        666 non-null    object
 7   look            666 non-null    object
 8   clear path      666 non-null    object
 9   crossing        666 non-null    object
 10  moving fast     666 non-null    object
 11  looking         666 non-null    object
 12  standing        666 non-null    object
 13  slow down       666 non-null    object
 14  nod             666 non-null    object
 15  speed up        666 non-null    object
 16  cross_overall   666 non-null    bool  
dtypes: bool(1), object(16)
memory usage: 84.0+ KB


In [8]:
# Let's take a more in-depth look at that first row:
print(pedestrians_df.iloc[665])
print(pedestrians_df.iloc[665]['frame_numbers'])

video_id                                                 video_0179
ped_ind                                                           2
frame_numbers     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
bounding_boxes    [[1033, 594, 32, 64], [1033, 594, 32, 64], [10...
moving slow       [False, False, False, False, False, False, Fal...
stopped           [True, True, True, True, True, True, True, Tru...
handwave          [False, False, False, False, False, False, Fal...
look              [False, False, False, False, False, False, Fal...
clear path        [False, False, False, False, False, False, Fal...
crossing          [False, False, False, False, False, False, Fal...
moving fast       [False, False, False, False, False, False, Fal...
looking           [False, False, False, False, False, False, Fal...
standing          [False, False, False, False, False, False, Fal...
slow down         [False, False, False, False, False, False, Fal...
nod               [False, False, False, False, F

# More info

Your task is predict, for each pedestrian, whether or not they will be crossing the road at each frame.  For example, for row 0 of the above dataframe, the pedestrian appears in frames 0-329.  For each of those frames, you need to predict whether or not they will be crossing or not crossing in the next frame.  So, for frame number 5, you can use whatever data you want from frames 0-4 to predict whether or not they will be crossing in frame 5.  And for frame 329, you can use whatever data you want from frames 0-328 to predict whether or not they will be crossing in frame 329.

You can skip the first few frames for each pedestrian if your solution requires a certain number of frames to be initialized.

You will need to:
- explore the dataset
- redefine the problem and select an appropriate metric
- unravel the existing per-pedestrian dataframe to build your new per-pedestrian-frame dataframe.  
- extract features
- split the data into train and validation sets (70%-30% split is probably about the right size)
- build a baseline that simply predicts the previous frames' "crossing" value for the next frame
- make some models
- test your final model on your validation set
- write up your analysis

In [5]:


### YOUR CODE HERE ###
df_temp = pedestrians_df.set_index(['video_id','ped_ind','cross_overall']).apply(pd.Series.explode).reset_index()
df_temp.head()

Unnamed: 0,video_id,ped_ind,cross_overall,frame_numbers,bounding_boxes,moving slow,stopped,handwave,look,clear path,crossing,moving fast,looking,standing,slow down,nod,speed up
0,video_0071,1,False,0,"[1209, 598, 51, 191]",False,True,False,False,False,False,False,False,False,False,False,False
1,video_0071,1,False,1,"[1214, 598, 52, 192]",False,True,False,False,False,False,False,False,False,False,False,False
2,video_0071,1,False,2,"[1218, 597, 53, 193]",False,True,False,False,False,False,False,False,False,False,False,False
3,video_0071,1,False,3,"[1223, 597, 54, 193]",False,True,False,False,False,False,False,False,False,False,False,False
4,video_0071,1,False,4,"[1228, 597, 55, 194]",False,True,False,False,False,False,False,False,False,False,False,False


In [69]:
cols_to_select = ['video_id','ped_ind','cross_overall','frame_numbers','moving slow','stopped','handwave','look','clear path',
                               'moving fast','looking','standing','slow down','nod','speed up','crossing']

feature_columns = ['moving slow','stopped','handwave','look','clear path',
                               'moving fast','looking','standing','slow down','nod','speed up','crossing']

df_temp1 = df_temp.loc[:,cols_to_select]

df_temp1.head(n=500)

df_temp1['shifted_crossing'] = df_temp1.groupby(by=['video_id','ped_ind'])['crossing'].transform(lambda x: x.shift(-1))
df_temp1

Unnamed: 0,video_id,ped_ind,cross_overall,frame_numbers,moving slow,stopped,handwave,look,clear path,moving fast,looking,standing,slow down,nod,speed up,crossing,shifted_crossing
0,video_0071,1,False,0,False,True,False,False,False,False,False,False,False,False,False,False,False
1,video_0071,1,False,1,False,True,False,False,False,False,False,False,False,False,False,False,False
2,video_0071,1,False,2,False,True,False,False,False,False,False,False,False,False,False,False,False
3,video_0071,1,False,3,False,True,False,False,False,False,False,False,False,False,False,False,False
4,video_0071,1,False,4,False,True,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
128215,video_0179,2,True,325,False,False,False,False,False,False,False,False,False,False,False,True,True
128216,video_0179,2,True,326,False,False,False,False,False,False,False,False,False,False,False,True,True
128217,video_0179,2,True,327,False,False,False,False,False,False,False,False,False,False,False,True,True
128218,video_0179,2,True,328,False,False,False,False,False,False,False,False,False,False,False,True,True


In [101]:
num = 9
df_to_play = df_temp1[(df_temp1.video_id == 'video_0071') & (df_temp1.ped_ind == '1')]
df_to_play = df_to_play.loc[:,feature_columns]
df_to_play = df_to_play.head(n=num)
tau = 4
total_rows = num - tau + 1
features = np.zeros((total_rows, tau,len(feature_columns)))
for i in range(total_rows):
    features[0] = df_to_play.iloc[i:i+tau].values.astype(int)


features

array([[[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
      

In [94]:
df_to_play

Unnamed: 0,moving slow,stopped,handwave,look,clear path,moving fast,looking,standing,slow down,nod,speed up,crossing
0,False,True,False,False,False,False,False,False,False,False,False,False
1,False,True,False,False,False,False,False,False,False,False,False,False
2,False,True,False,False,False,False,False,False,False,False,False,False
3,False,True,False,False,False,False,False,False,False,False,False,False
4,False,True,False,False,False,False,False,False,False,False,False,False
5,False,True,False,False,False,False,False,False,False,False,False,False
6,False,True,False,False,False,False,False,False,False,False,False,False
7,False,True,False,False,False,False,False,False,False,False,False,False
8,False,True,False,False,False,False,False,False,False,False,False,False


## Analysis

In [7]:
### Please write up a bit about what you did and your findings ###


In [7]:
from d2l import torch as d2l
import torch
from torch import nn
from torch.nn import functional as F

batch_size, num_steps = 32, 35
train_iter, vocab = d2l.load_data_time_machine(batch_size, num_steps)

In [16]:
m = nn.Linear(20, 1)
input = torch.randn(128, 20,20)
output = m(input)
print(output.size())

torch.Size([128, 20, 1])
