# CNN Image Preprocessing

Following a similar method to [Deep Feature Extraction from Trajectories for
Transportation Mode Estimation (Endo et al.)](http://www.npal.cs.tsukuba.ac.jp/~endo/pdf/pakdd2016_endo_preprint.pdf) in producing histograms for the trajectory data.

Will address the issue of sample rate inconsistencies affecting pixel intensities later.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv('../Metadata/trajFeatures.csv')

We want to determine a suitable window area to take a snapshot of our trajectories, this we will define as the median (the mean is extremely skewed) window area across all trajectories.

In [4]:
print('Side Length:', np.sqrt(df['Window Area'].median()))

Side Length: 994.6504151675744


This is the length of the sides of a square with an area equal to the mean window area of the trajectories.

Now we need to choose the centre of the trajectory, this we will choose to be roughly equal to the centre point in the series.

In [8]:
from Scripts.chooseTraj import randTraj
from Scripts.trajAnalysis import trajectory

In [11]:
t = trajectory(randTraj('../../Data'))

In [12]:
t.points

array([[-494.79765909, 1709.11357406],
       [-494.79765909, 1709.11357406],
       [-492.32721981, 1714.67187357],
       [-492.32721981, 1714.67187357],
       [-489.68640541, 1717.67335529],
       [-489.68640541, 1717.67335529],
       [-490.02715565, 1719.56317711],
       [-490.02715565, 1719.56317711],
       [-489.09009248, 1719.45201112],
       [-489.09009248, 1719.45201112],
       [-488.66415467, 1719.45201112],
       [-488.66415467, 1719.45201112],
       [-489.00490491, 1721.00833497],
       [-489.00490491, 1721.00833497],
       [-489.00490491, 1721.7864969 ],
       [-489.00490491, 1721.7864969 ],
       [-489.17528004, 1721.45299893],
       [-489.17528004, 1721.45299893],
       [-489.2604676 , 1721.45299893],
       [-489.2604676 , 1721.45299893],
       [-489.68640541, 1721.56416492],
       [-489.68640541, 1721.56416492],
       [-489.60121785, 1721.56416492],
       [-489.60121785, 1721.56416492],
       [-489.43084272, 1721.56416492],
       [-489.43084272, 17

In [13]:
t.points[len(t.points)//2]

array([-489.00490491, 1721.56416492])

I have written this into the `Scripts.traj2imge.imgCentre()` function. Will ideally extend it to the geometric median in future but this is computationally expensive.

From this centre we now must cut out a square window (assuming journeys are approximately isotropic in frequency) with sides of the length we previously computed and convert window into a histogram. This will require us to specify the `numpy` 2Dhistogram bin range explicitly. I have modified `matMat()` accordingly.

In [14]:
from Scripts.traj2image import *

In [17]:
makeMat(t,windowRange(imgCentre(t), 1000),12)

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 2.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0., 14.,  8.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  4., 30.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

The images will be much sparser but it hopefully this still translates to improved classification as they will all be standardised.