# Data Preprocessing

The data needs to be preprocessed before being used for any machine learning algorithms. Some of the datetimes are incorrect, all images need to be given unique names, and information bars need to be cropped out.

In [2]:
from image_preprocessing import *
import os
import os.path
import datetime
import exiftool
import re

## Datetime Correction
Images from the Browning camera initially had incorrect datetimes. To calculate the change in datetime, I looked at the datetimes between a trigger image series of incorrect and corrected datetimes. 

In [2]:
time_delta = datetime.datetime(2019, 6, 30, 10, 9, 0) - datetime.datetime(2018, 1, 28, 13, 23, 0)
time_delta

datetime.timedelta(517, 74760)

Images from the Reconyx camera also initially had incorrect datetimes. I only have information for each day, so the times may always be incorrect. Looking at light in the images, the times seem to match up reasonably well. Following is the change in days to correct Reconyx images.

In [3]:
time_delta = datetime.datetime(2019, 4, 11, 3, 38, 0) - datetime.datetime(2018, 1, 2, 3, 38, 0)
time_delta

datetime.timedelta(464)

Using the change_datetimes function, I corrected the image datetimes by moving images in and out of the data directory. This function will also be useful to adjust datetimes for daylight savings time.

```
active_dir = "./data/"
for image in os.listdir(active_dir):
    change_datetimes(active_dir + image, time_delta)
```

## Unique Naming
I decided to name by site (s1), camera (c1 = Reconyx, c2 = Browning), datetime, and unique number. I need a unique number because the smallest datetime units given by images are seconds and some images share the same datetime. For the Reconyx images I used the image sequence number as a unique tag. Right now I cannot extract the maker note metadata from the Browning images so I add a count number. <br>

TODO:
* Extract maker note metadata from Browning images (might need to use OCR instead)
* Determine best naming system:
 * Option 1: site, camera, initial sequence image trigger datetime, sequence number
 * Option 2: site, camera, image datetime, estimated milliseconds

```
active_dir = "./data/"
for image in os.listdir(active_dir):
    rename_image(active_dir + image)
```

## Image Cropping
To decrease noise in the image data I need to crop out information bars and logos. From experimentation, I need to crop out the ... for the Reconyx and ... for the Browning images.

In [13]:
1440 -1404

36

In [16]:
3024 - 2830

194

Reconyx:
* 1152, top 32, bottom 70
* 1440, top 36, bottom 70

Browning:
* 3024, bottom 194

TODO:
* Determine animal event time algorithm, i.e., right now I have lots of images, but how to block out into separate animal events?
* Re-do functions with pyexif wrapper (uses only one package instead of 3).