# Stay In Your Lane!
## Automated Bike Lane Enforcement With Neural Network Image Classification: Technical Notebook
### Author: Jesse Markowitz, October 2021

<img src="readme_images/cab_in_bikelane.png" alt="a scene often seen in NYC" width="600"/>

In [2]:
# Load dependencies
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

from functions import *

%load_ext autoreload
%autoreload 2
%matplotlib inline

## Business Understanding

Biking is my primary mode of transportation in and around New York City, as it is for an increasing number of people every year. I bike to commute to work, for groceries or other personal trips, and for exercise/pleasure. However, on every single trip I make in the city, I face a serious safety issue: **cars parked in bike lanes force me to weave in and out of traffic.** Although it is illegal to stop, stand, or park in a bike lane, vehicles in the city frequently do. Despite an increase in bike infrastructure and ridership in NYC, this problem continues, seemingly unenforced and unabated. The worst offenders are Taxi and Limousine Commission (T&LC) cars (yellow and green cabs, as well as rideshare vehicles for Uber, Lyft, etc.), delivery trucks, and police vehicles (personal and service). While it is possible to report offenders via 311 (the [Reported app](https://reportedly.weebly.com/) makes this especially easy), in general it is only T&LC drivers who are held accountable for these violations, as there is a set of prosecutors specifically for that regulatory purpose. When it comes to personal and police vehicles, 311 forwards the complaint to the local police precinct, where it is up to the responding officer's discretion to follow up. This rarely occurs. **Insufficient enforcement of bike lane traffic laws creates serious safety issues for cyclists.**

<img src="readme_images/blocked_bike_lane_nj_port_authority.png" alt="yet another cop in the bike lane" width="600"/>

On September 15, 2021, the NYC DOT released a ["Request for Expressions of Interest"](https://a856-cityrecord.nyc.gov/RequestDetail/20210907107) to create a system for automated bike lane enforcement. A system for bus lanes called the Automated Bus Lane Enforcement (ABLE) system was created by [Siemens Mobility](https://www.mobility.siemens.com/us/en/company/newsroom/short-news/first-ever-mobile-bus-lane-enforcement-solution-in-new-york.html) and installed in 2010 and has been expanded since then with great success, as measured by increased route speed and ridership. Automating enforcement of bike lane traffic laws would have the immediate effect of increase enforcement from what seem to be negligible levels. Automated enforcement would also benefit the city's cyclists by reducing the need for active police involvement with the issue, especially on streets where the problem is the greatest.

## Data Understanding and Preparation

The dataset consists of just over 1,800 images of New York City bike lanes, up from about 1,600 at the beginning of the project. Just over half of these images are of a bike lane obstructed by a vehicle, which comprises the target class. The rest of the images are of bike lanes without vehicular obstruction, showing entirely empty bike lanes or, on occassion, bike lanes with cyclists or pedestrians. The small size of the dataset is one of the most significant limitations of this project.

The images in the dataset were collected from a variety of sources:
 - The [Reported app's Twitter page](https://twitter.com/Reported_NYC), which tweets all traffic violations reported through the app
 - A large dataset of images provided by [Ryan Gravener](https://github.com/snooplsm), who is working on an image recognition project for Reported
 - Screenshots from Google Maps Street View
 - Manual collection (i.e., taking photos while biking around the city--this is the source of the vast majority of the non-target images of unobstructed bike lanes)

### Preprocessing

#### Re-orienting images
All of the images collected manually were taken with an iPhone X and saved as jpegs. Because we want to display images oriented correctly, digital cameras and smartphone cameras attach an Orientation tag to the EXIF data with each photo taken. This tag is read by most image display programs in order to orient the image correctly without altering the underlying image data, but a Keras `ImageDataGenerator` does not do this. As a result, the raw images collected manually are improperly oriented when fed into the model:

#### Cropping images
Many of the images collected via Reported contain timestamps printed at the top of the image: 

In [6]:
# EXAMPLE OF TIMESTAMP IMAGES

This is to enhance the photo's value as evidence, but creates a potentially confounding factor in the dataset because images with timestamps will be overrepresented in the target class. Without removing this feature, it's possible that the model will use it to predict the target class, rather than attending to real features in the image.

Cropping the top of the image is an easy way 

In [3]:
# Filepaths
train_dir = 'input_images/full_combined'

train_open_dir = os.path.join(train_dir, 'open_bike_lane')
train_vehicle_dir = os.path.join(train_dir, 'vehicle_bike_lane')

train_open_dir

'input_images/full_combined/open_bike_lane'

#### Removing unclear images
The final step before a train-val-test split is to manually review images, ensuring they are in the correct class directory, and removing any that are inappropriate or unclean. Images were generally removed that:
 - did not show both lane lines of a bike lane or were too "close up" to a vehicle
 - contained too many cyclists, motorbikes, or pedestrians such that the bike lane was significantly obstructed
 - were taken at night (these were extremely overrepresented in the target class)
 - showed a car crossing a bike lane legally (i.e., crossing an intersection)
 - were deemed to not adequately contain the information 

In [7]:
# EXAMPLES OF UNUSED IMAGES

Many of these decisions were subjective judgments and there were a surprisingly large number of images that were ambiguous. These images were kept in a separate `unused_images` folder for later inclusion or testing and as non-examples.

In [4]:
# Delete metadata files created by Mac OS
!find . -name ".DS_Store" -delete

In [5]:
# Check functionality and number of images
print('There are', len(os.listdir(train_open_dir)), 'non-target images in the training set')
print('There are', len(os.listdir(train_vehicle_dir)), 'target images in the training set')

# Expecting:
# 758 non-target
# 861 target

There are 758 non-target images in the training set
There are 861 target images in the training set


### Train-Validation-Test Split

Only 100 images (50 of each class) were set aside as a testing/holdout set for final model evaluation in order to maximize the training set. An additional 100 images (50 of each class) have been set aside as a `validation` set to use during model training. The [split-folders](https://pypi.org/project/split-folders/) package provides an easy way to accomplish this and has methods for splitting either by a ratio or a fixed number. 

I actually split twice because I did not start with a `validation` set, opting instead to use the `validation_split` parameter in Keras's `ImageDataGenerator` class. My first split was as follows:
```python


However, this causes two serious data issues. First, it creates a non-random validation set because `validation_split=0.1` simply withholds the last 10% of images in the dataset. Since my images are 

As new images were collected, they were added to a separate `new` folder, separate from the original `train` image set, then combined in a separate `full_combined` folder to be used for continued model training. 

The images are arranged in the following file structure:

```
└── input_images
    ├── full_combined
    ├── new
    ├── test
    ├── train
    └── validation
```
Each folder of images contains 2 subfolders to designate image classes, as shown below with one example:
```
└── input_images
    ├── full_combined
    │    ├──open_bike_lane
    │    └──vehicle_bike_lane
```

### Class distribution

In [None]:
# Visualize class distribution in training data
total_images = len(open_images) + len(vehicle_images)
fig, ax = plt.subplots(figsize=(8, 8))

ax.bar(x=['Open', 'Vehicle'], height=[len(open_images)/total_images, 
                                                  len(vehicle_images)/total_images])
ax.set_title('Class Distribution in Training Data', size=15)
ax.set_ylabel('Percentage of Dataset', size=13)
ax.set_xlabel('Image Class', size=13)
ax.set_yticklabels([str(int(p*100))+'%' for p in ax.get_yticks()])
plt.show()

### Image Samples

Below are samples of images from each class