# IDEA League exercise - 100car Naturalistic data analysis

Contact: Alexander Rasch ([alexander.rasch@chalmers.se](alexander.rasch@chalmers.se)), Pierluigi Olleja ([pierluigi.olleja@chalmers.se](pierluigi.olleja@chalmers.se))

Institution: Chalmers University of Technology

Date: September 2022

Course: IDEA League summer school (Analysis and modelling road user behaviour)

_Note:_ This notebook was developed based on the conda environment specified in _environment.yml_. To run it, you could use one of the following ways:
- Install the environment from conda (you need to have e.g. [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or [Anaconda](https://www.anaconda.com/products/individual) installed): `conda env create -f environment.yml` and activate it via `conda activate idea-league-chalmers`
- Upload it to [Google Colab](https://colab.research.google.com/) (you will also need to upload the files _Event.mat_ and _util.py_ to the same location, please do _not_ share the data publicly!)
- Have all the packages installed in your local Python installation (see the _environment.yml_ file for detailed specifications)

In [5]:
# If you use Google Colab, you may need the following lines to upload the Event.mat and util.py files

# from google.colab import files
# uploaded = files.upload()

In [6]:
import numpy as np
from scipy.io import loadmat
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd

# Utilities for loading the data from MATLAB and calculating odds ratios
import util

np.set_printoptions(precision=2)

## Task 1 (MATLAB-only!)

For this first task you do not need to write any code. You need, instead, to use the SAFER100car software (including the documentation), and the 100car's dictionaries.

Follow the steps below:
1. Start the SAFER100car GUI in MATLAB: execute the file _SAFER100car.m_
2. Load a 'crash' event
3. Plot the speed signal (and check its quality)
4. Read the narrative of the event
5. Explore and understand the video annotations (more info in the dictionaries)
6. Explore and understand the glance behavior (more info in the dictionaries)
7. Can you make a story? For example: What was the event's contributing factor? Was the driver distracted? Were there other vehicles involved in the event?
8. Repeat for some more events, including near-crashes.

## Task 2

This task is about understanding the relation between crash severity and distracting activities. 

Load the data from _Event.mat_ with the given function `util.get_data`. When the data are loaded (which can take about half a minute!), the pandas DataFrame `data`, which includes the information on all events in the 100car dataset is available.

In [7]:
data = util.get_data("Event.mat")

data.head()



Unnamed: 0,ID,Narratives,Sensor,Video,Glance
0,8296,Subject driver is adjusting the radio with her...,<Python.util.Sensor object at 0x0000025910E72670>,<Python.util.Video object at 0x00000259184F34C0>,<Python.util.Glances object at 0x00000259184F3...
1,8297,"Subject vehicle is traveling in the left lane,...",<Python.util.Sensor object at 0x00000259184F35E0>,<Python.util.Video object at 0x00000259184F3BB0>,<Python.util.Glances object at 0x00000259184F3...
2,8298,Subject driver begins to change lanes (from mi...,<Python.util.Sensor object at 0x00000259184F3CD0>,<Python.util.Video object at 0x00000259184F3C40>,<Python.util.Glances object at 0x00000259184F3...
3,8299,Subject driver is in the entrance/exit only la...,<Python.util.Sensor object at 0x00000259184F3580>,<Python.util.Video object at 0x00000259184F3C70>,<Python.util.Glances object at 0x00000259184F3...
4,8300,Subject vehicle pulls off of the road on the r...,<Python.util.Sensor object at 0x00000259184F3EE0>,<Python.util.Video object at 0x0000025918830430>,<Python.util.Glances object at 0x0000025918830...


Each row in the DataFrame `data` includes five columns, the `ID` of the event as an integer number, a description of the event (`Narratives`) as string, a `Sensor` object containing sensor data, a `Video` object containing video annotations and settings, and a `Glances` object containing the glance data.

See below a way to show all the attributes of, for instance, a `Video` object (for the first event at index 0).

In [8]:
print(*list(data.iloc[0]["Video"].__dict__.keys()), sep=', ')

ID, vehicle_webid, start, end, severity, subject_ID, age, gender, nature, incident_type, pre_incident_maneuver, maneuver_judgment, precipitating_event, driver_reaction, post_maneuver_control, driver_behaviour_1, driver_behaviour_2, driver_behaviour_3, driver_impairments, infrastructure, distraction_1, distraction_1_start_sync, distraction_1_end_sync, distraction_1_outcome, distraction_2, distraction_2_start_sync, distraction_2_end_sync, distraction_2_ouctome, distraction_3, distraction_3_start_sync, distraction_3_end_sync, distraction_3_outcome, hands_on_wheel, vehicle_contributing_factors, visual_obstructions, surface_condition, traffic_flow, travel_lanes, traffic_density, relation_to_junction, alignment, locality, lighting, weather, driver_seatbelt_use, number_of_other_vehicles, fault, vehicle_2_location, vehicle_2_type, vehicle_2_maneuver, vehicle_2_driver_reaction, vehicle_3_location, vehicle_3_type, vehicle_3_maneuver, vehicle_3_driver_reaction, traffic_control


The `Glance` object contains a pandas DataFrame (`glances`) which contains all the annotated glances of the driver. See below for an example event:

In [9]:
data.iloc[0]["Glance"].glances

Unnamed: 0,start,stop,duration,location
0,754,767,14,Cell phone
1,895,905,11,forward
2,805,846,42,forward
3,783,785,3,forward
4,880,894,15,right forward
5,720,753,34,forward
6,778,782,5,left window
7,786,804,19,Interior object
8,933,938,6,center stack
9,958,1017,60,Cell phone


To access certain attributes of the `Video` data, for instance, you could use for loops and the `iloc` function, or `lambda` functions, as follows:

In [10]:
data["Video"].apply(lambda v: v.nature)

0                Conflict with a lead vehicle
1                Conflict with a lead vehicle
2      Conflict with vehicle in adjacent lane
3                     Single vehicle conflict
4      Conflict with vehicle in adjacent lane
                        ...                  
823                      Conflict with animal
824              Conflict with a lead vehicle
825              Conflict with a lead vehicle
826              Conflict with a lead vehicle
827         Conflict with a following vehicle
Name: Video, Length: 828, dtype: object

Count "Crashes" with "Talking/listening on cell phone" as _first_ distraction. Look at the dictionary "ResearcherDictionaryVideoReductionDatav1_1" (uou can find it in the _SAFER100car_v1.5.zip_ archive) to understand how to extract this information. You may, for instance, use a for-loop to search the data or apply a lambda function to the relevant column.

In [11]:
Crashes_with_talking_on_cellphone = data[(data["Video"].apply(lambda v: v.distraction_1) == "Talking/listening on cell phone") & \
    (data["Video"].apply(lambda v: v.severity) == "Crash")].shape[0]
Crashes_with_talking_on_cellphone

6

Count "Crashes" without "Talking/listening on cell phone" as _first_ distraction.

In [12]:
Crashes_without_talking_on_cellphone = data[(data["Video"].apply(lambda v: v.distraction_1) != "Talking/listening on cell phone") & \
    (data["Video"].apply(lambda v: v.severity) == "Crash")].shape[0]
Crashes_without_talking_on_cellphone

62

Count "Near Crashes" with "Talking/listening on cell phone" use as _first_ distraction.

In [13]:
NearCrashes_with_talking_on_cellphone = data[(data["Video"].apply(lambda v: v.distraction_1) == "Talking/listening on cell phone") & \
    (data["Video"].apply(lambda v: v.severity) == "Near Crash")].shape[0]
NearCrashes_with_talking_on_cellphone

38

Count "Near Crashes" without "Talking/listening on cell phone" use as _first_ distraction.

In [14]:
NearCrashes_without_talking_on_cellphone = data[(data["Video"].apply(lambda v: v.distraction_1) != "Talking/listening on cell phone") & \
    (data["Video"].apply(lambda v: v.severity) == "Near Crash")].shape[0]
NearCrashes_without_talking_on_cellphone

722

Fill in the contingency table. You should oraganize the table as follow:

|                                     | Crashes | Near-crashes |
|-------------------------------------|---------|--------------|
| Talking/listening on cell phone     | x       | x            |
| Not Talking/listening on cell phone | x       | x            |

In [15]:
contingency_table = pd.DataFrame([[Crashes_with_talking_on_cellphone, NearCrashes_with_talking_on_cellphone], \
    [Crashes_without_talking_on_cellphone, NearCrashes_without_talking_on_cellphone]], \
        ["Talking/listening on cell phone", "Not Talking/listening on cell phone"], ["Crashes", "Near-crashes"])

contingency_table

Unnamed: 0,Crashes,Near-crashes
Talking/listening on cell phone,6,38
Not Talking/listening on cell phone,62,722


Calculate Odd ratios (ORs) and Confidence Intervals (CIs) (you may want to use the given function `util.get_odds_ratio_ci`)

In [16]:
OR, CI95 = util.get_odds_ratio_ci(contingency_table)

print(f"Odds ratio: {OR:.2f}, 95% confidence interval: {CI95}")

Odds ratio: 1.84, 95% confidence interval: [0.75 4.52]


Interpret the results and discuss with your colleague. No code needed for this.

## Task 3

Task 3 is similar to what you did in Task 2. However, here you decide what odds ratio to calculate. It can be another type of distracting activity or another attribute of the event. Then, interpret the results and discuss with your colleague.

In [17]:
# ...

## Task 4

This task is about understanding the relation between crash severity and glance behaviour. The code for solving this task is a bit more involved, because we need to inspect the glance time series for each event separately. Feel free to modify it to your needs.

In [18]:
Crashes_with_eyes_on_road = 0
NearCrashes_with_eyes_on_road = 0

Crashes_with_eyes_off_road = 0
NearCrashes_with_eyes_off_road = 0

for idx_data in range(data.shape[0]):
    
    # Extract the variable that defines the beginning of the `precipitating event`.
    # Look at the dictionary "ResearcherDictionaryVideoReductionDatav1_1"
    # to understand how to extract and interpret this information. 
    
    precipitating_event_start = data.iloc[idx_data]["Video"].start

    # For each event, loop through the glances time-series...
    for idx_glance in range(data.iloc[idx_data]["Glance"].glances.shape[0]):

        # Identify the glance that include the beginning of the
        # precipitating event. You may want to use the variable `start` and
        # `stop` for each glance.
        
        glance = data.iloc[idx_data]["Glance"].glances.iloc[idx_glance]
        
        if glance["start"] <= precipitating_event_start and precipitating_event_start <= glance["stop"]:
            
            # Check if the glance is valid. That is, exclude when the video
            # was not available

            if glance.location != "No Video":

                # Check if the glance was on road (consult the glance 
                # dictionary to decide about the appropriate glance 
                # location variable name)
                if glance.location == "forward":
                    # Count Crashes and Near-crashes with eyes on-road at precipitating event
                    if data.iloc[idx_data]["Video"].severity == "Crash":
                        Crashes_with_eyes_on_road = Crashes_with_eyes_on_road + 1
                    else:
                        NearCrashes_with_eyes_on_road = NearCrashes_with_eyes_on_road + 1
                else: # The glance was not on road
                    # Count Crashes and Near-crashes with eyes off-road at precipitating event
                    if data.iloc[idx_data]["Video"].severity == "Crash":
                        Crashes_with_eyes_off_road = Crashes_with_eyes_off_road + 1
                    else:
                        NearCrashes_with_eyes_off_road = NearCrashes_with_eyes_off_road + 1


Fill in the contingency table. You should oraganize the table as follow:

|               | Crashes | Near-crashes |
|---------------|---------|--------------|
| Eyes on-road  | x       | x            |
| Eyes off-road | x       | x            |

In [19]:
contingency_table = pd.DataFrame([[Crashes_with_eyes_on_road, NearCrashes_with_eyes_on_road], \
    [Crashes_with_eyes_off_road, NearCrashes_with_eyes_off_road]], \
        ["Eyes on-road", "Eyes off-road"], ["Crashes", "Near-crashes"])

contingency_table

Unnamed: 0,Crashes,Near-crashes
Eyes on-road,27,393
Eyes off-road,22,156


Calculate Odd ratios (ORs) and Confidence Intervals (CIs) (you may want to use the given function `util.get_odds_ratio_ci`)

In [20]:
OR, CI95 = util.get_odds_ratio_ci(contingency_table)

print(f"Odds ratio: {OR:.2f}, 95% confidence interval: {CI95}")

Odds ratio: 0.49, 95% confidence interval: [0.27 0.88]


Interpret the results and discuss with your colleague. No code needed for this.