# IDEA League exercise - 100car Naturalistic data analysis

Contact: Alexander Rasch ([alexander.rasch@chalmers.se](alexander.rasch@chalmers.se)), Pierluigi Olleja ([pierluigi.olleja@chalmers.se](pierluigi.olleja@chalmers.se))

Institution: Chalmers University of Technology

Date: September 2022

Course: IDEA League summer school (Analysis and modelling road user behaviour)

_Note:_ This notebook was developed based on the conda environment specified in _environment.yml_. To run it, you could use one of the following ways:
- Install the environment from conda (you need to have e.g. [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or [Anaconda](https://www.anaconda.com/products/individual) installed): `conda env create -f environment.yml` and activate it via `conda activate idea-league-chalmers`
- Upload it to [Google Colab](https://colab.research.google.com/) (you will also need to upload the files _Event.mat_ and _util.py_ to the same location, please do _not_ share the data publicly!)
- Have all the packages installed in your local Python installation (see the _environment.yml_ file for detailed specifications)

In [1]:
# If you use Google Colab, you may need the following lines to upload the Event.mat and util.py files

# from google.colab import files
# uploaded = files.upload()

In [2]:
import numpy as np
from scipy.io import loadmat
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd

# Utilities for loading the data from MATLAB and calculating odds ratios
import util

np.set_printoptions(precision=2)

## Task 1 (MATLAB-only!)

For this first task you do not need to write any code. You need, instead, to use the SAFER100car software (including the documentation), and the 100car's dictionaries.

Follow the steps below:
1. Start the SAFER100car GUI in MATLAB: execute the file _SAFER100car.m_
2. Load a 'crash' event
3. Plot the speed signal (and check its quality)
4. Read the narrative of the event
5. Explore and understand the video annotations (more info in the dictionaries)
6. Explore and understand the glance behavior (more info in the dictionaries)
7. Can you make a story? For example: What was the event's contributing factor? Was the driver distracted? Were there other vehicles involved in the event?
8. Repeat for some more events, including near-crashes.

## Task 2

This task is about understanding the relation between crash severity and distracting activities. 

Load the data from _Event.mat_ with the given function `util.get_data`.

In [None]:
print(util.get_data.__doc__)

When the data are loaded (which can take about half a minute!), the pandas DataFrame `data`, which includes the information on all events in the 100car dataset is available.

In [None]:
data = util.get_data("Event.mat")

data.head()

Each row in the DataFrame `data` includes five columns, the `ID` of the event as an integer number, a description of the event (`Narratives`) as string, a `Sensor` object containing sensor data, a `Video` object containing video annotations and settings, and a `Glances` object containing the glance data.

See below a way to show all the attributes of, for instance, a `Video` object (for the first event at index `0`).

In [None]:
print(*list(data.iloc[0]["Video"].__dict__.keys()), sep=', ')

The `Glance` object contains a pandas DataFrame (`glances`) which contains all the annotated glances of the driver. See below for an example event:

In [None]:
data.iloc[0]["Glance"].glances

To access certain attributes of the `Video` data, for instance, you could use for loops and the `iloc` function, or `lambda` functions, as follows:

In [None]:
data["Video"].apply(lambda v: v.nature)

Count "Crashes" with "Talking/listening on cell phone" as _first_ distraction. Look at the dictionary "ResearcherDictionaryVideoReductionDatav1_1" (uou can find it in the _SAFER100car_v1.5.zip_ archive) to understand how to extract this information. You may, for instance, use a for-loop to search the data or apply a lambda function to the relevant column.

In [None]:
Crashes_with_talking_on_cellphone = data[(data["Video"].apply(lambda v: v.distraction_1) == "Talking/listening on cell phone") & \
    (data["Video"].apply(lambda v: v.severity) == "Crash")].shape[0]
Crashes_with_talking_on_cellphone

Count "Crashes" without "Talking/listening on cell phone" as _first_ distraction.

In [None]:
Crashes_without_talking_on_cellphone = []

Count "Near Crashes" with "Talking/listening on cell phone" use as _first_ distraction.

In [None]:
NearCrashes_with_talking_on_cellphone = []

Count "Near Crashes" without "Talking/listening on cell phone" use as _first_ distraction.

In [None]:
NearCrashes_without_talking_on_cellphone = []

Fill in the contingency table. You should oraganize the table as follow:

|                                     | Crashes | Near-crashes |
|-------------------------------------|---------|--------------|
| Talking/listening on cell phone     | x       | x            |
| Not Talking/listening on cell phone | x       | x            |

In [None]:
contingency_table = pd.DataFrame([[Crashes_with_talking_on_cellphone, NearCrashes_with_talking_on_cellphone], \
    [Crashes_without_talking_on_cellphone, NearCrashes_without_talking_on_cellphone]], \
        ["Talking/listening on cell phone", "Not Talking/listening on cell phone"], ["Crashes", "Near-crashes"])

contingency_table

Calculate Odd ratios (ORs) and Confidence Intervals (CIs) (you may want to use the given function `util.get_odds_ratio_ci`)

In [None]:
print(util.get_odds_ratio_ci.__doc__)

In [None]:
OR, CI95 = []

print(f"Odds ratio: {OR:.2f}, 95% confidence interval: {CI95}")

Interpret the results and discuss with your colleague. No code needed for this.

## Task 3

Task 3 is similar to what you did in Task 2. However, here you decide what odds ratio to calculate. It can be another type of distracting activity or another attribute of the event. Then, interpret the results and discuss with your colleague.

In [None]:
# ...

## Task 4

This task is about understanding the relation between crash severity and glance behaviour. The code for solving this task is a bit more involved, because we need to inspect the glance time series for each event separately. Feel free to modify it to your needs.

In [None]:
Crashes_with_eyes_on_road = 0
NearCrashes_with_eyes_on_road = 0

Crashes_with_eyes_off_road = 0
NearCrashes_with_eyes_off_road = 0

for idx_data in range(data.shape[0]):
    
    # Extract the variable that defines the beginning of the `precipitating event`.
    # Look at the dictionary "ResearcherDictionaryVideoReductionDatav1_1"
    # to understand how to extract and interpret this information. 
    
    precipitating_event_start = []

    # For each event, loop through the glances time-series...
    for idx_glance in range(data.iloc[idx_data]["Glance"].glances.shape[0]):

        # Identify the glance that include the beginning of the
        # precipitating event. You may want to use the variable `start` and
        # `stop` for each glance.
        
        glance = data.iloc[idx_data]["Glance"].glances.iloc[idx_glance]
        
        if []:
            
            # Check if the glance is valid. That is, exclude when the video
            # was not available

            if []:

                # Check if the glance was on road (consult the glance 
                # dictionary to decide about the appropriate glance 
                # location variable name)
                if []:
                    # Count Crashes and Near-crashes with eyes on-road at precipitating event
                    if []:
                        Crashes_with_eyes_on_road = Crashes_with_eyes_on_road + 1
                    else:
                        NearCrashes_with_eyes_on_road = NearCrashes_with_eyes_on_road + 1
                else: # The glance was not on road
                    # Count Crashes and Near-crashes with eyes off-road at precipitating event
                    if []:
                        Crashes_with_eyes_off_road = Crashes_with_eyes_off_road + 1
                    else:
                        NearCrashes_with_eyes_off_road = NearCrashes_with_eyes_off_road + 1


Fill in the contingency table. You should oraganize the table as follow:

|               | Crashes | Near-crashes |
|---------------|---------|--------------|
| Eyes on-road  | x       | x            |
| Eyes off-road | x       | x            |

In [None]:
contingency_table = pd.DataFrame([[Crashes_with_eyes_on_road, NearCrashes_with_eyes_on_road], \
    [Crashes_with_eyes_off_road, NearCrashes_with_eyes_off_road]], \
        ["Eyes on-road", "Eyes off-road"], ["Crashes", "Near-crashes"])

contingency_table

Calculate Odd ratios (ORs) and Confidence Intervals (CIs) (you may want to use the given function `util.get_odds_ratio_ci`)

In [None]:
OR, CI95 = []

print(f"Odds ratio: {OR:.2f}, 95% confidence interval: {CI95}")

Interpret the results and discuss with your colleague. No code needed for this.