# Project Instructions

Your project is to create a module named `sakaydb` for managing ride-hailing data. This is to be done by LT. Your code should be commited on a private repo in GitHub with repo name `sakaydb` then submit this notebook via the `Assignments` tab of `nbgrader`. The project is due on September 7, 2022, 11:59PM. Only one member of the LT should submit---there will be a penalty for submissions from multiple members of an LT. The module should be at the top-level directory of this repo. Grant read access to this repo to Damian Dailisan and Michael Dorosan (GH account `temetski` and `mikedataCrunch`). **Do not write your code on this notebook nor submit it along with this notebook**. Just specify your repo url in the cell below.

Only the following packages may be used for the implementation:

* Python standard libraries
* Numpy (but not scipy)
* Pandas
* Matplotlib


## The `SakayDB` class

The module should contain one class named `SakayDB`. It should have the following specifications:

### Initialization

The class initializer should accept a string `data_dir` which is the directory path to where the data files are located. This path should be stored in the `data_dir` attribute of the object.

### Data persistence

Data are stored in the CSV files `trips.csv`, `drivers.csv`, and `locations.csv` in `data_dir`. The CSV files begin with a column header and the columns of each CSV file are:

`trips.csv`:
  - `trip_id`: an integer assigned to trip
  - `driver_id`: an integer assigned to the driver
  - `pickup_datetime`: datetime of dropoff as string with format "hh:mm:ss,DD-MM-YYYY"
  - `dropoff_datetime`: datetime of dropoff as string with format "hh:mm:ss,DD-MM-YYYY"
  - `passenger_count`: number of passengers as integer
  - `pickup_loc_id`: an integer assinged to to the pickup location
  - `dropoff_loc_id`: an integer assinged to to the dropoff location
  - `trip_distance`: distance in meters as float
  - `fare_amount`: total fare amount as float

`drivers.csv`:
  - `driver_id`: an integer assigned to the driver
  - `last_name`: last name of driver
  - `given_name`: given name of driver
  
`locations.csv`:
 - `location_id`: an integer assigned to the location
 - `loc_name`: zone location name

### Exception

The class has the associated exception `SakayDBError` which is a `ValueError`.


### Features

#### Adding a single trip to the database

Create a method `add_trip` that accepts the following parameters:

  - `driver`: trip driver as a string in `Last name, Given name` format
  - `pickup_datetime`: datetime of pickup as string with format "hh:mm:ss,DD-MM-YYYY"
  - `dropoff_datetime`: datetime of dropoff as string with format "hh:mm:ss,DD-MM-YYYY"
  - `passenger_count`: number of passengers as integer
  - `pickup_loc_name`: zone as a string, (e.g., Pine View, Legazpi Village)
  - `dropoff_loc_name`: zone as a string, (e.g., Pine View, Legazpi Village)
  - `trip_distance`: distance in meters (float)
  - `fare_amount`: amount paid by passenger (float)
  
The method should append the trip data to the end of `trips.csv`, if it exists, or creates it, otherwise. 
  
The `trip_id` is _last_ `trip_id` _in the file_ + 1, or `1` if there's no trip in the file yet. 
  
The `driver` should have a corresponding `driver_id` in `drivers.csv` based on the case-insensitive matches of `given_name` and `last_name` from the `driver` column of `trips.csv`.

It should append the driver in `drivers.csv` if the driver is not yet there. 

The `driver_id` of a new driver is _last `driver_id` in the file_ + 1, or `1` if there's no driver in the file yet.
The method should return the `trip_id` or raise a `SakayDBError` exception if the trip is already in `trips.csv`.

The same instructions above applies for `pickup_loc_name` and `dropoff_loc_name`and its corresponding `location_id` in `locations.csv`.

A trip is said to be in `trips.csv` if there is a trip that matches the `driver` (case-insensitive), `pickup_datetime`, `dropoff_datetime`, `passenger_count`, `pickup_loc_name`, `dropoff_loc_name`, `trip_distance` and `fare_amount`.

#### Adding trips in the database

Create a method `add_trips` that accepts a list of trips in the form of dictionaries with the following keys:
  - `driver`: trip driver as a string in `Last name, Given name` format
  - `pickup_datetime`: datetime of pickup as string with format "hh:mm:ss,DD-MM-YYYY"
  - `dropoff_datetime`: datetime of dropoff as string with format "hh:mm:ss,DD-MM-YYYY"
  - `passenger_count`: number of passengers as integer
  - `pickup_loc_name`: zone as a string, (e.g., Pine View, Legazpi Village)
  - `dropoff_loc_name`: zone as a string, (e.g., Pine View, Legazpi Village)
  - `trip_distance`: distance in meters (float)
  - `fare_amount`: amount paid by passenger (float)
  
The method should add each trip to the database. 
It returns a list of the `trip_ids`s of successfully added trips. If a trip is already in the database, skip it and print: `Warning: trip index {i} is already in the database. Skipping...`

If a trip has invalid or incomplete information, skip it and print `Warning: trip index {i} has invalid or incomplete information. Skipping...` instead. The *trip index* is the zero-based index of the trip in the passed list of trips.
  
#### Deleting a trip in the database

Create a method `delete_trip` that accepts the `trip_id` to delete then removes it from `trips.csv`.
It will raise a `SakayDBError` if the `trip_id` is not found.
  
#### Searching for trips in the database

Create a method `search_trips` that accepts the following keyword arguments:
  - `key`: string, can be any of the ff: `driver_id`, `pickup_datetime`, `dropoff_datetime`, `passenger_count`, `trip_distance`, `fare_amount`. 
For each of the valid keyword arguments, the values passed may be of the following types:
  - `exact` : for single value search. Some *value* with data type and format conforming to that of `key`
  - `range` : for range search
      - Case 1: tuple like (*value*, `None`) sorts by `key` (chronological or ascending) returns all entries from *value*, begin inclusive
      - Case 2: tuple like (`None`,*value*) sorts by `key` (chronological or ascending) returns all entries up to *value*, end inclusive
      - Case 3: tuple like (*value1*, *value2*) sorts by `key` and returns values between *value1* and *value2*, end inclusive.

This method should raise a `SakayDBError` when the following are not satisfied:
* Invalid keyword `key` i.e. not in listed keys above
* Invalid values for `range` i.e. tuples with sizes greater than 2, or either *values1* or *value2* is not parsable as a datetime object.
<!-- * When either *value1* or *value2* is beyond the datetime range available in the database in any of Cases 1 to 3 for `range` -->
<!-- * Invalid input to `exact` i.e. data type or format not algined with values in `key`
 -->
 
The method should return a `pd.DataFrame` of all the entries aligned with search key and values.

#### Exporting data

Create a method `export_data` that returns all of the trips in the database as a pandas data frame with the following columns:
  - `driver_lastname`: trip driver last name as string, capitalize first letter of each word in lastname
  - `driver_givenname` : trip driver last name as string, capitalize first letter of each word in lastname
  - `pickup_datetime`: datetime of pickup as string with format "hh:mm:ss,DD-MM-YYYY"
  - `dropoff_datetime`: datetime of dropoff as string with format "hh:mm:ss,DD-MM-YYYY"
  - `passenger_count`: number of passengers as integer
  - `pickup_loc_name`: zone as a string, (e.g., Pine View, Legazpi Village)
  - `dropoff_loc_name`: zone as a string, (e.g., Pine View, Legazpi Village)
  - `trip_distance`: distance in meters (float)
  - `fare_amount`: id of the trip's director
  
Sort the rows by the corresponding `trip_id` of each trip.
  
#### Generating statistics

Create a method `generate_statistics` that returns a dictionary depending on the `stat` parameter passed to it:
  - `trip`: key is day name (e.g., Monday), value is the average number of trips with pick-ups for that day name in the entire dataset
  - `passenger`: key is each unique `passenger_count`, value is another dictionary with day name (e.g., `Monday`) as key, and value is the average number of trips with pick-ups for that day name in the entire dataset
  - `driver`: key is driver name following the format `Last name, Given name`, value is another dictionary with day name as key and average number of trips of that driver for that day name as value
  - `all`: keys are `trip`, `passenger` and `driver`, values are the corresponding `stat` dictionaries returned by those keywords

The `stat` values are case-sensitive and the method should raise `SakayDBError` if the passed `stat` is unknown.

#### Plotting statistics

Create a method `plot_statistics` that returns a matplotlib `Axes` depending on the `stat` parameter passed to it:
  - `trip`: bar plot of the average number of trips per day of week (e.g., Monday, Tuesday, etc). Consider the following plot specs:
      - `figsize` : (12,8)
      - `title` : 'Average trips per day'
      - `ylabel` and `xlabel` are 'Ave Trips' and 'Day of week' respectively
      - `Day of week` is sorted from Monday to Sunday.
 
  - `passenger`: a line plot with marker 'o' showing average passenger count per day. Each line represents a passenger count (e.g., 0, 1, 2, 3). Consider the following plot specs:
      - `figsize` : (12,8)
      - No title
      - `ylabel` and `xlabel` are 'Ave Trips' and 'Day of week' respectively
      - `legend` must indicate which line corresponds to a particular passenger count and must be labeled accordingly.
      
  - `driver`: In a 7x1 grid, plot the drivers with the top average trips per day as a horizontal bar plot. Each subplot must correspond to a day (e.g., Monday, etc). Consider the following:
      - `figsize` : (8,25)
      - Plot only the top 5 drivers.
      - No titles, but the day of week must be indicated as a legend handle for each subplot.
      - The x-axis (Ave Trips) must be shared by all plots (i.e., only labeled in the bottom subplot, tick locations consistent across all subplots).
      - No `ylabel` but y-tick labels must indicate the driver's name sorted first by decreasing average trip count then alphabetically.

Please be guided by the visible asserts on other considerations not explicity mentioned here.

The `stat` values are case-sensitive and the method should raise `SakayDBError` if the passed `stat` is unknown.

#### Generate Origin-Destination Matrix

Create a method `generate_odmatrix` that takes in a `date_range` input parameter and returns a `pandas.DataFrame` with the `trips.csv` `pickup_loc_name` as the row names (dataframe index) and `dropoff_loc_name` as the columns. The values for each row-column combination is the average daily number of trips that occured within the `date_range` specified.

  - `date_range` : takes a tuple of datetime strings, and filters trips based on `pickup_datetime`. Defaults to `None`, in which case all dates are included.
      - Case 1: tuple like (*value*, `None`) sorts by `key` (chronological or ascending) returns all entries from *value*, begin inclusive
      - Case 2: tuple like (`None`,*value*) sorts by `key` (chronological or ascending) returns all entries up to *value*, end inclusive
      - Case 3: tuple like (*value1*, *value2*) sorts by `key` and returns values between *value1* and *value2*, end inclusive.
      
Input errors to the `date_range` parameter should be handled like that of `search_trips`.



## Grading guide [Still To Change]

* The project has a highest possible score of 150 points.

* Each cell with an assert statement is worth 10 pts. Successfully passing all of the tests in a cell will earn you the entire 10 pts. Failure to pass any of the test in the cell, including hidden tests, will earn no point. No partial points will be given thus make sure that you run and pass all the visible tests in the test suite before submitting.

* Successful git cloning is worth 15 pts. Successful importing of the module is worth 5 pts. 

* If the module fails to clone or import, the professor will attempt to make it work but will merit additional deductions up to 10% of highest possible score.

* Methods should have a sensible docstring. The professor will deduct up to a total of 15 pts for missing, misleading or nonsensible docstrings. If you reasonably follow the numpy docstring format then you will likely not receive any deductions.

* The code should follow PEP8. The professor will run your python codes through [pycodestyle](https://pypi.org/project/pycodestyle/) and will deduct a point up to a total of 15 points for every instance of PEP8 violation (including warning).

In [None]:
# THIS IS THE ONLY CELL THAT YOU WILL MODIFY IN THIS NOTEBOOK.
# Store the SSH clone URL for your `sakaydb` repo as a string
git_repo_url = ''
# YOUR CODE HERE
raise NotImplementedError()

# Automated tests

## Cloning

In [None]:
import pickle
import shutil
import os
import pandas as pd
from tempfile import TemporaryDirectory
from numpy.testing import (assert_equal, assert_almost_equal,
                           assert_raises, assert_allclose)

In [None]:
#### Test clone and pip install
# The tests here will run `git clone {git_repo_url} sakaydb` then copy all
# of the repo contents back to the directory where this notebook is

## Code style

PEP 8 violations and warnings:

In [None]:
!find . -iname "*.py" | xargs pycodestyle | wc -l

## The `SakayDB` class

In [None]:
from sakaydb import SakayDB, SakayDBError

### Initialization

In [None]:
sakay_db = SakayDB('.')
assert_equal(sakay_db.data_dir, '.')


### Adding a trip to the database

In [None]:
trip_columns = ['trip_id', 'driver_id', 'pickup_datetime', 'dropoff_datetime',
                'passenger_count', 'pickup_loc_id', 'dropoff_loc_id',
                'trip_distance', 'fare_amount']
driver_columns = ['driver_id', 'given_name', 'last_name']
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    assert_equal(
        sakay_db.add_trip('Dailisan, Damian', '08:13:00,15-05-2022',
                          '08:46:00,15-05-2022', 2,
                          'UP Campus', 'Legazpi Village', 17.6, 412),
        1
    )
    assert_raises(
        SakayDBError,
        lambda: sakay_db.add_trip('Dailisan, Damian', '08:13:00,15-05-2022',
                                  '08:46:00,15-05-2022', 2, 'UP Campus',
                                  'Legazpi Village', 17.6, 412)
    )
    df_trips = pd.read_csv(os.path.join(temp_dir, 'trips.csv'))
    assert_equal(
        set(df_trips.columns.tolist()),
        set(trip_columns)
    )
    assert_equal(
        df_trips.to_numpy().tolist(),
        [[1, 1, '08:13:00,15-05-2022',
          '08:46:00,15-05-2022',
          2, 1, 2, 17.6, 412]]
    )
    assert_equal(
        df_trips.index.tolist(),
        [0]
    )
    df_drivers = pd.read_csv(os.path.join(temp_dir, 'drivers.csv'))
    assert_equal(
        set(df_drivers.columns.tolist()),
        set(driver_columns)
    )
    assert_equal(
        df_drivers[driver_columns].to_numpy().tolist(),
        [[1, 'Damian', 'Dailisan']]
    )
    assert_equal(
        df_drivers.index.tolist(),
        [0]
    )

In [None]:
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    assert_equal(
        sakay_db.add_trip('Dailisan, Damian', '08:13:00,15-05-2022',
                          '08:46:00,15-05-2022', 2,
                          'UP Campus', 'Legazpi Village', 17.6, 412),
        1
    )
    assert_equal(
        sakay_db.add_trip('Dorosan, Michael', '14:13:00,31-12-2022',
                          '14:46:00,31-12-2022', 1,
                          'Fairview', 'Highway Hills', 15.1, 371),
        2
    )
    assert_equal(
        sakay_db.add_trip('Alis, Christian', '09:13:00,16-08-2022',
                          '09:46:00,16-08-2022', 3,
                          'Loyola Heights', 'Legazpi Village', 8.9, 235),
        3
    )
    assert_equal(
        sakay_db.add_trip('Dailisan, Damian', '15:13:00,09-09-2022',
                          '15:46:00,09-09-2022', 2,
                          'Pasong Putik', 'San Antonio', 31.2, 716),
        4
    )
    assert_raises(
        SakayDBError,
        lambda: sakay_db.add_trip('Alis, Christian', '09:13:00,16-08-2022',
                                  '09:46:00,16-08-2022', 3,
                                  'Loyola Heights', 'Legazpi Village', 8.9, 235)
    )

In [None]:
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    assert_equal(
        sakay_db.add_trip('Dailisan, Damian', '08:13:00,15-05-2022',
                          '08:46:00,15-05-2022', 2,
                          'UP Campus', 'Legazpi Village', 17.6, 412),
        1
    )
    assert_equal(
        sakay_db.add_trip('Dailisan, Damian', '14:13:00,31-12-2022',
                          '14:46:00,31-12-2022', 1,
                          'Fairview', 'Highway Hills', 15.1, 371),
        2
    )
    assert_equal(
        sakay_db.add_trip('Dailisan, Damian', '09:13:00,16-08-2022',
                          '09:46:00,16-08-2022', 3,
                          'Fairview', 'Highway Hills', 17.6, 412),
        3
    )
    assert_raises(
        SakayDBError,
        lambda: sakay_db.add_trip(' Dailisan, Damian ', '09:13:00,16-08-2022',
                                  '09:46:00,16-08-2022', 3,
                                  ' Fairview ', ' Highway Hills', 17.6, 412)
    )

### Adding trips in the database

In [None]:
%%capture out
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    sakay_db.add_trips([
        {'driver': 'Dailisan, Damian',
         'pickup_datetime': '08:13:00,15-05-2022',
         'dropoff_datetime': '08:46:00,15-05-2022',
         'passenger_count': 2,
         'pickup_loc_name': 'UP Campus',
         'dropoff_loc_name': 'Legazpi Village',
         'trip_distance': 17.6,
         'fare_amount': 412},
        {'driver': 'Dorosan, Michael',
         'pickup_datetime': '14:13:00,31-12-2022',
         'dropoff_datetime': '14:46:00,31-12-2022',
         'passenger_count': 1,
         'pickup_loc_name': 'Fairview',
         'dropoff_loc_name': 'Highway Hills',
         'trip_distance': 15.1,
         'fare_amount': 371},
        {'driver': 'Alis, Christian',
         'pickup_datetime': '09:13:00,16-08-2022',
         'dropoff_datetime': '09:46:00,16-08-2022',
         'pickup_loc_name': 'Loyola Heights',
         'dropoff_loc_name': 'Legazpi Village',
         'trip_distance': 8.9,
         'fare_amount': 235},
        {'driver': 'Dailisan, Damian',
         'pickup_datetime': '15:13:00,09-09-2022',
         'dropoff_datetime': '15:46:00,09-09-2022',
         'passenger_count': 2,
         'pickup_loc_name': 'Pasong Putik',
         'dropoff_loc_name': 'San Antonio',
         'trip_distance': 31.2,
         'fare_amount': 716},
        {'driver': 'Dorosan, Michael',
         'pickup_datetime': '14:13:00,31-12-2022',
         'dropoff_datetime': '14:46:00,31-12-2022',
         'passenger_count': 1,
         'pickup_loc_name': 'Fairview',
         'dropoff_loc_name': 'Highway Hills',
         'trip_distance': 15.1,
         'fare_amount': 371}
    ])

In [None]:
assert_equal(
    out.stdout,
    'Warning: trip index 2 has invalid or incomplete information. '
    'Skipping...\n'
    'Warning: trip index 4 is already in the database. Skipping...\n'
)

### Deleting a trip in the database

In [None]:
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    assert_raises(SakayDBError, lambda: sakay_db.delete_trip(1))
    shutil.copy('trips_test.csv', os.path.join(temp_dir, 'trips.csv'))
    shutil.copy('drivers_test.csv', os.path.join(temp_dir, 'drivers.csv'))
    sakay_db.delete_trip(1)
    assert_raises(SakayDBError, lambda: sakay_db.delete_trip(1))

### Searching for trips in the database

In [None]:
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    assert_raises(SakayDBError, lambda: sakay_db.search_trips())
    assert_equal(sakay_db.search_trips(driver_id=1), [])
    shutil.copy('trips_test.csv', os.path.join(temp_dir, 'trips.csv'))
    shutil.copy('drivers_test.csv', os.path.join(temp_dir, 'drivers.csv'))
    assert_raises(SakayDBError, lambda: sakay_db.search_trips())

    assert_equal(sakay_db.search_trips(driver_id=1).to_numpy().tolist(),
                 [[1, 1, '08:13:00,15-05-2022', '08:46:00,15-05-2022', 2, 1, 2, 17.6, 412],
                  [4, 1, '15:13:00,09-09-2022', '15:46:00,09-09-2022', 2, 6, 7, 31.2, 716]]
                 )
    assert_equal(sakay_db.search_trips(fare_amount=(None, 300)).to_numpy().tolist(),
                 [[3, 3, '09:13:00,16-08-2022', '09:46:00,16-08-2022', 3, 5, 2, 8.9, 235]]
                 )
    assert_equal(sakay_db.search_trips(driver_id=1, fare_amount=(None, 300)).to_numpy().tolist(),
                 [])
    assert_equal(sakay_db.search_trips(driver_id=1, fare_amount=(200, 500)).to_numpy().tolist(),
                 [[1, 1, '08:13:00,15-05-2022', '08:46:00,15-05-2022', 2, 1, 2, 17.6, 412]])

    assert_equal(sakay_db.search_trips(
                 pickup_datetime=('00:00:00,1-05-2022', '23:59:59,31-08-2022')
                 ).to_numpy().tolist(),
                 [[1, 1, '08:13:00,15-05-2022', '08:46:00,15-05-2022', 2, 1, 2, 17.6, 412],
                  [3, 3, '09:13:00,16-08-2022', '09:46:00,16-08-2022', 3, 5, 2, 8.9, 235]]
                 )

### Exporting data

In [None]:
export_columns = ['driver_givenname', 'driver_lastname', 'pickup_datetime', 'dropoff_datetime',
                  'passenger_count', 'pickup_loc_name', 'dropoff_loc_name',
                  'trip_distance', 'fare_amount']
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    df_export = sakay_db.export_data()
    assert_equal(isinstance(df_export, pd.DataFrame), True)
    assert_equal(
        set(df_export.columns.tolist()),
        set(export_columns)
    )
    assert_equal(len(df_export), 0)
    shutil.copy('trips_test.csv', os.path.join(temp_dir, 'trips.csv'))
    shutil.copy('drivers_test.csv', os.path.join(temp_dir, 'drivers.csv'))
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    df_export = sakay_db.export_data()
    assert_equal(isinstance(df_export, pd.DataFrame), True)
    assert_equal(
        set(df_export.columns.tolist()),
        set(export_columns)
    )
    assert_equal(
        df_export.loc[df_export.index[:2], export_columns].to_numpy().tolist(),
        [['Damian', 'Dailisan', '08:13:00,15-05-2022', '08:46:00,15-05-2022', 2, 'UP Campus', 'Legazpi Village', 17.6, 412],
         ['Michael', 'Dorosan', '14:13:00,31-12-2022', '14:46:00,31-12-2022', 1, 'Fairview', 'Highway Hills', 15.1, 371]]
    )

### Generating statistics

In [None]:
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    assert_raises(SakayDBError, lambda: sakay_db.generate_statistics('Trips'))
    assert_equal(sakay_db.generate_statistics('trip'), {})
    assert_equal(sakay_db.generate_statistics('passenger'), {})
    assert_equal(sakay_db.generate_statistics('driver'), {})
    assert_equal(
        sakay_db.generate_statistics('all'),
        {'trip': {}, 'passenger': {}, 'driver': {}}
    )
    shutil.copy('trips_test2.csv', os.path.join(temp_dir, 'trips.csv'))
    shutil.copy('drivers_test2.csv',
                os.path.join(temp_dir, 'drivers.csv'))
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    stats_trip = sakay_db.generate_statistics('trip')
    assert_equal(len(stats_trip), 7)
    assert_almost_equal(stats_trip['Friday'], 41.794117647058826)
    assert_almost_equal(stats_trip['Sunday'], 41.714285714285715)

In [None]:
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    shutil.copy('trips_test2.csv', os.path.join(temp_dir, 'trips.csv'))
    shutil.copy('drivers_test2.csv',
                os.path.join(temp_dir, 'drivers.csv'))
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    stats_passenger = sakay_db.generate_statistics('passenger')
    assert_equal(len(stats_passenger), 4)
    assert_equal(len(stats_passenger[0]), 7)
    assert_almost_equal(
        stats_passenger[0]['Monday'],
        9.371428571428572
    )
    stats_driver = sakay_db.generate_statistics('driver')

    assert_equal(len(stats_driver), 200)
    assert_almost_equal(
        stats_driver['Dome, Benyamin']['Saturday'],
        1
    )
    stats_all = sakay_db.generate_statistics('all')
    assert_equal(set(stats_all.keys()), {'trip', 'passenger', 'driver'})
    assert_equal(len(stats_all['trip']), 7)
    assert_almost_equal(stats_all['trip']['Tuesday'], 39.74285714285714)
    assert_equal(len(stats_all['passenger']), 4)
    assert_almost_equal(stats_all['passenger'][0]['Monday'],
                        9.371428571428572)
    assert_almost_equal(
        stats_all['driver']['Dome, Benyamin']['Saturday'],
        1
    )
    assert_equal(len(stats_all['driver']), 200)

### Plotting statistics

In [None]:
from matplotlib.legend import Legend
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    shutil.copy('trips_test2.csv', os.path.join(temp_dir, 'trips.csv'))
    shutil.copy('drivers_test2.csv',
                os.path.join(temp_dir, 'drivers.csv'))
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    
    # trip tests
    ax_trip = sakay_db.plot_statistics('trip')
    ax_trip.get_figure().canvas.draw()
    
    assert_equal(ax_trip.get_title(), 'Average trips per day')
    assert_equal(
        [t.get_text() for t in ax_trip.get_xticklabels()],
        ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 
         'Saturday', 'Sunday']
    )
    assert_equal(
        [t.get_text() for t in ax_trip.get_yticklabels()],
        ['0', '5', '10', '15', '20', '25', '30', '35', '40', '45']
    )
    assert_equal(ax_trip.figure.get_size_inches(), (12, 8))
    assert_equal(ax_trip.get_ylabel(), 'Ave Trips')
    assert_equal(ax_trip.get_xlabel(), 'Day of week')

    
    # passenger tests
    ax_passenger = sakay_db.plot_statistics('passenger')
    ax_passenger.get_figure().canvas.draw()
    
    assert_equal(ax_passenger.get_ylabel(), 'Ave Trips')
    assert_equal(ax_passenger.get_xlabel(), 'Day of week')
    assert_equal(
        [t for t in ax_passenger.get_yticks()],
        [9.0, 9.25, 9.5, 9.75, 10.0, 10.25, 10.5, 10.75, 11.0, 
         11.25, 11.5]
    )
    
    assert_equal(
        ax_passenger.get_legend_handles_labels()[1],
        ['0', '1', '2', '3']
    )
    assert_equal(ax_passenger.lines[0].get_label(), '0')
    assert_equal(ax_passenger.lines[0].get_marker(), 'o')
    assert_equal(ax_passenger.lines[0].get_ls(), '-')


    # driver tests
    ax_driver = sakay_db.plot_statistics('driver')
    axes = ax_driver.get_axes()
    
    assert_equal(ax_driver.get_size_inches(), (8.0, 25.0))
    assert_equal(
        axes[0].get_position().get_points().tolist(), 
        [[0.125, 0.7879268292682927], [0.9, 0.88]]
    )
    assert_equal(axes[0].get_legend_handles_labels()[1][0], 'Monday')
    assert_equal(axes[6].get_legend_handles_labels()[1][0], 'Sunday')
    assert_equal(axes[6].get_title(), '')
    assert_equal(axes[0].get_xticks(), axes[6].get_xticks())
    

### Generate OD matrix

In [None]:
with TemporaryDirectory() as temp_dir:
    sakay_db = SakayDB(temp_dir)
    assert_equal(sakay_db.generate_odmatrix().to_numpy().tolist(), [])
    shutil.copy('trips_test2.csv', os.path.join(temp_dir, 'trips.csv'))
    shutil.copy('drivers_test2.csv',
                os.path.join(temp_dir, 'drivers.csv'))
    shutil.copy('locations.csv', os.path.join(temp_dir, 'locations.csv'))
    od_df = sakay_db.generate_odmatrix()
    assert_equal(od_df.shape, (48, 48))
    assert_equal(od_df.loc['Macpherson', 'UP Campus'], 1.25)
    assert_equal(od_df.iloc[-2, -2], 0)
    assert_equal(sakay_db
                   .generate_odmatrix(date_range=('00:00:00,1-08-2022', '23:59:59,31-08-2022'))
                   .shape, (48, 48))