# Python Catch-up Exam (DU, 08/06/2020): Car consumtion

> + **Due time** 15/06/2020
> + **Send your final notebook** at [romain.madar@cern.ch](mailto:romain.madar@cern.ch)
> 
> The final mark will be a number between 0 (very bad) and 20 (very good). The evaluation of this work takes into account mainly the correctness of the answers, but also the clarity of the explanations and the quality of the code.

## General informations


### A bit of context

Car consumption is a key element from many points of view (ecology, cost, fuel choice for a new car, ...). Most of the time, consumptions annouced by car constructors are not reflecting consumptions observed in real use-cases. The goal of this work is to analyze 280 car travels in order to estimate the real car consumption of the studied case, and understand how to possibly optimize it. The first part focuses on the analysis of the travels (distance, duration, etc...) while the second part focuses on the consumption.


### Data description

The data associated to the 280 car travels to be analyzed are provided *via* a `csv` file containings 7 informatios for each travel:
  1. `distance_Km`: travel distance [unit: kilometer],
  1. `conso_L100km`: averaged consomation of the travel [unit: Liter / 100 km]
  1. `duration_H`: duration of the travel [unit: hours, *e.g.* 0.3 means $60 \times 0.3 = 18\,$min]
  1. `price_EuroL`: price of the fuel for this travel [unit: Euros / Liter]
  1. `dayTime_H`: hour of the day at which the travel was done [number between 0 and 23)]
  1. `weekDay`: day of the week at which the travel was done [number between 1 (monday) and 7 (sunday)]
  1. `yearMonth`: month of the year at which the travel was done [number between 1 (january) and 12 (december)]

### Import packages

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Import data

In [2]:
df = pd.read_csv('CarData.csv')
df.head()

Unnamed: 0,distance_Km,conso_L100km,duration_H,price_EuroL,dayTime_H,weekDay,yearMonth
0,15.5,5.6,0.55,1.34,7,1,2
1,12.4,4.5,0.216667,1.45,19,7,2
2,4.5,4.0,0.133333,1.45,18,7,2
3,0.4,11.7,0.033333,1.45,18,7,2
4,0.5,10.6,0.066667,1.45,11,7,2


## Part I : travel analysis (10 pts)

### 1. Average values (2 pts + 2 bonus pts)

 - what is the average traveled distance (in km)?

 - what is the average travel duration (in minutes)?

 - *bonus:* are these averages representative of the sample ? A quantitative criteria and a discussion are expected.

### 2. Distributions (6 pts)

 - plot the histogram of the traveled distances

 - plot the histogram of the travel durations

 - compute the distance $D$ which satisfies $f_{\text{travels}}(d<D) = 90\%$ (with a precision of 1\%), where $f_{\text{travels}}(d<D)$ is the fraction of travels having a distance below $D$. Improve the histogram of distances using this information.

 - compute the averaged speed for *each travel* (in km/h) and plot the corresponding distribution. Add this information to the original dataframe

### 3. Is there a favored time to travel? (2 pts)

Are there some periods of the day (*e.g.* morning or afternoon), some days of the week (*e.g.* sunday or monday), having more travels?

## Part II: consumption analysis (10 pts)

### 1. Typical consumption values (2 pts)

What are the minimum, mean and maximum consumptions?

### 2.  Distributions (8 pts)

 - Plot the distribution of the consumption.

 - How this distribution changes when a sub-set of travels (*e.g.* long, short, fast slow, ...) is selected?

 - According to you, is the maximum consumption found above representative?

 + Can you isolate types of travels having a generally lower consumption?