# Introduction

Fleet Fuel Efficiency Analysis:  

This project aims to analyze and optimize the fuel efficiency of a fleet of vehicles using telemetry data. The dataset includes telemetry information such as vehicle speed, GPS coordinates, idling times, and fuel consumption. The goal of this analysis is to identify driving patterns that contribute to fuel consumption inefficiency and suggest possible optimizations.

The analysis will be divided into several key steps:

1. **Exploratory Data Analysis (EDA)**: We will load the dataset and perform basic data exploration to understand its structure, check for missing values, and identify any patterns.
2. **Data Cleaning**: We will clean the data by addressing missing values, correcting data types, and removing any unnecessary columns.
3. **Data Visualization**: Various visualizations will be created to highlight important insights related to fuel efficiency and driving patterns.
4. **Statistical Analysis**: We may also perform some statistical tests to identify correlations between different variables in the dataset.

The outcome of this analysis will help optimize fleet operations, reduce fuel consumption, and potentially improve overall operational efficiency.

Let's begin by loading and inspecting the data!


## Step 1: Load data and check structure


In [2]:
import pandas as pd
df = pd.read_csv("../data/fleet_telemetry_dataset.csv")
df.head()


Unnamed: 0,trip_id,vehicle_id,driver_id,route_type,start_time,end_time,distance_km,avg_speed_kmph,idle_time_min,fuel_consumed_liters,trip_day_of_week
0,T00000,V039,D079,rural,2023-12-04 04:26:00,2023-12-04 06:04:00,110.11,73.16,8.5,26.57,Monday
1,T00001,V029,D036,urban,2023-07-09 05:01:00,2023-07-09 07:08:00,84.68,47.35,20.2,21.54,Sunday
2,T00002,V015,D055,rural,2023-10-29 00:03:00,2023-10-29 02:00:00,114.06,78.89,30.8,28.09,Sunday
3,T00003,V043,D064,mixed,2023-07-09 21:04:00,2023-07-09 22:24:00,83.83,77.04,15.1,19.03,Sunday
4,T00004,V008,D091,urban,2023-07-29 05:16:00,2023-07-29 06:20:00,69.27,80.43,12.5,21.26,Saturday


In [5]:
df.tail()

Unnamed: 0,trip_id,vehicle_id,driver_id,route_type,start_time,end_time,distance_km,avg_speed_kmph,idle_time_min,fuel_consumed_liters,trip_day_of_week
7495,T07495,V015,D062,mixed,2023-11-30 08:21:00,2023-11-30 09:21:00,41.26,62.29,21.1,9.22,Thursday
7496,T07496,V044,D038,urban,2023-03-28 23:42:00,2023-03-29 01:30:00,77.44,60.0,30.8,21.66,Tuesday
7497,T07497,V027,D026,urban,2023-05-06 03:01:00,2023-05-06 03:27:00,8.83,53.0,16.1,3.54,Saturday
7498,T07498,V026,D024,urban,2023-01-29 18:49:00,2023-01-29 20:50:00,104.96,68.13,29.0,25.83,Sunday
7499,T07499,V033,D039,urban,2023-11-25 00:47:00,2023-11-25 03:24:00,148.75,66.36,23.2,35.97,Saturday


In [3]:
df.describe()

Unnamed: 0,distance_km,avg_speed_kmph,idle_time_min,fuel_consumed_liters
count,7500.0,7500.0,7500.0,7500.0
mean,99.882167,65.045955,15.597653,23.768443
std,29.207303,10.135071,8.959436,6.706084
min,-4.69,29.07,0.0,-0.19
25%,79.9175,58.14,8.7,19.1375
50%,99.65,65.055,15.0,23.81
75%,119.66,71.95,21.7,28.32
max,217.08,104.39,50.2,53.07


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7500 entries, 0 to 7499
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   trip_id               7500 non-null   object 
 1   vehicle_id            7500 non-null   object 
 2   driver_id             7500 non-null   object 
 3   route_type            7500 non-null   object 
 4   start_time            7500 non-null   object 
 5   end_time              7500 non-null   object 
 6   distance_km           7500 non-null   float64
 7   avg_speed_kmph        7500 non-null   float64
 8   idle_time_min         7500 non-null   float64
 9   fuel_consumed_liters  7500 non-null   float64
 10  trip_day_of_week      7500 non-null   object 
dtypes: float64(4), object(7)
memory usage: 644.7+ KB
