# Car Fuel Emissions Dataset Dashboard

> This dashboard helps you explore the $CO_2$ emissions of different car models and their fuel consumption.

<b>[Data](https://open.canada.ca/data/en/dataset/98f1a129-f628-4ce4-b24d-6f16bf24dd64) Updated as of June 15, 2023</b>

In [1]:
import ipywidgets as widgets
from ipywidgets import interact

import matplotlib.pyplot as plt
import seaborn as sns

from IPython import get_ipython

from sql.ggplot import ggplot, aes, geom_boxplot, geom_histogram

import numpy as np

In [2]:
%load_ext sql

%sql duckdb:///../data/database/car_data.duckdb

%config SqlMagic.displaycon = False

In [3]:
years = %sql select DISTINCT(model_year) from all_vehicles
years = [model_year[0] for model_year in years]

makes = %sql select DISTINCT(make_) from all_vehicles
makes = [m[0] for m in makes]

classes = %sql select DISTINCT(vehicleclass_) from all_vehicles
classes = [c[0] for c in classes]

co2 = %sql select DISTINCT(co2_rating) from all_vehicles
co2 = [c[0] for c in co2]

In [4]:
def init_widgets():
    """Initialize widgets"""
    widget_year = widgets.SelectMultiple(
        options=years,
        description="Model Year",
        value=years,
    )

    widget_make = widgets.SelectMultiple(
        options=makes,
        description="Car Brand",
        value=makes,
    )

    widget_vehicle_class = widgets.SelectMultiple(
        options=classes,
        description="Vehicle Class (Size)",
        value=classes,
        style={"description_width": "initial"},
    )

    widget_co2 = widgets.IntSlider(
        value=5,
        min=0,
        max=10,
        step=1,
        description="CO2 Rating >=",
        disabled=False,
        style={"description_width": "initial"},
    )

    widget_row = widgets.IntSlider(
        value=5,
        min=0,
        max=10,
        step=1,
        description="Rows to Show",
        disabled=False,
        style={"description_width": "initial"},
    )
    return (
        widget_year,
        widget_make,
        widget_vehicle_class,
        widget_co2,
        widget_row,
    )  # noqa E501

In [5]:
ip = get_ipython()
sql_magic = ip.find_cell_magic("sql")

## Visualizing Interactive Tables

> Multiple values can be selected with shift and/or ctrl (or command) pressed and mouse clicks or arrow keys.

### Fuel Cars Only

In [6]:
(
    widget_year,
    widget_make,
    widget_vehicle_class,
    widget_co2,
    widget_row,
) = init_widgets()  # noqa E501

In [7]:
%%sql --interact widget_year --interact widget_make --interact widget_vehicle_class --interact widget_co2 --interact widget_row
SELECT * 
FROM fuel 
WHERE model_year IN {{widget_year}}
AND make_ IN {{widget_make}}
AND vehicleclass_ IN {{widget_vehicle_class}}
AND co2_rating >= {{widget_co2}}
LIMIT {{widget_row}}

interactive(children=(SelectMultiple(description='Model Year', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…

### Hybrid Cars Only

In [8]:
(
    widget_year,
    widget_make,
    widget_vehicle_class,
    widget_co2,
    widget_row,
) = init_widgets()  # noqa E501

In [9]:
%%sql --interact widget_year --interact widget_make --interact widget_vehicle_class --interact widget_co2 --interact widget_row
SELECT * 
FROM hybrid 
WHERE model_year IN {{widget_year}}
AND make_ IN {{widget_make}}
AND vehicleclass_ IN {{widget_vehicle_class}}
AND co2_rating >= {{widget_co2}}
LIMIT {{widget_row}}

interactive(children=(SelectMultiple(description='Model Year', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…

### Electric Cars Only

In [10]:
(
    widget_year,
    widget_make,
    widget_vehicle_class,
    widget_co2,
    widget_row,
) = init_widgets()  # noqa E501

In [11]:
%%sql --interact widget_year --interact widget_make --interact widget_vehicle_class --interact widget_row
SELECT * 
FROM electric 
WHERE model_year IN {{widget_year}}
AND make_ IN {{widget_make}}
AND vehicleclass_ IN {{widget_vehicle_class}}
LIMIT {{widget_row}}

interactive(children=(SelectMultiple(description='Model Year', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…

### All Cars

In [12]:
(
    widget_year,
    widget_make,
    widget_vehicle_class,
    widget_co2,
    widget_row,
) = init_widgets()  # noqa E501

In [13]:
%%sql --interact widget_year --interact widget_make --interact widget_vehicle_class --interact widget_co2 --interact widget_row
SELECT * 
FROM all_vehicles 
WHERE model_year IN {{widget_year}}
AND make_ IN {{widget_make}}
AND vehicleclass_ IN {{widget_vehicle_class}}
AND co2_rating >= {{widget_co2}}
LIMIT {{widget_row}}

interactive(children=(SelectMultiple(description='Model Year', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…

## Plots

This dashboard contains 5 plots, created using the `ggplot` API, `seaborn`, and `ipywidgets`:

1. Bar Plot of Car Manufacturing Trends
2. Boxplot of Fuel Consumption and $CO2$ Emission for All Vehicle Types 
3. Scatter Plot of Electric Vehicle Ranges and Charging Time by Car Size and Model Year 
4. Histogram of $CO_2$ Emissions by Vehicle and Fuel Type
5. $CO_2$ Emissions of Hybrid and Fuel-Only US Car Brands by Transmission Type

## Bar Plot of Car Manufacturing Trends

In [14]:
%%sql --save q_1_hybrid_electric --no-execute
SELECT DISTINCT model_year, vehicle_type, COUNT(id) AS num_vehicles
FROM all_vehicles
WHERE vehicle_type = 'hybrid' OR vehicle_type = 'electric'
GROUP BY model_year, vehicle_type
ORDER BY num_vehicles DESC;

In [15]:
%%sql --save q_1_fuel --no-execute
SELECT DISTINCT model_year, vehicle_type, COUNT(id) AS num_vehicles
FROM fuel
GROUP BY model_year, vehicle_type
ORDER BY model_year;

In [16]:
hybrid_electric_count = %sql SELECT * FROM q_1_hybrid_electric
fuel_count = %sql SELECT * FROM q_1_fuel

hybrid_electric_count = hybrid_electric_count.DataFrame()
fuel_count = fuel_count.DataFrame()

In [17]:
radio_button = widgets.RadioButtons(
    options=["fuel_count", "hybrid_electric_count"],
    description="Select Data:",
    disabled=False,
    style={"description_width": "initial"},
)


def draw_bar_year_count(data):
    plt.figure(figsize=(10, 5), dpi=300)
    if data == "fuel_count":
        sns.barplot(
            data=fuel_count,
            x="model_year",
            y="num_vehicles",
            color="orange",
            errorbar=None,
            width=0.4,
        )
        sns.pointplot(
            data=fuel_count,
            x="model_year",
            y="num_vehicles",
            color="red",
            linestyles="--",
            ax=plt.gca(),
            errorbar=None,
        )
        plt.xlabel("Car Model Year")
        plt.ylabel("Count")
        plt.xticks(rotation=45)
        plt.title("Count of Unique Fuel-Only Cars by Model Year")
        plt.show()
    else:
        sns.barplot(
            data=hybrid_electric_count,
            x="model_year",
            y="num_vehicles",
            hue="vehicle_type",
            palette={"hybrid": "blue", "electric": "green"},
            width=0.4,
        )
        sns.pointplot(
            data=hybrid_electric_count,
            x="model_year",
            y="num_vehicles",
            color="red",
            linestyles="--",
            ax=plt.gca(),
            errorbar=None,
        )
        plt.xlabel("Car Model Year")
        plt.ylabel("Count")
        plt.xticks(rotation=45)
        plt.title("Count of Unique Hybrid and Electric Cars by Model Year")
        plt.legend(bbox_to_anchor=(1, 1), loc="upper right")
        plt.show()


interact(draw_bar_year_count, data=radio_button)

interactive(children=(RadioButtons(description='Select Data:', options=('fuel_count', 'hybrid_electric_count')…

<function __main__.draw_bar_year_count(data)>

### Interesting Insights

From the bar plot of fuel-only cars, we can see that the <b>number of unique car brand models</b> introduced to the Canadian automobile market had been increasing from the turn of the 21st century to the year 2005. This increasing trend then plateued and remained fairly constant until 2022, with 2015 experiencing the largest spike. On December 21, 2022, Steven Guilbeault, Canada's minister of environment and climate change, unveiled a regulation that would require increasing percentages of vehicle sales in Canada to be zero-emissions vehicles up to 100% by the year 2035[$^1$](https://www.canada.ca/en/environment-climate-change/news/2022/12/let-it-roll-government-of-canada-moves-to-increase-the-supply-of-electric-vehicles-for-canadians.html). These efforts seem to have had an immediate impact on the number of fuel-only cars introduced to the Canadian market, with 2023 experiencing a sharp decline and reaching 2003 levels. 

The above insights are reinforced by the bar plot of the number of unique hybrid and electric car brands and their respective models introduced to the Canadian automobile market. In 2012, only two electric car models, Nissan's Leaf and Mitsubishi's i-MiEV, and one hybrid car manufacturer, Chevrolet's Volt, were present in the market. Since then, this figure has grown to 134 electric car models and 32 hybrid car models in 2023 in Canada.

## Boxplot of Fuel Consumption and $CO2$ Emission for All Vehicle Types

In [18]:
%%sql --save boxplot_fuel_consum --no-execute
SELECT fuelconsumption_city_l_100km, fuelconsumption_hwy_l_100km, fuelconsumption_comb_l_100km, co2emissions_g_km
FROM all_vehicles

In [19]:
columns = widgets.SelectMultiple(
    options=[
        "fuelconsumption_city_l_100km",
        "fuelconsumption_hwy_l_100km",
        "fuelconsumption_comb_l_100km",
        "co2emissions_g_km",
    ],
    value=["fuelconsumption_comb_l_100km"],
    description="Column(s):",
    disabled=False,
)

In [20]:
plt.rcParams["figure.figsize"] = (12, 3)  # increase size of canvas


def plot(columns):
    (
        ggplot(
            table="boxplot_fuel_consum",
            with_="boxplot_fuel_consum",
            mapping=aes(x=columns),
        )
        + geom_boxplot()
    )


interact(plot, columns=columns)

interactive(children=(SelectMultiple(description='Column(s):', index=(2,), options=('fuelconsumption_city_l_10…

<function __main__.plot(columns)>

### Interesting Insights

The boxplots of fuel consumption, measured in litres per 100 kilometers, above show the distribution of fuel consumption in the city, highway, or as their combination for all types of cars. The median fuel consumption in the city for all cars is around 12 litres per 100 kilometers, while the median fuel consumption on the highway for all cars is around 10 litres per 100 kilometers. The combined fuel consumption for all cars is the vehicle's city's and highway's average fuel consumption, which is around 11 litres per 100 kilometers.

Fuel consumption and $CO2$ emissions have a strong, positive relationship. The higher the fuel consumption, the higher the $CO2$ emissions. The boxplot of $CO2$ emissions, measured in grams per kilometer, above shows the distribution of $CO2$ emissions for all types of cars. The median $CO2$ emission for all cars is around 250 grams per kilometer. Moreover, this column has outliers on either side of the boxplot, implying that electric cars have zero $CO2$ emissions and fuel-only luxury sports cars have very high $CO2$ emissions.

## Scatter Plot of Electric Vehicle Ranges and Charging Time by Car Size and Model Year 

In [21]:
%%sql --save q_2_electric_range --no-execute
SELECT range1_km, recharge_time_h, vehicleclass_, model_year
FROM electric

In [22]:
electric_range = %sql SELECT * FROM q_2_electric_range

electric_range = electric_range.DataFrame()

# group vehicle class into sedan or SUV

electric_range["vehicle_size"] = np.where(
    electric_range["vehicleclass_"].isin(
        ["subcompact", "compact", "mid-size", "full-size", "two-seater"]
    ),
    "Sedan or smaller",
    "SUV or larger",
)

# group model year into 2012-2021 and 2022-2023

electric_range["model_year_grouped"] = np.where(
    electric_range["model_year"] <= 2021, "2012-2021", "2022-2023"
)

In [23]:
hue_button = widgets.Dropdown(
    options=["vehicle_size", "model_year_grouped", None],
    description="(Un)select Hue:",
    disabled=False,
    style={"description_width": "initial"},
)


def draw_scatter_electric_range(hue):
    plt.figure(figsize=(10, 5), dpi=300)
    sns.scatterplot(
        data=electric_range, x="recharge_time_h", y="range1_km", hue=hue
    )  # noqa E501
    plt.title(
        f"Scatter Plot of Electric Vehicle Range and Recharge Time by {hue}"
    )  # noqa E501
    plt.xlabel("Recharge Time (hrs)")
    plt.ylabel("Range (km)")
    plt.show()


interact(draw_scatter_electric_range, hue=hue_button)

interactive(children=(Dropdown(description='(Un)select Hue:', options=('vehicle_size', 'model_year_grouped', N…

<function __main__.draw_scatter_electric_range(hue)>

### Interesting Insights

The above scatterplot helps us compare the ranges and charging times of electric cars by their size or model year. Although one could deduce that higher recharge times (depending on the car's battery size, quality, etc.) would lead to travelling greater ranges, the graph offers more details that are worth exploring. For example, electric cars manufactured recently (2022 and onwards) have a much higher range, on average, than those manufactured between 2012 and 2021. This is likely due to the advancements in battery technology and the increased demand for electric cars. Moreover, some electric cars recently manufactured provide a better range with 10 hours of recharge time than those manufactured previously with 12 hours of recharge time. Furthermore, some new electric cars with recharge times of 10 hours provide as good a range as both new and older electric cars with recharge times greater than 10 hours (13 hours being the outlier). Maybe 10 hours is the sweet spot for recharge time?

If we shift our focus to vehicle size, there are more electric sedans (and smaller) than there are SUV's (and larger) for lower recharge times between 4 to 7 hours and this is expected due to the difference in car sizes. Sedans, on average, also seem to provide greater ranges than SUV's for recharge times greater than 7 hours. However, for recharge times less than 7 hours, SUV's provide greater ranges than sedans. This could be due to the fact that SUV's have larger batteries and, therefore, can travel greater ranges with less recharge time. Moreover, some sedans with 10 hours of recharge time provide better ranges than all SUV's do with greater than 10 hours of recharge time!

Therefore, consumers have a wide range of options to choose from when it comes to electric cars! Choosing wisely by assessing the tradeoff between recharge time and range is key and this graph helps us do just that.

## Histogram of $CO_2$ Emissions by Vehicle and Fuel Type

In [24]:
%%sql --save hist_co2 --no-execute
SELECT vehicle_type, mapped_fuel_type, co2emissions_g_km	
FROM all_vehicles
WHERE co2emissions_g_km is not null 

In [25]:
b = widgets.IntSlider(
    value=10,
    min=1,
    max=20,
    step=1,
    description="Bins:",
    orientation="horizontal",
)
cmap = widgets.Dropdown(
    options=["viridis", "plasma", "inferno", "magma", "cividis"],
    value="plasma",
    description="Colormap:",
    disabled=False,
)
fill = widgets.RadioButtons(
    options=["vehicle_type", "mapped_fuel_type"],
    description="Fill by:",
    disabled=False,
)

In [26]:
def plot(b, cmap, fill):
    (
        ggplot(
            table="hist_co2",
            with_="hist_co2",
            mapping=aes(x="co2emissions_g_km"),
        )  # noqa E501
        + geom_histogram(bins=b, fill=fill, cmap=cmap)
    )


interact(plot, b=b, cmap=cmap, fill=fill)

interactive(children=(IntSlider(value=10, description='Bins:', max=20, min=1), Dropdown(description='Colormap:…

<function __main__.plot(b, cmap, fill)>

### Interesting Insights

The histogram above represents the distribution of $CO_2$ emissions, measured in grams per kilometer. If we select the `fill` attribute to `vehicle_type`, we obtain a clear view that fuel-only cars emit the most $CO_2$. In fact, they can pollute up to 6x more than hybrid cars! Hybrid cars have both an electric motor and a gasoline engine, which allows them to emit less $CO_2$ than fuel-only cars. The range of $CO_2$ emitted from hybrid vehicles ranges between 10 to 80 grams per kilometer, while the distribution of $CO_2$ emissions for fuel-only cars ranges from 100 to 500 grams per kilometer, with the bulk of vehicles emitting between 200 to 300 grams per kilometer. Electric cars have zero carbon dixoide emissions and are, hence, fittingly also known as zero-emission vehicles.

Given these findings, the efforts of the Canadian government to increase the supply of electric vehicles in Canada by 2035[$^2$](https://www.canada.ca/en/environment-climate-change/news/2022/12/let-it-roll-government-of-canada-moves-to-increase-the-supply-of-electric-vehicles-for-canadians.html) will likely have a positive impact on the environment. 

Selecting the `fill` attribute to `mapped_fuel_type` and adjusting the histogram to 12 bins allows us to see that the majority of vehicles in Canada run on gasoline, premium being more harmful to the environment than regular as it is the only fuel type that emits greater than 450 grams per kilometer in some cars. However, since most cars run on regular gasoline, the area occupied for it in the histogram is greater. Diesel and Ethanol (E85) are slightly cleaner than gasoline as their emissions range from 150 to 400 grams per kilometer with the bulk of vehicles emitting between 200 to 300 grams per kilometer (similar to both gasoline types). 

## $CO_2$ Emissions of Hybrid and Fuel-Only US Car Brands by Transmission Type

In [27]:
%%sql --save co2_usa --no-execute
SELECT vehicle_type, make_, co2emissions_g_km, transmission_type
FROM all_vehicles
WHERE co2emissions_g_km is not null AND
vehicle_type IN ('fuel-only', 'hybrid') AND
make_ IN ('cadillac', 'chevrolet', 'chrysler', 'ford', 'jeep', 'lincoln')

In [28]:
co2_usa = %sql SELECT * FROM co2_usa
co2_usa = co2_usa.DataFrame()



In [29]:
hue_button = widgets.Dropdown(
    options=["vehicle_type", "transmission_type", None],
    description="(Un)select Hue:",
    disabled=False,
    style={"description_width": "initial"},
)


def draw_boxplot_usa(hue):
    plt.figure(figsize=(15, 6), dpi=300)
    sns.boxplot(data=co2_usa, x="make_", y="co2emissions_g_km", hue=hue)
    plt.xticks(rotation=90)
    plt.xlabel("Car Make")
    plt.ylabel("CO2 Emissions (g/km)")
    plt.title("CO2 Emissions (g/km) by Gas and Hybrid Run US Car Brands")
    plt.show()


interact(draw_boxplot_usa, hue=hue_button)

interactive(children=(Dropdown(description='(Un)select Hue:', options=('vehicle_type', 'transmission_type', No…

<function __main__.draw_boxplot_usa(hue)>

### Interesting Insights

The boxplots above show the distribution of $CO_2$ emissions for hybrid and fuel-only US manufactured cars. Viewing the boxplot at its highest level i.e without a `hue`, suggests that Chrysler has the lowest median $CO_2$ emission, at around 250 grams per kilometer, out of all car brands. Chevrolet, on the other hand, has the highest median $CO_2$ emission, at around 300 grams per kilometer, out of all car brands. Chrysler also has also the lowest interquartile range, which could imply that the $CO_2$ emissions of its cars are more consistent than those of other car brands. 

However, upon selecting `hue` as `vehicle_type`, we see that Chevrolet's hybrid cars have the lowest median $CO_2$ emission out of all hybrid car brands. Yet, its fuel-only cars pollute the most on average. Jeep's hybrid cars pollute the most, on average, out of all US hybrid car brands, while its fuel-only cars' median $CO_2$ emissions are at par with that of Chrysler's, the cleanest fuel-only US brand.

Lastly, the boxplot of $CO_2$ emissions for hybrid and fuel-only US manufactured cars by transmission type portrays that continuously variable transmission cars pollute the least out of the other available transmissions. These cars would likely correspond to the hybrid cars of the US brands, which are the cleanest out of all hybrid cars. Another interesting observation is that all brands, apart from Chrysler, have lower median $CO_2$ emissions for manual transmission cars than for automatic transmission cars. In fact, the Environmental Protection Agency (EPA) found that vehicles with a manual transmission were more efficient than their automatic counterparts through about 2010, but modern automatic transmissions are now more efficient [$^3$](https://www.epa.gov/sites/default/files/2021-01/documents/420r21003.pdf). Only Ford has an automated manual transmission available for its cars, which has a significantly wide distribution for $CO_2$ emissions, similar to Cadillac's continuously variable transmissions cars, but a median $CO_2$ emission that is lower than that of its automatic transmission cars.

## References

Canada, Service. “Government of Canada.” Service Canada, n.d. https://www.canada.ca/. 

The 2020 EPA Automotive Trends Report: Greenhouse gas emissions, fuel ..., n.d. https://www.epa.gov/sites/default/files/2021-01/documents/420r21003.pdf. 