# Car Fuel Emissions Dataset Dashboard

> This dashboard helps you explore the $CO_2$ emissions of different car models and their fuel consumption.

<b>[Data](https://open.canada.ca/data/en/dataset/98f1a129-f628-4ce4-b24d-6f16bf24dd64) Updated as of June 15, 2023</b>

In [1]:
import ipywidgets as widgets
from ipywidgets import interact

import matplotlib.pyplot as plt
import seaborn as sns

from IPython import get_ipython

from sql.ggplot import ggplot, aes, geom_boxplot, geom_histogram

import numpy as np

In [2]:
%load_ext sql

%sql duckdb:///../data/database/car_data.duckdb

%config SqlMagic.displaycon = False

In [3]:
years = %sql select DISTINCT(model_year) from all_vehicles
years = [model_year[0] for model_year in years]

makes = %sql select DISTINCT(make_) from all_vehicles
makes = [m[0] for m in makes]

classes = %sql select DISTINCT(vehicleclass_) from all_vehicles
classes = [c[0] for c in classes]

co2 = %sql select DISTINCT(co2_rating) from all_vehicles
co2 = [c[0] for c in co2]

In [4]:
def init_widgets():
    """Initialize widgets"""
    widget_year = widgets.SelectMultiple(
        options=years,
        description="Model Year",
        value=years,
    )

    widget_make = widgets.SelectMultiple(
        options=makes,
        description="Car Brand",
        value=makes,
    )

    widget_vehicle_class = widgets.SelectMultiple(
        options=classes,
        description="Vehicle Class (Size)",
        value=classes,
        style={"description_width": "initial"},
    )

    widget_co2 = widgets.IntSlider(
        value=5,
        min=0,
        max=10,
        step=1,
        description="CO2 Rating >=",
        disabled=False,
        style={"description_width": "initial"},
    )

    widget_row = widgets.IntSlider(
        value=5,
        min=0,
        max=10,
        step=1,
        description="Rows to Show",
        disabled=False,
        style={"description_width": "initial"},
    )
    return (
        widget_year,
        widget_make,
        widget_vehicle_class,
        widget_co2,
        widget_row,
    )  # noqa E501

In [5]:
ip = get_ipython()
sql_magic = ip.find_cell_magic("sql")

## Visualizing Interactive Tables

> Multiple values can be selected with shift and/or ctrl (or command) pressed and mouse clicks or arrow keys.

### Fuel Cars Only

In [6]:
(
    widget_year,
    widget_make,
    widget_vehicle_class,
    widget_co2,
    widget_row,
) = init_widgets()  # noqa E501

In [7]:
%%sql --interact widget_year --interact widget_make --interact widget_vehicle_class --interact widget_co2 --interact widget_row
SELECT * 
FROM fuel 
WHERE model_year IN {{widget_year}}
AND make_ IN {{widget_make}}
AND vehicleclass_ IN {{widget_vehicle_class}}
AND co2_rating >= {{widget_co2}}
LIMIT {{widget_row}}

interactive(children=(SelectMultiple(description='Model Year', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…

### Hybrid Cars Only

In [8]:
(
    widget_year,
    widget_make,
    widget_vehicle_class,
    widget_co2,
    widget_row,
) = init_widgets()  # noqa E501

In [9]:
%%sql --interact widget_year --interact widget_make --interact widget_vehicle_class --interact widget_co2 --interact widget_row
SELECT * 
FROM hybrid 
WHERE model_year IN {{widget_year}}
AND make_ IN {{widget_make}}
AND vehicleclass_ IN {{widget_vehicle_class}}
AND co2_rating >= {{widget_co2}}
LIMIT {{widget_row}}

interactive(children=(SelectMultiple(description='Model Year', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…

### Electric Cars Only

In [10]:
(
    widget_year,
    widget_make,
    widget_vehicle_class,
    widget_co2,
    widget_row,
) = init_widgets()  # noqa E501

In [11]:
%%sql --interact widget_year --interact widget_make --interact widget_vehicle_class --interact widget_row
SELECT * 
FROM electric 
WHERE model_year IN {{widget_year}}
AND make_ IN {{widget_make}}
AND vehicleclass_ IN {{widget_vehicle_class}}
LIMIT {{widget_row}}

interactive(children=(SelectMultiple(description='Model Year', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…

### All Cars

In [12]:
(
    widget_year,
    widget_make,
    widget_vehicle_class,
    widget_co2,
    widget_row,
) = init_widgets()  # noqa E501

In [13]:
%%sql --interact widget_year --interact widget_make --interact widget_vehicle_class --interact widget_co2 --interact widget_row
SELECT * 
FROM all_vehicles 
WHERE model_year IN {{widget_year}}
AND make_ IN {{widget_make}}
AND vehicleclass_ IN {{widget_vehicle_class}}
AND co2_rating >= {{widget_co2}}
LIMIT {{widget_row}}

interactive(children=(SelectMultiple(description='Model Year', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…

## Plots

1. Bar plot with three groups (fuel, hybrid, electric). X axis is `model_year` and y axis is `num_vehicles`. This way we can examine car manufacturing trends - ggplot API
2. Scatter plot of electric vehicle ranges and charging time (by vehicle class?)- seaborn 
3. Bar plot with three groups (fuel, hybrid, electric). X axis is `vehicleclass_` and y axis is `num_vehicles`. 
4. Bubble plot of fuel vehicles, x axis is `co2emissions_g(g/km)`, y axis is `fuelconsumption_comb(mpg)`, and bubble size is `enginesize_(l)`
5. Boxplot for statistical comparison of `fuelconsumption_city` across all three groups (fuel, hybrid, electric) -> tough to do in a single CTE because of the different column names. Could do it separately (1 boxplot for each group) or look at another variable.
6. Heatmap (makes sense for only Gas Cars) of `fuelconsumption_comb_l_100km`, `enginesize_l`, `cylinders_`, `co2emissions_g_km`, `co2_rating`, `smog_rating`, and `number_of_gears`. CTE is not possible, will have to use `df.corr()`. Don't count this for CTE use so technically still at 5.
7. Histogram of `co2emissions_g_km` with widgets for `cmap`, `fill`, and `bins`. Fill cols include `vehicle_type` and `mapped_fuel_type`. (hybrid and fuel-only) -> outputting 2 plots (bug?)
8. Boxplot with seaborn for `co2emissions_g_km` by `make_` and `vehicleclass_` (hybrid and fuel-only)  

## Bar Plot of Car Manufacturing Trends

In [14]:
%%sql --save q_1_hybrid_electric --no-execute
SELECT DISTINCT model_year, vehicle_type, COUNT(id) AS num_vehicles
FROM all_vehicles
WHERE vehicle_type = 'hybrid' OR vehicle_type = 'electric'
GROUP BY model_year, vehicle_type
ORDER BY num_vehicles DESC;

In [15]:
%%sql --save q_1_fuel --no-execute
SELECT DISTINCT model_year, vehicle_type, COUNT(id) AS num_vehicles
FROM fuel
GROUP BY model_year, vehicle_type
ORDER BY model_year;

In [16]:
hybrid_electric_count = %sql SELECT * FROM q_1_hybrid_electric
fuel_count = %sql SELECT * FROM q_1_fuel

hybrid_electric_count = hybrid_electric_count.DataFrame()
fuel_count = fuel_count.DataFrame()

In [17]:
radio_button = widgets.RadioButtons(
    options=["fuel_count", "hybrid_electric_count"],
    description="Select Data:",
    disabled=False,
    style={"description_width": "initial"},
)


def draw_bar_year_count(data):
    plt.figure(figsize=(10, 5), dpi=300)
    if data == "fuel_count":
        sns.barplot(
            data=fuel_count,
            x="model_year",
            y="num_vehicles",
            color="orange",
            errorbar=None,
            width=0.4,
        )
        sns.pointplot(
            data=fuel_count,
            x="model_year",
            y="num_vehicles",
            color="red",
            linestyles="--",
            ax=plt.gca(),
            errorbar=None,
        )
        plt.xlabel("Car Model Year")
        plt.ylabel("Count")
        plt.xticks(rotation=45)
        plt.title("Count of Unique Fuel-Only Cars by Model Year")
        plt.show()
    else:
        sns.barplot(
            data=hybrid_electric_count,
            x="model_year",
            y="num_vehicles",
            hue="vehicle_type",
            palette={"hybrid": "blue", "electric": "green"},
            width=0.4,
        )
        sns.pointplot(
            data=hybrid_electric_count,
            x="model_year",
            y="num_vehicles",
            color="red",
            linestyles="--",
            ax=plt.gca(),
            errorbar=None,
        )
        plt.xlabel("Car Model Year")
        plt.ylabel("Count")
        plt.xticks(rotation=45)
        plt.title("Count of Unique Hybrid and Electric Cars by Model Year")
        plt.legend(bbox_to_anchor=(1, 1), loc="upper right")
        plt.show()


interact(draw_bar_year_count, data=radio_button)

interactive(children=(RadioButtons(description='Select Data:', options=('fuel_count', 'hybrid_electric_count')…

<function __main__.draw_bar_year_count(data)>

### Interesting Insights

From the bar plot of fuel-only cars, we can see that the <b>number of unique car brand models</b> introduced to the Canadian automobile market had been increasing from the turn of the 21st century to the year 2005. This increasing trend then plateued and remained fairly constant until 2022, with 2015 experiencing the largest spike. On December 21, 2022, Steven Guilbeault, Canada's minister of environment and climate change, unveiled a regulation that would require increasing percentages of vehicle sales in Canada to be zero-emissions vehicles up to 100% by the year 2035[$^1$](https://www.canada.ca/en/environment-climate-change/news/2022/12/let-it-roll-government-of-canada-moves-to-increase-the-supply-of-electric-vehicles-for-canadians.html). These efforts seem to have had an immediate impact on the number of fuel-only cars introduced to the Canadian market, with 2023 experiencing a sharp decline and reaching 2003 levels. 

The above insights are reinforced by the bar plot of the number of unique hybrid and electric car brands and their respective models introduced to the Canadian automobile market. In 2012, only two electric car models, Nissan's Leaf and Mitsubishi's i-MiEV, and one hybrid car manufacturer, Chevrolet's Volt, were present in the market. Since then, this figure has grown to 134 electric car models and 32 hybrid car models in 2023 in Canada.

## Boxplot of Fuel Consumption for All Vehicle Types

In [27]:
%%sql --save boxplot_fuel_consum --no-execute
SELECT fuelconsumption_city_l_100km, fuelconsumption_hwy_l_100km, fuelconsumption_comb_l_100km	
FROM all_vehicles

In [28]:
columns = widgets.SelectMultiple(
    options=[
        "fuelconsumption_city_l_100km",
        "fuelconsumption_hwy_l_100km",
        "fuelconsumption_comb_l_100km",
    ],
    value=["fuelconsumption_comb_l_100km"],
    description="Column(s):",
    disabled=False,
)

In [29]:
plt.rcParams["figure.figsize"] = (12, 3)  # increase size of canvas


def plot(columns):
    (
        ggplot(
            table="boxplot_fuel_consum",
            with_="boxplot_fuel_consum",
            mapping=aes(x=columns),
        )
        + geom_boxplot()
    )


interact(plot, columns=columns)

interactive(children=(SelectMultiple(description='Column(s):', index=(2,), options=('fuelconsumption_city_l_10…

<function __main__.plot(columns)>

### Interesting Insights

The three available boxplots above show the distribution of fuel consumption in the city, highway, or as their combination for all types of cars. The median fuel consumption in the city for all cars is around 12 litres per 100 kilometers, while the median fuel consumption on the highway for all cars is around 10 litres per 100 kilometers. The combined fuel consumption for all cars is the vehicle's city's and highway's average fuel consumption, which is around 11 litres per 100 kilometers.

## Scatter Plot of Electric Vehicle Ranges and Charging Time by Car Size and Model Year 

In [30]:
%%sql --save q_2_electric_range --no-execute
SELECT range1_km, recharge_time_h, vehicleclass_, model_year
FROM electric

In [31]:
electric_range = %sql SELECT * FROM q_2_electric_range

electric_range = electric_range.DataFrame()

# group vehicle class into sedan or SUV

electric_range["vehicle_size"] = np.where(
    electric_range["vehicleclass_"].isin(
        ["subcompact", "compact", "mid-size", "full-size", "two-seater"]
    ),
    "Sedan or smaller",
    "SUV or larger",
)

# group model year into 2012-2021 and 2022-2023

electric_range["model_year_grouped"] = np.where(
    electric_range["model_year"] <= 2021, "2012-2021", "2022-2023"
)

In [33]:
hue_button = widgets.Dropdown(
    options=["vehicle_size", "model_year_grouped", None],
    description="(Un)select Hue:",
    disabled=False,
    style={"description_width": "initial"},
)


def draw_scatter_electric_range(hue):
    plt.figure(figsize=(10, 5), dpi=300)
    sns.scatterplot(
        data=electric_range, x="recharge_time_h", y="range1_km", hue=hue
    )  # noqa E501
    plt.title(
        f"Scatter Plot of Electric Vehicle Range and Recharge Time by {hue}"
    )  # noqa E501
    plt.xlabel("Recharge Time (hrs)")
    plt.ylabel("Range (km)")
    plt.show()


interact(draw_scatter_electric_range, hue=hue_button)

interactive(children=(Dropdown(description='(Un)select Hue:', options=('vehicle_size', 'model_year_grouped', N…

<function __main__.draw_scatter_electric_range(hue)>

### Interesting Insights

The above scatterplot helps us compare the ranges and charging times of electric cars by their size or model year. Although one could deduce that higher recharge times (depending on the car's battery size, quality, etc.) would lead to travelling greater ranges, the graph offers more details that are worth exploring. For example, electric cars manufactured recently (2022 and onwards) have a much higher range, on average, than those manufactured between 2012 and 2021. This is likely due to the advancements in battery technology and the increased demand for electric cars. Moreover, some electric cars recently manufactured provide a better range with 10 hours of recharge time than those manufactured previously with 12 hours of recharge time. Furthermore, some new electric cars with recharge times of 10 hours provide as good a range as both new and older electric cars with recharge times greater than 10 hours (13 hours being the outlier). Maybe 10 hours is the sweet spot for recharge time?

If we shift our focus to vehicle size, there are more electric sedans (and smaller) than there are SUV's (and larger) for lower recharge times between 4 to 7 hours and this is expected due to the difference in car sizes. Sedans, on average, also seem to provide greater ranges than SUV's for recharge times greater than 7 hours. However, for recharge times less than 7 hours, SUV's provide greater ranges than sedans. This could be due to the fact that SUV's have larger batteries and, therefore, can travel greater ranges with less recharge time. Moreover, some sedans with 10 hours of recharge time provide better ranges than all SUV's do with greater than 10 hours of recharge time!

Therefore, consumers have a wide range of options to choose from when it comes to electric cars! Choosing wisely by assessing the tradeoff between recharge time and range is key and this graph helps us do just that.

## Histogram of $CO_2$ Emissions by Vehicle and Fuel Type

In [21]:
%%sql --save hist_co2 --no-execute
SELECT vehicle_type, mapped_fuel_type, co2emissions_g_km	
FROM all_vehicles
WHERE co2emissions_g_km is not null 

In [22]:
b = widgets.IntSlider(
    value=10,
    min=1,
    max=20,
    step=1,
    description="Bins:",
    orientation="horizontal",
)
cmap = widgets.Dropdown(
    options=["viridis", "plasma", "inferno", "magma", "cividis"],
    value="plasma",
    description="Colormap:",
    disabled=False,
)
fill = widgets.RadioButtons(
    options=["vehicle_type", "mapped_fuel_type"],
    description="Fill by:",
    disabled=False,
)

In [23]:
def plot(b, cmap, fill):
    (
        ggplot(
            table="hist_co2",
            with_="hist_co2",
            mapping=aes(x="co2emissions_g_km"),
        )  # noqa E501
        + geom_histogram(bins=b, fill=fill, cmap=cmap)
    )


interact(plot, b=b, cmap=cmap, fill=fill)

interactive(children=(IntSlider(value=10, description='Bins:', max=20, min=1), Dropdown(description='Colormap:…

<function __main__.plot(b, cmap, fill)>

### Interesting Insights

The histogram above represents the distribution of $CO_2$ emissions, measured in grams per kilometer. If we select the `fill` attribute to `vehicle_type`, we obtain a clear view that fuel-only cars emit the most $CO_2$. In fact, they can pollute up to 6x more than hybrid cars! Hybrid cars have both an electric motor and a gasoline engine, which allows them to emit less $CO_2$ than fuel-only cars. The range of $CO_2$ emitted from hybrid vehicles ranges between 10 to 80 grams per kilometer, while the distribution of $CO_2$ emissions for fuel-only cars ranges from 100 to 500 grams per kilometer, with the bulk of vehicles emitting between 200 to 300 grams per kilometer. Electric cars have zero carbon dixoide emissions and are, hence, fittingly also known as zero-emission vehicles.

Given these findings, the efforts of the Canadian government to increase the supply of electric vehicles in Canada by 2035[$^2$](https://www.canada.ca/en/environment-climate-change/news/2022/12/let-it-roll-government-of-canada-moves-to-increase-the-supply-of-electric-vehicles-for-canadians.html) will likely have a positive impact on the environment. 

Selecting the `fill` attribute to `mapped_fuel_type` and adjusting the histogram to 12 bins allows us to see that the majority of vehicles in Canada run on gasoline, premium being more harmful to the environment than regular as it is the only fuel type that emits greater than 450 grams per kilometer in some cars. However, since most cars run on regular gasoline, the area occupied for it in the histogram is greater. Diesel and Ethanol (E85) are slightly cleaner than gasoline as their emissions range from 150 to 400 grams per kilometer with the bulk of vehicles emitting between 200 to 300 grams per kilometer (similar to both gasoline types). 

## References

Canada, Service. “Government of Canada.” Service Canada, n.d. https://www.canada.ca/. 