# The relation between car use and fuel prices

TIL Python Project group 9
* Tessa van de Hulst (4963601)
* Maartje van den Broek (4964837)
* Lara de Geus (4965868)
* Pien Biersteker (4888375)

The group contribution statement can be found at the bottom of the file.

## Introduction & research questions 

In the past couple of months, the fuel prices have been increasing and fluctuating a lot because of multiple factors. For the end users of fuel, this could have had an influence on their usage and travel behavior. But did fuel prices always influence choices of travel modes in the past decades? This research will focus on the interrelation of fuel prices (petrol, diesel and LPG) on car use in the Netherlands in the years 2006 until 2019. The main research question therefore calls: 
“How does the fluctuation of fuel prices (petrol, diesel and LPG) affect the car utilization per year in the Netherlands?”

To further explore the main research question, three subquestions have been established: 
1. How do the fuel prices fluctuate over the years in the Netherlands? 
2. Does the fuel type affect the annual mileage of cars in the Netherlands? 
3. How does the type of property (company/private) affect the annual mileage of cars in the Netherlands? 

These subquestions have been investigated and are used for answering the main research question. 

## Data used 

Two data sets are used in this research. 
   * Fuel prices data set 2006 til 2019
       - https://opendata.cbs.nl/statline/portal.html?_la=nl&_catalog=CBS&tableId=80416ned&_theme=426 

   * Traffic performance data set 2006 til 2019
       - https://opendata.cbs.nl/#/CBS/nl/dataset/80428ned/table
       
These data sets are adjusted and made suitable for this research in the data processing notebook.

## Part 1: Import libraries en combined data sets 

In [22]:
# First, the necessary libraries were imported.
import pandas as pd
import numpy as np
import math
import scipy
import itertools
from scipy import stats

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

As visible in the table below, both used data sets are combined into one table. This is done in the Data Processing Notebook. Because the datasets are now combined into one table, it is usable to plot figures and graphs.  

In [5]:
# In this piece of code the combined dataset, created in the data rocessing notebook, is imported and the first 5 rows are printed
file_path = "downloads/total_data_traffic.csv"
traffic_data = pd.read_csv(file_path, delimiter = ',')
traffic_data.head()

Unnamed: 0,Property,Fuel type,Date,Annual mileage in the Netherlands (km),Annual Mileage Company (km),Annual Mileage Private (km),Fuel price,Annual mileage Petrol (km),Annual mileage Diesel (km),Annual mileage LPG (km)
0,Total,Petrol,2006,9711,,,1.373255,9711.0,,
1,Total,Petrol,2007,9667,,,1.414156,9667.0,,
2,Total,Petrol,2008,9359,,,1.476393,9359.0,,
3,Total,Petrol,2009,9379,,,1.354011,9379.0,,
4,Total,Petrol,2010,9369,,,1.503186,9369.0,,


The adjusted Fuel prices data set is also imported, because this data set is necessary for the first subquestion. This data set is made suitable in the Data Processing Notebook as well.

In [6]:
# In this piece of code the fuel prices data set, adjusted in the dataprocessing notebook, is imported and the first 5 rows are printed
file_path2 = "downloads/fueldata.csv"
df_fuel = pd.read_csv(file_path2)
df_fuel.head()

Unnamed: 0,Date,Petrol,Diesel,LPG
0,01-01-2006,1.325,1.003,0.543
1,02-01-2006,1.328,1.007,0.542
2,03-01-2006,1.332,1.007,0.54
3,04-01-2006,1.348,1.02,0.55
4,05-01-2006,1.347,1.021,0.55


# Subquestion 1

Subquestion 1: How do the fuel prices fluctuate over the years in the Netherlands?

To answer this research question, the fuel prices per day of the various fuel types are put into a line graph for the years 2006 through 2019. In the time period of 2006 until 2019, the prices for all three different fuel types fluctuate. As can be seen in this figure, the fluctuation lines all have a quite similar procession. LPG has the least extreme fluctuations in its graph, probably because the prices are already lower than petrol and diesel in general. Around 2008 there was a big decrease in the price for all three fuel types. This could be due to the economic crisis during these years which made the fuelprices decrease. Another example is the decrease in price around the end of 2015. This was a consequence of the OPEC which normally cuts down the production when the oil prices decrease, but they did not do this in 2015 [1]. This caused a bigger supply than the demand which led to low fuel prices. 

When there is a lot of fluctuation in the fuel price over a year, it can lead to a change in the car usage in that year. For example, when the fuel price increases a lot during the year, which is happening now for example, people might use their car less than before because of the high fuel prices. Therefore, it is good to include the fluctuations when answering the research questions.   

[1] https://www.unitedconsumers.com/tanken/nieuws/2015/12/16/brandstofprijzen-historisch-laag.jsp# 

In [25]:
# In this code a line graph for the fuelprices is created with the adjusted fuel prices data set
# The chart shows the values of fuel prices per day from 2006 through 2019. 
labels= ['Petrol', 'Diesel', 'LPG']
fig1 = px.line(df_fuel, x='Date', y= labels)

#these lines of code are used to transform the layout of the graph
fig1.update_layout(
    title="Fluctuation in fuel prices from 2006 through 2019",
    xaxis_title="Date",
    yaxis_title="Price (€)",
    legend_title="Fuel type",
)

fig1.show()

## Subquestion 2

Subquestion 2: Does the fuel type affect the annual mileage of cars in the Netherlands? 

For the second sub question, it will be analyzed how the fuel price of the different types of fuel influences the average number of kilometers driven per vehicle per year in the Netherlands. To investigate the relationship between the two variables, a scatterplot is made for the three fuel types. In these plots, no distinction is made for property status because only the influence of the different fuel types is examined.
 
The average fuel price of petrol is relatively high, between 1,35 and 1,76 euro per liter. This leads to a lower number of kilometers driven in total, namely 9.500 kilometers per vehicle on average, compared to the other two fuel types. Furthermore, a very small decrease in the average number of kilometers driven can be seen when the price of petrol increases. It can be concluded that an increase in the fuel price of petrol leads to a decrease in  the number of kilometers driven per vehicle. However, this is only a very small decrease. 
 
The fuel price for diesel is in comparison to the fuel price of petrol lower, the price differs between 1,00 and 1,44 euro per liter. The lower price goes together with a higher number of kilometers driven per year, namely between 22.000 and 19.000 kilometers. From the plot is can be concluded that that when the price of diesel increases, the average number of kilometers driven per year will decrease. 
 
The fuel price for LPG is the lowest out of all the different fuel types. The price per liter LPG is between 0,50 and 0,76 euro. However, this lowest price does not lead to the highest number of kilometers driven per year, compared to the diesel and petrol vehicles. LPG vehicles drive on average between 16.000 and 12.000 kilometers per year. This can be due to the fact that there are less LPG vehicles on the road (Bovag, 2021) [2]. Furthermore, no linear relationship can be seen between the price of LPG and the average number of kilometers driven. An increase in the price does not necessarily lead to a decrease in the number of kilometers driven. This can also be explained by the fact that there are less and less cars driving on LPG on the road.

[2] https://mijn.bovag.nl/actueel/nieuws/2021/oktober/aantal-lpg-auto-s-daalt-onder-100-000#:~:text=In%20september%202021%20is%20het,procent%20in%20het%20totale%20wagenpark. 

In [56]:
# These 2 lines of code make sure the value 'Total' is selected in the column 'Property', so that the graphs only  
# examine the fuel types. No distinction is made between private and company cars. 
traffic_data_property = traffic_data.groupby('Property')
traffic_data_property_total = traffic_data_property.get_group('Total')

# Here the scatterplot is created for the Fual price & Annual mileage, for each fueltype there is made a different scatterplot
fig2 = px.scatter(traffic_data_property_total, x="Fuel price", y="Annual mileage in the Netherlands (km)", color="Fuel type", facet_col="Fuel type",
       category_orders={"Fuel price": ["Petrol", "Diesel", "LPG"]})

# These lines of code are used to transform the layout of the scatterplot
fig2.update_layout(
    title="Influence of the fuel price on the average number of kilometers per car",
    xaxis_title="Fuel price",
    yaxis_title="Number of kilometers",
    legend_title="Fuel type",
)

fig2.show()

The correlation between the fuel prices per fuel type and the average number of kilometers driven per car per year in the Netherlands is also calculated. The correlation says something about the coherence between variables. All the calculated correlations are negative, which means there is a negative relationship between the two variables. This is also visible in the scatterplots above. The correlation of fuel type diesel is the highest (-0,783), which can be seen in the third calculation. This means the fuel price of diesel has the strongest coherence with the average number of kilometers driven. Besides, the coefficient is quite close to -1, which means it is a strong correlation. The correlation coefficients of petrol and LPG are -0,665 and -0,640 respectively, and are strong correlations as well. All correlation coefficients are statistically significant when applying the 0,05 significance boundary, which means the results can say something about the reality. 

In [57]:
# In these codelines the correlation between the price of petrol and the average number of kilometers driven is calculated
traffic_data_property_total_fueltype = traffic_data_property_total.groupby('Fuel type')
traffic_data_property_total_fueltype_petrol = traffic_data_property_total_fueltype.get_group('Petrol')

stats.pearsonr(traffic_data_property_total_fueltype_petrol['Fuel price'], traffic_data_property_total_fueltype_petrol['Annual mileage in the Netherlands (km)'])

(-0.6649994211988541, 0.009458835967883517)

In [58]:
# In these codelines the correlation between the price of diesel and the average number of kilometers driven is calculated
traffic_data_property_total_fueltype = traffic_data_property_total.groupby('Fuel type')
traffic_data_property_total_fueltype_diesel = traffic_data_property_total_fueltype.get_group('Diesel')

stats.pearsonr(traffic_data_property_total_fueltype_diesel['Fuel price'], traffic_data_property_total_fueltype_diesel['Annual mileage in the Netherlands (km)'])

(-0.7828958254507818, 0.0009298231141449003)

In [59]:
# In these codelines the correlation between the price of LPG and the average number of kilometers driven is calculated
traffic_data_property_total_fueltype = traffic_data_property_total.groupby('Fuel type')
traffic_data_property_total_fueltype_lpg = traffic_data_property_total_fueltype.get_group('LPG')

stats.pearsonr(traffic_data_property_total_fueltype_lpg['Fuel price'], traffic_data_property_total_fueltype_lpg['Annual mileage in the Netherlands (km)'])

(-0.6396227878233751, 0.013767455799138717)

## Subquestion 3

Subquestion 3: How does the type of property (company/private) affect the annual mileage of cars in the Netherlands? 

For the third subquestion, the difference between property type of the car is being researched. To answer this subquestion, a scatterplot is again created for the different fuel types, but this scatterplot does include the property situations. The different property situations are decleared by the different colors. 

As can be concluded from the graphs, company cars have a much bigger share in the number of kilometers than the private cars. Furthermore, it can be concluded that there is a very small linear relationship between the price of petrol and the annual mileage of a private car, a linear relationship between the price of diesel and the annual mileage of a private or company car and a linear relationship between the price of LPG and the annual mileage of a private car. For the linear relations, it holds that an increase in the fuel price leads to a decrease in the average number of kilometers driven with a private or company car. However, the relations between the price of petrol and lpg and the annual mileage of a company car are not linear. This means that an increase in the price does not necessarily lead to a decrease in the number of kilometers driven. For both fuel types, it is visible that with a price fluctuation, the average number of kilometers driven with a company car both grows and declines.  

In [29]:
# In these code lines, a distinction is made between private (red) and company (blue).
labels= ['Annual Mileage Company (km)', 'Annual Mileage Private (km)']
fig3 = px.scatter(traffic_data, x='Fuel price', y= labels, facet_col='Fuel type')

# These lines of code are used to transform the layout of the scatterplot
fig3.update_layout(
    title="Influence of the fuel price on the average number of kilometers per company and private car",
    xaxis_title="Fuel price",
    yaxis_title="Number of kilometers",
    legend_title="Type of car",
)

fig3.show()

Furthermore, the correlation between the fuel price of each fuel type and the average number of kilometers driven with a company or private car per year in the Netherlands is calculated. It is noticable that all correlation coefficients are negative, except the correlation coefficient of the correlation between the price of petrol and the annual mileage of a company car. However, this correlation coefficient is not statistically significant when applying the 0,05 significance boundary (p-value = 0,849), meaning that this result cannot say anything about the reality.

What is further remarkable is that for each fuel type the correlation coefficient between the fuel price and annual mileage of a private car is statistically significant when applying the 0,05 significance boundary. These three correlation coefficients are quite close to -1, meaning that the coherence between the variables is quite strong. For the annual mileage of a company car, only the correlation with the price of diesel is statistically significant. So only this correlation can say something about the reality. The coefficient is also quite close to -1, meaning that the coherence between the two variables is quite strong. 

In [48]:
# In these codelines the correlation between the price of petrol and the average number of kilometers driven with a private and company car is calculated
traffic_data_fueltype = traffic_data.groupby('Fuel type')
traffic_data_fueltype_petrol = traffic_data_fueltype.get_group('Petrol')
traffic_data_fueltype_petrol_property = traffic_data_fueltype_petrol.groupby('Property')
traffic_data_fueltype_petrol_private = traffic_data_fueltype_petrol_property.get_group('Private')
traffic_data_fueltype_petrol_company = traffic_data_fueltype_petrol_property.get_group('Company')

In [37]:
stats.pearsonr(traffic_data_fueltype_petrol_private['Fuel price'], traffic_data_fueltype_petrol_private['Annual mileage in the Netherlands (km)'])

(-0.72855600060414, 0.003123352379629446)

In [38]:
stats.pearsonr(traffic_data_fueltype_petrol_company['Fuel price'], traffic_data_fueltype_petrol_company['Annual mileage in the Netherlands (km)'])

(0.055987838076011, 0.8492280147931854)

In [39]:
# In these codelines the correlation between the price of diesel and the average number of kilometers driven with a private and company car is calculated
traffic_data_fueltype = traffic_data.groupby('Fuel type')
traffic_data_fueltype_diesel = traffic_data_fueltype.get_group('Diesel')
traffic_data_fueltype_diesel_property = traffic_data_fueltype_diesel.groupby('Property')
traffic_data_fueltype_diesel_private = traffic_data_fueltype_diesel_property.get_group('Private')
traffic_data_fueltype_diesel_company = traffic_data_fueltype_diesel_property.get_group('Company')

In [40]:
stats.pearsonr(traffic_data_fueltype_diesel_private['Fuel price'], traffic_data_fueltype_diesel_private['Annual mileage in the Netherlands (km)'])

(-0.721364842310528, 0.003591431002758882)

In [41]:
stats.pearsonr(traffic_data_fueltype_diesel_company['Fuel price'], traffic_data_fueltype_diesel_company['Annual mileage in the Netherlands (km)'])

(-0.7441415752966531, 0.0022734622703575245)

In [42]:
# In these codelines the correlation between the price of LPG and the average number of kilometers driven with a private and company car is is calculated
traffic_data_fueltype = traffic_data.groupby('Fuel type')
traffic_data_fueltype_lpg = traffic_data_fueltype.get_group('LPG')
traffic_data_fueltype_lpg_property = traffic_data_fueltype_lpg.groupby('Property')
traffic_data_fueltype_lpg_private = traffic_data_fueltype_lpg_property.get_group('Private')
traffic_data_fueltype_lpg_company = traffic_data_fueltype_lpg_property.get_group('Company')

In [43]:
stats.pearsonr(traffic_data_fueltype_lpg_private['Fuel price'], traffic_data_fueltype_lpg_private['Annual mileage in the Netherlands (km)'])

(-0.6308196936320142, 0.015567414854313172)

In [44]:
stats.pearsonr(traffic_data_fueltype_lpg_company['Fuel price'], traffic_data_fueltype_lpg_company['Annual mileage in the Netherlands (km)'])

(-0.5061322214656161, 0.06479569950673499)

## Main research question

The main question of this research is: “How does the fluctuation of fuel prices (petrol, diesel and LPG) affect the car utilization per year in the Netherlands?”

The correlation between fuel price and annual mileage in the Netherlands is calculated below. The correlation coefficient is -0,25. This implies that an increase in the fuel price leads to a decrease in the average number of kilometers driven. However, this correlation coefficient is very small, meaning that the coherence between the two variables is very weak. 

In [53]:
# In these codelines the correlation between the annual mileage and FuelPrice is calculated
stats.pearsonr(traffic_data['Fuel price'], traffic_data['Annual mileage in the Netherlands (km)'])

(-0.2509149317715653, 0.004597994456137454)

Next to the calculated correlation coefficient, an animated graph is made which shows the general procession of the annual mileage of a vehicle per fuel type from 2006 untill 2019, so without a distinction in property type. It can be seen that in 2008, there is a relatively big decrease in the average number of kilometers driven for each fuel type. This can be explained by the extreme fluctuations of the fuel price in 2008, which can be seen in graph 1 in subquestion 1. In 2008, the fuel price first increases, and then it decreases a lot. Although a decrease in the fuel price leads in general to a an increase in the number of kilometers driven, the major decrease in the fuel price in 2008 does not lead to an increase in the number of kilometers driven. This can be explained by the fact that there was a great economic crisis in that time period, in which people might use their car less, because there are more unemployed people and people do fewer activities. In 2009, the annual mileage remained pretty stable, but it can be seen that in 2010 and 2011, there is a second big decrease in the annual mileage. This might be because of the crisis as well, because the crisis did not ended until the end of 2011. After 2011, the annual mileage stayed quite stable, until 2019. In 2019 there was a lot of fluctuation in the fuel price, as can be seen in graph 1 in subquesion 1, which might have caused the decrease in the annual mileage. 

Furthermore, it can be seen that the number of kilometers driven with a petrol car is quite stable compared to the other two fuel types, even though the price of petrol fluctuates quite much. Thus, it can be concluded that the price of petrol did not very much influence the annual mileage of a vehicle. 

In [54]:
# In these lines of code, an animated barchart is created. It shows the progression of mileage for each FuelType over the years.
fig_4 = px.bar(traffic_data_property_total, x= 'Annual mileage in the Netherlands (km)', y='Fuel type', orientation='h',color='Fuel type',
                animation_frame='Date', animation_group='Fuel type')

# These lines of code are used to transform the layout of the barchart
fig_4.update_layout(title= 'The average number of kilometers driven per year per fuel type', 
                    xaxis_title= "Number of kilometers driven",
                    yaxis_title= "Fuel type")
fig_4.update_layout(showlegend=False,xaxis_range = [0,25000])
fig_4.update_layout(yaxis={'categoryorder':'total ascending'})
fig_4.show()

## Conclusion

Based on the results of the sub research questions, the correlation between fuel price and annual mileage in the Netherlands and the animated graph of the number of driven kilometers, an answer to the main research question has been obtained. 

The main research quesion is:  “How does the fluctuation of fuel prices (petrol, diesel and LPG) affect the car utilization per year in the Netherlands?”

From the graphs in subquestion 2, it can be concluded that for diesel, an increase in the fuel price leads to a decrease in the number of kilometers driven. For petrol and LPG this relationship is less clear. An increase in the price of petrol leads only to a very small decrease in the average number of kilometers driven. And for LPG, there is no linear relationship between the price and the number of kilometers driven, so an increase in the price does not necessarily lead to a decrease in the number of kilometers driven. However, when looking at the calculated correlations, it can be concluded that the prices of each fuel type have a negative, pretty strong coherence with the annual mileage of a car. 

When looking at the distinction between company and private cars, investigated in subquestion 3, it can be concluded from the graph that mainly the relationships for private cars are linear. Only for diesel, there is also a linear relation between the fuel price and the annual mileage with a company car. When looking at the correlations between the fuel price of each fuel type and annual mileage of a private or company car, it can be concluded that the correlations of the private cars are all statistically significant, negative and quite strong, which means there is strong negative coherence between the fuel price of each fuel type and the annual mileage of a private car. But, the correlation between the fuel price of each fuel type and the annual mileage of a private car is only statistically significant for the fuel type diesel. 

The general correlation of fuel price and annual mileage in the Netherlands is negative, but not very strong (-0,251). So when looking at the variables in general, there is hardly any coherence between the variables. When looking at the fluctuation in annual mileage of a car, and comparing it to the fuel price fluctuations, it can be concluded that the annual mileage of a petrol car remains quite stable, despite the fluctuations in fuelprice. This can also be seen in the scatterplots, where there is only a very little linear relationship between the price of petrol and the annual mileage of a (private) petrol car. For diesel and LPG, there is some fluctuation in the annual mileage. This can be explained by the fluctuation in fuel price and the economic crisis between 2008 and 2011. 

All in all, it can be concluded that there is strong coherence and a negative linear relationship between the fuel price of each fuel type and the annual mileage of a private car. For a company car, there is only coherence and a negative linear relationship for diesel. Moreover, the relationship between the price of petrol and the annual mileage is only very small. The negative linear relationships can be explained by the fluctuation in fuel price during a year, but it might also be caused by other variables, not investigated in this research. Thus, this should be further investigated in follow-up research. 

## Contribution Statement 

* Tessa van de Hulst (4963601)
    - Final responsibility for subquestion 1 and 3 and the introduction. Joint responsibility for all code rules for the various charts 
* Maartje van den Broek (4964837)
    - Final responsibility for subquestion 2. Joint responsibility for all code rules for the various charts
* Lara de Geus (4965868)
    - Final respoinsibility for the main research question and conclusion. Joint responsibility for all code rules for the various charts. Finetuning the whole report. 
* Pien Biersteker (4888375)
    - Final responsibility for data processing notebook. Comments on coding in both notebooks. Joint responsibility for all code rules for the various charts

