# The relation between car use and fuel prices

TIL Python Project group 9 

For this project we´re going to look at the relation between fuel prices for petrol, diesel and LPG and the use of car. 

In [25]:
import pandas as pd
import numpy as np
import math
import scipy
import itertools

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

import plotly.io as pio

# Part 1 - Combine the datasets

Two data sets are used in the study. 
   * Fuelprices dataset 2006 til 2019
       - https://opendata.cbs.nl/statline/portal.html?_la=nl&_catalog=CBS&tableId=80416ned&_theme=426 

   * Traffic use dataset 2006 til 2019
       - https://opendata.cbs.nl/#/CBS/nl/dataset/80428ned/table
   
First the fuel dataset is downloaded and made suitable for our research

In [26]:
#import fuel data
file_path = "C:/Users/tessa/Downloads/fuelprices goed 2006-2019.csv"
df_fuel = pd.read_csv(file_path, delimiter = ';')

# in this piece of code the dutch column names are replaced for English column names
df_fuel.rename(columns ={'Perioden': 'Date', 
                    'BenzineEuro95_1': 'Petrol', 
                    'Diesel_2': 'Diesel', 
                    'Lpg_3': 'LPG'}, inplace=True)
df_fuel.head()

Unnamed: 0,Date,Petrol,Diesel,LPG
0,1-1-2006,1.325,1.003,0.543
1,2-1-2006,1.328,1.007,0.542
2,3-1-2006,1.332,1.007,0.54
3,4-1-2006,1.348,1.02,0.55
4,5-1-2006,1.347,1.021,0.55


In the upper table, the average prices for the three different fuel types are given per day from January 1st 2006 until Decembre 31st 2019. To work with this data, the average prices are calculated from day to year (2006-2019). The yearly average prices are shown in the table beneath.

In [27]:
# In this piece of code the averige fuel price for petrol is calculated per year
average_petrol = df_fuel.groupby(pd.PeriodIndex(df_fuel['Date'], freq="Y"))['Petrol'].mean()
average_diesel = df_fuel.groupby(pd.PeriodIndex(df_fuel['Date'], freq="Y"))['Diesel'].mean()
average_LPG = df_fuel.groupby(pd.PeriodIndex(df_fuel['Date'], freq="Y"))['LPG'].mean()
averageFuel = df_fuel.groupby(pd.PeriodIndex(df_fuel['Date'], freq="Y"))['Petrol','Diesel','LPG'].mean()
averageFuel


Parsing '13-10-2006' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.


Parsing '14-10-2006' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.


Parsing '15-10-2006' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.


Parsing '16-10-2006' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.


Parsing '17-10-2006' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.


Parsing '18-10-2006' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.


Parsing '19-10-2006' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.


Parsing '20-10-2006' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing.


Parsing '21-10-2006' in

Unnamed: 0_level_0,Petrol,Diesel,LPG
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2006,1.373255,1.043211,0.520479
2007,1.414156,1.059721,0.538271
2008,1.476393,1.241314,0.59879
2009,1.354011,1.012929,0.509482
2010,1.503186,1.170838,0.644268
2011,1.639871,1.347986,0.700367
2012,1.758383,1.443951,0.768587
2013,1.735597,1.420614,0.731671
2014,1.694627,1.401058,0.757405
2015,1.55777,1.230197,0.618951


To eventually include the fuelprice data into the other dataset, the prices have to be listed under eachother: 

In [28]:
# In this piece of code a list is created to get the average prices for petrol, diesel and LPG under eachother
average_f = list(itertools.chain(average_petrol, average_diesel, average_LPG))
averageFuel = list(itertools.chain(average_f, average_f))

Next, the traffic use dataset is imported and made suitable for the research: 

In [29]:
# Import traffic use data
file_p =  "C:/Users/tessa/Downloads/trafic personcars 2006-2019.csv"
df_traffic = pd.read_csv(file_p, delimiter = ';')

# Remove unnecessary columns 
df_traffic = df_traffic.drop(columns = ['Leeftijd voertuig', 'Kilometers personenautos in Nederland/Totaal kilometers in Nederland (x mln km)',
         'Kilometers personenauto\'s in Nederland/Kilometers door Nederlandse voertuigen (x mln km)',
         'Kilometers personenauto\'s in Nederland/Kilometers door buitenlandse voertuigen (x mln km)',
         'Kilometers Nederlandse personenauto\'s/Totaal kilometers (x mln km)', 
         'Kilometers Nederlandse personenauto\'s/Kilometers in Nederland (x mln km)', 
         'Kilometers Nederlandse personenauto\'s/Kilometers in het buitenland (x mln km)', 
         'Gemiddeld jaarkilometrage/Totaal gemiddeld jaarkilometrage (aantal km)', 
         'Gemiddeld jaarkilometrage/Gemiddeld jaarkilometrage in buitenland (aantal km)'] )

# Rename Dutch column names for English column names
df_traffic.rename(columns ={'Eigendomssituatie': 'Property', 
                      'Brandstofsoort': 'FuelType', 
                      'Perioden': 'Date', 
                      'Gemiddeld jaarkilometrage/Gemiddeld jaarkilometrage in Nederland (aantal km)': 'annual mileage in the Netherlands (km)',
                      'Nederlandse personenauto\'s in gebruik (aantal)': 'Dutch Passenger cars in use (number)'}, inplace=True)

#remove cells with value 'Totaal' in columns fueltype and property
df_traffic = df_traffic[df_traffic.FuelType != 'Totaal']
df_traffic = df_traffic[df_traffic.Property != 'Totaal']


As you can see, a couple of columns from the original traffic use dataset are excluded because they are not relevant. The rows for 'Totaal' for Property and Fueltype are also excluded because of the irrelevance. 

In [30]:
# In this piece of code a column is added for the amount of cars compeny en private and the annual Mileage company and private in km
df_traffic['Cars Company'] = np.where(df_traffic['Property'] == 'Bedrijf', df_traffic['Dutch Passenger cars in use (number)'], np.nan)
df_traffic['Cars Private'] = np.where(df_traffic['Property'] == 'Particulier', df_traffic['Dutch Passenger cars in use (number)'], np.nan)

df_traffic['Annual Mileage Company (km)'] = np.where(df_traffic['Property'] == 'Bedrijf', df_traffic['annual mileage in the Netherlands (km)'], np.nan) 
df_traffic['Annual Mileage Private (km)'] = np.where(df_traffic['Property'] == 'Particulier', df_traffic['annual mileage in the Netherlands (km)'], np.nan)

#Then the average fuelprices list what we've made before is added to the Table.
df_traffic['FuelPrice'] = averageFuel = list(itertools.chain(average_f, average_f))

#in this piece of code the columns are added for the amount of cars used per fuelsort
df_traffic['Car use Petrol'] = np.where(df_traffic['FuelType'] == 'Benzine/overige', df_traffic['Dutch Passenger cars in use (number)'], np.nan)
df_traffic['Car use Diesel'] = np.where(df_traffic['FuelType'] == 'Diesel', df_traffic['Dutch Passenger cars in use (number)'], np.nan)
df_traffic['Car use LPG'] = np.where(df_traffic['FuelType'] == 'LPG', df_traffic['Dutch Passenger cars in use (number)'], np.nan)

# In this code the columns are added for the annual mileage per km per fuelsort
df_traffic['Annual mileage Petrol (km)'] = np.where(df_traffic['FuelType'] == 'Benzine/overige', df_traffic['annual mileage in the Netherlands (km)'], np.nan)
df_traffic['Annual mileage Diesel (km)'] = np.where(df_traffic['FuelType'] == 'Diesel', df_traffic['annual mileage in the Netherlands (km)'], np.nan)
df_traffic['Annual mileage LPG (km)'] = np.where(df_traffic['FuelType'] == 'LPG', df_traffic['annual mileage in the Netherlands (km)'], np.nan)

#so we now have a dataset where the fuelprices is in it and we have seperate columns for different property and fuelsorts
df_traffic

Unnamed: 0,Property,FuelType,Date,annual mileage in the Netherlands (km),Dutch Passenger cars in use (number),Cars Company,Cars Private,Annual Mileage Company (km),Annual Mileage Private (km),FuelPrice,Car use Petrol,Car use Diesel,Car use LPG,Annual mileage Petrol (km),Annual mileage Diesel (km),Annual mileage LPG (km)
70,Particulier,Benzine/overige,2006,9217,5935907,,5935907.0,,9217.0,1.373255,5935907.0,,,9217.0,,
71,Particulier,Benzine/overige,2007,9191,6009671,,6009671.0,,9191.0,1.414156,6009671.0,,,9191.0,,
72,Particulier,Benzine/overige,2008,8859,6091810,,6091810.0,,8859.0,1.476393,6091810.0,,,8859.0,,
73,Particulier,Benzine/overige,2009,8817,6163467,,6163467.0,,8817.0,1.354011,6163467.0,,,8817.0,,
74,Particulier,Benzine/overige,2010,8786,6276280,,6276280.0,,8786.0,1.503186,6276280.0,,,8786.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
163,Bedrijf,LPG,2015,16271,4642,4642.0,,16271.0,,0.618951,,,4642.0,,,16271.0
164,Bedrijf,LPG,2016,15788,4065,4065.0,,15788.0,,0.571298,,,4065.0,,,15788.0
165,Bedrijf,LPG,2017,15863,3757,3757.0,,15863.0,,0.632584,,,3757.0,,,15863.0
166,Bedrijf,LPG,2018,15604,3651,3651.0,,15604.0,,0.685408,,,3651.0,,,15604.0


As visible in the table above, both used datasets are combined into one table. The fuelprices of the different fueltypes are added in the respective rows. Because the datasets are now combined into one table, it is usable to plot figures or graphs.  

In [31]:
#These lines of codes are used to make a linegraph for the car use per fuelsort and on the x-axis the fuelprice
labels= ['Car use Petrol', 'Car use Diesel', 'Car use LPG']
fig = px.line(df_traffic, x='FuelPrice', y= labels)

#these lines of code are used to transform the layout of the graph
fig.update_layout(
    title="Amount of cars per fuelsort",
    xaxis_title="FuelPrice",
    yaxis_title="Amount of cars",
    legend_title="Fuel sort",
)


fig.show()

In [32]:
#These lines of codes are used to make a scatterplot for the car use per fuelsort just for 2006
labels= ['Car use Petrol', 'Car use Diesel', 'Car use LPG']
fig = px.scatter(df_traffic.query('Date==2006'), x='FuelPrice', y= labels)

#these lines of code are used to transform the layout of the graph
fig.update_layout(
    title="Amount of cars per fuelsort",
    xaxis_title="Fuelprice",
    yaxis_title="Amount of cars",
    legend_title="Fuel sort",
)

fig.show()

In [33]:
#This code makes a scatterplot just for the petrol fuelsort
fig = px.scatter(df_traffic, x='FuelPrice', y= 'Annual mileage Petrol (km)')

#these lines of code are used to transform the layout of the graph
fig.update_layout(
    title="Annual mileage Petrol",
    xaxis_title="Fuelprice",
    yaxis_title="Annual mileage",
    
)

fig.show()

In [34]:
#These lines of codes are used to make a linegraph for the annual mileage per fuelsort from 2006 til 2019
labels= ['Annual mileage Petrol (km)', 'Annual mileage Diesel (km)', 'Annual mileage LPG (km)']
fig = px.line(df_traffic, x='FuelPrice', y= labels)

#these lines of code are used to transform the layout of the graph
fig.update_layout(
    title="Annual mileage per fuelsort",
    xaxis_title="Fuelprice",
    yaxis_title="Annual mileage",
    legend_title="Fuel sort",
)


fig.show()

Here fuel prices are shown in a line graph from 2006 through 2019. The fluctuations in prices are easy to see.

In [35]:
#These lines of codes are used to make a linegraph for the fuel prices from 2006 til 2019
labels= ['Petrol', 'Diesel', 'LPG']
fig = px.line(df_fuel, x='Date', y= labels)

#these lines of code are used to transform the layout of the graph
fig.update_layout(
    title="Fuelprices",
    xaxis_title="Date",
    yaxis_title="Prices",
    legend_title="Fuel sort",
)


fig.show()

In [36]:
#This is a code to create a scatterplot for the different fueltypes

fig2 = px.scatter(df_traffic, x="FuelPrice", y="annual mileage in the Netherlands (km)", color="FuelType", facet_col="FuelType",
       category_orders={"FuelPrice": ["Petrol", "Diesel", "LPG"]})

fig2.update_layout(
    title="Influence of Fuelprices",
    xaxis_title="Fuel price",
    yaxis_title="number of kilometers",
    legend_title="Fuel sort",
)

fig2.show()