# Dataset 2 creation

The goal is to create a dataset that contains the the market share of electric vehicles for each country. We will take the maximal value so far (in most cases, this coincides with the newest). We will consider the European countries from dataset 1 as well as U.S. and China. The dataset will be augmented with a host of characteristics that will later be used as independent variables.

In [124]:
import pandas as pd
import numpy as np

## Import useful data from dataset 1

In [125]:
df_eu = pd.read_csv('data/ev_share_europe.csv')
df_eu.head()

Unnamed: 0,time,country,elec_percent
0,2013,Austria,1.003395
1,2013,Belgium,1.276297
2,2013,Croatia,0.006443
3,2013,Cyprus,1.172058
4,2013,Denmark,0.0


In [126]:
df_max = df_eu.drop('time',axis=1).groupby('country').max()
df_max.columns = ['max_ev_p']
max_country = df_max.max_ev_p.idxmax()
print(max_country,df_max.loc[max_country].values)
df_max.head()

Norway [40.35865784]


Unnamed: 0_level_0,max_ev_p
country,Unnamed: 1_level_1
Austria,3.795771
Belgium,4.672967
Croatia,0.107844
Cyprus,3.723006
Denmark,0.0


In [127]:
# Sanity check
df_eu[df_eu.country=='Austria']

Unnamed: 0,time,country,elec_percent
0,2013,Austria,1.003395
28,2014,Austria,1.191123
56,2015,Austria,1.662784
84,2016,Austria,2.55358
112,2017,Austria,3.795771


## China 

### Alternative 1

* https://en.wikipedia.org/wiki/Electric_car_use_by_country (Passenger plug-in market share of total new car sales between 2013 and 2018 for selected countries and selected regional markets, value for 2018)
    * According to this source, the share was 4.2% in 2018
    * Refers to a blogpost: http://ev-sales.blogspot.com/2019/01/china-december-2018.html
    * Not very reliable!
    
### Alternative 2

* Total
    * https://www.statista.com/statistics/233743/vehicle-sales-in-china/
    * 23.71m passenger vehicles sold in China in 2018
* Electric
    * https://en.wikipedia.org/wiki/New_energy_vehicles_in_China
    * (It refers to https://www.d1ev.com/news/shuju/85937)
    * 1.256m NEV sold in China in 2018
    * **Warning:** According to the definition of the Chinese government, NEVs only include battery electric vehicles (BEV), plug-in hybrid electric vehicles (PHEV) and fuel cell electric vehicles (FCEV). But this number also includes busses, vans and commercial vehicles. This is different to the data on the European market, which only includes sales of passenger cars. Thus, the numbers will be an overestimation of electric passenger cars in China.
* Compute the percentage: 0.05297342893293969

### Conclusion

* Since NEVs (according to the Chinese government) include more vehicles than passenger cars, the number is probably upwards biased
* So, we take a weighted average

In [128]:
df_max.loc['China','max_ev_p'] = 4.2 * 0.6 + 5.3 * 0.4
df_max.loc['China','max_ev_p']

4.640000000000001

## United States

https://en.wikipedia.org/wiki/Electric_car_use_by_country
2.1%

http://www.ev-volumes.com/country/usa/
361000 PEVs sold in 2018

https://www.statista.com/statistics/199983/us-vehicle-sales-since-1951/

`17213500`

* Percentage: 0.02097

* The sources agree!

In [129]:
df_max.loc['United States','max_ev_p'] = 361000*100/17213500


## Some European countries

All from https://en.wikipedia.org/wiki/Electric_car_use_by_country

In [130]:
new_data = {
    'Norway': 49.1,
    'Iceland': 19,
    'Denmark':2
}
for k,v in new_data.items():
    df_max.loc[k,'max_ev_p'] = v

In [131]:
df_max = df_max[df_max.max_ev_p != 0]

In [132]:
df_max.to_csv('data/ev_max_p.csv')

# Country characteristics

In [133]:
df_2 = df_max.copy()

# Due to insufficient data
df_2 = df_2.drop('Liechtenstein',axis=0)

## Same-sex marriages

https://en.wikipedia.org/wiki/Same-sex_marriage

Text in the introductory paragraph

In [134]:
ssm = 'Austria,Belgium,Denmark,Finland,France,Germany,Iceland,Ireland,Luxembourg,Malta,Netherlands,Norway,Portugal,Spain, Sweden,United Kingdom,US'.split(',')
print(ssm)
df_2['ssm'] = np.zeros(len(df_2))
for country in df_2.index:
    if country in ssm:
        df_2.loc[country,'ssm'] = 1
    
df_2.head()

['Austria', 'Belgium', 'Denmark', 'Finland', 'France', 'Germany', 'Iceland', 'Ireland', 'Luxembourg', 'Malta', 'Netherlands', 'Norway', 'Portugal', 'Spain', ' Sweden', 'United Kingdom', 'US']


Unnamed: 0_level_0,max_ev_p,ssm
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Austria,3.795771,1.0
Belgium,4.672967,1.0
Croatia,0.107844,0.0
Cyprus,3.723006,0.0
Denmark,2.0,1.0


In [135]:
def add_column_from_csv(filename, column, df):
    data = pd.read_csv(filename, index_col='country').squeeze()
    df[column] = np.zeros(len(df_2))
    for i in data.index:
        if i in df.index:
            df.loc[i,column] = data[i]
    return data

## Gini index

In [136]:
add_column_from_csv('data/gini.csv','gini',df_2)
df_2.head()

Unnamed: 0_level_0,max_ev_p,ssm,gini
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Austria,3.795771,1.0,30.5
Belgium,4.672967,1.0,28.1
Croatia,0.107844,0.0,32.2
Cyprus,3.723006,0.0,35.6
Denmark,2.0,1.0,28.5


In [137]:
df_2[df_2.gini == 0]

Unnamed: 0_level_0,max_ev_p,ssm,gini
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1


## GDP (PPP) per capita

In [138]:
add_column_from_csv('data/ppp_per_capita.csv','ppp',df_2)
df_2.head()

Unnamed: 0_level_0,max_ev_p,ssm,gini,ppp
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Austria,3.795771,1.0,30.5,52558.0
Belgium,4.672967,1.0,28.1,47561.0
Croatia,0.107844,0.0,32.2,25264.0
Cyprus,3.723006,0.0,35.6,34504.0
Denmark,2.0,1.0,28.5,50541.0


In [139]:
df_2[df_2.ppp == 0]

Unnamed: 0_level_0,max_ev_p,ssm,gini,ppp
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1


## Social Progress Index

In [140]:
spi_data = add_column_from_csv('data/social_progress_index.csv','spi',df_2)
df_2.head()

Unnamed: 0_level_0,max_ev_p,ssm,gini,ppp,spi
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Austria,3.795771,1.0,30.5,52558.0,86.76
Belgium,4.672967,1.0,28.1,47561.0,87.39
Croatia,0.107844,0.0,32.2,25264.0,79.6
Cyprus,3.723006,0.0,35.6,34504.0,82.85
Denmark,2.0,1.0,28.5,50541.0,89.96


In [141]:
df_2[df_2.spi == 0]

Unnamed: 0_level_0,max_ev_p,ssm,gini,ppp,spi
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Malta,1.746633,1.0,29.4,39534.0,0.0


In [142]:
df_2.loc['Malta','spi'] = spi_data.Italy
df_2.loc['Malta','spi']

86.04

## Representation of nationalist parties in parliament

In [143]:
nationalist = pd.read_csv('data/nationalist_party_europe.csv')
nationalist_tot = nationalist.groupby('country').sum()

In [144]:
nationalist_tot.head()

Unnamed: 0_level_0,vote_percent
country,Unnamed: 1_level_1
Albania,0.28
Armenia,6.6
Austria,26.0
Belgium,24.0
Bulgaria,9.24


In [145]:
df_2['nat_p'] = np.zeros(len(df_2))
for i in df_2.index:
    if i in nationalist_tot.index:
        df_2.loc[i,'nat_p'] = nationalist_tot.loc[i,'vote_percent']

df_2.head()

Unnamed: 0_level_0,max_ev_p,ssm,gini,ppp,spi,nat_p
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Austria,3.795771,1.0,30.5,52558.0,86.76,26.0
Belgium,4.672967,1.0,28.1,47561.0,87.39,24.0
Croatia,0.107844,0.0,32.2,25264.0,79.6,5.8
Cyprus,3.723006,0.0,35.6,34504.0,82.85,3.71
Denmark,2.0,1.0,28.5,50541.0,89.96,21.1


In [146]:
# Since China only has one party
df_2.loc['China','nat_p'] = 50

# Popular vote for Republicans in 2016 election
df_2.loc['United States','nat_p'] = 46.1

In [147]:
df_2.loc[['China','United States'],:]

Unnamed: 0_level_0,max_ev_p,ssm,gini,ppp,spi,nat_p
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
China,4.64,0.0,38.6,16807.0,64.57,50.0
United States,2.097191,0.0,41.5,59532.0,84.78,46.1


## Write to file

In [164]:
cols = list(df_2.columns.values)
new_cols = cols[1:] + [cols[0]]
df_2 = df_2[new_cols]

In [165]:
df_2.to_csv('data/dataset2.csv')