# 2. Put on market (POM) data

This dataset includes the amount of electronic/electrical items put on the market (POM). POM data was extrapolated for the period 1980-1995, empirical data for the period 1996-2017, and forecasted from 2017 onwards (Source: https://github.com/Statistics-Netherlands/ewaste and Kees Balde @ United Nations University)

Relevant variables include:

1. **Country**: three-letter code of each country
2. **UNU_Key**: keys referenced to electronic items description as defined by the United Nations University (UNU)
3. **Year**: year for the POM entry
4. **POM_t**: amount of a given electronic/electrical item put on the market, in tonnes 
5. **POM_pieces**: amount of a given electronic/electrical item put on the market, in pieces count
6. **Inhabitants**: total population of a given country in a given year
7. **kpi**: amount of a given electronic/electrical item put on the market per inhabitant, in kg per inhabitant
8. **ppi**: amount of a given electronic/electrical item put on the market per inhabitant, in pieces per inhabitant


A new data frame is created from the raw data. The relevant variables listed above are analyzed and cleaned when necessary. The columns are renamed to avoid conlficts with other data frames. The final data frame is finally saved into a csv file.

### Data loading and cleaning

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Import raw data

pom = pd.read_csv("../data/raw/4_tbl_POM.csv")

In [3]:
# Transform column names into lower case for easier handling

pom.columns = map(str.lower, pom.columns)

In [4]:
# Rename kpi and ppi columns to avoid conflict with data for WEEE 
# (Waste Electrical & Electronic Equipment)

pom = pom.rename(columns={"unu_key": "eee_key", "kpi": "pom_kpi", "ppi": "pom_ppi"})

In [5]:
# Drop useless columns

pom = pom.drop(["stratum", "flag"], axis=1)

In [6]:
pom.head()

Unnamed: 0,country,eee_key,year,pom_t,pom_pieces,inhabitants,pom_kpi,pom_ppi
0,AUT,1,1980,2761.339647,89508.57849,7540000.0,0.366225,0.011871
1,AUT,1,1981,2898.970646,93969.87508,7556000.0,0.383665,0.012436
2,AUT,1,1982,3034.351975,98358.2488,7565000.0,0.401104,0.013002
3,AUT,1,1983,3157.072369,102336.2194,7543000.0,0.418543,0.013567
4,AUT,1,1984,3289.053034,106614.3609,7544000.0,0.435983,0.014132


In [7]:
pom.dtypes

country         object
eee_key          int64
year             int64
pom_t          float64
pom_pieces     float64
inhabitants    float64
pom_kpi        float64
pom_ppi        float64
dtype: object

In [8]:
# Transform unu_key and year to string object

pom["eee_key"] = pom["eee_key"].astype(str)
pom["year"] = pom["year"].astype(str)

In [9]:
pom.dtypes

country         object
eee_key         object
year            object
pom_t          float64
pom_pieces     float64
inhabitants    float64
pom_kpi        float64
pom_ppi        float64
dtype: object

In [10]:
# Check for missing values

pom.isnull().sum()

country          0
eee_key          0
year             0
pom_t          398
pom_pieces     398
inhabitants      0
pom_kpi        398
pom_ppi        398
dtype: int64

In [11]:
# there are 398 missing values in pom_t, pom_pieces, kpi and ppi

In [12]:
pom[pom["pom_ppi"].isnull()]["eee_key"].value_counts()

2    398
Name: eee_key, dtype: int64

In [13]:
# checking the rows with missing values, we can see that the values 
# are missing for "unu_key" = 2 in the 4 columns

In [14]:
pom["eee_key"].nunique()

54

In [15]:
# There are 54 UNU_keys

In [16]:
pom["country"].nunique()

28

In [17]:
# There are 28 countries, which are the countries of the EU

In [18]:
# Verify the data for each country

pd.Series(pom["country"]).value_counts()

FRA    2268
EST    2268
CZE    2268
BEL    2268
ITA    2268
BGR    2268
SVN    2268
HRV    2268
ROU    2268
LTU    2268
FIN    2268
GRC    2268
MLT    2268
CYP    2268
SVK    2268
DEU    2268
IRL    2268
ESP    2268
GBR    2268
PRT    2268
DNK    2268
HUN    2268
LUX    2268
NLD    2268
SWE    2268
LVA    2268
AUT    2268
POL    2268
Name: country, dtype: int64

In [22]:
pom

Unnamed: 0,country,eee_key,year,pom_t,pom_pieces,inhabitants,pom_kpi,pom_ppi
0,AUT,1,1980,2761.339647,89508.578490,7540000.0,0.366225,0.011871
1,AUT,1,1981,2898.970646,93969.875080,7556000.0,0.383665,0.012436
2,AUT,1,1982,3034.351975,98358.248800,7565000.0,0.401104,0.013002
3,AUT,1,1983,3157.072369,102336.219400,7543000.0,0.418543,0.013567
4,AUT,1,1984,3289.053034,106614.360900,7544000.0,0.435983,0.014132
...,...,...,...,...,...,...,...,...
63499,SWE,1002,2017,62.595063,678.757999,10177000.0,0.006151,0.000067
63500,SWE,1002,2018,65.937837,715.005818,10297000.0,0.006404,0.000069
63501,SWE,1002,2019,69.284121,751.291707,10404000.0,0.006659,0.000072
63502,SWE,1002,2020,72.596441,787.209292,10510000.0,0.006907,0.000075


In [20]:
# Save the data frame to a csv file

pom.to_csv("../Data/clean_data/2_put_on_market.csv", index=False)

In [21]:
pom.dtypes

country         object
eee_key         object
year            object
pom_t          float64
pom_pieces     float64
inhabitants    float64
pom_kpi        float64
pom_ppi        float64
dtype: object