# <p style="text-align: center;"> Storytelling data visualization of  euro exchange rate</p>

## Introduction:

In this project we will use a dataset regarding **euro currency** and our objective will be to find the best way to represent the data information in order to be easy to understand the evolution of the exchange rate euro/US dolar just checking data visualization. 

So, before start this project, it is important we clarify some concepts:


- The euro (symbolized with €) is the official currency in most of the countries of the European Union.

- As all other currency in the world, euro also has **exchange rate** regarding other currency, i.e., the rate at which one currency will be exchanged for another currency.

- For example, if the exchange rate of the euro to the US dollar is 1.5, it means that we will get 1.5 US dollars per each 1.0 euro that we pay (one euro has more value than one US dollar at this exchange rate).



The dataset describes the euro daily exchange rates between 1999 and 2021, and its source is the European Central Bank. The data is frequently updated and we can be download [here](https://www.kaggle.com/lsind18/euro-exchange-daily-rates-19992020). We will use one version download on January 2021.  

Even we have information about much more exchange rates, our focus in the this project will be on the exchange rate between the euro and the American dollar (US dollar).


## First overview of dataset: 

Let us start to import the dataset and check the first and the last 5 five rows of dataset.

In [1]:
#First: import the libraries we will need for this project:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.style as style

In [2]:
#Second: import the csv file into a pandas DataFrame:
exchange_rates = pd.read_csv("euro-daily-hist_1999_2020.csv")

In [3]:
#Third: visualize some rows:
exchange_rates #as alternative we could print rows using df.head() and df.tail()

Unnamed: 0,Period\Unit:,[Australian dollar ],[Bulgarian lev ],[Brazilian real ],[Canadian dollar ],[Swiss franc ],[Chinese yuan renminbi ],[Cypriot pound ],[Czech koruna ],[Danish krone ],...,[Romanian leu ],[Russian rouble ],[Swedish krona ],[Singapore dollar ],[Slovenian tolar ],[Slovak koruna ],[Thai baht ],[Turkish lira ],[US dollar ],[South African rand ]
0,2021-01-08,1.5758,1.9558,6.5748,1.5543,1.0827,7.9184,,26.163,7.4369,...,4.8708,90.8000,10.0510,1.6228,,,36.8480,9.0146,1.2250,18.7212
1,2021-01-07,1.5836,1.9558,6.5172,1.5601,1.0833,7.9392,,26.147,7.4392,...,4.8712,91.2000,10.0575,1.6253,,,36.8590,8.9987,1.2276,18.7919
2,2021-01-06,1.5824,1.9558,6.5119,1.5640,1.0821,7.9653,,26.145,7.4393,...,4.8720,90.8175,10.0653,1.6246,,,36.9210,9.0554,1.2338,18.5123
3,2021-01-05,1.5927,1.9558,6.5517,1.5651,1.0803,7.9315,,26.227,7.4387,...,4.8721,91.6715,10.0570,1.6180,,,36.7760,9.0694,1.2271,18.4194
4,2021-01-04,1.5928,1.9558,6.3241,1.5621,1.0811,7.9484,,26.141,7.4379,...,4.8713,90.3420,10.0895,1.6198,,,36.7280,9.0579,1.2296,17.9214
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5694,1999-01-08,1.8406,,,1.7643,1.6138,,0.58187,34.938,7.4433,...,1.3143,27.2075,9.1650,1.9537,188.8400,42.560,42.5590,0.3718,1.1659,6.7855
5695,1999-01-07,1.8474,,,1.7602,1.6165,,0.58187,34.886,7.4431,...,1.3092,26.9876,9.1800,1.9436,188.8000,42.765,42.1678,0.3701,1.1632,6.8283
5696,1999-01-06,1.8820,,,1.7711,1.6116,,0.58200,34.850,7.4452,...,1.3168,27.4315,9.3050,1.9699,188.7000,42.778,42.6949,0.3722,1.1743,6.7307
5697,1999-01-05,1.8944,,,1.7965,1.6123,,0.58230,34.917,7.4495,...,1.3168,26.5876,9.4025,1.9655,188.7750,42.848,42.5048,0.3728,1.1790,6.7975


In [4]:
#Fourth: check some data details (type, null values, ...):
exchange_rates.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5699 entries, 0 to 5698
Data columns (total 41 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Period\Unit:              5699 non-null   object 
 1   [Australian dollar ]      5699 non-null   object 
 2   [Bulgarian lev ]          5297 non-null   object 
 3   [Brazilian real ]         5431 non-null   object 
 4   [Canadian dollar ]        5699 non-null   object 
 5   [Swiss franc ]            5699 non-null   object 
 6   [Chinese yuan renminbi ]  5431 non-null   object 
 7   [Cypriot pound ]          2346 non-null   object 
 8   [Czech koruna ]           5699 non-null   object 
 9   [Danish krone ]           5699 non-null   object 
 10  [Estonian kroon ]         3130 non-null   object 
 11  [UK pound sterling ]      5699 non-null   object 
 12  [Greek drachma ]          520 non-null    object 
 13  [Hong Kong dollar ]       5699 non-null   object 
 14  [Croatia

Checking this information we can takes following notes:
- Dataset has 5699 rows × 41 columns


- There are some currency with many null values. For example "[Greek drachma  ]", "[Maltese lira  ]", "[Cypriot pound  ]", "[Slovenian tolar  ]" and "[Slovak koruna   ]", which more than half of values are null -> this information should be important if we will work with this exchange rates.


- Although all columns are regarding numerical values, almost all columns are object type, except 3: "[Iceland krona  ]", "[Romanian leu  ]" and "[Turkish lira  ]". -> this should mean that we will need to convert the reaming data to numerical format too (float64), except the first column "Period\Unit:" which should be convert to datetime type.


- The columns names are not easy to work because square brackets, capital letters and spaces. -> To be easier we should simplified the column names at the least of the columns with what we will work.


## Data Cleaning:

### Rename columns:

As mentioned before, we will focus on US dolar, so we will rename only `[US dollar ]` and `Period\Unit:` columns:

In [5]:
exchange_rates.rename({"[US dollar ]":"US_dollar", r"Period\Unit:":"Time"}, axis=1, inplace=True) 
#In order to avoid `unicode error` in the 2nd column name, we need to use "r" before the string name
#as we have "\U..." it is interpreted as eight-character Unicode escape, if we don't put "r" or use double "\\" 

In [6]:
exchange_rates.columns # to visualize the changes

Index(['Time', '[Australian dollar ]', '[Bulgarian lev ]', '[Brazilian real ]',
       '[Canadian dollar ]', '[Swiss franc ]', '[Chinese yuan renminbi ]',
       '[Cypriot pound ]', '[Czech koruna ]', '[Danish krone ]',
       '[Estonian kroon ]', '[UK pound sterling ]', '[Greek drachma ]',
       '[Hong Kong dollar ]', '[Croatian kuna ]', '[Hungarian forint ]',
       '[Indonesian rupiah ]', '[Israeli shekel ]', '[Indian rupee ]',
       '[Iceland krona ]', '[Japanese yen ]', '[Korean won ]',
       '[Lithuanian litas ]', '[Latvian lats ]', '[Maltese lira ]',
       '[Mexican peso ]', '[Malaysian ringgit ]', '[Norwegian krone ]',
       '[New Zealand dollar ]', '[Philippine peso ]', '[Polish zloty ]',
       '[Romanian leu ]', '[Russian rouble ]', '[Swedish krona ]',
       '[Singapore dollar ]', '[Slovenian tolar ]', '[Slovak koruna ]',
       '[Thai baht ]', '[Turkish lira ]', 'US_dollar',
       '[South African rand ]'],
      dtype='object')

### Convert `Time` column in datetime type:

In [7]:
#format of information in this column is: 2021-01-08 
exchange_rates["Time"]=pd.to_datetime(exchange_rates["Time"], format="%Y-%m-%d")

In [8]:
exchange_rates["Time"].dtype #checking the changes

dtype('<M8[ns]')

### Sort values by `Time` in ascending order:

As our goal is to create plots which show the exchange rate evolution by time, it is important to do this step.

In [9]:
exchange_rates.sort_values("Time", ascending=True,inplace=True) # sort values by time
exchange_rates.reset_index(drop=True, inplace=True) # reset the index (and drop the initial index)

In [10]:
exchange_rates # visualize the changes

Unnamed: 0,Time,[Australian dollar ],[Bulgarian lev ],[Brazilian real ],[Canadian dollar ],[Swiss franc ],[Chinese yuan renminbi ],[Cypriot pound ],[Czech koruna ],[Danish krone ],...,[Romanian leu ],[Russian rouble ],[Swedish krona ],[Singapore dollar ],[Slovenian tolar ],[Slovak koruna ],[Thai baht ],[Turkish lira ],US_dollar,[South African rand ]
0,1999-01-04,1.9100,,,1.8004,1.6168,,0.58231,35.107,7.4501,...,1.3111,25.2875,9.4696,1.9554,189.0450,42.991,42.6799,0.3723,1.1789,6.9358
1,1999-01-05,1.8944,,,1.7965,1.6123,,0.58230,34.917,7.4495,...,1.3168,26.5876,9.4025,1.9655,188.7750,42.848,42.5048,0.3728,1.1790,6.7975
2,1999-01-06,1.8820,,,1.7711,1.6116,,0.58200,34.850,7.4452,...,1.3168,27.4315,9.3050,1.9699,188.7000,42.778,42.6949,0.3722,1.1743,6.7307
3,1999-01-07,1.8474,,,1.7602,1.6165,,0.58187,34.886,7.4431,...,1.3092,26.9876,9.1800,1.9436,188.8000,42.765,42.1678,0.3701,1.1632,6.8283
4,1999-01-08,1.8406,,,1.7643,1.6138,,0.58187,34.938,7.4433,...,1.3143,27.2075,9.1650,1.9537,188.8400,42.560,42.5590,0.3718,1.1659,6.7855
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5694,2021-01-04,1.5928,1.9558,6.3241,1.5621,1.0811,7.9484,,26.141,7.4379,...,4.8713,90.3420,10.0895,1.6198,,,36.7280,9.0579,1.2296,17.9214
5695,2021-01-05,1.5927,1.9558,6.5517,1.5651,1.0803,7.9315,,26.227,7.4387,...,4.8721,91.6715,10.0570,1.6180,,,36.7760,9.0694,1.2271,18.4194
5696,2021-01-06,1.5824,1.9558,6.5119,1.5640,1.0821,7.9653,,26.145,7.4393,...,4.8720,90.8175,10.0653,1.6246,,,36.9210,9.0554,1.2338,18.5123
5697,2021-01-07,1.5836,1.9558,6.5172,1.5601,1.0833,7.9392,,26.147,7.4392,...,4.8712,91.2000,10.0575,1.6253,,,36.8590,8.9987,1.2276,18.7919


### Isolate `Time` and `US_dollar` in new dataframe:

In [11]:
euro_to_dollar=exchange_rates[["Time", "US_dollar"]]

In [12]:
euro_to_dollar # checking the new Dataframe

Unnamed: 0,Time,US_dollar
0,1999-01-04,1.1789
1,1999-01-05,1.1790
2,1999-01-06,1.1743
3,1999-01-07,1.1632
4,1999-01-08,1.1659
...,...,...
5694,2021-01-04,1.2296
5695,2021-01-05,1.2271
5696,2021-01-06,1.2338
5697,2021-01-07,1.2276


In [13]:
euro_to_dollar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5699 entries, 0 to 5698
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Time       5699 non-null   datetime64[ns]
 1   US_dollar  5699 non-null   object        
dtypes: datetime64[ns](1), object(1)
memory usage: 89.2+ KB


Checking this information we can identify:
- there is not null values in the US_dollar column
- US_dollar is still with object type - we will need to convert this to a numerical type
- Before converting it in numerical type, let us inspect the unique values in the "US_dollar" column in order to check if we detect something strange or if we can proceed

In [14]:
euro_to_dollar["US_dollar"].value_counts()

-         62
1.2276     9
1.1215     8
1.1305     7
1.1797     6
          ..
1.4109     1
1.4594     1
0.8723     1
1.3022     1
1.4302     1
Name: US_dollar, Length: 3528, dtype: int64

Based on this information we see we have 62 times "-" in this column.
So, if we don't have information, it is not relevant information, for our project. So, we need to remove this rows.

### Drop rows without rate value in US_dollar column:

In [15]:
euro_to_dollar=euro_to_dollar[euro_to_dollar["US_dollar"]!="-"]

In [16]:
euro_to_dollar #checking our new dataset:

Unnamed: 0,Time,US_dollar
0,1999-01-04,1.1789
1,1999-01-05,1.1790
2,1999-01-06,1.1743
3,1999-01-07,1.1632
4,1999-01-08,1.1659
...,...,...
5694,2021-01-04,1.2296
5695,2021-01-05,1.2271
5696,2021-01-06,1.2338
5697,2021-01-07,1.2276


### Convert the`US_dollar` column to float type:

In [19]:
euro_to_dollar.loc[:,"US_dollar"]=euro_to_dollar.loc[:,"US_dollar"].astype(float)
euro_to_dollar.is_copy=None # we add this condition in order to delete the warning
euro_to_dollar.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5637 entries, 0 to 5698
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Time       5637 non-null   datetime64[ns]
 1   US_dollar  5637 non-null   float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 132.1 KB
