### <span style="color:black"><b>Pandas Tutorial 11</b></span>

<ins>Tidy Data</ins>

Definitions of a tidy dataset can be subjective at times, though most people would agree on the fact that tidy data normally displays the following characteristics

* Each variable has its own column.
* Each observation is its own row.
* Each value must have its own cell

Pandas has a dataframe method called `df.melt()` that helps with this

Why bother?

---

Here is what [Hadley Wickam](http://hadley.nz/) (a major contributer to the R programming language)  has to say:
* There’s a general advantage to picking one consistent way of storing data. If you have a consistent data structure, it’s easier to learn the tools that work with it because they have an underlying uniformity
* Tidy datasets are easy to manipulate, model and visualise, and have a specific structure
* Tidy datasets provide a standardised way to link the structure of a dataset (its physical layout)
with its semantics (its meaning).

---

[Easily my favourite resource on tidy data](https://vita.had.co.nz/papers/tidy-data.pdf)



In [19]:
import pandas as pd

In [20]:
f_rates = pd.read_csv('fertility_rates.csv')
f_rates.head()

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Aruba,ABW,4.82,4.655,4.471,4.271,4.059,3.842,3.625,3.417,...,1.779,1.795,1.813,1.834,1.854,1.872,1.886,1.896,1.901,
1,Africa Eastern and Southern,AFE,6.723308,6.738651,6.752818,6.7654,6.775406,6.783357,6.789885,6.79604,...,4.956842,4.882058,4.804516,4.72622,4.647637,4.569675,4.493744,4.420264,4.349433,
2,Afghanistan,AFG,7.45,7.45,7.45,7.45,7.45,7.45,7.45,7.45,...,5.77,5.562,5.359,5.163,4.976,4.8,4.633,4.473,4.321,
3,Africa Western and Central,AFW,6.439002,6.455523,6.471399,6.487246,6.502619,6.51905,6.537615,6.560078,...,5.557872,5.503781,5.446144,5.384336,5.319473,5.251674,5.182391,5.113003,5.044144,
4,Angola,AGO,6.708,6.79,6.872,6.954,7.036,7.116,7.194,7.267,...,6.12,6.039,5.953,5.864,5.774,5.686,5.6,5.519,5.442,


In [22]:
f_rates.melt(id_vars=['Country Name', 'Country Code'], var_name='Year', value_name='Fertility Rates')

Unnamed: 0,Country Name,Country Code,Year,Fertility Rates
0,Aruba,ABW,1960,4.820000
1,Africa Eastern and Southern,AFE,1960,6.723308
2,Afghanistan,AFG,1960,7.450000
3,Africa Western and Central,AFW,1960,6.439002
4,Angola,AGO,1960,6.708000
...,...,...,...,...
16221,Kosovo,XKX,2020,
16222,"Yemen, Rep.",YEM,2020,
16223,South Africa,ZAF,2020,
16224,Zambia,ZMB,2020,


In [23]:
df = pd.read_csv('exchange_rates.csv')
df.head()

Unnamed: 0,Currency,20 Aug 2021,23 Aug 2021,24 Aug 2021,25 Aug 2021
0,United States dollar,0.7133,0.7161,0.7234,0.7245
1,Chinese renminbi,4.6375,4.6494,4.6869,4.6925
2,Japanese yen,78.24,78.74,79.45,79.48
3,European euro,0.6103,0.6112,0.6161,0.617
4,South Korean won,841.75,840.36,843.81,845.73


In [28]:
df_melted = df.melt(id_vars=['Currency'], var_name='Date', value_name='Units of foreign corrency per Australian Dollar')
df_melted

Unnamed: 0,Currency,Date,Units of foreign corrency per Australian Dollar
0,United States dollar,20 Aug 2021,0.7133
1,Chinese renminbi,20 Aug 2021,4.6375
2,Japanese yen,20 Aug 2021,78.2400
3,European euro,20 Aug 2021,0.6103
4,South Korean won,20 Aug 2021,841.7500
...,...,...,...
67,Vietnamese dong,25 Aug 2021,16526.0000
68,Hong Kong dollar,25 Aug 2021,5.6398
69,Papua New Guinea kina,25 Aug 2021,2.5421
70,Swiss franc,25 Aug 2021,0.6623


In [30]:
df_melted['Date'] = pd.to_datetime(df_melted['Date'])
df_melted.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 3 columns):
 #   Column                                           Non-Null Count  Dtype         
---  ------                                           --------------  -----         
 0   Currency                                         72 non-null     object        
 1   Date                                             72 non-null     datetime64[ns]
 2   Units of foreign corrency per Australian Dollar  72 non-null     float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 1.8+ KB


In [31]:
df_melted

Unnamed: 0,Currency,Date,Units of foreign corrency per Australian Dollar
0,United States dollar,2021-08-20,0.7133
1,Chinese renminbi,2021-08-20,4.6375
2,Japanese yen,2021-08-20,78.2400
3,European euro,2021-08-20,0.6103
4,South Korean won,2021-08-20,841.7500
...,...,...,...
67,Vietnamese dong,2021-08-25,16526.0000
68,Hong Kong dollar,2021-08-25,5.6398
69,Papua New Guinea kina,2021-08-25,2.5421
70,Swiss franc,2021-08-25,0.6623


In [26]:
df['Date'] = pd.to_datetime(df['Date'])
df

KeyError: 'Date'

In [None]:
'Units of foreign corrency per Australian Dollar'

<u>**Example 1**</u>

In [None]:
df1 = pd.read_csv("gdp_stats.csv")
df1

In [None]:
df1 = df1.melt(id_vars = ['Country Name'], var_name = 'Year', value_name = 'GDP Growth Percentage').copy()

In [None]:
df1

<u>**Example 2:**</u>

In [None]:
df2 = pd.read_csv("exchange_rates.csv")
df2

In [None]:
df2 = df2.melt(id_vars = 'Currency', var_name = 'Date', value_name = 'Units of Foreign Currency per Australian Dollar')

In [None]:
# Now we can do things like convert to a date
df2['Date'] = pd.to_datetime(df2.Date)
df2