Let's load up our data and do a little cleanup. We have two-line headers, so our columns will be a MultiIndex. We have two date columns, so we need to parse them. And we have four columns that always contain the same value, so we can drop them. Once we've done that we can convert our column names to a single index, and our data will be easier to work with.

In [1]:
import pandas as pd

DATA = '/kaggle/input/philippines-food-security-and-nutrition-indicators/suite-of-food-security-indicators_phl.csv'
df = pd.read_csv(filepath_or_buffer=DATA, header=[0, 1], parse_dates=[('StartDate', '#date+start'), ('EndDate', '#date+end')])
df = df.drop(columns=[('Iso3', '#country+code'), ('Area Code', 'Unnamed: 3_level_1'), ('Area Code (M49)', 'Unnamed: 4_level_1'), ('Area', '#country+name')])
df.columns = df.columns.get_level_values(0)
df.head()

Unnamed: 0,StartDate,EndDate,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag,Note
0,2000-01-01,2002-12-31,21010,Average dietary energy supply adequacy (percen...,6121,Value,20002002,2002,%,109.0,E,
1,2001-01-01,2003-12-31,21010,Average dietary energy supply adequacy (percen...,6121,Value,20012003,2003,%,109.0,E,
2,2002-01-01,2004-12-31,21010,Average dietary energy supply adequacy (percen...,6121,Value,20022004,2004,%,110.0,E,
3,2003-01-01,2005-12-31,21010,Average dietary energy supply adequacy (percen...,6121,Value,20032005,2005,%,111.0,E,
4,2004-01-01,2006-12-31,21010,Average dietary energy supply adequacy (percen...,6121,Value,20042006,2006,%,112.0,E,


How many items do we have?

In [2]:
df['Item'].nunique()

49

It turns out we have a lot of different series of data here. We're going to need a lot of colors if we plot them all together on one graph.

Let's make an exploratory plot. Because we have so many variables and not many years of data we probably should use a line plot to see year over year changes. Also, we probably need a log plot because so many of our variables are so similar.

In [3]:
from plotly import colors
from plotly import express
from plotly.offline import init_notebook_mode

init_notebook_mode(connected=True)

express.line(data_frame=df, x='Year', y='Value', color='Item Code', color_discrete_sequence = colors.sample_colorscale('HSV', 50), log_y=True, height=900).show(renderer='iframe_connected', )

What do we see? We see that for most of our variables we have only partial series. And most of them change very little from year to year.

In [4]:
df[df['Item Code'] == '21031'].head()

Unnamed: 0,StartDate,EndDate,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag,Note
574,2000-01-01,2000-12-31,21031,Per capita food supply variability (kcal/cap/day),6128,Value,2000,2000,kcal/cap/d,40.0,E,
575,2001-01-01,2001-12-31,21031,Per capita food supply variability (kcal/cap/day),6128,Value,2001,2001,kcal/cap/d,40.0,E,
576,2002-01-01,2002-12-31,21031,Per capita food supply variability (kcal/cap/day),6128,Value,2002,2002,kcal/cap/d,29.0,E,
577,2003-01-01,2003-12-31,21031,Per capita food supply variability (kcal/cap/day),6128,Value,2003,2003,kcal/cap/d,25.0,E,
578,2004-01-01,2004-12-31,21031,Per capita food supply variability (kcal/cap/day),6128,Value,2004,2004,kcal/cap/d,17.0,E,


The only variable with much variability year to year is Per capita food supply variability, which is interesting.