<img src="https://nyp-aicourse.s3.ap-southeast-1.amazonaws.com/agods/nyp_ago_logo.png" width='300'/>

Welcome to the lab! Before we get started here are a few pointers on using this notebooks.

1. The notebook is composed of cells; cells can contain code which you can run, or they can hold text and/or images which are there for you to read.

2. You can execute code cells by clicking the ```Run``` icon in the menu, or via the following keyboard shortcuts ```Shift-Enter``` (run and advance) or ```Ctrl-Enter``` (run and stay in the current cell).

3. To interrupt cell execution, click the ```Stop``` button on the toolbar or navigate to the ```Kernel``` menu, and select ```Interrupt ```.
    

# Long format vs Wide format

We often encountered data collected that is in either long format (long-form data) or wide format (wide-form data). 

<img src='images/long_form.png' />

**Long Format** 

The dataframe on the left has a long format. The ‘Series ID’ and ‘Item’ columns represent the food category. The ‘Year Month’ is a single column that has all the months from Jan. 2020 to Apr. 2022, and the ‘Avg. Price ($)’ has a value corresponding to each month in the ‘Year Month’ column.

Notice how the dataframe on the left is structured in a long format: each food category (‘Item’) has multiple repeating rows, each of which represents a specific year/month and the average food price corresponding to that year/month. Though we only have 5 food categories (‘items’), we have a total of 139 rows, making the dataframe a ‘long’ shape.

A long-form data table has the following characteristics:
- Each variable is a column
- Each observation is a row

**Wide format**

In contrast, The dataframe on the right-hand side has a wide format — more like a spreadsheet-style format. In this format, each row represents a unique food category. We pivot the ‘Year Month’ column in the left dataframe so that each month is in a separate column — making the right dataframe a ‘wide’ shape. The values of the ‘Year Month’ column in the left table now become the column names in the right table and we have the ‘avg. price’ for each Month/Year column accordingly.

The variables in this dataset are linked to the dimensions of the table, rather than to named fields. 

## Convert Long-form to Wide-form

We will read in a dataset that is collected in long format and learn to convert it to wide format.


In [None]:
import pandas as pd

df = pd.read_csv("datasets/long_data.csv")
df.head()

To reshape the dataframe from long to wide in Pandas, we can use Pandas’ `pd.pivot()` method:

`pd.pivot(df, index=, columns=, values=)`

`columns`: Column to use to make new frame’s columns (e.g., ‘Year Month’).

`values`: Column(s) to use for populating new frame’s values (e.g., ‘Avg. Price ($)).

`index`: Column to use to make new frame’s index (e.g., ‘Series ID’ and ‘Item’). If None, use the existing index.

In [None]:
df_wide = pd.pivot(df, index=['Series ID', 'Item'], columns='Year Month', values='Ave Price')
df_wide.head()

## Convert Wide-form to Long-form

To convert from wide-form to long-form, we can use pandas [`melt()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.melt.html) to unpivot a dataframe from wide to long:

`pd.melt(df, id_vars=, value_vars=, var_name=, value_name=, ignore_index=)`

`id_vars`: Column(s) to use as identifier variables

`value_vars`: Column(s) to unpivot. In our example, it would be the list of year/month columns (‘2020 Jan’, ‘2020 Feb’, ‘2020 Mar’, etc.)

`var_name`: Name to use for the ‘variable’ column

`value_name` : Name to use for the ‘value’ column

`ignore_index`: If ‘True’, original index is ignored. If ‘False’, the original index is retained

In [None]:
year_list = list(df_wide.columns)
print(year_list)

In [None]:
df_long = pd.melt(df_wide, value_vars=year_list, value_name='Ave Price $', ignore_index = False)
df_long.head(10)

## Exercise 

The dataset below is somewhere between long and wide format. Convert the data into completely long format.

In [None]:
data = pd.read_csv("datasets/faang.csv")
data.head()

In [None]:
##TODO:  Convert to long form