# Reshaping DataFrames Using Pandas




## Outline
* Pivoting dataframes
* Melting dataframes





## Pivoting DataFrames

The pivot() function is used to reshaped a given DataFrame organized by given index / column values. The pivot method let us specify which columns to use as index and which one to use as columns in a new DataFrame.

The pivot() method takes three parameters:

- **index**: Which column should be used to identify and order your rows vertically
- **columns**: Which column should be used to create the new columns in our reshaped DataFrame. Each unique value in the column stated here will create a column in our new DataFrame.
- **values**: Which column(s) should be used to fill the values in the cells of our DataFrame.

In [1]:
import pandas as pd

df = pd.DataFrame({'year': ['one', 'one', 'one', 'two', 'two',
                         'two'],
                   'average': ['A', 'B', 'C', 'A', 'B', 'C'],
                    'student_name': ["John", "Alex", "Teresa", "Amber", "Joe", "Mary"],
                    'age': ['18', '19', '20', '21', '22', '23']})
df.head(6)

Unnamed: 0,year,average,student_name,age
0,one,A,John,18
1,one,B,Alex,19
2,one,C,Teresa,20
3,two,A,Amber,21
4,two,B,Joe,22
5,two,C,Mary,23


In [2]:
df.pivot(index = 'year', columns = 'average', values = 'student_name')

average,A,B,C
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,John,Alex,Teresa
two,Amber,Joe,Mary


In [3]:
df.pivot(index = 'year', columns = 'average', values = 'age')

average,A,B,C
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,18,19,20
two,21,22,23


# Melting dataframes


Pandas `dataframe.melt()` function unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

This function is useful to massage a DataFrame into a format where one or more columns are *identifier variables* (`id_vars`), while all other columns, considered *measured variables* (`value_vars`), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

Our data currently has a different column for each variable, or "wide format".  Use `DataFrame.melt()` to convert it to long form, where the resulting table will have a "variable" column containing the variable name, and a "value" column containing the value of that variable.

Pick the identifying columns for the `id_vars` argument, and include all variables you want unpivoted in `value_vars`.

In [4]:
 # Use melt() function to set column “A” as the identifier variable and column “B” as value variable.
    
df2 = pd.DataFrame({"Job Position":['Data Scientist', 'Professor', 'Business Analyst', 'Computer Engineer'],  
                   "Salary":[130, 110, 90, 100],  
                   "Start Date":[2017, 2019, 2020, 2019],  
                   "Performance Rating":[4, 3.5, 3.7, 4]}) 

df2 

Unnamed: 0,Job Position,Salary,Start Date,Performance Rating
0,Data Scientist,130,2017,4.0
1,Professor,110,2019,3.5
2,Business Analyst,90,2020,3.7
3,Computer Engineer,100,2019,4.0


In [5]:
df2.melt(id_vars = ["Job Position"], value_vars = ["Start Date"])

Unnamed: 0,Job Position,variable,value
0,Data Scientist,Start Date,2017
1,Professor,Start Date,2019
2,Business Analyst,Start Date,2020
3,Computer Engineer,Start Date,2019


`melt()` will also assume all columns are `value_vars` if they aren't included as `id_vars`. Use `var_name` and `value_name` to control the names of the output dataframe columns.

In [6]:
df2.head()

Unnamed: 0,Job Position,Salary,Start Date,Performance Rating
0,Data Scientist,130,2017,4.0
1,Professor,110,2019,3.5
2,Business Analyst,90,2020,3.7
3,Computer Engineer,100,2019,4.0


In [9]:
# function to unpivot the dataframe 
# also provide a customized name to the value and variable column 

df2_melt = df2.melt(id_vars = ["Job Position"], value_vars =["Salary", "Start Date"], var_name = "Variable Column",\
        value_name = "Value Column")

df2_melt

Unnamed: 0,Job Position,Variable Column,Value Column
0,Data Scientist,Start Date,2017
1,Professor,Start Date,2019
2,Business Analyst,Start Date,2020
3,Computer Engineer,Start Date,2019
4,Data Scientist,Salary,130
5,Professor,Salary,110
6,Business Analyst,Salary,90
7,Computer Engineer,Salary,100


Predictably, the resulting dataframe has a row count equal to the original number of rows multiplied by the number of features included as `value_vars`.

In [None]:
if len(df2_melt) == len(df2) * 2: # there are 3 feature variables that we are melting
    print('okay!')
else:
    print('ERROR')

## Summary

- We can reshape dataframe using the function pivot
- We can change a dataframe from wide to long format by melting dataframes.

# Pandas Stack

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html

In [13]:
>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
...                                     index=['cat', 'dog'],
...                                     columns=['weight', 'height'])

df_single_level_cols.head()

Unnamed: 0,weight,height
cat,0,1
dog,2,3


In [15]:
df_stacked = df_single_level_cols.stack()

#this is a series
df_stacked.head()

cat  weight    0
     height    1
dog  weight    2
     height    3
dtype: int64

In [18]:
>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('height', 'm')])
>>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

In [19]:
df_multi_level_cols2

Unnamed: 0_level_0,weight,height
Unnamed: 0_level_1,kg,m
cat,1.0,2.0
dog,3.0,4.0


In [20]:
df_multi_level_cols2.stack(0)

#This is a dataframe

Unnamed: 0,Unnamed: 1,kg,m
cat,height,,2.0
cat,weight,1.0,
dog,height,,4.0
dog,weight,3.0,


In [21]:
df_multi_level_cols2.stack([0, 1])

#this is a series

cat  height  m     2.0
     weight  kg    1.0
dog  height  m     4.0
     weight  kg    3.0
dtype: float64