# Rearranging and reshaping data
Here, you will learn how to reshape your DataFrames using techniques such as pivoting, melting, stacking, and unstacking. These are powerful techniques that allow you to tidy and rearrange your data into the format that allows you to most easily analyze it for insights.

# 1. Pivoting DataFrames
## 1.1 Pivoting a single variable
Suppose you started a blog for a band, and you would like to log how many visitors you have had, and how many signed-up for your newsletter. To help design the tours later, you track where the visitors are. A DataFrame called `users` consisting of this information has been pre-loaded for you.

Inspect `users` in the IPython Shell and make a note of which variable you want to use to index the rows (`'weekday'`), which variable you want to use to index the columns (`'city'`), and which variable will populate the values in the cells (`'visitors'`). Try to visualize what the result should be.

For example, in the video, Dhavide used `'treatment'` to index the rows, `'gender'` to index the columns, and `'response'` to populate the cells. Prior to pivoting, the DataFrame looked like this:
```
   id treatment gender  response
0   1         A      F         5
1   2         A      M         3
2   3         B      F         8
3   4         B      M         9
```
After pivoting:
```
gender     F  M
treatment      
A          5  3
B          8  9
```
In this exercise, your job is to pivot `users` so that the focus is on `'visitors'`, with the columns indexed by `'city'` and the rows indexed by `'weekday'`.

In [1]:
import pandas as pd
users = pd.read_csv('_datasets/users.csv', index_col=0)
users

Unnamed: 0,weekday,city,visitors,signups
0,Sun,Austin,139,7
1,Sun,Dallas,237,12
2,Mon,Austin,326,3
3,Mon,Dallas,456,5


###### Instructions:
* Pivot the `users` DataFrame with the rows indexed by `'weekday'`, the columns indexed by `'city'`, and the values populated with `'visitors'`.
* Print the pivoted DataFrame.

In [2]:
# Pivot the users DataFrame: visitors_pivot
visitors_pivot = users.pivot(index='weekday', columns='city', values='visitors')

# Print the pivoted DataFrame
visitors_pivot

city,Austin,Dallas
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,326,456
Sun,139,237


Notice how in the pivoted DataFrame, the index is labeled `'weekday'`, the columns are labeled `'city'`, and the values are populated by the number of visitors.

## 1.2 Pivoting all variables
If you do not select any particular variables, all of them will be pivoted. In this case - with the `users` DataFrame - both `'visitors'` and `'signups'` will be pivoted, creating hierarchical column labels.

You will explore this for yourself now in this exercise.

###### Instructions:
* Pivot the `users` DataFrame with the `'signups'` indexed by `'weekday'` in the rows and `'city'` in the columns.
* Print the new DataFrame. This has been done for you.
* Pivot the `users` DataFrame with both `'signups'` and `'visitors'` pivoted - that is, all the variables. This will happen automatically if you do not specify an argument for the `values` parameter of `.pivot()`.
* Print the pivoted DataFrame. This has been done for you.

In [3]:
# Pivot users with signups indexed by weekday and city: signups_pivot
signups_pivot = users.pivot(index='weekday', columns='city', values='signups')

# Print signups_pivot
signups_pivot

city,Austin,Dallas
weekday,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,3,5
Sun,7,12


In [4]:
# Pivot users pivoted by both signups and visitors: pivot
pivot = users.pivot(index='weekday', columns='city')

# Print the pivoted DataFrame
pivot

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


Notice how in the second DataFrame, both `'signups'` and `'visitors'` were pivoted by default since you didn't provide an argument for the `values` parameter.

# 2. Stacking & unstacking DataFrames
## 2.1 Stacking & unstacking I
You are now going to practice stacking and unstacking DataFrames. The `users` DataFrame you have been working with in this chapter has been pre-loaded for you, this time with a MultiIndex. Explore it in the IPython Shell to see the data layout. Pay attention to the index, and notice that the index levels are `['city', 'weekday']`. So `'weekday'` - the second entry - has position 1. This position is what corresponds to the `level` parameter in `.stack()` and `.unstack()` calls. Alternatively, you can specify `'weekday'` as the level instead of its position.

Your job in this exercise is to unstack `users` by `'weekday'`. You will then use `.stack()` on the unstacked DataFrame to see if you get back the original layout of `users`.

In [5]:
users = pd.read_csv('_datasets/users.csv').drop('Unnamed: 0', 1).set_index(['city', 'weekday']).sort_index()
users

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


###### Instructions:
* Define a DataFrame `byweekday` with the `'weekday'` level of `users` unstacked.
* Print the `byweekday` DataFrame to see the new data layout. This has been done for you.
* Stack `byweekday` by `'weekday'` and print it to check if you get the same layout as the original `users` DataFrame.

In [6]:
# Unstack users by 'weekday': byweekday
byweekday = users.unstack(level='weekday')

# Print the byweekday DataFrame
byweekday

Unnamed: 0_level_0,visitors,visitors,signups,signups
weekday,Mon,Sun,Mon,Sun
city,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Austin,326,139,3,7
Dallas,456,237,5,12


In [7]:
# Stack byweekday by 'weekday' and print it
byweekday.stack(level='weekday')

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


By stacking and then unstacking `users`, you ended up with the same layout as the original DataFrame.

## 2.2 Stacking & unstacking II
You are now going to continue working with the `users` DataFrame. As always, first explore it in the IPython Shell to see the layout and note the index.

Your job in this exercise is to unstack and then stack the `'city'` level, as you did previously for `'weekday'`. Note that you won't get the same DataFrame.

###### Instructions:
* Define a DataFrame `bycity` with the `'city'` level of `users` unstacked.
* Print the `bycity` DataFrame to see the new data layout. This has been done for you.
* Stack `bycity` by `'city'` and print it to check if you get the same layout as the original `users` DataFrame.

In [8]:
users = pd.read_csv('_datasets/users.csv').set_index(['city', 'weekday']).sort_index()
users = users.drop('Unnamed: 0', 1)
users

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


In [9]:
# Unstack users by 'city': bycity
bycity = users.unstack(level='city')

# Print the bycity DataFrame
bycity

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [10]:
# Stack bycity by 'city' and print it
bycity.stack(level='city')

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
weekday,city,Unnamed: 2_level_1,Unnamed: 3_level_1
Mon,Austin,326,3
Mon,Dallas,456,5
Sun,Austin,139,7
Sun,Dallas,237,12


## 2.3 Restoring the index order
Continuing from the previous exercise, you will now use `.swaplevel(0, 1)` to flip the index levels. Note they won't be sorted. To sort them, you will have to follow up with a `.sort_index()`. You will then obtain the original DataFrame. Note that an unsorted index leads to slicing failures.

To begin, print both `users` and `bycity` in the IPython Shell. The goal here is to convert `bycity` back to something that looks like `users`.

###### Instructions:
* Define a DataFrame `newusers` with the `'city'` level stacked back into the index of `bycity`.
* Swap the levels of the index of `newusers`.
* Print `newusers` and verify that the index is not sorted. This has been done for you.
* Sort the index of `newusers`.
* Print `newusers` and verify that the index is now sorted. This has been done for you.
* Assert that `newusers` equals `users`. This has been done for you, so hit 'Submit Answer' to see the result.

In [11]:
bycity

Unnamed: 0_level_0,visitors,visitors,signups,signups
city,Austin,Dallas,Austin,Dallas
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Mon,326,456,3,5
Sun,139,237,7,12


In [12]:
# Stack 'city' back into the index of bycity: newusers
newusers = bycity.stack(level='city')

# Swap the levels of the index of newusers: newusers
newusers = newusers.swaplevel(0,1)

# Print newusers and verify that the index is not sorted
newusers

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Dallas,Mon,456,5
Austin,Sun,139,7
Dallas,Sun,237,12


In [13]:
# Sort the index of newusers: newusers
newusers = newusers.sort_index()

# Print newusers and verify that the index is now sorted
newusers

Unnamed: 0_level_0,Unnamed: 1_level_0,visitors,signups
city,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
Austin,Mon,326,3
Austin,Sun,139,7
Dallas,Mon,456,5
Dallas,Sun,237,12


In [14]:
# Verify that the new DataFrame is equal to the original
newusers.equals(users)

True