---
<center><h1>Lesson 2 - Basic intro into pandas</h1></center> 
---
---
<center><h2>Part 6. Work with pandas DataFrames: reshaping and pivot tables</h2></center>
---

## Table of Contents

- [Work with pandas DataFrames: reshaping and pivot tables](#Work-with-pandas-DataFrames:-reshaping-and-pivot-tables)
    * [Pivot table](#Pivot-table)
    * [Reshaping](#Reshaping)
    - [*Exercise 6.1*](#Exercise-6.1)

In [1]:
import pandas as pd
import numpy as np
import random

## Work with pandas DataFrames: reshaping and pivot tables

[[back to top]](#Table-of-Contents)

Current part post will be devoted to a well-known property of the work with tables, which is familiar for MS Excel users – pivot table. We will also consider the possibility to change/transform the shape of pandas DataFrame (like matrix transposing) – reshaping.

Let’s create two smaller DataFrames selecting them from movies with the aim of good visual demonstration of the learning material

In [2]:
movies = pd.read_csv('data/movies.csv', encoding="ISO-8859-1")

In [3]:
x = movies[['user_id', 'movie_id', 'rating', 'timestamp']]
short_df = pd.DataFrame([x.loc[0],x.loc[0], x.loc[13], x.loc[43],x.loc[43],x.loc[55],x.loc[90],x.loc[245]])
short_df

Unnamed: 0,user_id,movie_id,rating,timestamp
0,196,242,3,881250949
0,196,242,3,881250949
13,18,242,5,880129305
43,9,242,4,886958715
43,9,242,4,886958715
55,417,242,3,879645999
90,680,242,4,876815942
245,591,393,4,891031644


In [4]:
y = movies[['user_id', 'movie_id', 'rating', 'timestamp','gender','occupation']]
longer_df = pd.DataFrame([y.loc[2], y.loc[156],y.loc[765],y.loc[1234],y.loc[2432],y.loc[3765],y.loc[5324],y.loc[6332],y.loc[8676]])
longer_df

Unnamed: 0,user_id,movie_id,rating,timestamp,gender,occupation
2,6,242,4,883268170,M,executive
156,43,393,4,883956417,F,librarian
765,712,67,3,874957086,F,
1234,836,663,5,885754266,M,artist
2432,653,94,2,880153494,M,executive
3765,459,108,1,879563796,M,student
5324,548,13,1,891415677,M,writer
6332,871,269,3,888192970,M,executive
8676,342,591,3,875318629,F,other


### Pivot table

[[back to top]](#Table-of-Contents)

Pivot table is a many functional data summarization tool, which can automatically sort, count total, give the average of the data stored in one table, etc. Creating neat, informative summaries out of huge lists of raw data is a common challenge. Despite pandas gives us all the tools we need to create such summaries into a new DataFrame, but it may be extremely tedious. Even worse, this approach isn’t very flexible. Suppose, we have created the perfect summary that compares, say, age for different occupations, and if we want to compare age across different rates other criteria, we will need to start from scratch and build a whole new report.
Fortunately, pandas has a feature called pivot tables (as MS Excel, particularly, but as we said earlier pandas is an order of magnitude more functional and flexible) that can solve all these problems. Pivot tables quickly summarize long lists of data. By using a pivot table, you can calculate summary information without writing a single formula or copying a single cell. 

The pivot function of pandas is used to create a new derived table out of a given one. Pivot takes 3 required arguments with the following names: `index`, `columns`, and `values`. As a value for each of these parameters you need to specify a column name in the original table. Then the pivot function will create a new table, whose row and column indices are the unique values of the respective parameters. The cell values of the new table are taken from column given as the values parameter.

The following picture visualize the process of pivot table forming for the DataFrame `df`

||city|state|precipitation|wind_speed
|----|----|----|----|----|
|**0**|cityA|stateA|0|1.0|
|**1**|cityB|stateA|12|4.5|
|**2**|cityC|stateB|1|0.8|
|**3**|cityD|stateD|10|2.5|

So, the command 

    pd.pivot_table(df, index='state', columns='city', values='wind_speed')

will works like 

<img src="images/pivot_table1.jpg">

But the previous example does not fully demonstrate the role of pivot table, because each pair of state and city contains only one value in the `"wind_speed"` column. Pivot table is very nice tool when the table contains many items for some identificators and you need to collect or aggregate this data in something way. Let's consider other DataFrame `df2`

||city|state|pressure|precipitation|wind_speed
|----|----|----|----|----|----|
|0|cityA|stateA|1012|0|1.0|
|1|cityA|stateA|1024|0|1.2|
|13|cityB|stateA|NaN|12|4.5|
|14|cityB|stateA|995|11|3.8|
|20|cityC|stateB|1024|1|0.8|

Suppose, we write 

    pd.pivot_table(df2, index='state', columns='city', values='wind_speed')

So, what will happened here? When two or more values of any columns correspond to the same pair of column and index positions in the pivot table, by default pandas calculate the average value of all these values and put it into the cell for corresponding column and index pair, because it’s not clear which one of the all values pandas should select. 

<img src="images/pivot_table2.jpg">

Let’s give an example. Assume we need to collect all data about `rating` for each `user_id` and each `movie_id` . It’s a good way to transform DataFrame (and create a new DataFrame in result) such that `user_id` became its indexes, `movie_id` became the columns and cell values correspond to the `rating`. Exactly this can be reached with the help of pivot table

In [5]:
pd.pivot_table(longer_df, index='user_id', columns='movie_id', values='rating')

movie_id,13,67,94,108,242,269,393,591,663
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
6,,,,,4.0,,,,
43,,,,,,,4.0,,
342,,,,,,,,3.0,
459,,,,1.0,,,,,
548,1.0,,,,,,,,
653,,,2.0,,,,,,
712,,3.0,,,,,,,
836,,,,,,,,,5.0
871,,,,,,3.0,,,


Of course, you may swap the columns and rows indexes.

In [6]:
pd.pivot_table(longer_df, index='movie_id', columns='user_id', values='rating')

user_id,6,43,342,459,548,653,712,836,871
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
13,,,,,1.0,,,,
67,,,,,,,3.0,,
94,,,,,,2.0,,,
108,,,,1.0,,,,,
242,4.0,,,,,,,,
269,,,,,,,,,3.0
393,,4.0,,,,,,,
591,,,3.0,,,,,,
663,,,,,,,,5.0,


Thus, the last result is like transpose table.

But the previous example does not fully demonstrate the role of pivot table, because each pair contains only one value in the column. Pivot table is very nice tool when the table contains many items for some identificators and you need to collect or aggregate this data in something way. Let’s see an example for `short_df` DataFrame

In [7]:
pd.pivot_table(short_df, index='movie_id', columns='user_id', values='rating')

user_id,9,18,196,417,591,680
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
242,4.0,5.0,3.0,3.0,,4.0
393,,,,,4.0,


So, what’s happened here? When two or more values of any columns correspond to the same pair of column and index positions in the pivot table, by default pandas calculate the average value of all these values and put it into the cell for corresponding column and index pair, because it’s not clear which one of the all values pandas should select. 

In [8]:
pd.pivot_table(short_df, index='movie_id', columns='user_id', values='rating', aggfunc='sum')

user_id,9,18,196,417,591,680
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
242,8.0,5.0,6.0,3.0,,4.0
393,,,,,4.0,


It’s possible to apply more than one function at once

In [9]:
pd.pivot_table(short_df, index='movie_id', columns='user_id', values='rating', aggfunc=[np.sum, len])

Unnamed: 0_level_0,sum,sum,sum,sum,sum,sum,len,len,len,len,len,len
user_id,9,18,196,417,591,680,9,18,196,417,591,680
movie_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
242,8.0,5.0,6.0,3.0,,4.0,2.0,1.0,2.0,1.0,,1.0
393,,,,,4.0,,,,,,1.0,


or your own function

In [10]:
import math
pd.pivot_table(short_df, index='movie_id', columns='user_id', values='rating',  aggfunc=lambda x: math.sqrt(sum(x**2)/(len(x))))

user_id,9,18,196,417,591,680
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
242,4.0,5.0,3.0,3.0,,4.0
393,,,,,4.0,


If it’s necessary you may take many fields as values argument

In [11]:
pd.pivot_table(short_df, index='movie_id', columns='user_id', values=['rating', 'timestamp'])

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp
user_id,9,18,196,417,591,680,9,18,196,417,591,680
movie_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
242,4.0,5.0,3.0,3.0,,4.0,886958715.0,880129305.0,881250949.0,879645999.0,,876815942.0
393,,,,,4.0,,,,,,891031644.0,


Similarly you may get pivot table with many columns for indexing in the pivot table and many pivoting columns

In [12]:
res = pd.pivot_table(longer_df, index='user_id', columns=['movie_id', 'gender'], values='rating')
res

movie_id,13,67,94,108,242,269,393,591,663
gender,M,F,M,M,M,M,F,F,M
user_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
6,,,,,4.0,,,,
43,,,,,,,4.0,,
342,,,,,,,,3.0,
459,,,,1.0,,,,,
548,1.0,,,,,,,,
653,,,2.0,,,,,,
712,,3.0,,,,,,,
836,,,,,,,,,5.0
871,,,,,,3.0,,,


The output result can be presented in the other form (sometimes more informative) by using of `to_string` method of pivot function

In [13]:
print(res.to_string(na_rep=''))

movie_id  13   67   94   108  242  269  393  591  663
gender      M    F    M    M    M    M    F    F    M
user_id                                              
6                             4.0                    
43                                      4.0          
342                                          3.0     
459                      1.0                         
548       1.0                                        
653                 2.0                              
712            3.0                                   
836                                               5.0
871                                3.0               


In [14]:
pd.pivot_table(longer_df, index=['user_id', 'occupation'], columns='movie_id')

Unnamed: 0_level_0,Unnamed: 1_level_0,rating,rating,rating,rating,rating,rating,rating,rating,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp
Unnamed: 0_level_1,movie_id,13,94,108,242,269,393,591,663,13,94,108,242,269,393,591,663
user_id,occupation,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
6,executive,,,,4.0,,,,,,,,883268170.0,,,,
43,librarian,,,,,,4.0,,,,,,,,883956417.0,,
342,other,,,,,,,3.0,,,,,,,,875318629.0,
459,student,,,1.0,,,,,,,,879563796.0,,,,,
548,writer,1.0,,,,,,,,891415677.0,,,,,,,
653,executive,,2.0,,,,,,,,880153494.0,,,,,,
836,artist,,,,,,,,5.0,,,,,,,,885754266.0
871,executive,,,,,3.0,,,,,,,,888192970.0,,,


Let’s note, when we don’t define the values argument of pivot function as it was made above, pandas try to apply pivot’s `argfunc` to each remaining column

In [15]:
pd.pivot_table(longer_df, index='user_id', columns='movie_id')

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp
movie_id,13,67,94,108,242,269,393,591,663,13,67,94,108,242,269,393,591,663
user_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2
6,,,,,4.0,,,,,,,,,883268170.0,,,,
43,,,,,,,4.0,,,,,,,,,883956417.0,,
342,,,,,,,,3.0,,,,,,,,,875318629.0,
459,,,,1.0,,,,,,,,,879563796.0,,,,,
548,1.0,,,,,,,,,891415677.0,,,,,,,,
653,,,2.0,,,,,,,,,880153494.0,,,,,,
712,,3.0,,,,,,,,,874957086.0,,,,,,,
836,,,,,,,,,5.0,,,,,,,,,885754266.0
871,,,,,,3.0,,,,,,,,,888192970.0,,,


pandas allows to replace `NaN` values of pivot table with the help of `fill_value` argument

In [16]:
pd.pivot_table(longer_df, index='user_id', columns='movie_id', fill_value=0)

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp
movie_id,13,67,94,108,242,269,393,591,663,13,67,94,108,242,269,393,591,663
user_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2
6,0,0,0,0,4,0,0,0,0,0,0,0,0,883268170,0,0,0,0
43,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,883956417,0,0
342,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,875318629,0
459,0,0,0,1,0,0,0,0,0,0,0,0,879563796,0,0,0,0,0
548,1,0,0,0,0,0,0,0,0,891415677,0,0,0,0,0,0,0,0
653,0,0,2,0,0,0,0,0,0,0,0,880153494,0,0,0,0,0,0
712,0,3,0,0,0,0,0,0,0,0,874957086,0,0,0,0,0,0,0
836,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,885754266
871,0,0,0,0,0,3,0,0,0,0,0,0,0,0,888192970,0,0,0


You may also apply the argfunc not only to each index-column pair but also to each row and column of creating pivot table. The argument `margins=True` allows to do this

In [17]:
pd.pivot_table(longer_df, index='user_id', columns='movie_id',  margins=True)

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp,timestamp
movie_id,13,67,94,108,242,269,393,591,663,All,13,67,94,108,242,269,393,591,663,All
user_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2
6,,,,,4.0,,,,,4.0,,,,,883268170.0,,,,,883268200.0
43,,,,,,,4.0,,,4.0,,,,,,,883956417.0,,,883956400.0
342,,,,,,,,3.0,,3.0,,,,,,,,875318629.0,,875318600.0
459,,,,1.0,,,,,,1.0,,,,879563796.0,,,,,,879563800.0
548,1.0,,,,,,,,,1.0,891415677.0,,,,,,,,,891415700.0
653,,,2.0,,,,,,,2.0,,,880153494.0,,,,,,,880153500.0
712,,3.0,,,,,,,,,,874957086.0,,,,,,,,
836,,,,,,,,,5.0,5.0,,,,,,,,,885754266.0,885754300.0
871,,,,,,3.0,,,,3.0,,,,,,888192970.0,,,,888193000.0
All,1.0,,2.0,1.0,4.0,3.0,4.0,3.0,5.0,2.875,891415677.0,,880153494.0,879563796.0,883268170.0,888192970.0,883956417.0,875318629.0,885754266.0,883452900.0


### Reshaping

[[back to top]](#Table-of-Contents)

In fact pivoting of a table is a special case of stacking of any pandas DataFrame. With just one example of the DataFrame’s stacking we have met during the learning of grouping options. Stack/unstuck or reshaping will work when we have a DataFrame with MultiIndixes on the rows and columns (like an example presented below). Stacking of a DataFrame means moving (also rotating or pivoting) the innermost column index to become the innermost row index. The inverse operation is called unstacking. It means the moving the innermost row index to become the innermost column index.
Let’s create a pivot table which will help us to consider the reshaping of a DataFrame’s form possibility

In [18]:
table = pd.pivot_table(short_df, index=['user_id', 'timestamp'], columns='movie_id',values='rating', \
                       fill_value=0, aggfunc=[np.sum, np.mean])
table

Unnamed: 0_level_0,Unnamed: 1_level_0,sum,sum,mean,mean
Unnamed: 0_level_1,movie_id,242,393,242,393
user_id,timestamp,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
9,886958715,8,0,4,0
18,880129305,5,0,5,0
196,881250949,6,0,3,0
417,879645999,3,0,3,0
591,891031644,0,4,0,4
680,876815942,4,0,4,0


Thus, we have two rows indexes and two column indexes. Let’s apply function `unstuck()` to the table DataFrame

In [19]:
unstacked = table.unstack()
unstacked

Unnamed: 0_level_0,sum,sum,sum,sum,sum,sum,sum,sum,sum,sum,...,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean
movie_id,242,242,242,242,242,242,393,393,393,393,...,242,242,242,242,393,393,393,393,393,393
timestamp,876815942,879645999,880129305,881250949,886958715,891031644,876815942,879645999,880129305,881250949,...,880129305,881250949,886958715,891031644,876815942,879645999,880129305,881250949,886958715,891031644
user_id,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
9,,,,,8.0,,,,,,...,,,4.0,,,,,,0.0,
18,,,5.0,,,,,,0.0,,...,5.0,,,,,,0.0,,,
196,,,,6.0,,,,,,0.0,...,,3.0,,,,,,0.0,,
417,,3.0,,,,,,0.0,,,...,,,,,,0.0,,,,
591,,,,,,0.0,,,,,...,,,,0.0,,,,,,4.0
680,4.0,,,,,,0.0,,,,...,,,,,0.0,,,,,


As you can see `unstuck()` function transfer one level of row indexing (note: the internal level) to the lowest indexing level of columns. The `stuck()` function works conversely

In [20]:
stacked = table.stack()
stacked

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,sum,mean
user_id,timestamp,movie_id,Unnamed: 3_level_1,Unnamed: 4_level_1
9,886958715,242,8,4
9,886958715,393,0,0
18,880129305,242,5,5
18,880129305,393,0,0
196,881250949,242,6,3
196,881250949,393,0,0
417,879645999,242,3,3
417,879645999,393,0,0
591,891031644,242,0,0
591,891031644,393,4,4


And with aim of confirmation that `stuck()` is the reverse function to `unstuck()`  let’s demonstrate two following examples

In [21]:
stacked.unstack()

Unnamed: 0_level_0,Unnamed: 1_level_0,sum,sum,mean,mean
Unnamed: 0_level_1,movie_id,242,393,242,393
user_id,timestamp,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
9,886958715,8,0,4,0
18,880129305,5,0,5,0
196,881250949,6,0,3,0
417,879645999,3,0,3,0
591,891031644,0,4,0,4
680,876815942,4,0,4,0


In [22]:
all(table.unstack().stack() == table.stack().unstack())

True

>### Exercise 6.1

> - The `movies` DataFrame contains many movies wich are related to more than one genre. For example, the movie "Star Wars: Episode VII - The Force Awakens (2015)" has genres Adventure, Fantasy, War; the movie "Rough Magic (1995)" has Drama and Romance genres, etc.. First of all, add a new column `genres` to the `movies` DataFrame, which should contain strings with all genres of each movie, i.e. movies from above example will have the respective records "Adventure|Fantasy|War", "Drama|Romance" (let's select this order). After that create a pivot table, where you need calculate the average rating for each genre category and the total amount of movies for each categories. Genres names should be placed in index column. Pay attention, we will not include here no "unknown" or empty fields. Sort result in descending order by movies amount. Call the obtained DataFrame as `pivot_genres`.

In [23]:
# type your code here
#print movies.columns
list1 = movies.columns[11:-1]
print (list1)
#movies['genres'] = movies.apply(lambda x: "|".join(str(z) for z in [i for i in list1 if x[i] !=0]) ,axis=1)
#del movies['unknown']
#movies = movies.dropna()
#print movies.head(5)
p_table = pd.pivot_table(movies, index='movie_title', columns='genres', values='rating', fill_value = 0,
                         aggfunc=np.mean)
print(p_table.head())
pivot_genres = p_table.sort(ascending=False)
print (pivot_genres.columns)

Index(['unknown', 'Action', 'Adventure', 'Animation', 'Childrens', 'Comedy',
       'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
       'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War'],
      dtype='object')


KeyError: 'genres'

In [69]:
from test_helper import Test

Test.assertEqualsHashed(pivot_genres, '7a956f74572f45bae745af08e046eeb2aed61097', 
                                      'Incorrect content of "pivot_genres" DataFrame', "Exercise 6.1 is successful")

1 test failed. Incorrect content of "pivot_genres" DataFrame


<center><h3>Presented by <a target="_blank" rel="noopener noreferrer nofollow" href="http://datascience-school.com">datascience-school.com</a></h3></center>