#### 1 Introduction to Data Reshaping
Let's start by understanding the concept of wide and long formats and the advantages of using each of them. You’ll then learn how to pivot data from long to a wide format, and get summary statistics from a large DataFrame.

##### The long and the wide
As part of a data scientist job interview, you need to answer some technical questions. One of the challenges is to show the differences between long and wide data formats.

##### Flipping players
Congratulations! You got the data scientist job! In your first project, you will work with the fifa_players dataset. It contains data of the players included in the last version of the video game. Before you start to do any analysis, you need to clean and format your dataset.

As a first step, you need to explore your dataset and reshape it using basic steps, such as setting different indices, filtering columns and flipping the DataFrame. You would like to see if that is enough for further analysis.

The fifa_players dataset is available for you. The pandas module will be preloaded as pd in your session throughout all the exercises of the course.

In [3]:
import pandas as pd

filename = 'fifa_players.csv'
fifa_players = pd.read_csv(filename, index_col = 0)

print(fifa_players.head())

# Set name as index
fifa_transpose = fifa_players.set_index('name')

# Print fifa_transpose
print(fifa_transpose)

                name  age  height  weight nationality                 club
0       Lionel Messi   32     170      72   Argentina         FC Barcelona
1  Cristiano Ronaldo   34     187      83    Portugal             Juventus
2    Neymar da Silva   27     175      68      Brazil  Paris Saint-Germain
3          Jan Oblak   26     188      87    Slovenia      Atlético Madrid
4        Eden Hazard   28     175      74     Belgium          Real Madrid
                   age  height  weight nationality                 club
name                                                                   
Lionel Messi        32     170      72   Argentina         FC Barcelona
Cristiano Ronaldo   34     187      83    Portugal             Juventus
Neymar da Silva     27     175      68      Brazil  Paris Saint-Germain
Jan Oblak           26     188      87    Slovenia      Atlético Madrid
Eden Hazard         28     175      74     Belgium          Real Madrid


In [4]:
# Modify the DataFrame to keep only height and weight columns
fifa_transpose = fifa_players.set_index('name')[['height','weight']]

# Print fifa_transpose
print(fifa_transpose)

                   height  weight
name                             
Lionel Messi          170      72
Cristiano Ronaldo     187      83
Neymar da Silva       175      68
Jan Oblak             188      87
Eden Hazard           175      74


In [5]:
# Change the DataFrame so rows become columns and vice versa
fifa_transpose = fifa_players.set_index('name')[['height', 'weight']].transpose()

# Print fifa_transpose
print(fifa_transpose)

name    Lionel Messi  Cristiano Ronaldo  Neymar da Silva  Jan Oblak  \
height           170                187              175        188   
weight            72                 83               68         87   

name    Eden Hazard  
height          175  
weight           74  


##### Dribbling the pivot method
It's time to keep working with the fifa_players dataset. After you explored the dataset, you realized the dataset contains player scores on different movements: shooting, dribbling, and passing. There are attacking scores as well as overall scores.

The goal of the project is to analyze the scores to create an optimized team, so you decide to explore which score is better. But the current data is in a long format. You'll need to to pivot your DataFrame in different ways to discover a pattern.

The fifa_players dataset is available for you. Make sure to examine it in the console!



In [6]:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

filename = "fifa_movements.csv"
fifa_players = pd.read_csv(filename, index_col = 0)

print(fifa_players.head())

                name   movement  overall  attacking
0           L. Messi   shooting       92         70
1  Cristiano Ronaldo   shooting       93         89
2           L. Messi    passing       92         92
3  Cristiano Ronaldo    passing       82         83
4           L. Messi  dribbling       96         88


In [16]:
fifa_overall = fifa_players.pivot(values='overall', index='name', columns='movement' )
print(fifa_overall)
print()

fifa_attacking = fifa_players.pivot(values='attacking', index='name', columns='movement' )
print(fifa_attacking)
print()

fifa_names = fifa_players.pivot(values='overall', index='movement', columns='name' )
print(fifa_names)

movement           dribbling  passing  shooting
name                                           
Cristiano Ronaldo         89       82        93
L. Messi                  96       92        92

movement           dribbling  passing  shooting
name                                           
Cristiano Ronaldo         84       83        89
L. Messi                  88       92        70

name       Cristiano Ronaldo  L. Messi
movement                              
dribbling                 89        96
passing                   82        92
shooting                  93        92


In [17]:
fifa_over_attack = fifa_players.pivot(values=['overall','attacking'], index='name', columns='movement' )
print(fifa_over_attack)
print()

fifa_all = fifa_players.pivot(index='name', columns='movement' )
print(fifa_all)

                    overall                  attacking                 
movement          dribbling passing shooting dribbling passing shooting
name                                                                   
Cristiano Ronaldo        89      82       93        84      83       89
L. Messi                 96      92       92        88      92       70

                    overall                  attacking                 
movement          dribbling passing shooting dribbling passing shooting
name                                                                   
Cristiano Ronaldo        89      82       93        84      83       89
L. Messi                 96      92       92        88      92       70


##### Replay that last move!
Amazing! You were able to pivot all columns of fifa_players. You saw that the overall and attacking scores are different and decided to extend your analysis to more players. However, you found an error.

You suspect that there are different scores for the same index and column values. You remember that using the .pivot() method for all the columns does not work in that case.

First, you decide to delete the problematic row so you can reshape the DataFrame afterwards.

The fifa_players dataset is available for you. Make sure you examine the dataset into the console and notice the repeated rows.

In [35]:
print(fifa_players)
print()

# Drop the fifth row to delete all repeated rows
fifa_no_rep = fifa_players.drop(4, axis=0)

fifa_pivot = fifa_no_rep.pivot(index='name', columns='movement') 

# Print fifa_pivot
print(fifa_no_rep)

print()
print(fifa_pivot)

                name   movement  overall  attacking
0           L. Messi   shooting       92         70
1  Cristiano Ronaldo   shooting       93         89
2           L. Messi    passing       92         92
3  Cristiano Ronaldo    passing       82         83
4           L. Messi  dribbling       96         88
5  Cristiano Ronaldo  dribbling       89         84

                name   movement  overall  attacking
0           L. Messi   shooting       92         70
1  Cristiano Ronaldo   shooting       93         89
2           L. Messi    passing       92         92
3  Cristiano Ronaldo    passing       82         83
5  Cristiano Ronaldo  dribbling       89         84

                    overall                  attacking                 
movement          dribbling passing shooting dribbling passing shooting
name                                                                   
Cristiano Ronaldo      89.0    82.0     93.0      84.0    83.0     89.0
L. Messi                NaN    92.

In [33]:
fifa_no_rep = fifa_players.drop('name', axis=1)

# Print fifa_pivot
print(fifa_no_rep)

    movement  overall  attacking
0   shooting       92         70
1   shooting       93         89
2    passing       92         92
3    passing       82         83
4  dribbling       96         88
5  dribbling       89         84


##### Reviewing the moves
Wow! You have now learned about pivot tables. In the last analysis that you did, you encountered a DataFrame that had non-unique index/column pairs. In order to pivot your DataFrame, you wrote code to drop the last row, and then reshaped it.

In this exercise, you will modify the code using pivot tables and compare it with your strategy of using the pivot method.

The fifa_players dataset is available for you.

In [37]:
# Discard the fifth row to delete all repeated rows
fifa_drop = fifa_players.drop(4, axis=0)

# Use pivot method to get all scores by name and movement
fifa_pivot = fifa_drop.pivot(index='name', columns='movement') 

# Print fifa_pivot
print(fifa_pivot)  
print()

# Use pivot table to get all scores by name and movement
fifa_pivot_table = fifa_players.pivot_table(index='name', columns='movement', aggfunc='mean')

# Print fifa_pivot_table
print(fifa_pivot_table)

                    overall                  attacking                 
movement          dribbling passing shooting dribbling passing shooting
name                                                                   
Cristiano Ronaldo      89.0    82.0     93.0      84.0    83.0     89.0
L. Messi                NaN    92.0     92.0       NaN    92.0     70.0

                  attacking                    overall                 
movement          dribbling passing shooting dribbling passing shooting
name                                                                   
Cristiano Ronaldo        84      83       89        89      82       93
L. Messi                 88      92       70        96      92       92


##### Exploring the big match
Now, it's time to continue working on the fifa_players exploration. Your next task is to examine the characteristics of players belonging to different teams.

Particularly, you are interested in players from two big rival teams: Barcelona and Real Madrid.

You decide that .pivot_table() is the best tool to get your results since it's an easy way to generate a report. Also, it allows you to define aggregation functions and work with multiple indices.

The fifa_players dataset is available for you. _Make sure you explore it. Check which data it contains from the players playing for each team.

In [41]:
filename = "fifa_big_match.csv"
fifa_players = pd.read_csv(filename, index_col = 0)

# Use pivot table to display mean age of players by club and nationality 
mean_age_fifa = fifa_players.pivot_table(index='nationality', columns='club', values='age', aggfunc='mean')

# Print mean_age_fifa
print(mean_age_fifa)
print()

# Use pivot table to display max height of any player by club and nationality
tall_players_fifa = fifa_players.pivot_table(index='nationality', columns='club', values='height', aggfunc='max')

# Print tall_players_fifa
print(tall_players_fifa)
print()

players_country = fifa_players.pivot_table(index='nationality', columns='club', values='name', aggfunc='count', margins=True)

# Print tall_players_fifa
print(players_country)
print()

club         FC Barcelona  Real Madrid
nationality                           
Brazil          25.666667         23.5
Croatia         31.000000         33.0
France          23.600000         27.0
Germany         27.000000         29.0
Uruguay         32.000000         20.0

club         FC Barcelona  Real Madrid
nationality                           
Brazil                190          186
Croatia               184          172
France                190          191
Germany               187          183
Uruguay               182          182

club         FC Barcelona  Real Madrid  All
nationality                                
Brazil                  3            6    9
Croatia                 1            1    2
France                  5            3    8
Germany                 1            1    2
Uruguay                 1            1    2
All                    11           12   23



##### The tallest and the heaviest
You will continue your exploration of characteristics of players in fifa_players belonging to two teams: FC Barcelona and Real Madrid. As your last task, you are interested in exploring the maximum height and weight separated by teams and nationality. You will also compare two years, 2000 and 2010.

You have two columns that you want to set as an index, so you will need to use pivot_table().

The fifa_players dataset is available for you. It contains data about the club, nationality, height, weight, and year of the players playing for each team.

In [44]:
filename = "fifa_years.csv"
fifa_players = pd.read_csv(filename, index_col = 0)

fifa_mean = fifa_players.pivot_table(index=['nationality','club'], columns='year' ,aggfunc='max', margins=True)

# Print tall_players_fifa
print(fifa_mean)
print()


                         height           weight         
year                       2000 2010  All   2000 2010 All
nationality club                                         
Croatia     FC Barcelona    184  185  185     78   76  78
            Real Madrid     172  173  173     66   68  68
Germany     FC Barcelona    187  189  189     85   87  87
            Real Madrid     183  185  185     76   77  77
All                         187  189  189     85   87  87

