# Table of contents
1. [Introduction](#introduction)
2. [Manually Completing the Task](#paragraph1)
3. [Creating the Function](#paragraph2)
    1. [Parameter Inputs](#subparagraph1)
4. [Appendix: Full Code for the Function](#paragraph3)

## Introduction <a name="introduction"></a>

You are given a task to complete a report that summarizes the top 3 publishers for each year.

**This Jupyter Notebook assumes you've completed and read through each of the previous notebooks. This includes:**

1. Setting up the Initial py File
2. Reading in Files
3. Function 1 - Best Selling Games by Region by Year

Lets import the ``vgsales.csv`` dataset to start.

In [4]:
import pandas as pd
import os
import pythonguides as pg

In [6]:
games_file = pg.find_directory_file('vgsales')
df = pg.load_file(games_file)
df.head(5)

Using directory file: vgsales.csv


Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


## Manually Completing the Task <a name="paragraph1"></a>
    
Lets start by calculating the total global sales for each publisher each year.

We'll be utilizing the ``groupby()`` function with Pandas. ``groupby()`` is an extremely powerful function that can be chained with other functions.

In [9]:
df2 = df.groupby(['Year', 'Publisher'])['Global_Sales'].sum().reset_index().sort_values(['Year', 'Global_Sales'], ascending=[True, False])
df2.tail(15)

Unnamed: 0,Year,Publisher,Global_Sales
2301,2016.0,Rising Star Games,0.02
2303,2016.0,Screenlife,0.02
2305,2016.0,Sold Out,0.02
2316,2016.0,Yeti,0.02
2269,2016.0,Epic Games,0.01
2270,2016.0,Experience Inc.,0.01
2278,2016.0,Inti Creates,0.01
2285,2016.0,Marvelous Entertainment,0.01
2298,2016.0,Paradox Development,0.01
2300,2016.0,Prototype,0.01


Lets break down what happened here.

1. We grouped the dataset into year, then publisher.
2. Now that we have the groups, we summed up the ``Global_Sales`` for each publisher for each year
3. We utilized ``reset_index()`` to move ``Year`` and ``Publisher`` back into columns for the next few steps
4. We sorted the top ``Publisher`` by ``Global_Sales`` each ``Year``. The top ``Publisher`` will appear first

In [10]:
df2 = df2.groupby('Year').head(3)
df2.head(15)

Unnamed: 0,Year,Publisher,Global_Sales
1,1980.0,Atari,8.36
0,1980.0,Activision,3.02
3,1981.0,Activision,8.5
4,1981.0,Atari,8.45
7,1981.0,Imagic,4.82
19,1982.0,Atari,19.43
17,1982.0,Activision,1.86
28,1982.0,Parker Bros.,1.12
36,1983.0,Nintendo,10.96
35,1983.0,Atari,3.39


Since we know sales is sorted by descending order, we can utilize the ``groupby()`` function once more to grab the top 3 publishers each year.

Now lets rank them. Create a new column with their rank in their respective year:

In [11]:
df2.insert(0, 'Rank', df2.groupby('Year').cumcount())
df2.head(10)

Unnamed: 0,Rank,Year,Publisher,Global_Sales
1,0,1980.0,Atari,8.36
0,1,1980.0,Activision,3.02
3,0,1981.0,Activision,8.5
4,1,1981.0,Atari,8.45
7,2,1981.0,Imagic,4.82
19,0,1982.0,Atari,19.43
17,1,1982.0,Activision,1.86
28,2,1982.0,Parker Bros.,1.12
36,0,1983.0,Nintendo,10.96
35,1,1983.0,Atari,3.39


In [12]:
# Drop the global sales column as we no longer need it
del df2['Global_Sales']

Now lets pivot the dataframe. This means we are going to transpose the dataframe (columns become rows and rows become columns).

In [13]:
df2 = df2.pivot(index='Year', columns='Rank', values='Publisher')
df2.head(10)

Rank,0,1,2
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1980.0,Atari,Activision,
1981.0,Activision,Atari,Imagic
1982.0,Atari,Activision,Parker Bros.
1983.0,Nintendo,Atari,Activision
1984.0,Nintendo,Namco Bandai Games,Hudson Soft
1985.0,Nintendo,Namco Bandai Games,Hudson Soft
1986.0,Nintendo,Capcom,Namco Bandai Games
1987.0,Nintendo,Namco Bandai Games,Enix Corporation
1988.0,Nintendo,Enix Corporation,Capcom
1989.0,Nintendo,Palcom,Capcom


Lets clean up the dataframe for better user readability once exported.

In [14]:
df2.reset_index(inplace=True)
df2.rename(columns={0.0: 'Top_Publisher', 1.0: '2nd_Place', 2.0: '3rd_Place'}, inplace=True)
df2.fillna("No Publisher", inplace=True)

Here is the finalized dataframe:

In [15]:
df2

Rank,Year,Top_Publisher,2nd_Place,3rd_Place
0,1980.0,Atari,Activision,No Publisher
1,1981.0,Activision,Atari,Imagic
2,1982.0,Atari,Activision,Parker Bros.
3,1983.0,Nintendo,Atari,Activision
4,1984.0,Nintendo,Namco Bandai Games,Hudson Soft
5,1985.0,Nintendo,Namco Bandai Games,Hudson Soft
6,1986.0,Nintendo,Capcom,Namco Bandai Games
7,1987.0,Nintendo,Namco Bandai Games,Enix Corporation
8,1988.0,Nintendo,Enix Corporation,Capcom
9,1989.0,Nintendo,Palcom,Capcom


## Creating the Function <a name="paragraph2"></a>

Lets create the function. The function is very similar to the manual steps above except for a few things. I will go over them momentarily. I will not go over the export functions as they have already been explained in the last example, [Function 1 - Best Selling Game by Region this Year](link)

I have also included the documentation in the function as well.

In [16]:
def top_three_publishers_by_year(filename='vgsales'):
    """
    
    Generates the top three publishers for all years found in the input file. If no parameter is passed, the default filename will be ``vgsales``.

    The final Data Frame will be exported to the current directory location. 
    
    Parameters
    ----------
    filename : string, optional
        Will default to ``vgsales`` if no input is given.

    Returns
    -------
    This function does not return a value but exports a dataframe to the current directory.

    """
    
    # Attempt to find the correct filename in the current directory
    games_file = find_directory_file(filename)
    
    # Load in the csv file into the dataframe, df
    df = load_file(games_file)
    
    # Calcuate the total global sales each for publisher for each year
    df2 = df.groupby(['Year', 'Publisher'])['Global_Sales'].sum().reset_index().sort_values(['Year', 'Global_Sales'], ascending=[True, False])
    
    # Once you've sorted by top sales per year, grab the top 3 values from each year
    df2 = df2.groupby('Year').head(3)
    
    # To calculate rank, create a new column with the values: [0, 1, 2]
    df2.insert(0, 'Rank', df2.groupby('Year').cumcount())
    
    # Drop the global sales column as we no longer need it
    del df2['Global_Sales']
    
    # Pivot your dataframe so the rankings are now columns for easier user readability
    df2 = df2.pivot(index='Year', columns='Rank', values='Publisher')
    
    # Clean the dataframe before exporting
    df2.reset_index(inplace=True)
    df2.rename(columns={0.0: 'Top_Publisher', 1.0: '2nd_Place', 2.0: '3rd_Place'}, inplace=True)
    df2.fillna("No Publisher", inplace=True)
    
    # Export finalized dataframe
    filename = os.path.join(DIRECTORY_LOCATION, f'Top_Three_Publishers_by_Sales_Each_Year.csv')
    df2.to_csv(filename, index=False)   
    print(f'Exporting completed file to this folder path: {filename}')

### Parameter Inputs <a name="subparagraph1"></a>

The following are our parameters:

``top_three_publishers_by_year(filename='vgsales')``

- ``filename``: Will automatically attempt to find a filename with the name vgsales, unless specified by the user

In our example, we use the ``vgsales.csv`` file. Nothing is needed on the users side as they can simply call the function to complete it:

In [18]:
pg.top_three_publishers_by_year()

Using directory file: vgsales.csv
Exporting completed file to this folder path: C:/Users/Kevin/Desktop/Data\Top_Three_Publishers_by_Sales_Each_Year.csv


## Full Code for the Function <a name="paragraph3"></a>

Here is the full completed code without any comments.

In [None]:
def top_three_publishers_by_year(filename='vgsales'):
    """
    
    Generates the top three publishers for all years found in the input file. If no parameter is passed, the default filename will be ``vgsales``.

    The final Data Frame will be exported to the current directory location. 
    
    Parameters
    ----------
    filename : string, optional
        Will default to ``vgsales`` if no input is given.

    Returns
    -------
    This function does not return a value but exports a dataframe to the current directory.

    """
    
    games_file = find_directory_file(filename)
    df = load_file(games_file)
    
    df2 = df.groupby(['Year', 'Publisher'])['Global_Sales'].sum().reset_index().sort_values(['Year', 'Global_Sales'], ascending=[True, False])
    df2 = df2.groupby('Year').head(3)
    df2.insert(0, 'Rank', df2.groupby('Year').cumcount())
    
    del df2['Global_Sales']
    
    df2 = df2.pivot(index='Year', columns='Rank', values='Publisher')
    
    df2.reset_index(inplace=True)
    df2.rename(columns={0.0: 'Top_Publisher', 1.0: '2nd_Place', 2.0: '3rd_Place'}, inplace=True)
    df2.fillna("No Publisher", inplace=True)
    
    filename = os.path.join(DIRECTORY_LOCATION, f'Top_Three_Publishers_by_Sales_Each_Year.csv')
    df2.to_csv(filename, index=False)   
    print(f'Exporting completed file to this folder path: {filename}')