# Table of contents
1. [Introduction](#introduction)
2. [Manually Completing the Task](#paragraph1)
3. [Creating the Function](#paragraph2)
    1. [Parameter Inputs](#subparagraph1)
    1. [Exporting Final Dataframe](#subparagraph2)
4. [Appendix: Full Code for the Function](#paragraph3)

## Introduction <a name="introduction"></a>

You are given a task to do a complete a yearly report that summarizes the top selling game each region this year.

Assuming you've set up these two functions:

- ``find_directory_file()``
-  ``load_file()``

Lets import the ``vgsales.csv`` dataset to start.

In [7]:
import pythonguides as pg
import pandas as pd

In [9]:
file_name = pg.find_directory_file('vgsales')
df = pg.load_file(file_name)

df.head(5)

Using directory file: vgsales.csv


Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


## Manually Completing the Task <a name="paragraph1"></a>

Lets start by defining a variable that captures the current year

In [21]:
import datetime
import os

In [10]:
a = datetime.datetime.now().year
a

2020

Next filter the dataset with the year

In [12]:
df1 = df[df['Year'] == a]
df1.head(5)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
5957,5959,Imagine: Makeup Artist,DS,2020.0,Simulation,Ubisoft,0.27,0.0,0.0,0.02,0.29


This dataset only has one entry for the year 2020, for example purposes lets use 2015 as our year to get some working data.

In [13]:
a = 2015
df1 = df[df['Year'] == a]
df1.head(5)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
33,34,Call of Duty: Black Ops 3,PS4,2015.0,Shooter,Activision,5.77,5.81,0.35,2.31,14.24
77,78,FIFA 16,PS4,2015.0,Sports,Electronic Arts,1.11,6.06,0.06,1.26,8.49
92,93,Star Wars Battlefront (2015),PS4,2015.0,Shooter,Electronic Arts,2.93,3.29,0.22,1.23,7.67
101,102,Call of Duty: Black Ops 3,XOne,2015.0,Shooter,Activision,4.52,2.09,0.01,0.67,7.3
109,110,Fallout 4,PS4,2015.0,Role-Playing,Bethesda Softworks,2.47,3.15,0.24,1.1,6.96


Now we can create four different dataframes, each containing the highest grossing game for each region. Once we have all four dataframes, we can concatenate them together into a finalized dataframe.

In [14]:
df_NA = pd.DataFrame(df1.iloc[df1['NA_Sales'].values.argmax()])
df_NA.columns = ['NA_Sales']

df_EU = pd.DataFrame(df1.iloc[df1['EU_Sales'].values.argmax()])
df_EU.columns = ['EU_Sales']

df_JP = pd.DataFrame(df1.iloc[df1['JP_Sales'].values.argmax()])
df_JP.columns = ['JP_Sales']

df_Other = pd.DataFrame(df1.iloc[df1['Other_Sales'].values.argmax()])
df_Other.columns = ['Other_Sales']

In [15]:
df_NA

Unnamed: 0,NA_Sales
Rank,34
Name,Call of Duty: Black Ops 3
Platform,PS4
Year,2015
Genre,Shooter
Publisher,Activision
NA_Sales,5.77
EU_Sales,5.81
JP_Sales,0.35
Other_Sales,2.31


Since all four dataframes have the same index, we can easily concatenate them by row.

In [17]:
df_final = pd.concat([df_NA,df_EU,df_JP,df_Other],axis = 1).reset_index()
df_final

Unnamed: 0,index,NA_Sales,EU_Sales,JP_Sales,Other_Sales
0,Rank,34,78,415,34
1,Name,Call of Duty: Black Ops 3,FIFA 16,Monster Hunter X,Call of Duty: Black Ops 3
2,Platform,PS4,PS4,3DS,PS4
3,Year,2015,2015,2015,2015
4,Genre,Shooter,Sports,Action,Shooter
5,Publisher,Activision,Electronic Arts,Capcom,Activision
6,NA_Sales,5.77,1.11,0.25,5.77
7,EU_Sales,5.81,6.06,0.19,5.81
8,JP_Sales,0.35,0.06,2.78,0.35
9,Other_Sales,2.31,1.26,0.04,2.31


Great! Now lets remove the 'Rank' row as we don't need it and then finally export our dataframe.

In [19]:
df_final = df_final[df_final['index'] != 'Rank']
df_final

Unnamed: 0,index,NA_Sales,EU_Sales,JP_Sales,Other_Sales
1,Name,Call of Duty: Black Ops 3,FIFA 16,Monster Hunter X,Call of Duty: Black Ops 3
2,Platform,PS4,PS4,3DS,PS4
3,Year,2015,2015,2015,2015
4,Genre,Shooter,Sports,Action,Shooter
5,Publisher,Activision,Electronic Arts,Capcom,Activision
6,NA_Sales,5.77,1.11,0.25,5.77
7,EU_Sales,5.81,6.06,0.19,5.81
8,JP_Sales,0.35,0.06,2.78,0.35
9,Other_Sales,2.31,1.26,0.04,2.31
10,Global_Sales,14.24,8.49,3.26,14.24


In [23]:
# Export finalized dataframe
df_final.to_csv('Top_Selling_Game_By_Region_2015.csv', index=False)   

## Creating the function <a name="paragraph2"></a>

Lets create the function. The function is very similar to the manual steps above except for a few things. I will go over them momentarily.

I have also included the documentation in the function as well.

In [24]:
def generate_top_region_games_by_year(currentYear = datetime.datetime.now().year, filename='vgsales'):
    """
    
    Generates the top selling game by each region by year. By default, if no parameter is passed for ``currentYear``, then the function will use the systems' current year.

    The final Data Frame will be exported to the current directory location.  
    
    Parameters
    ----------
    currentYear : int, optional
        Will default to systems' current year if no input is given
        
    filename : string, optional
        Will default to ``vgsales`` if no input is given.
        

    Returns
    -------
    This function does not return a value but exports a dataframe to the current directory.

    """
    
    # Attempt to find the correct filename in the current directory
    games_file = find_directory_file(filename)
    
    # Load in the csv file into the dataframe, df
    df = load_file(games_file)
    
    # Filter for the current year
    df = df[df['Year'] == currentYear]
    
    # Find the top selling game by finding the highest value in each respective column using .argmax()
    df_NA = pd.DataFrame(df.iloc[df['NA_Sales'].values.argmax()])
    df_NA.columns = ['NA_Sales']
    
    df_EU = pd.DataFrame(df.iloc[df['EU_Sales'].values.argmax()])
    df_EU.columns = ['EU_Sales']

    df_JP = pd.DataFrame(df.iloc[df['JP_Sales'].values.argmax()])
    df_JP.columns = ['JP_Sales']

    df_Other = pd.DataFrame(df.iloc[df['Other_Sales'].values.argmax()])
    df_Other.columns = ['Other_Sales']
    
    # Concatenate the four pandas series into a finalized dataframe
    df_final = pd.concat([df_NA, df_EU, df_JP, df_Other], axis=1).reset_index()
    
    # Remove unecessary row
    df_final = df_final[df_final['index'] != 'Rank']
    
    # Export finalized dataframe
    filename = os.path.join(DIRECTORY_LOCATION, f'Top_Selling_Game_By_Region_{currentYear}.csv')
    df_final.to_csv(filename, index=False)   
    print(f'Exporting completed file to this folder path: {filename}')

There are two notable differences:

- The parameter inputs
- Exporting the final dataframe

### Parameter Inputs <a name="subparagraph1"></a>

The following are our parameters:

``generate_top_region_games_by_year(currentYear = datetime.datetime.now().year, filename='vgsales')``

- ``currentYear``: Will automatically capture the current year by default, unless specified by the user
- ``filename``: Will automatically attempt to find a filename with the name vgsales, unless specified by the user




As you can see, both parameters are optionally and you could easily run the function in a single line with no inputs:

In [25]:
pg.generate_top_region_games_by_year()

Using directory file: vgsales.csv
Exporting completed file to this folder path: C:/Users/Kevin/Desktop/Data\Top_Selling_Game_By_Region_2020.csv


However you can specify the parameters if you wish. Lets say we want to find the top games for 2015 as in our previous example, just add-in the 2015 parameter:

In [26]:
pg.generate_top_region_games_by_year(2015)

Using directory file: vgsales.csv
Exporting completed file to this folder path: C:/Users/Kevin/Desktop/Data\Top_Selling_Game_By_Region_2015.csv


**Notice the filename changed to 2015? The next section explains why...**

### Exporting Final Dataframe <a name="subparagraph2"></a>

The last three lines are as follows:

    filename = os.path.join(DIRECTORY_LOCATION, f'Top_Selling_Game_By_Region_{currentYear}.csv')
    df_final.to_csv(filename, index=False)   
    print(f'Exporting completed file to this folder path: {filename}')

By utilizing ``f`` before a string, we can input variables into string names. We use the ``currentYear`` variable to automatically name the file depending on the year we analyzed.

Finally, its always a good idea to write a print statement to ensure the user knows where the file is saved. This is the last statement.

## Full Code for the Function <a name="paragraph3"></a>

Here is the full completed code without any comments.

In [27]:
def generate_top_region_games_by_year(currentYear = datetime.datetime.now().year, filename='vgsales'):
    """
    
    Generates the top selling game by each region by year. By default, if no parameter is passed for ``currentYear``, then the function will use the systems' current year.

    The final Data Frame will be exported to the current directory location.  
    
    Parameters
    ----------
    currentYear : int, optional
        Will default to systems' current year if no input is given
        
    filename : string, optional
        Will default to ``vgsales`` if no input is given.
        

    Returns
    -------
    This function does not return a value but exports a dataframe to the current directory.

    """
    
    
    games_file = find_directory_file(filename)
    df = load_file(games_file)
    
    df = df[df['Year'] == currentYear]
    
    df_NA = pd.DataFrame(df.iloc[df['NA_Sales'].values.argmax()])
    df_NA.columns = ['NA_Sales']
    
    df_EU = pd.DataFrame(df.iloc[df['EU_Sales'].values.argmax()])
    df_EU.columns = ['EU_Sales']

    df_JP = pd.DataFrame(df.iloc[df['JP_Sales'].values.argmax()])
    df_JP.columns = ['JP_Sales']

    df_Other = pd.DataFrame(df.iloc[df['Other_Sales'].values.argmax()])
    df_Other.columns = ['Other_Sales']
    
    df_final = pd.concat([df_NA, df_EU, df_JP, df_Other], axis=1).reset_index()
    df_final = df_final[df_final['index'] != 'Rank']
    
    filename = os.path.join(DIRECTORY_LOCATION, f'Top_Selling_Game_By_Region_{currentYear}.csv')
    df_final.to_csv(filename, index=False)   
    print(f'Exporting completed file to this folder path: {filename}')