# Setting up the py File

1. [Introduction](#introduction)
2. [Saving your starting py file](#paragraph1)
3. [Overview of the Games Dataset](#paragraph2)
4. [Creating a Working Directory](#paragraph3)
5. [Summary](#paragraph4)

## Introduction <a name="introduction"></a>

Setting up your py file is similar to setting up packages like pandas or numpy, you have to create the .py file and place it in the folder your IDE reads all your Python packages.

This guide uses the Anaconda distribution of Python. The Anaconda distribution is commonly used for all data science distributions of Python as it comes prepackaged with other useful utilities without the user needing to go to each individual website.

Anaconda can be installed on their website: [Link](https://www.anaconda.com/)

Once you have Anaconda installed, open up your preferred IDE to start creating your py file. Some common IDEs include:

- [Pycharm](https://www.jetbrains.com/pycharm/)
- [Microsoft Visual Studio](https://visualstudio.microsoft.com/)
- Spyder (packaged with Anaconda)

## Saving your starting py file <a name="paragraph1"></a>

Create your py file and carefully give it the name you want to call your library.

For example purposes, this guide will call it ``pythonguides.py``

The location to save your py file will be where your Python distribution houses all your other packages. For Anaconda, you will most likely find it here:

``C:/Users/Your_Username/AppData/Local/Continuum/anaconda3/Lib/site-packages``

Save your py file once you've figured out the correct folder path and we can start creating some functions!

## Overview of the Games Dataset <a name="paragraph2"></a>

The dataset used in the various eaxamples will be a games dataset found on the following Kaggle page: [Link](https://www.kaggle.com/gregorut/videogamesales)

Fields include

    Rank - Ranking of overall sales

    Name - The games name

    Platform - Platform of the games release (i.e. PC,PS4, etc.)

    Year - Year of the game's release

    Genre - Genre of the game

    Publisher - Publisher of the game

    NA_Sales - Sales in North America (in millions)

    EU_Sales - Sales in Europe (in millions)

    JP_Sales - Sales in Japan (in millions)

    Other_Sales - Sales in the rest of the world (in millions)

    Global_Sales - Total worldwide sales.

In [1]:
import pandas as pd
df = pd.read_csv('vgsales.csv')

In [2]:
df.head(5)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


## Creating a Working Directory <a name="paragraph3"></a>

You will be commonly working with specific folders that house all your various csv, Excel and other data files on your computer.

To be consistent, creating a working directory and storing your files in one location will make it easier for you to locate your files. In your py file, lets define your working directory that you will be using.

Keep in mind the function below will most likely choose ``C:/Users/Your_Username`` as the working directory by default if you have not changed any settings.

In [3]:
import os

In [5]:
DIRECTORY_LOCATION = os.getcwd()
print(f"Database directory is {DIRECTORY_LOCATION}")

Database directory is C:\Users\Kevin


Or you could define it directly:

In [4]:
DIRECTORY_LOCATION = r'C:/Users/Kevin/Desktop/Data'
print(f"Database directory is {DIRECTORY_LOCATION}")

Database directory is C:/Users/Kevin/Desktop/Data


**It is recommended you choose the first option as you can easily define your working directory before importing your package anytime by using ``os.chdir("Your_Folder_Path")``**

In [6]:
os.chdir(r'C:/Users/Kevin/Desktop')
DIRECTORY_LOCATION = os.getcwd()
print(f"Database directory is {DIRECTORY_LOCATION}")

Database directory is C:\Users\Kevin\Desktop


## Summary <a name="paragraph4"></a>

To recap, place the following lines at the top of your py file which include some packages we will need imported as well as the 2 lines of code defining your working directory.

In [7]:
import glob
import os
import pandas as pd
import numpy as np
import datetime

# Initialize the py file
DIRECTORY_LOCATION = os.getcwd()
print(f"Database directory is {DIRECTORY_LOCATION}")

Database directory is C:\Users\Kevin\Desktop
