This notebook will allow us to experiment with editing CSV files and hopefully eventually allow us to combine them. Python has a built-in CSV library we can use.

In [1]:
import csv
filename = "popular-games-no-reviews.csv"

# Starter code from https://www.geeksforgeeks.org/working-csv-files-python/#
# initializing the titles and rows list
original_fields = []
original_rows = []
 
# reading csv file
with open(filename, 'r', encoding='Latin1') as csvfile:
    # creating a csv reader object
    csvreader = csv.reader(csvfile)
 
    # extracting field names through first row
    original_fields = next(csvreader)
 
    # extracting each data row one by one
    for row in csvreader:
        original_rows.append(row)
 
    # get total number of rows
    print("Total no. of rows: %d" % (csvreader.line_num))
 
# printing the field names
print('Field names are:' + ', '.join(field for field in original_fields))

Total no. of rows: 1513
Field names are:, Title, Release Date, Team, Rating, Times Listed, Number of Reviews, Genres, Plays, Playing, Backlogs, Wishlist


So now we have the list of rows stored in the variable "rows".

In [2]:
print(original_rows[0:5])

[['0', 'Elden Ring', '25-Feb-22', "['Bandai Namco Entertainment', 'FromSoftware']", '4.5', '3.9K', '3.9K', "['Adventure', 'RPG']", '17K', '3.8K', '4.6K', '4.8K'], ['1', 'Hades', '10-Dec-19', "['Supergiant Games']", '4.3', '2.9K', '2.9K', "['Adventure', 'Brawler', 'Indie', 'RPG']", '21K', '3.2K', '6.3K', '3.6K'], ['2', 'The Legend of Zelda: Breath of the Wild', '3-Mar-17', "['Nintendo', 'Nintendo EPD Production Group No. 3']", '4.4', '4.3K', '4.3K', "['Adventure', 'RPG']", '30K', '2.5K', '5K', '2.6K'], ['3', 'Undertale', '15-Sep-15', "['tobyfox', '8-4']", '4.2', '3.5K', '3.5K', "['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']", '28K', '679', '4.9K', '1.8K'], ['4', 'Hollow Knight', '24-Feb-17', "['Team Cherry']", '4.4', '3K', '3K', "['Adventure', 'Indie', 'Platform']", '21K', '2.4K', '8.3K', '2.3K']]


In [3]:
# These constants correspond to the index of the field
TITLE = 1
RELEASE_DATE = 2
TEAM = 3
RATING = 4
TIMES_LISTED = 5
NUMBER_OF_REVIEWS = 6
GENRES = 7
PLAYS = 8
PLAYING = 9
BACKLOGS = 10
WISHLIST = 11

print(original_rows[3][TITLE])
print(original_rows[3][GENRES])

Undertale
['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']


Next, create the new fields that we want to add to. Don't run the cell directly below this more than once!

In [4]:
DEVELOPER = 12
PUBLISHER = 13

original_fields.append("Developer")
original_fields.append("Publisher")

In [5]:
# Read in VGChartz DB. Do it with a dictionary this time!
# initializing the titles and rows list
VGCHARTZ_NAME = 0
VGCHARTZ_PUBLISHER = 3
VGCHARTZ_DEVELOPER = 4

vgchartz_fields = []
vgchartz_rows = []
vgchartz_dict = {}    # store TITLE -> INDEX, so that we can quickly find the correct row for any given game.
                      # We must use lists, because multiple games may be published under the same name (e.g. Tetris).

# reading csv file
with open("vgchartz.csv", 'r', encoding='Latin1') as csvfile:
    # creating a csv reader object
    csvreader = csv.reader(csvfile)
 
    # extracting field names through first row
    vgchartz_fields = next(csvreader)
 
    # extracting each data row one by one
    index = 0
    for row in csvreader:
        vgchartz_rows.append(row)
        # Next, take care of the dictionary
        if row[VGCHARTZ_NAME] in vgchartz_dict:
            # If there already is a value here, append the new value to the list
            vgchartz_dict[row[VGCHARTZ_NAME]].append(index)
        else:
            # If there isn't a value here yet, create a list for the dictionary
            vgchartz_dict[row[VGCHARTZ_NAME]] = [index]
            
        index += 1
 
    # get total number of rows
    print("Total no. of rows: %d" % (csvreader.line_num))
 
# printing the field names
print('Field names are:' + ', '.join(field for field in vgchartz_fields))

Total no. of rows: 64006
Field names are:name, date, platform, publisher, developer, shipped, total, america, europe, japan, other, vgc, critic, user


In [6]:
# Test cases
print(vgchartz_dict["Tetris"]) # This means that rows 0, 92, 693, 49762, and 49763 contain versions of Tetris


[0, 92, 693, 49762, 49763]


Fill out the new fields. We can do that by looping through original_rows and extracting data from vgchartz_rows using vgchartz_dict for each entry.

In [8]:
for row in original_rows:
    # First, find the corresponding entry in vgchartz
    title = row[TITLE]
    date = row[RELEASE_DATE]

    # Check that it exists
    if title in vgchartz_dict:
        # Once we get here, there is at least one entry in vgchartz for this title
        possible_vgchartz_indices = vgchartz_dict[title]

        # TODO: Pick out which of the possible indices is the correct one using the release date
        # TODO: Then, append the correct developer and publisher information as found from vgchartz

    else:
        # There was no entry in vgchartz for this title, so we have no publisher/developer data
        row.append(None)    # TODO: this might not be the right syntax for an empty entry
        row.append(None)
    