
Video game localizion prioritization tool proposal
---

My goal is to scrape and analyze data from the video game platform
Steam in order to help studios or localization service providers
choose which languages they should localize into in order to
maximize their localization ROI.

The dataset would consist of a database of games with columns including
genre, sales, price, number of reviews, percent of positive reviews,
available languages, and the language that positive or negative reviews
are written in. Any relationships between these variables (especially
between language, genre, sales, price, and positive reviews, if such a
relationship is found) could be instrumental in driving business
decisions on the studio or language service provider level.

The code below is the beginning of my scraper. It scrapes a search result
page to gather the name, price, number of reviews, percent of positive
reviews, and individual game page url for all listed games. In order to
suit the needs of my project, it must be expanded to also perform the
following:

1. Scroll through a results list in order to cause the page to load more
results (current max is 50). Tools exist for this, but I haven't had the
time to study them yet.

2. Perform a secondary scraping of the individual games' pages to collect
the remainder of the column info that I haven't scraped yet. This is
theoretically possible with my current limited skillset, though I worry
that so many rapid calls will cause Steam to ban my ISP, so I should also
study tools that slow down and/or randomize the request timing.

3. Be able to ascertain the language in which a review is written. I think
there are tools available for this - worst case scenario, I just ask
ChatGPT 3.5 which language it is, using a rotating cast of ISPs to bypass
the daily message limit.

In [1]:
# Basic DS stuff
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Trying not to get blocked while scraping by inputting
# random delays between Get requests.
import random
import time

# Web scraping
from bs4 import BeautifulSoup
from urllib.request import urlopen
import chardet

# I needed some extra help locating specific parts within a
# bs4 tag object, so I got this.
import re

# For file tracking when exporting files.
from datetime import date

# I didn't end up using this one, but that might be because
# I still have no idea what the eff I'm doing. Leaving it for
# now in case I need it later.
import requests

Step 1: Learn about the page
---

In [2]:
# THIS CODE ONLY WORKS IF YOUR STEAM SETTINGS ARE SET TO PAGINATED
# SEARCH RESULTS, NOT INFINITE SCROLL.

# This url is for the "all products" search with the result type
# limited to "Games" (category1=998)
url = "https://store.steampowered.com/search/?category1=998"
html = urlopen(url)
current_page_soup = BeautifulSoup(html, 'lxml')

In [3]:
# From looking at the whole page's HTML, I can tell which tag to call in order
# to get the information relevant to only a single game.

single_game_example = current_page_soup.find('a', class_='search_result_row ds_collapse_flag')

print(single_game_example.prettify())

<a class="search_result_row ds_collapse_flag" data-ds-appid="1086940" data-ds-crtrids="[6879350]" data-ds-descids="[1,2,5]" data-ds-itemkey="App_1086940" data-ds-steam-deck-compat-handled="true" data-ds-tagids="[122,6426,1742,4747,21,4474,3843]" data-gpnav="item" data-search-page="1" href="https://store.steampowered.com/app/1086940/Baldurs_Gate_3/?snr=1_7_7_230_150_1" onmouseout="HideGameHover( this, event, 'global_hover' )" onmouseover="GameHover( this, event, 'global_hover', {&quot;type&quot;:&quot;app&quot;,&quot;id&quot;:1086940,&quot;public&quot;:1,&quot;v6&quot;:1} );">
 <div class="col search_capsule">
  <img src="https://cdn.cloudflare.steamstatic.com/steam/apps/1086940/capsule_sm_120.jpg?t=1692294127" srcset="https://cdn.cloudflare.steamstatic.com/steam/apps/1086940/capsule_sm_120.jpg?t=1692294127 1x, https://cdn.cloudflare.steamstatic.com/steam/apps/1086940/capsule_231x87.jpg?t=1692294127 2x"/>
 </div>
 <div class="responsive_search_name_combined">
  <div class="col search_na

In [4]:
# I learned the hard way that not all listings are identical. Most listings are for 'app's, but
# some are for 'bundle's. Bundles have a slightly different leading 'a' tag - not different enough
# that we need to use a different attribute to access them, but different enough that we need to
# use different attributes to scrape some of the data.

# Let's pull one up for reference.

for listing in current_page_soup.find_all('a', class_='search_result_row ds_collapse_flag') :
    if listing.has_attr('data-ds-bundle-data') :
        print(listing.prettify())
        break

Step 2: Scrape the first set of data from the search results pages
---

In [5]:
# Now that we know what our soups will look like, we can write functions to do the scraping.
# The first function will scrape all the relevant data off of the current results page.
# The second function will programmatically switch to the next page of results.
# Later, we will run both functions within a loop in order to scrape all results data
# from all pages.

# This is only the first round of scraping. Later, we will scrape more data from each
# game's store page. Since that process is completely different, we will define new
# functions for it later, after this round of scraping is complete.

# Loop through the HTML blocks for each game and scrape the key info into a dictionary,
# then add the dictionaries to the list.
# I'm not cleaning up the data types at this point - I'm learning as I'm going, so I'm
# prioritizing getting all the info I need into the df, and then working with data
# types later either by doing operations on the df or re-writing some of this code.
def scrape_current_page(current_page_soup) :

    """
    This function takes the soup of a paginated Steam search results page (NOT infinte scroll)
    and scrapes the:
    
    title
    release_date
    positive_review_percent
    number_of_reviews
    price
    game_page_link
    type
    app_id
    
    from every game on the page. It puts these values into dictionaries and appends them to
    the list called "games". 
    """

    for listing in current_page_soup.find_all('a', class_='search_result_row ds_collapse_flag') :

        # Create (or clean out) an empty dictionary to hold the new info.
        game = {}

        # Listings on results pages can be one of two types - standalone games, or bundles.
        # We only want to work with standalone games.
        # Only apps have this tag in their listing.
        if listing.has_attr('data-ds-appid') :
            game['app_id'] = listing.get('data-ds-appid')

            # The title and release date seem to be at uniform locations in all listings.
            game['title'] = listing.find('span', class_='title').get_text()
            game['release_date'] = listing.find('div', class_='col search_released responsive_secondrow').get_text()

            # Not all games have reviws listed, so we have to account for code blocks that omit this part.
            # I might eventually remove this part and scrape the review data from the individual game pages
            # instead, since it seems to be more complete there. This is just proof of concept for now.
            try:
                review_string = re.split('>| of|the | user', listing.find('div', class_='col search_reviewscore responsive_secondrow') \
                                                            .find('span').get('data-tooltip-html'))
                game['positive_review_percent'] = review_string[1]
                game['number_of_reviews'] = review_string[3]
            except: 
                game['positive_review_percent'] = np.nan
                game['number_of_reviews'] = np.nan
            
            # Same for price - many unreleased games do not have price info, so we have to skip them.
            # Some games have an original price and a discounted price listed, but for the time being
            # I've decided to only go by original prices, so I'll default to that and only return
            # a null value if no kind of price whatsoever is listed.
            try: 
                game['price'] = listing.find('div', class_="discount_original_price").get_text()
            except:
                try:
                    game['price'] = listing.find('div', class_="discount_final_price").get_text()
                except:
                    game['price'] = np.nan

            # Weirdly enough, not every game seems to have its own page.
            try:
                game['game_page_link'] = listing.get('href')
            except:
                game['game_page_link'] = 'Failed'

            # Now we grab the tags, which will be a major feature in our analysis.
            try :
                game['tags'] = listing.get('data-ds-tagids')
            except :
                game['tags'] = 'Failed'

            # Now we add this dict to the list, rinse and repeat.
            games.append(game)

In [6]:
# Now we create the function that determines if there is a next page of
# results, or if we're already at the last page.

def get_next_page_url(current_page_soup) :

        """
        This function takes the soup of a paginated Steam search results page (NOT infinte scroll)
        and determines whether it is the last page of results.

        If it is not the last page, the URL of the next page is stored in "next_link".

        If it is the last page, "next_link" will be set to False.
        """

        # First, we check to make sure there IS a next page. We can tell by looking
        # at the 'pagebtn' tags.
        pagebtn_tags = current_page_soup.find_all('a', class_='pagebtn')

        # This is the variable that we will use to store the next link, or set it to
        # False to let the loop know that we're done scraping.
        global next_link

        # If it is any of the middle pages, there will be two pagebtn tags.
        # The link we need is in side the pagebtn tag that displays the text '>'.
        if len(pagebtn_tags) == 2 :
                next_link = pagebtn_tags[1].get('href')

        # If there is only one pagebtn tag, that means we're on the first page or the 
        # last page. If it's the first page, then the pagebtn tag will contain the
        # character '>'.
        elif pagebtn_tags[0].get_text() == ">" :
                next_link = pagebtn_tags[0].get('href')

        # If neither of the above conditions are met, then we're on the last page and
        # we can set "next_link" to False, triggering the loop to stop scraping.
        else :
                next_link = False

In [7]:
# Now that we have our functions, we'll iterate over them to scrape the data.

# Set the first url to be processed to the first page of search results.
next_link = url

# Create the list that will hold the dictionaries of game info.
games = []

# Now we decide how many results we want. 
# 
# The main constraint here is time - since
# we don't want to get IP banned, we'll have set delay between each get request.
# This isn't so important for this loop, since we can get 25 games in one get request.
# However, later we'll be going through the games' pages one-by-one, and in some cases
# we'll have to do 10 different get requests per game to scrape language-specific data.
# Therefore, adding 1 game adds at least 11 get requests & delays to our process.
# (I ended up scraping for over 10 hours.)
#
# Will only limit to inteverals of 25 (as there are 25 results per page).
# If games_to_scrape is greater than the number of games in the search results, then
# the the will automatically stop trying to scrape when it reaches the end of the
# final page of results, because get_next_page_url will set the next_link variable to False.
#
# I want to play with a set of about 3,000 games, but some will be unusable or duplicated,
# so let's overshoot and just play with what we get. 
games_to_scrape = 3200

# Now, loop. Keep scraping as long as our games list is shorter than the games_to_scrape var.
while len(games) < games_to_scrape :
    
    # Soup up the page in question.
    html = urlopen(next_link)
    current_page_soup = BeautifulSoup(html, 'lxml')
    
    # Scrape that page.
    scrape_current_page(current_page_soup)

    # Set "next_link" to the next URL we want to scrape.
    get_next_page_url(current_page_soup)

    # Include a random delay to prevent getting IP blocked.
    interval = 1.5 + random.random() * 0.5
    time.sleep(interval)

    if next_link == False :
        print('Fewer than '+str(games_to_scrape)+' games in the search results.')
        print(str(len(games))+' games scraped.')
        break

# Check our work!
print(len(games))

3200


In [8]:
# Frame it and check.
scraped_search_results_df = pd.DataFrame(games)

# This results in some duplicates - sometimes different versions of the game have the same app id.
# Because we're interested in the relative ration of comment frequencies, not in the total number
# of games or total number of comments, we can safely drop duplicates even if they have different
# comments.
scraped_search_results_df = scraped_search_results_df.drop_duplicates(subset='app_id', keep='first')
scraped_search_results_df = scraped_search_results_df.reset_index(drop=True)

# We'll save this as a csv for convenience and because I don't trust %store yet.
scraped_search_results_df.to_csv('../data/raw/Scraped Search Results.csv')

# Peek peek.
print(scraped_search_results_df.info())
print(scraped_search_results_df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3171 entries, 0 to 3170
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   app_id                   3171 non-null   object
 1   title                    3171 non-null   object
 2   release_date             3171 non-null   object
 3   positive_review_percent  3145 non-null   object
 4   number_of_reviews        3145 non-null   object
 5   price                    2905 non-null   object
 6   game_page_link           3171 non-null   object
 7   tags                     3171 non-null   object
dtypes: object(8)
memory usage: 198.3+ KB
None
    app_id                               title  release_date  \
0  1086940                     Baldur's Gate 3   Aug 3, 2023   
1      730    Counter-Strike: Global Offensive  Aug 21, 2012   
2  1888160  ARMORED CORE™ VI FIRES OF RUBICON™  Aug 24, 2023   
3  1085660                           Destiny 2   Oct 1, 2019 

Step 3: Scrape additional data for each game from its individual game page
---

In [9]:
# Now we're ready to use the URLs we just scraped to go through the pages
# one-by-one and scrape more data.

# We'll put all this data in a completely different df, then join them
# when we're done on app_id.
def scrape_game_page_data(current_page_soup) :

    """
    This function scrapes info from all the individual games pages
    currently referenced in games_info_df. We put the info in a dict
    "game", then append it to "games_extend_list".
    
    Later, we will turn that list into another df and merge it to
    games_info_df on index.

    Scraped information is:

    app_id
    developer
    publisher
    description
    interface_languages
    full_audio_languages
    subtitles_languages
    english     <-- the number of user comments in English
    """
    # For bugfixing
    global touched_ids
    
    # Create/clear out the dictionary.
    game = {}


    # Weirdly, the best place to find the app_id is in the reflexive URL.
    # If we split the url by slashes, the app id is third from the end.
    try:
        url_string = current_page_soup.find('link', rel='canonical').get('href')
        url_string = re.split('/', url_string)
        game["app_id"] = url_string[-3]
        touched_ids.append(game['app_id'])
    except:
        game["app_id"] = "Failed"

    # We can get the developer and publisher from the same code block.
    try :
        code_block = current_page_soup.find('div', attrs={'id':'appHeaderGridContainer'})
    except :
        pass

    # The developer name is at a fixed location.
    try:
        game['developer'] = code_block.find('div', class_='grid_content').get_text()
        # Don't know why it always brings in a newline at the beginning of the string. and a
        # space at the end. Let's take those out.
        game['developer'] = game['developer'][1:-1]
    except :
        game['developer'] = 'Failed'

    # The publisher name is also at a fixed location. Not every game has a publisher, though.
    try :
        game['publisher'] = code_block.find('div', class_='grid_label', string='Publisher').find_next('a').get_text()
    except :
        game['publisher'] = 'None'

    # Descriptions are at a fixed location.
    try:
        game['description'] = current_page_soup.find('meta', attrs={'name':'Description'}).get('content')
    except :
        game['description'] = 'Failed'

    # The languages are listed as rows of a table.
    # There are three different ways languages can be implemented in the game.
    # As we look through the table, we'll store the languages in separate lists.
    interface_languages = []
    full_audio_languages = []
    subtitles_languages = []
    language_types = [interface_languages, full_audio_languages, subtitles_languages]

    # The source code is compex so let's isolate the relevant block for safety.
    try :
        languages_code_block = current_page_soup.find('table', class_='game_language_options')
    # I'll leave a note for myself to help with bugfixing if needed.
    except :
        language_types[0] = 'Did not find code block'

    # Each "row" of the table is separated by a re tag. However, there's an extra
    # tr tag at the beginning of languages_code_block that I couldn't find a better
    # way to work around - since it has no text, it'll throw an error on .get_text,
    # so we can just try/except our way out of it.
    try :
        for row in languages_code_block.find_all('tr', class_='') :
            try :
                current_language = row.find('td', class_='ellipsis').get_text()
                # The text has a lot of formatting in it. No more!
                current_language = re.sub('\t|\n|\r', '', current_language)

                # The code block represents each cell of the row with a td class='checkcol'
                # tag. In order, the three cells of each row are interface, full audio,
                # and subtitles. If the language of that row does not have one of those
                # services, then there will be no more code inside the tags. If it does,
                # then there will be a "span" tag in there along with a checkmark.

                # Since the three types of language services are always in order,
                # we can basically use 'counter' to iterate through the list of lists
                # of language service types and only append the name of the language
                # if that section of code has the "span" tag that indicates a checkmark.
                counter = 0
                for column in row.find_all('td', class_='checkcol') :
                    if column.find('span') :
                        language_types[counter].append(current_language)
                    counter += 1
            except :
                pass
    # For bugfixing.
    except :
        language_types[0] = 'Found code block, failed to parse within code block'


    # Now we add the lists to our dictionary. We can access the lists via
    # the index of the language_types list of lists.
    game['interface_languages'] = language_types[0]
    game['full_audio_languages'] = language_types[1]
    game['subtitles_languages'] = language_types[2]

    # I would love to have rating data available for the games, but Steam does not
    # present it systematically (probably because so many games are not rated,
    # and because there are different rating systems.)
    # Maybe someday.
    # game['rating'] = PG, Mature Audiences, etc...

    # Now we get the number of reviews that are in English.
    # To get the numbers for other languages, we'll have to modify the URL parameters
    # and get the page again, so that'll be a big ol'loop that we'll do later.
    try:
        game['english'] = current_page_soup.find('label', attrs={'for':'review_language_mine'}) \
                                                            .find_next('span', class_='user_reviews_count').get_text()
    except:
        game['english'] = 0

    # Rinse and repeat.
    games_extend_list.append(game)

In [10]:
# I'm declaring/cleaning out the list in a different cell because I hit a lot of 
# exceptions while testing this, and I didn't want to accidentally clean out all
# my previous hard work each time I made a fix and continued the process. 
games_extend_list = []

# Since running the following cell requires repeated get requests and sleep intervals,
# and since many failures tend to happen 20 minutes or more into the process,
# we can build in a ticker that keeps track of how far we got LAST time.
# Then, after we debug, we can start right over from where we left off. 
ticker = 0

In [12]:
# For bugfixing.
touched_ids = []

# Now we loop over all all app_ids in the df we created earlier.
for index, row in scraped_search_results_df.iterrows() :
    
    # This is for bugfixing. If the loop throws an exception, I can use the ticker
    # variable to quickly pick up where we left off.
    if index == ticker :
        # Soup up the page.
        url = row['game_page_link']
        html = urlopen(url)
        current_page_soup = BeautifulSoup(html, 'lxml')

        # Scrape the page.
        scrape_game_page_data(current_page_soup)

        # Include a random delay to prevent getting IP blocked.
        interval = 1.5 + random.random() * 0.5
        time.sleep(interval)
        
        # If the loop throws an exception on a game, 'ticker' will thus be equal
        # to that game's index in the df, and I can go see what the problem was.
        ticker = index + 1
        

# Turn the new list of dicts into a new df.
scraped_game_pages_df = pd.DataFrame(games_extend_list)
scraped_game_pages_df.to_csv('../data/raw/Scraped Game Pages.csv')

In [13]:
# Now we join our dataframes to create our core dataset.
# I say "core," even though our all-important label has yet to be scraped.
# Bear with me. I'm new at this.
joined_games_df = pd.merge(scraped_search_results_df, scraped_game_pages_df, on="app_id", how='inner')
joined_games_df.to_csv('../data/imterim/Joined Games DF.csv')
print(joined_games_df.info())
print(joined_games_df.head())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3154 entries, 0 to 3153
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   app_id                   3154 non-null   object
 1   title                    3154 non-null   object
 2   release_date             3154 non-null   object
 3   positive_review_percent  3128 non-null   object
 4   number_of_reviews        3128 non-null   object
 5   price                    2889 non-null   object
 6   game_page_link           3154 non-null   object
 7   tags                     3154 non-null   object
 8   developer                3154 non-null   object
 9   publisher                3154 non-null   object
 10  description              3154 non-null   object
 11  interface_languages      3154 non-null   object
 12  full_audio_languages     3154 non-null   object
 13  subtitles_languages      3154 non-null   object
 14  english                  3154 non-null  

Step 4: Scrape the number of comments in each language from the games' pages
---

In [14]:
# Now we begin the task of getting all the comment counts for each different language.
# Since this process requires a huge amount of get requests/time, we'll limit our exploration
# to the 10 most common languages for game localization (assuming the source text is English).

# Here's a list of all the language codes on Steam, for good measure.
# Don't know if we'll use it, but here it is.
all_languages = ['schinese', 'tchinese', 'japanese', 'koreana', 'thai', 'bulgarian', 'czech', 'danish', \
                 'german', 'english', 'spanish', 'latam', 'greek', 'french', 'italian', 'indonesian', \
                 'hungarian', 'dutch', 'norwegian', 'polish', 'portugese', 'brazilian', 'romanian', \
                 'russian', 'finnish', 'swedish', 'turkish', 'vietnamese', 'ukranian']

# These are the generally-accepted top 10 languages to localize into from EN.
# The count of EN comments is important for our analysis, but it's already in the df.
# No idea why they put an a on the end of Korean.
top_10_languages = ['german', 'french', 'spanish', 'brazilian', 'russian', 'italian', 'schinese', \
                    'japanese', 'koreana', 'polish']

%store top_10_languages

Stored 'top_10_languages' (list)


In [15]:
# Now we build a function that will find the number of reviews in a given language for a given game.
# This function will iterate through our df (using the first one, which is also the smallest, for
# good measure), creating a new column for the language and filling the value with the number.
app_comment_languages = []
single_app_comment_languages = {}

def comments_in_all_languages(app_id, languages) :
    """
    Takes a Steam app id and a list of languages (as spelled in Steam's html)
    and creates a dictionary, then appends that dictionary to a list.

    Intended to be iterated over.

    The first key in the dictionary is "app id", and the value is the app id.

    The rest of the keys are the names of the languages, and the values are
    the number of comments on that game/app's page that are in that language.
    """
    
    # Make sure the dict is empty at the beginning of each loop.
    single_app_comment_languages = {}
    
    # Store the app_id in the dict.
    single_app_comment_languages['app_id'] = app_id

    # Soup up the game's page in the current language.
    for language in languages :
        url = 'https://store.steampowered.com/app/'+app_id+'/?l='+language
        html = urlopen(url)
        current_page_soup = BeautifulSoup(html, 'lxml')

        # There are 2 types of game page source code, used on games with different language settings.
        # We'll try the most common one first, then try to execute the other type if this throws an exception.
        try :
            single_app_comment_languages[language] = current_page_soup.find('label', attrs={'for':'review_language_mine'}) \
                                                                        .find_next('span').get_text()
        
        # If that's no good, we try scraping the other way.
        # The 'other way' can't be scraped effectively by urlopen(), so we'll use requests.get() instead.
        except :
            try :
                url = 'https://store.steampowered.com/app/'+app_id+'/?l='+language
                html = requests.get(url)
                html_string = str(html.content)

                single_app_comment_languages[language] = re.split('<span class="user_reviews_count">|</span> <a class="tooltip" data-tooltip-html=', html_string)[-2]

            # If both fail, then it's a loss.
            except:
                single_app_comment_languages[language] = 'Failed'
                
        interval = 1.5 + random.random() * 0.5
        time.sleep(interval)

    # Rinse and repeat.
    app_comment_languages.append(single_app_comment_languages)
        

In [26]:
# Now we iterate over that function for all app ids.
# I'm also resetting the dic/list variables here since I ran these cells out of order a lot
# during bugfixing.
app_comment_languages = []
single_app_comment_languages = {}

# Pass each app_id into the function along with our list of target languages.
for index, row in scraped_search_results_df.iterrows() :
    comments_in_all_languages(row['app_id'], top_10_languages)

# Save as a .csv because I'm risk-averse.
comment_languages_df = pd.DataFrame(app_comment_languages)
comment_languages_df.to_csv('../data/raw/Comment Languages DF.csv')

# Peek peek
print(comment_languages_df.info())
print(comment_languages_df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3171 entries, 0 to 3170
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   app_id     3171 non-null   object
 1   german     3171 non-null   object
 2   french     3171 non-null   object
 3   spanish    3171 non-null   object
 4   brazilian  3171 non-null   object
 5   russian    3171 non-null   object
 6   italian    3171 non-null   object
 7   schinese   3171 non-null   object
 8   japanese   3171 non-null   object
 9   koreana    3171 non-null   object
 10  polish     3171 non-null   object
dtypes: object(11)
memory usage: 272.6+ KB
None
    app_id       german     french    spanish  brazilian      russian  \
0  1086940    (196,574)    (7,174)    (5,780)    (7,374)     (17,432)   
1      730  (2,267,349)  (124,244)  (277,384)  (433,297)  (1,976,542)   
2  1888160     (22,412)      (378)      (391)      (228)        (609)   
3  1085660    (341,920)   (10,917)   (17,154) 

In [28]:
# Now we merge our dfs into our big main one.
games_df = pd.merge(joined_games_df, comment_languages_df, on="app_id", how='inner')
games_df.to_csv('0 - Raw Scraped Games DF.csv')
print(games_df.info())
print(games_df.head())

# Data scraped!

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3154 entries, 0 to 3153
Data columns (total 25 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   app_id                   3154 non-null   object
 1   title                    3154 non-null   object
 2   release_date             3154 non-null   object
 3   positive_review_percent  3128 non-null   object
 4   number_of_reviews        3128 non-null   object
 5   price                    2889 non-null   object
 6   game_page_link           3154 non-null   object
 7   tags                     3154 non-null   object
 8   developer                3154 non-null   object
 9   publisher                3154 non-null   object
 10  description              3154 non-null   object
 11  interface_languages      3154 non-null   object
 12  full_audio_languages     3154 non-null   object
 13  subtitles_languages      3154 non-null   object
 14  english                  3154 non-null  