##DATA102 HW1
##Webscraping Steam Top 100 Charts
###by Alfonso Rafael L. Reyes**

<br>

###**This webscraping project scrapes the Top 100 Most Played Games on Steam and the Top 100 Best Selling Games on Steam at the time of execution**
<br>
---


###Top 100 Most Played Games on Steam

###Features: Rank, Name, Price, Current Players, Peak Today, URL



<br>

###Top 100 Best Selling Games on Steam

###Features: Rank, Name, Price, Change, Weeks, URL


###So let's first import the necessary libraries

In [None]:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait


##Let's start with the Top 100 Most Played Games on Steam

##Now, let's first set the appropriate URL

In [None]:
mostplayed_url = 'https://store.steampowered.com/charts/mostplayed'

###The following blocks of code first do the following:

###1. Set up the Edge driver that Selenium will be using
###2. Tell the web driver to wait (10s) until the game_items elements are loaded in
###3. Create the empty arrays for each of the elements we will be scraping
###4. Find each element corresponding to the arrays using their class names
###5. Place exception handling for when the elements are not found
###6. Append the scraped data into their corresponding arrays
###7. Save and organize the data to a dataframe
###8. Close Edge driver

<br>

###Features: Rank, Name, Price, Current Players, Peak Today, URL


In [None]:
driver = webdriver.Edge()
driver.get(mostplayed_url)
item_element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "weeklytopsellers_TableRow_2-RN6"))
)

game_items = driver.find_elements(By.CLASS_NAME, "weeklytopsellers_TableRow_2-RN6")

game_rank = []
game_names = []
game_prices = []
game_currentplayers = []
game_peaktoday = []
game_storeurl = []

for item in game_items:
    try:
        rank = item.find_element(By.CLASS_NAME, "weeklytopsellers_RankCell_34h48").text.strip()
    except:
        rank = "Rank not available"


    try:
        name = item.find_element(By.CLASS_NAME, "weeklytopsellers_GameName_1n_4-").text.strip()
    except:
        name = "Name not available"


    try:
        price = item.find_element(By.CLASS_NAME, "salepreviewwidgets_StoreSalePriceBox_Wh0L8").text.strip()
    except:
        price = "Price not available"


    try:
        currentplayers = item.find_element(By.CLASS_NAME, "weeklytopsellers_ConcurrentCell_3L0CD").text.strip()
    except:
        currentplayers = "Current Players not available"


    try:
        peaktoday = item.find_element(By.CLASS_NAME, "weeklytopsellers_PeakInGameCell_yJB7D").text.strip()
    except:
        peaktoday = "Peak Today not available"


    try:
        storeurl = item.find_element(By.CLASS_NAME, "weeklytopsellers_TopChartItem_2C5PJ").get_attribute('href')
    except:
        storeurl = "Store URL not available"


    game_rank.append(rank)
    game_names.append(name)
    game_prices.append(price)
    game_currentplayers.append(currentplayers)
    game_peaktoday.append(peaktoday)
    game_storeurl.append(storeurl)

mostplayed_df = pd.DataFrame({'Rank': game_rank,
                   'Name': game_names,
                   'Price': game_prices,
                   'Current Players': game_currentplayers,
                   'Peak Today': game_peaktoday,
                   'Store URL': game_storeurl})

print(mostplayed_df)

driver.quit()


   Rank                 Name         Price Current Players Peak Today  \
0     1     Counter-Strike 2  Free To Play         502,404  1,018,492   
1     2             Palworld       ₱910.00         488,688    784,385   
2     3               Dota 2  Free To Play         288,784    621,987   
3     4  PUBG: BATTLEGROUNDS  Free To Play         121,856    448,159   
4     5      Baldur's Gate 3     ₱2,599.00         120,043    125,475   
..  ...                  ...           ...             ...        ...   
95   96   Deep Rock Galactic       ₱649.95           9,610     12,667   
96   97       VPet-Simulator  Free To Play           9,001     16,575   
97   98               Arma 3     ₱1,599.95           8,964     14,204   
98   99    Street Fighter™ 6     ₱2,979.00           8,707     20,452   
99  100       Risk of Rain 2       ₱274.97           8,674      8,674   

                                            Store URL  
0   https://store.steampowered.com/app/730/Counter...  
1   https:/

###Now, we can see the scraped data saved to our dataframe

In [None]:
print(mostplayed_df.to_string())

   Rank                                                      Name                Price Current Players Peak Today                                                                                                   Store URL
0     1                                          Counter-Strike 2         Free To Play         502,404  1,018,492                                https://store.steampowered.com/app/730/CounterStrike_2?snr=1_7001_7005__7003
1     2                                                  Palworld              ₱910.00         488,688    784,385                                   https://store.steampowered.com/app/1623730/Palworld?snr=1_7001_7005__7003
2     3                                                    Dota 2         Free To Play         288,784    621,987                                         https://store.steampowered.com/app/570/Dota_2?snr=1_7001_7005__7003
3     4                                       PUBG: BATTLEGROUNDS         Free To Play         121,856    448,15

###Now, lets scrape the Top 100 Best Selling Games on Steam

###The code will be very similar to the previous code but with a little more work

###Let's first set the appropriate URL (We're using the global charts URL but any country charts URL should work)

In [None]:
topselling_url = 'https://store.steampowered.com/charts/topselling/global'

###The following blocks of code first do the following:

###1. Set up the Edge driver that Selenium will be using
###2. Tell the web driver to wait (20s) until the game_items elements are loaded in (Increased the number of seconds from the previous code as I have found it takes a bit longer to load)

###3. Create the empty arrays for each of the elements we will be scraping

###4. Because Steam uses 3 different classes for the Change feature depending on whether it was an increase, decrease or being a new entrant, we assign change_class_names the three class names for each of these possible values. If there was no change in ranking, the Change element is not present.

###5. Find each element corresponding to the arrays using their class names
###6. Place exception handling for when the elements are not found. For the Change element, we look for each of the possible values of the Change class name (which we placed in the array earlier). If none are found, we fill the missing data with "No Change".

###7. Append the scraped data into their corresponding arrays
###8. Save and organize the data to a dataframe
###9. Close Edge driver

<br>


##Features: Rank, Name, Price, Change, Weeks, URL

<br>

###Change refers to the change in ranking from last week

####Weeks refers to how many weeks the game has stayed at that ranking


In [None]:
driver = webdriver.Edge()
driver.get(topselling_url)
item_element = WebDriverWait(driver, 20).until(
    EC.presence_of_element_located((By.CLASS_NAME, "weeklytopsellers_TableRow_2-RN6"))
)

game_items = driver.find_elements(By.CLASS_NAME, "weeklytopsellers_TableRow_2-RN6")

game_rank = []
game_names = []
game_prices = []
game_rankchange = []
game_weeks = []
game_storeurl = []

change_class_names = ["weeklytopsellers_ChangeCell_1ZdIh.Up.Focusable", "weeklytopsellers_ChangeCell_1ZdIh.Down", "weeklytopsellers_ListWeeksDebut_3VOg1"]


for item in game_items:
    try:
        rank = item.find_element(By.CLASS_NAME, "weeklytopsellers_RankCell_34h48").text.strip()
    except:
        rank = "Rank not available"


    try:
        name = item.find_element(By.CLASS_NAME, "weeklytopsellers_GameName_1n_4-").text.strip()
    except:
        name = "Name not available"


    try:
        price = item.find_element(By.CLASS_NAME, "salepreviewwidgets_StoreSalePriceBox_Wh0L8").text.strip()
    except:
        price = "Price not available"



    try:
        rankchange_element = None
        for class_name in change_class_names:
            try:
                rankchange_element = item.find_element(By.CLASS_NAME, class_name)
                if rankchange_element.text.strip():
                    break
            except NoSuchElementException:
                pass

        if rankchange_element is None:
            rankchange = "No Change"
        else:
            rankchange = rankchange_element.text.strip()
    except:
        rankchange = "No Change"


    try:
        weeks = item.find_element(By.CLASS_NAME, "weeklytopsellers_WeeksCell_xm7Jp").text.strip()
    except:
        weeks = "Weeks not available"

    try:
        storeurl = item.find_element(By.CLASS_NAME, "weeklytopsellers_TopChartItem_2C5PJ").get_attribute('href')
    except:
        storeurl = "Store URL not available"


    game_rank.append(rank)
    game_names.append(name)
    game_prices.append(price)
    game_rankchange.append(rankchange)
    game_weeks.append(weeks)
    game_storeurl.append(storeurl)


topsellers_df = pd.DataFrame({'Rank': game_rank,
                   'Name': game_names,
                   'Price': game_prices,
                   'Change': game_rankchange,
                   'Weeks': game_weeks,
                   'Store URL': game_storeurl})

print(topsellers_df)

driver.quit()


   Rank                                 Name         Price     Change Weeks  \
0     1                        HELLDIVERS™ 2     ₱1,990.00  No Change     3   
1     2                  PUBG: BATTLEGROUNDS  Free To Play  No Change   359   
2     3  HELLDIVERS™ 2 Super Citizen Edition     ₱2,990.00        NEW     1   
3     4                     Counter-Strike 2  Free To Play  No Change   600   
4     5                               Dota 2  Free To Play  No Change   323   
..  ...                                  ...           ...        ...   ...   
95   96                       Limbus Company  Free To Play        NEW     1   
96   97                        Overcooked! 2       ₱198.75        NEW     1   
97   98                   Train Sim World® 4       ₱420.00        NEW     1   
98   99                     Dragon's Dogma 2   Prepurchase        NEW     1   
99  100                Football Manager 2024     ₱2,299.00       ▼ 34    21   

                                            Store U

###Now, we can see the data we scraped which we saved into a dataframe

In [None]:
print(topsellers_df.to_string())

   Rank                                                           Name                Price     Change Weeks                                                                                                                    Store URL
0     1                                                  HELLDIVERS™ 2            ₱1,990.00  No Change     3                                           https://store.steampowered.com/app/553850/HELLDIVERS_2?snr=1_7001_topselling__7003
1     2                                            PUBG: BATTLEGROUNDS         Free To Play  No Change   359                                     https://store.steampowered.com/app/578080/PUBG_BATTLEGROUNDS?snr=1_7001_topselling__7003
2     3                            HELLDIVERS™ 2 Super Citizen Edition            ₱2,990.00        NEW     1                                                     https://store.steampowered.com/bundle/33608/?snr=1_7001_topselling__7003
3     4                                               Counter-St

  

---


  # -END OF CODE-





## Q&A

<br>

## Why did you choose to scrape this site?

###*I am interested in video games, data science, and Steam as a platform, so I chose to create a webscraping program to save and display Steam charts' real-time data*

<br>

##What were the challenges you encountered?

###I encountered some issues with the website loading its elements properly which resulted in no data being scraped and an empty dataframe being created. Furthermore, assigning the correct class names took some time ti fix especially with regards to the Change feature, which had 3 possible class names or simply not existing. While these individual issues were not overwhemingly difficult, they each produced similar errors and outcomes for the program which made debugging very difficult, especially considering how slow Selenium runs

<br>

##Does the data you collected contain any personally identifiable information (PII)?

###There did not seem to be any personally identifiable information which is to be expected considering the nature of the webpage being scraped and how good Steam is at ensuring their users' privacy

<br>

#-END OF NOTEBOOK-


---


