# Projekt 2 - Steam Sale!

## 1. Data Scraping och Data Munging

## Reflektion

Jag började med att noga studera webbsidans struktur för att identifiera de relevanta elementen och sedan systematiskt extrahera och 
organisera dem i olika listor. Därigenom har jag skapat en välstrukturerad dataset över spelinformation.

En viktig aspekt av koden var att hantera olika betingelser och undantag för att säkerställa att jag fick korrekt information. 
Jag stötte på utmaningar relaterade till varierande HTML-strukturer, särskilt när det gällde att hämta betyg och procentandelar på ett korrekt sätt.

I anpassningen av variabelnamnen valde jag att följa konventionen med snake_case, vilket gör koden mer 
lättläst och följer god praxis inom Python-programmering.

## Imports

In [1]:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import os.path


### Uppgiften

In [33]:
title_list = []
rating_word_list = []
rating_percent_list = []
sale_list = []
price_list = []
price_original_list = []
year_list = []
system_win_list = []
system_lin_list = []
system_osx_list = []
timestamp_list = []

# Loop through 5 pages of content
for i in range(1, 6):
    res = requests.get("https://store.steampowered.com/search/?specials=1&page=" + str(i))
    soup = bs(res.text, "html.parser")

    # Find all divs with class "responsive_search_name_combined" and iterate through them
    for games in soup.findAll("div", class_="responsive_search_name_combined"):
        title_list.append(games.find("span", class_="title").text)
        timestamp_list.append(pd.to_datetime("today").strftime("%d/%m/%Y"))

        if games.find("div", class_="search_review_summary positive") != 1:
            if games.find(class_="col search_reviewscore responsive_secondrow").span is not None:
                rating_word_list.append(games.find(class_="col search_reviewscore responsive_secondrow").span["data-tooltip-html"].split('<br>')[0])
                rating_percent_list.append(games.find(class_="col search_reviewscore responsive_secondrow").span["data-tooltip-html"].split('<br>')[1].split(" ")[3])
            else:
                rating_word_list.append("")
                rating_percent_list.append("")

            if games.find(class_="discount_pct") is None:
                sale_list.append("")
            else:
                sale_list.append(games.find(class_="discount_pct"))

            price_list.append(games.find(class_="discount_final_price"))
            price_original_list.append(games.find(class_="discount_original_price"))
            year_list.append(games.find(class_="col search_released responsive_secondrow").text[2:])
        #Add a 1 for every operating system support and a 0 for those that are not supported
        system_win_list.append("1" if games.find("span", class_="platform_img win") else "0")
        system_osx_list.append("1" if games.find("span", class_="platform_img mac") else "0")
        system_lin_list.append("1" if games.find("span", class_="platform_img linux") else "0")



### Den slutliga listan och använder .replace() för att få en vettig rating 

In [34]:
percentile_list = pd.DataFrame(
    {'Title': title_list,
     'Rating 1-8': rating_word_list,
     'Reviews': rating_percent_list,
     'Sale' : sale_list,
     'Sale Price' : price_list,
     'Original Price' : price_original_list,
     'Year of release' : year_list,
     'Win' : system_win_list,
     'Lin' : system_lin_list,
     'Osx' : system_osx_list,
     'Info retreived': timestamp_list
    })
percentile_list.replace(["Very Positive", "Overwhelmingly Positive", "Positive", "Mostly Positive",
                        "Mixed","Mostly Negative","Negative","Very Negative","Overwhelmingly Negative" ], ["8","9","7","6","5","4","3","2","1"])

Unnamed: 0,Title,Rating 1-8,Reviews,Sale,Sale Price,Original Price,Year of release,Win,Lin,Osx,Info retreived
0,Cyberpunk 2077,8,605869,[-50%],"[29,99€]","[59,99€]","9 Dec, 2020",1,0,0,08/12/2023
1,Red Dead Redemption 2,8,455206,[-67%],"[19,79€]","[59,99€]","5 Dec, 2019",1,0,0,08/12/2023
2,Grand Theft Auto V,8,1531011,[-77%],"[9,24€]","[39,98€]","13 Apr, 2015",1,0,0,08/12/2023
3,Cyberpunk 2077: Ultimate Edition,8,617237,[-43%],"[51,04€]","[89,98€]",,1,0,0,08/12/2023
4,Cyberpunk 2077: Phantom Liberty,8,11368,[-15%],"[25,49€]","[29,99€]","25 Sep, 2023",1,0,0,08/12/2023
...,...,...,...,...,...,...,...,...,...,...,...
120,ATOM RPG 2-in-1 Edition,8,11516,[-33%],"[17,44€]","[25,98€]",,1,1,1,08/12/2023
121,DEATH STRANDING DIRECTOR'S CUT,8,16379,,,,"30 Mar, 2022",1,0,0,08/12/2023
122,METAL GEAR SOLID V: THE PHANTOM PAIN,8,58295,[-40%],"[17,99€]","[29,99€]","1 Sep, 2015",1,0,0,08/12/2023
123,Sifu,8,6421,[-50%],"[19,99€]","[39,99€]","28 Mar, 2023",1,0,0,08/12/2023


### Skapar och updaterar csv fil


In [35]:
path = './SteamSales.csv'


if (os.path.exists(path)):
    percentile_list.to_csv("SteamSales.csv", mode = "a", index = False, header = False)
else:
    percentile_list.to_csv("SteamSales.csv", mode = "a", index = False)

### Använder head() för att se datan bättre

In [36]:
percentile_list.head(10)

Unnamed: 0,Title,Rating 1-8,Reviews,Sale,Sale Price,Original Price,Year of release,Win,Lin,Osx,Info retreived
0,Cyberpunk 2077,Very Positive,605869,[-50%],"[29,99€]","[59,99€]","9 Dec, 2020",1,0,0,08/12/2023
1,Red Dead Redemption 2,Very Positive,455206,[-67%],"[19,79€]","[59,99€]","5 Dec, 2019",1,0,0,08/12/2023
2,Grand Theft Auto V,Very Positive,1531011,[-77%],"[9,24€]","[39,98€]","13 Apr, 2015",1,0,0,08/12/2023
3,Cyberpunk 2077: Ultimate Edition,Very Positive,617237,[-43%],"[51,04€]","[89,98€]",,1,0,0,08/12/2023
4,Cyberpunk 2077: Phantom Liberty,Very Positive,11368,[-15%],"[25,49€]","[29,99€]","25 Sep, 2023",1,0,0,08/12/2023
5,Resident Evil 4,Overwhelmingly Positive,71893,[-50%],"[29,99€]","[59,99€]","23 Mar, 2023",1,0,0,08/12/2023
6,Grand Theft Auto V: Premium Edition,Very Positive,1541254,[-77%],"[9,24€]","[39,98€]",,1,0,0,08/12/2023
7,GUILTY GEAR -STRIVE-,Very Positive,30440,[-50%],"[19,99€]","[39,99€]","11 Jun, 2021",1,0,0,08/12/2023
8,Hogwarts Legacy,Very Positive,173654,[-40%],"[35,99€]","[59,99€]","10 Feb, 2023",1,0,0,08/12/2023
9,Medieval Dynasty,Very Positive,27798,[-25%],"[22,49€]","[29,99€]","23 Sep, 2021",1,0,0,08/12/2023


### Använder describe() för att få statistik över datan som t.ex. mängden (count) och topp spelet 

In [40]:
percentile_list.describe()

Unnamed: 0,Title,Rating 1-8,Reviews,Sale,Sale Price,Original Price,Year of release,Win,Lin,Osx,Info retreived
count,125,125,125,125,124,124,125.0,125,125,125,125
unique,124,4,124,37,82,49,100.0,2,2,2,1
top,Shadow of the Tomb Raider: Definitive Edition,Very Positive,1541254,[-50%],"[19,99€]","[59,99€]",,1,0,0,08/12/2023
freq,2,82,2,16,6,19,24.0,124,107,99,125
