# Extracting data from Steam 

In this project, you'll get a chance to explore scraping data from Steam. We'll get you started with importing the packages you'll need and connecting to a sample webpage on the site (https://store.steampowered.com/tags/en/Action/). We provide a few ideas below of what you can try scraping, too, but feel free to roam around!

## Initial Setup

In [97]:
from bs4 import BeautifulSoup
import requests

## Connect to Steam webpage

In [2]:
response = requests.get("https://store.steampowered.com/tags/en/Action/")
response.status_code

200

In [3]:
html = response.content

In [4]:
soup = BeautifulSoup(html, "lxml")

## Instructions
`1` Try extracting the names of the top games from this page.<br>
`2` What tags contain the prices?  Can you extract the price information?<br>
`3` Can you get the text from each span tag with class equal to "top_tag"?<br>
`4` Under the "Narrow by Tag" section, there are a collection of tags (e.g. "Indie", "Adventure", etc.). <br>Write code to return these tags.<br>

## Codes

**`1` Try extracting the names of the top games from this page.<br>
`2` What tags contain the prices?  Can you extract the price information?**

In [85]:
tag_topSellers = soup.find_all('div', {'id':'TopSellersTable'})[0]
tag_topSellers

<div id="TopSellersTable">
<div id="TopSellersRows">
<a class="tab_item" data-ds-appid="1118010" data-ds-crtrids="[33273264,34827959]" data-ds-itemkey="App_1118010" data-ds-tagids="[19,3859,1695,9564,1685,5144,4026]" href="https://store.steampowered.com/app/1118010/Monster_Hunter_World_Iceborne/?snr=1_241_4_action_104" onmouseout="HideGameHover( this, event, 'global_hover' )" onmouseover="GameHover( this, event, 'global_hover', {&quot;type&quot;:&quot;app&quot;,&quot;id&quot;:1118010,&quot;params&quot;:{&quot;bDisableHover&quot;:false},&quot;public&quot;:1,&quot;v6&quot;:1} );">
<div class="tab_item_cap">
<img class="tab_item_cap_img" src="https://cdn.cloudflare.steamstatic.com/steam/apps/1118010/capsule_184x69.jpg?t=1596524584"/>
</div>
<div class="discount_block tab_item_discount no_discount" data-price-final="31200"><div class="discount_prices"><div class="discount_final_price">HK$ 312.00</div></div></div> <div class="tab_item_content">
<div class="tab_item_name">Monster Hunter Worl

In [99]:
#top seller results from page 1 only
tag_titles = tag_topSellers.find_all('div',{'class':'tab_item_name'})

for title in tag_titles:
    print(title.string.strip())

Monster Hunter World: Iceborne
Borderlands 3
Monster Hunter: World
Sea of Thieves
Grand Theft Auto V
Granblue Fantasy: Versus
Conqueror's Blade
Destroy All Humans!
Risk of Rain 2
Marvel's Avengers
Warframe
SWORD ART ONLINE Alicization Lycoris
Counter-Strike: Global Offensive
Red Dead Redemption 2
Destiny 2


In [100]:
tag_prices = tag_topSellers.find_all('div', {'class':'discount_final_price'})

for price in tag_prices:
    print(price.string.strip())

HK$ 312.00
HK$ 234.00
HK$ 234.00
HK$ 269.00
HK$ 124.50
HK$ 438.00
Free to Play
HK$ 229.00
HK$ 103.20
HK$ 479.00
Free to Play
HK$ 349.00
Free to Play
HK$ 468.00
Free To Play


**`3` Can you get the text from each span tag with class equal to "top_tag"?**

In [109]:
tags_topTag = soup.find_all('span', {'class':'top_tag'})

topTagDict = {}
for topTag in tags_topTag:
    entry = topTag.string.replace(', ','') #removing , + ' '
    topTagDict[entry] = topTagDict.get(entry, 0) + 1
    
print(topTagDict)

{'Third-Person Shooter': 4, 'Action Roguelike': 6, 'Multiplayer': 32, 'Action': 52, 'Shooter': 8, 'FPS': 15, 'First-Person': 2, 'Adventure': 20, 'RPG': 13, 'Female Protagonist': 3, 'Free to Play': 11, 'MMORPG': 2, 'Villain Protagonist': 3, 'Aliens': 2, 'Simulation': 2, 'Comedy': 1, 'Family Friendly': 1, 'Survival': 2, 'Crafting': 1, 'Roguelite': 1, 'Dungeon Crawler': 1, 'Pixel Graphics': 1, 'Masterpiece': 4, 'Singleplayer': 1, 'Story Rich': 2, 'Anime': 3, 'Side Scroller': 1, 'Indie': 3, 'Pirates': 2, 'Star Wars': 1, 'Massively Multiplayer': 2, "Shoot 'Em Up": 1, 'Arcade': 1, 'Psychedelic': 1, 'Open World': 11, 'Hunting': 1, 'Looter Shooter': 7, 'Online Co-Op': 2, 'Co-op': 11, 'Automobile Sim': 3, 'Fighting': 3, '2D Fighter': 1, 'Strategy': 4, 'Superhero': 2, 'Competitive': 4, 'Battle Royale': 3, 'Hero Shooter': 4, 'Tactical': 2, 'MOBA': 1, 'Soccer': 1, 'Sports': 2, '2D': 1, 'Horror': 1, 'Survival Horror': 1, 'Heist': 2, 'Utilities': 1, 'Software': 1, 'Mature': 1, 'Design & Illustration

#### `4` Under the "Narrow by Tag" section, there are a collection of tags (e.g. "Indie", "Adventure", etc.). <br>Write code to return these tags.<br>

In [112]:
tag_collections = soup.find_all('div',{'class':'tag_count_button'})
tag_collections

[<div class="tag_count_button">
 <span class="tag_name">Indie</span>
 <span class="tag_count tab_filter_control_count">15,629</span>
 </div>,
 <div class="tag_count_button">
 <span class="tag_name">Adventure</span>
 <span class="tag_count tab_filter_control_count">9,036</span>
 </div>,
 <div class="tag_count_button">
 <span class="tag_name">Singleplayer</span>
 <span class="tag_count tab_filter_control_count">6,625</span>
 </div>,
 <div class="tag_count_button">
 <span class="tag_name">Casual</span>
 <span class="tag_count tab_filter_control_count">6,617</span>
 </div>,
 <div class="tag_count_button">
 <span class="tag_name">RPG</span>
 <span class="tag_count tab_filter_control_count">3,709</span>
 </div>,
 <div class="tag_count_button">
 <span class="tag_name">Early Access</span>
 <span class="tag_count tab_filter_control_count">3,439</span>
 </div>,
 <div class="tag_count_button">
 <span class="tag_name">2D</span>
 <span class="tag_count tab_filter_control_count">3,341</span>
 </div>

In [126]:
collectionList = []

for tag in tag_collections:
    collectionName = tag.contents[1].string
    collectionCount = tag.contents[3].string
    
    collectionList.append({
        'name':collectionName, 'count':collectionCount
    })

In [127]:
print(collectionList)

[{'name': 'Indie', 'count': '15,629'}, {'name': 'Adventure', 'count': '9,036'}, {'name': 'Singleplayer', 'count': '6,625'}, {'name': 'Casual', 'count': '6,617'}, {'name': 'RPG', 'count': '3,709'}, {'name': 'Early Access', 'count': '3,439'}, {'name': '2D', 'count': '3,341'}, {'name': 'Simulation', 'count': '3,212'}, {'name': 'Multiplayer', 'count': '3,169'}, {'name': 'Strategy', 'count': '3,142'}, {'name': 'Shooter', 'count': '2,982'}, {'name': 'Great Soundtrack', 'count': '2,638'}]
