# Scraping Steam Using lxml

Importantly:
All HTML pages have a series of HTML elements, consisting of a set of tags and attributes. HTML elements are the building blocks of a web page. A tag tells the web browser where an element begins and ends, whereas an attribute describes the characteristics of an element.

In order to read these tags with following procedures. 


In [1]:
import requests
import lxml.html

In [2]:
html = requests.get('https://store.steampowered.com/explore/new/')
doc = lxml.html.fromstring(html.content)

Particular selection of option in the above given website such ‘Popular New Releases’ 

In [3]:
new_releases = doc.xpath('//div[@id="tab_newreleases_content"]')[0]

The title is contained in a div with a class of tab_item_name.

In [4]:
titles = new_releases.xpath('.//div[@class="tab_item_name"]/text()')

Extract the title from website 

In [5]:
print(titles)

['Martial Law', 'The Planet Crafter: Prologue', 'The Two of Us', 'Arena of Kings', 'Chill Corner', "Five Nights at Freddy's: Security Breach", 'Eraser', 'Bitburner', 'The Chronicles Of Myrtana: Archolos', 'GTFO', 'Halo Infinite (Campaign)', 'Knell of St. Godhrkar', 'Century: Age of Ashes', 'Propnight', 'Farming Simulator 22', 'Snake Force', 'Gunfire Reborn', 'DYSMANTLE', 'Halo Infinite', 'Car Detailing Simulator: Prologue', 'Jurassic World Evolution 2', 'Forza Horizon 5', 'Cell to Singularity - Evolution Never Ends', 'Animal Shelter: Prologue', 'Idle Big Devil', 'Unpacking', 'Sea Brawl Autobattler', 'Crab Game', 'Age of Empires IV', 'Age of Empires IV', "Marvel's Guardians of the Galaxy", 'SCP: Containment Breach Multiplayer', 'The Dark Pictures Anthology: House of Ashes', 'Gloomhaven', 'Inscryption', 'Escape Simulator']


Extract the prices from website by specific location such as Popular new releases tag

In [6]:
prices = new_releases.xpath('.//div[@class="discount_final_price"]/text()')

In [7]:
print(prices)

['Free', 'Free To Play', 'Free', 'Free To Play', 'Free', '$12.99', 'Free', 'Free To Play', 'Free', '$9.74', '$59.99', 'Free To Play', 'Free To Play', '$7.37', '$34.99', 'Free To Play', '$6.14', '$5.15', 'Free To Play', 'Free', '$13.99', '$29.50', 'Free To Play', 'Free', 'Free To Play', '$8.19', 'Free To Play', 'Free', '$17.99', '$17.99', '$32.49', 'Free To Play', '$19.79', '$9.59', '$7.99', '$5.59']


In [8]:
platforms_div = new_releases.xpath('.//div[@class="tab_item_details"]')
total_platforms = []

for game in platforms_div:
    temp = game.xpath('.//span[contains(@class, "platform_img")]')
platforms = [t.get('class').split(' ')[-1] for t in temp]
if 'hmd_separator' in platforms:
    platforms.remove('hmd_separator')
    total_platforms.append(platforms)

 Extract details tiles , prices,tags and platforms

In [13]:
import requests
import lxml.html

In [15]:
    html = requests.get('https://store.steampowered.com/explore/new/')
    doc = lxml.html.fromstring(html.content)
    new_releases = doc.xpath('//div[@id="tab_newreleases_content"]')[0]

    titles = new_releases.xpath('.//div[@class="tab_item_name"]/text()')
    prices = new_releases.xpath('.//div[@class="discount_final_price"]/text()')

    tags = []
    for tag in new_releases.xpath('.//div[@class="tab_item_top_tags"]'):
        tags.append(tag.text_content())

    tags = [tag.split(', ') for tag in tags]
    platforms_div = new_releases.xpath('.//div[@class="tab_item_details"]')
    total_platforms = []

    for game in platforms_div:
        temp = game.xpath('.//span[contains(@class, "platform_img")]')
        platforms = [t.get('class').split(' ')[-1] for t in temp]
        if 'hmd_separator' in platforms:
            platforms.remove('hmd_separator')
        total_platforms.append(platforms)

    output = []
    for info in zip(titles,prices, tags, total_platforms):
        resp = {}
        resp['title'] = info[0]
        resp['price'] = info[1]
        resp['tags'] = info[2]
        resp['platforms'] = info[3]
        output.append(resp)

    print(output)

[{'title': 'Martial Law', 'price': 'Free', 'tags': ['Pixel Graphics', 'Atmospheric', 'Choices Matter', 'Adventure'], 'platforms': ['win', 'mac', 'linux']}, {'title': 'The Planet Crafter: Prologue', 'price': 'Free To Play', 'tags': ['Open World Survival Craft', 'Survival', 'Open World', 'Exploration'], 'platforms': ['win']}, {'title': 'The Two of Us', 'price': 'Free', 'tags': ['Strategy', '2D Platformer', 'Online Co-Op', 'Puzzle Platformer'], 'platforms': ['win']}, {'title': 'Arena of Kings', 'price': 'Free To Play', 'tags': ['Free to Play', 'PvP', 'MOBA', 'Competitive'], 'platforms': ['win']}, {'title': 'Chill Corner', 'price': 'Free', 'tags': ['Relaxing', 'Atmospheric', 'Casual', 'Utilities'], 'platforms': ['win']}, {'title': "Five Nights at Freddy's: Security Breach", 'price': '$12.99', 'tags': ['Horror', 'Survival Horror', 'Singleplayer', 'Robots'], 'platforms': ['win']}, {'title': 'Eraser', 'price': 'Free', 'tags': ['3D Platformer', 'Indie', 'Online Co-Op', 'Difficult'], 'platforms