# Accessing data from Twitch

Let's get started with retrieving data from Twitch.tv. For this demonstration, we would like to retrieve some data from one of the category pages (from "The Sims"](https://www.twitch.tv/directory/game/The%20Sims%204). Open that category page and have a look at it!

## Selenium

Before we can start this demonstration, we have to install a new package, called `Selenium`. Selenium is a very flexible tool that can "automate" a Chrome browser, and as such, simulate actual browsing behavior rather than just "downloading" the HTML source code of a website (such as done with `BeautifulSoup`).

__Using Jupyter Notebook on your own computer__

If you're running this notebook on your own computer, you first have to install Selenium & Chromedriver.

--> Check out the [installation instructions right now](https://tilburgsciencehub.com/configure/python-for-scraping).

__Using Google Colab__

If you're using Google Colab, you can run the cell below to install Selenium and Chromedriver right now. It takes about a minute or so to run.


In [2]:
# Simply run this cell to install Selenium/Chromedriver on Google Colab! 

# If you're using Jupyter Notebook on your own computer instead, you need to install
# selenium and chromedriver yourself: https://tilburgsciencehub.com/configure/python-for-scraping

!pip install selenium
!apt-get update 
!apt install chromium-chromedriver

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)

Hit:1 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:4 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:9 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:10 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease
Hit:11 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Hit:12 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic InRelease
Hit:13 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic 

  # Remove the CWD from sys.path while we load stuff.


## Let's get started

Let's first point our Chrome browser to the URL below. Recall it's the category overview page for the game "The Sims 4".

In [5]:
url = 'https://www.twitch.tv/directory/game/Grand%20Theft%20Auto%20V'
driver.get(url)

We can now view the source code of the website by accessing the driver's attribute `page_source`.

In [6]:
driver.page_source



When inspecting the source code (Chrome/Firefox --> right click --> inspect elements), we notice that the titles of streams are tagged with `h3`.

We can use the funciton `.find_elements_by_tag_name()` to search for them. For example:

In [15]:
driver.find_element_by_tag_name('h3')

<selenium.webdriver.remote.webelement.WebElement (session="646833254c4e77518a34dff7c112d3b9", element="a1acf5e7-55be-4dbb-b91f-f5fb3f555530")>

...observing a message like `<selenium.webdriver.remote.webelements...` informs us about the fact that selenium was able to locate an elements.

We can also view its attributes, for example `.text` to view the caption of the element.

In [16]:
driver.find_element_by_tag_name('h3').text

'Vinny "Material King" Pistone | Chang Gang | NoPixel | !VBI !PObox !GFuel !Merch'

Open the [category page of The Sims]('https://www.twitch.tv/directory/game/Grand%20Theft%20Auto%20V') again. Can you find that stream somewhere?!

(It should be the first stream that you see...).


## Extracting multiple elements

We can *change* the function `find_element[...]` to its plural form, `.find_elements[...]` to extract *all* possible elements tagged with `h3`.

In [21]:
all_streams = driver.find_elements_by_tag_name('h3')

Running the `len(...)` command ("length") informs us about how many elements were extracted.

In [22]:
len(all_streams)

30

Using a simple `for` loop, we can now extract the titles of all streams.

In [23]:
for stream in all_streams:
  print(stream.text)

Vinny "Material King" Pistone | Chang Gang | NoPixel | !VBI !PObox !GFuel !Merch
NoPixel | FLIPPY | #TONOR #ad | !MERCH OUT NOW | SileNt Live on !youtube
Abdul "Mayor of Los Santos" | !NoPixel | !Subtember | !Mayor | !NordVPN | !SneakEnergy | !Socials | #GoodVibes #ToxicFree #COYB
NoPixel WL | Mike Wadum | Day 8 of 30 | @Xiceman126
Speedy Vagos jefe -subathon at 2k subs! -Nopixel-twitter@Sayeeedblack
Sergeant Matthew Espinoz - !subtember - NoPixel 3.0 WL
Jesse BBMC - Subtember vibes. [Nopixel WL]
Capitan KJ | NoPixel [WL] | !Subtember! Meth, weed and Boosting. Come chill with me <3 !RP !meta !Vagos
Sherry Paie - UWU and mayor stuff - Nopixel 3.0 WL !subtember
NoPixel Dev | UwU Cafe construction, rushing before Tsunami - !SUBtember | !NordVPN !links | #Polglish
Nopixel 3.0 | 353 'The Enforcer' Deputy Nova | !Discord To Join Us | Morning FTO'ing
Lilith Ailhart | HOB | NoPixel WL 3.0 | !discord
Pez vs Chicken Wizard | NoPixel WL | Subs 20% off - !RAID !Discord !Merch !Trivia !pez !podcast

## Want more?

As you may have seen in other tutorials, there are multiple ways to refer to elements on a website, such as by ids, by class names, by tags, etc. Check [this page](https://selenium-python.readthedocs.io/locating-elements.html) to learn about *other* ways to extract data using the Selenium package.