# Web Scraping using Selenium - SGX Website

**Objectives:** 

* How to use Selenium to extract HTML
* How to use Selenium to interact with website before extract HTML

We will use following website to demonstrate 
* https://www.sgx.com/securities/equities/D05


Install Python library `selenium` and `webdriver_manager` using `pip`. 

Import libraries

## 1. Extract Data without Interaction

We will demonstrate on how to extract company announcements and company news from SGX website.

### Open Website and Get HTML

Get an instance of web browser. 
* The `webdriver_manager` provides managers for different browsers. It will download the correct version of driver for your browser.

Use the `browser` object to open a webpage. 

Close browser

To find multiple items, you can use following methods.
* find_elements_by_id()
* find_elements_by_name()
* find_elements_by_xpath()
* find_elements_by_link_text()
* find_elements_by_partial_link_text()
* find_elements_by_tag_name()
* find_elements_by_class_name()
* find_elements_by_css_selector()

### Examine HTML Code and Make Soup

Save the HTML code to a file and examine it. Examine the file to make sure it contains the data which you are interested in.

Let's "make a soup" from the downloaded HTML code.

## 2. Example: Extract Company Announcements

In Chrome, inspect the element of the Company Announcements. It uses a custom tag `widget-company-announcements`.

```html
<widget-company-announcements class="website-template-widget print-format-d-none" data-analytics-category="Company Announcements">
      ...
</widget-company-announcements>
```

Use `soup.find()` to find above element by its tag name.

Use `soup.find_all()` to find all items.

### Experiment with One Annoucement

Let's use first item to experiment our code.

Get timestamp from the item.

Get news title and its link.

Get tag of the news

### Package into Function

Package above code into a function.

Test the function.

### Process All Announcements

Print out the result

Close web browser after the task is completed.

## 3. Exercise: Extract Company News

### Download HTML

Open web browser and go to website.

Wait for page is loaded and return HTML code. 

Close web browser.

### Make Soup and Find Items

Make a soup from HTML code.

Inspect the News elements in Chrome. The news list is contained in a tag `<widget-stocks-news>`.

Use `soup.find()` to find the relevant HTML code using tag name.

Each news item are enclosed in a `div` with class name `article-list-result`.

Use `find_all()` method to extract all new items.

### Extract One News

Use first news to experiment your code to extract data from a single news item.

Extract `timestamp` of the news.

Extract `title` of the news. 

Extract Tags of the news.

### Create Function

Package above codes into a function.

### Extract All News

Use function to extract all news items.

Print out result.

## 4. Example: Extract Data after Interaction

**Task:** Extract details of all News items of a company from SGX Website. 

For each news item on SGX Equity website, e.g. https://www.sgx.com/securities/equities/D05, user has to click on the item to view more details in a pop-up window. We have to simulate click on the item before we can extract the data.

Open web browser and go to website `https://www.sgx.com/securities/equities/D05`.

#### Close Cookie Banner

Find the `Accept` button and click to accept the cookie agreement. This is so that the banner will be closed and wont block the clicks.

Extract elements from browser using xpath.

### Experiment with One News

Experiment with code to extract one news item from pop-up.

### Package them into Function

Use the function to extract all news items.

Print out result.

Clean up by closing the web browser.