# Python Intermediate - Day 1
---
# `requests` - the python addon to get data by HTTP

**Import before you use it**

```
import requests
```

Provide a web page link that you want to fetch data from

```
r = requests.get('https://data.weather.gov.hk/weatherAPI/opendata/weather.php?dataType=fnd&lang=tc') # retrieving live weather data
r
r.status_code
r.headers
r.headers['content-type']
r.encoding
r.text
r.json()
```
To run a CELL
- Press `▶ Run` Button
- HOLD `SHIFT` + PRESS ```ENTER```

# Retrieving JSON Data
Example commands to retrieve JSON data
```
r.json()
r.json()['generalSituation']
r.json()['weatherForecast'][0]
r.json()['weatherForecast'][0]['week']
r.json()['weatherForecast'][0]['forecastWeather']
print(r.json()['weatherForecast'][0]['week'],
      ' | ',
      r.json()['weatherForecast'][0]['forecastMintemp']['value'],
      r.json()['weatherForecast'][0]['forecastMintemp']['unit'],
      ' | ',
      r.json()['weatherForecast'][0]['forecastWeather']
     )
```

# Simple HTML Extracting

Import `BeautifulSoup` before you use it

```
from bs4 import BeautifulSoup
```


# Dummy HTML Texts

Declare the following HTML documents

```
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<b>Sample HTML Contents</b>
<p class="title purple"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister purple" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie (<b>Important</b>)</a>;
and they lived at the bottom of a well.</p>

<p class="story">a paragraph ... </p>
</body>
</html>
"""
````

**Type the following commands to get to know soup**:
```
soup = BeautifulSoup(html_doc, 'html.parser')
soup
type(soup)
```

# call `prettify()` function to show neat HTML codes
The following command will display a neat output
```
print(soup.prettify())
```

# Use `find()` to retrieve a child elements
Examples:
```
soup.find('html')
soup.find('head')
soup.find('title')
soup.find('body')
soup.find('p')
p = soup.find('p')
type(p)
b = p.find('b')
type(b)
```
`find()` _function will only return ONE SINGLE element even if there are multiple matched_

# Use dot  `.` as short hand
To retrieve the `<title>` tag
```
soup.title
```
Other examples
```
soup.html
soup.head
soup.body
title_tag = soup.title 
print(title_tag.name)
print(title_tag.string)
print(title_tag.text)
```

To retrieve the title text only


# Get the parent tag
`.parent` gives the parent tag of current tag
```
title_tag.parent
title_tag.parent.name
title_tag.parent.string
```

# Extracting the attributes of a tag
Showing attribute
```
a_tag = soup.a
a_tag
a_tag["class"]
a_tag["href"]
a_tag["id"]
a_tag.attrs # show all the attributes of a tag
```
a_tag.attrs
Showing all attributes
```

```

# Find all the matching tags, use `find_all()`
`find_all()` function will return all the matching tags in the form of array
Example:
```
soup.find_all('a')
links = soup.find_all('a')
print(links)
type(links)
links[0]
links[1]
links[0]["href"]
```


# Retrieving by css class name
Examples:
```
soup.find(class_='sister')
soup.find_all(class_='sister') # returns all tags with sister css class
soup.find_all('a', class_='purple') # returns the `<a>` tag with css class purple
soup.find_all('p', class_='purple') # returns the `<p>` tag with css class purple

```

# Limit the number in search result
Example
```
soup.find_all('a')
soup.find_all('a', limit=2) # set the limit return to 2

```

# Retrieving by HTML `id`
Examples:
```
soup.find(id='link1')
```

**Note**:
- id is a unique value. So you should expecting only one matched tag.  
- However there could be exception as it's quite common that HMTL codes are buggy and messy.

# Advanced CSS Selectors
If you are experienced with CSS coding, you will be familiar with the following coding styles
```
soup.select('body b')
soup.select('p b')
soup.select('body>b')
soup.select('body>p>b')
```

# Practical Session
#### Use requests and BeautifulSoup together
BeautifulSoup is NOT a HTTP client, we have to use `requests` to retrieve HTML codes from an actual webiste

**Required imports**:
```
import requests
from bs4 import BeautifulSoup
```

In [None]:
import requests
from bs4 import BeautifulSoup

### Declaring url to retrieve
```
url = "https://stock360.hkej.com/marketWatch/Top20/topGainers"
webpage_request = requests.get(url)
soup = BeautifulSoup(webpage_request.content, 'html.parser')
```

### Use `find()` to retrieve top stock rows
```
top_stocks_table = soup.find(class_='dt640')
print(type(top_stocks_table))
stock_rows = top_stocks_table.find_all("tr")
print(len(stock_rows))
print(stock_rows[0])
print(stock_rows[1])
print(stock_rows[2])
print(stock_rows[3])

```

### Looping the top stock rows
```
for i in range(2, len(stock_rows)):
    stock = stock_rows[i]
    code = stock.find(class_='code')
    name = stock.find(class_='name')
    print(f'{code}\t{name}')
    #print(f'{code.string}\t{name.string}')
```

### Retrieving more stock columns
```
for i in range(2, len(stock_rows)):
    stock = stock_rows[i]
    code = stock.find(class_='code')
    name = stock.find(class_='name')
    latest = stock.find(class_='latest')
    change = stock.find(class_='change')
    change_p = stock.find(class_='change_p')
    volumn = stock.find(class_='volumn')
    turnover = stock.find(class_='turnover')
    market_cap = stock.find(class_='marketCap')    
    print(f'{code.string}\t{name.string}\t{latest.string}\t{change_p.text}')
```

# Run the complete python script
There is a complete python script named `get_active_stock.py` in the script folder of the downloaded folder

**To run the script**:
- Open the command line window / Terminal
- Type in command: `python3 get_active_stock.py`