# HTML

Online Resources:
* https://youtu.be/CKlh1lwe2rY

In this section, we will learn how we can load HTML tables directly into Pandas, and learn the basics of web scraping which is a very popular way for data gathering.

# HTML Tutorial
If the webpage (HTML) has a table inside, we can easily extract it with `Pandas` and `requests`.

In [1]:
import pandas as pd
import requests

Download the source code

In [2]:
url = 'https://www.worldcoinindex.com/'
crypto_url = requests.get(url)
crypto_url

<Response [200]>

We already talked about these responses in the APIs section. To take the body we need to take the attribute text.

In [3]:
body = crypto_url.text

Body now consists of full HTML source code of our webpage. Now if the HTML source has a table which is marked by the HTML tag ```<table></table>``` (this tag is used for defining a table in HTML) Pandas uses ```read_html()``` to extract the table from the HTML document.

So, whenever you pass a HTML to pandas and expect it to output a nice looking data-frame, make sure the HTML page has a table in it!

In [4]:
crypto_data = pd.read_html(body)
print(type(crypto_data))
print(len(crypto_data))

<class 'list'>
1


From the above output, it is clear that there is a `list` with one element which is our table. Therefore:

In [5]:
crypto_data = crypto_data[0]
crypto_data.head()

Unnamed: 0,#,Unnamed: 1,Name,Ticker,Last price,%,24 high,24 low,Price Charts 7d,24 volume,# Coins,Market cap
0,1,,Ethereum,ETH,"$ 2,727.42",-4.55%,"$ 2,858.96","$ 2,556.62",,$ 17.47B,116.15M,$ 316.81B
1,2,,Bitcoin,BTC,"$ 37,187",-5.24%,"$ 39,261","$ 35,634",,$ 14.66B,18.72M,$ 696.39B
2,3,,Dogecoin,DOGE,$ 0.385950,-3.73%,$ 0.401433,$ 0.350594,,$ 5.67B,129.40B,$ 49.94B
3,4,,Ripple,XRP,$ 0.981902,-6.09%,$ 1.05,$ 0.917729,,$ 3.54B,46.15B,$ 45.31B
4,5,,Binancecoin,BNB,$ 398.36,-7.10%,$ 429.00,$ 366.22,,$ 3.35B,154.53M,$ 61.55B


### What if there is no table in HTML?

If we want to extract information from HTML, which doesn't have a table, we need to use a different approach: Scraping. Fortunately, Python has a great package for this called Beautiful Soup. [See here for a simple scraping tutorial](https://www.dataquest.io/blog/web-scraping-tutorial-python/)

# Scraping / Beautiful Soup

In addition to following any and all explicit rules about web scraping posted on the site, it’s also a good idea to follow these best practices:
* Never scrape more frequently than you need to
* Consider caching the content you scrape so that it’s only downloaded once as you work on the code you’re using to filter and analyze it, rather than re-downloading every time you run your code
* Consider building pauses into your code using functions like time.sleep() to keep from overwhelming servers with too many requests in too short a timespan.