HTMLParser

HTMLParser is a class for scrapping and parsing a webpage. Especially useful for converting a table in HTML syntax to a pandas.DataFrame.

Example

Here we scrap a page from Wikipedia, parse it for tables, and convert the first table found into a pandas.DataFrame.

from htmlparser import HTMLParser
import pandas

# Here we scrap a page from Wikipedia, parse it for tables, and convert the first table found into a `pandas.DataFrame`.
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
hp = HTMLParser(url)
# scrapping the webpage
page = hp.scrap_url()
# extracting only tables from the webpage
element = 'table'
params = {'class': 'wikitable sortable'}
elements = hp.get_page_elements(page, element=element, params=params)
# get a pandas.DataFrame from the (first) html table
df = hp.parse_html_table(elements[0])
print(df.columns.values)

This results in the following output (column headers):

['Symbol' 'Security' 'SEC filings' 'GICS Sector' 'GICS Sub Industry'
 'Headquarters Location' 'Date first added' 'CIK' 'Founded']

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
example		example
htmlparser		htmlparser
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HTMLParser

Example

About

Uh oh!

Releases

Packages

Languages

License

fmilthaler/HTMLParser

Folders and files

Latest commit

History

Repository files navigation

HTMLParser

Example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages