GitHub - eliaSchenker/HtmlParser: HtmlParser offers an easy way to parse HTML-Files in Python.

HtmlParser

HtmlParser is a simple Python class which can help you interpret repositories

How to use:
First import the parser from its class:

from HtmlParser import HtmlParser

Create a new parser object using the following code:

parser = HtmlParser()

You can either pass the html file in the constructor:

parser = HtmlParser("<html></html>")

Or feed it later:

parser = HtmlParser()
parser.feed("<html></html>")

Then you can extract an array of top-level tags from the object by using the following code:

topLevelTags = parser.topTags

You can get the children by using the children variable in the tag object:

children = topLevelTags[0].children #Get the children of the first top level tag
myTag = children[0] #Get the first child

If you want all the tags in the html in one array use the following code:

allTags = parser.getAllTags()

To access information (such as data, attributes and the tag) from the tag type use:

print(myTag.tag)
print(myTag.attributes)
print(myTag.data)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
HtmlParser.py		HtmlParser.py
Readme.md		Readme.md

Provide feedback