HtmlParser
HtmlParser is a simple Python class which can help you interpret repositories
How to use:
First import the parser from its class:
from HtmlParser import HtmlParser
Create a new parser object using the following code:
parser = HtmlParser()
You can either pass the html file in the constructor:
parser = HtmlParser("<html></html>")
Or feed it later:
parser = HtmlParser()
parser.feed("<html></html>")
Then you can extract an array of top-level tags from the object by using the following code:
topLevelTags = parser.topTags
You can get the children by using the children variable in the tag object:
children = topLevelTags[0].children #Get the children of the first top level tag
myTag = children[0] #Get the first child
If you want all the tags in the html in one array use the following code:
allTags = parser.getAllTags()
To access information (such as data, attributes and the tag) from the tag type use:
print(myTag.tag)
print(myTag.attributes)
print(myTag.data)