Skip to content

HtmlParser offers an easy way to parse HTML-Files in Python.

Notifications You must be signed in to change notification settings

eliaSchenker/HtmlParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

HtmlParser

HtmlParser is a simple Python class which can help you interpret repositories

How to use:
First import the parser from its class:

from HtmlParser import HtmlParser

Create a new parser object using the following code:

parser = HtmlParser()

You can either pass the html file in the constructor:

parser = HtmlParser("<html></html>")

Or feed it later:

parser = HtmlParser()
parser.feed("<html></html>")

Then you can extract an array of top-level tags from the object by using the following code:

topLevelTags = parser.topTags

You can get the children by using the children variable in the tag object:

children = topLevelTags[0].children #Get the children of the first top level tag
myTag = children[0] #Get the first child

If you want all the tags in the html in one array use the following code:

allTags = parser.getAllTags()

To access information (such as data, attributes and the tag) from the tag type use:

print(myTag.tag)
print(myTag.attributes)
print(myTag.data)

About

HtmlParser offers an easy way to parse HTML-Files in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages