Webstruct

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only on text data. This allows to define features that use HTML structure, and also to embed annotation results back into HTML.

Read the docs for more info.

License is MIT.

Contributing

Source code: https://github.com/scrapinghub/webstruct
Bug tracker: https://github.com/scrapinghub/webstruct/issues

To run tests, make sure tox is installed, then run tox from the source root.

Name		Name	Last commit message	Last commit date
Latest commit History 452 Commits
docs		docs
example		example
webstruct		webstruct
webstruct_data		webstruct_data
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGES.rst		CHANGES.rst
README.rst		README.rst
codecov.yml		codecov.yml
ideas.rst		ideas.rst
requirements-dev.txt		requirements-dev.txt
requirements-doc.txt		requirements-doc.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webstruct

Contributing

About

Releases

Packages

Contributors 9

Languages

scrapinghub/webstruct

Folders and files

Latest commit

History

Repository files navigation

Webstruct

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages