Skip to content

jyyyyylim/htmlparser-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

htmlparser-cpp

babby's first C++ project. refactor in progress

raison d'etre

- i needed parsing capability for another one of my projects

progress

  • Just Works
  • 支持 CJK 字符集
  • given a html5-compliant file input, accurately produces a doubly linked general tree representation of the DOM
  • accurately preserves ALL tag attributes
  • shouldnt discriminate against even the most horrendously formatted markup
  • any facility whatsoever to process the parsed tree
  • parses at reasonable speed
  • support of emmet-like input rules to the parser

limitations

- a 1-week old cpp dev birthed this into existence. do point out any better approach to the spaghetti that is the parsing logic
- will break down at javascript embeds if the raw string </script> is involved. clueless as to how to deal with it at the moment
- built without unsafe input handling considered. use recklessly at your own risk
- no "parse exceptions" of any kind implemented... yet?
- discards certain data such as script and stylesheet embeds
- excessive spaces within the content are not ignored, though its, at worst, an annoyance that doesnt affect the accuracy of the structure

About

reasonably robust html parser, babby's first

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages