Skip to content

pageman/Aldo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aldo

Simple, yet relatively advanced HTML scraper.

Most page scrapers built in PHP can be tedious to use, while providing unintended results. These page scrapers iterate through the HTML for each independent "DOM Extraction", thus making it slow to use. Once the results are received, you still need to manipulate and sort the data yourself, which can be difficult without knowledge of JavaScript.

Aldo aims to make it almost effortless to fetch results from a remote website.

TODO

  • HTTP Requests
  • Element Manager
  • Selectors for ID, class (TODO: and other types)
  • Sorting
  • Filtering (getting emails)
  • Rebuild HTML

Small TODO

  • Parent/children
  • Set value of element, instead of creating a new array for value
  • Handle HTML empty elements: input, br, etc
  • Do not include comments in sequence
  • Alias functions for certain attributes; href => link(), src => source(), value => val(), etc
  • Support multiple classes in element
  • Turn arrays into objects

About

Revamp of the previous page scraper, "Damon".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 98.1%
  • HTML 1.9%