Skip to content

a fully customizable web contents crawler for collecting ml dataset

License

Notifications You must be signed in to change notification settings

gridaco/contents-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

contents-crawler

A fully customizable web contents crawler for collecting ml dataset.

packages

  • text crawler (general text crawler)
  • classified text crawler (crawls text contained by button, input placeholder, etc..)
  • image crawler
  • screen shot crawler

Contribution

Follows general bridged contributing guideline

Development

Crawlers powered by Scrapy with Python3. It'll later use Selenium for collecting screenshots & supporting client-side rendered apps.

Run it on your own

(WIP) - tutorial will be provided soon

No, I just want the ready data set.

Goto ui-dataset for ml-ready dataset

About

a fully customizable web contents crawler for collecting ml dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published