Skip to content

princesaroj111/WebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebScraper

Smart Web Scraper

Structure of code

1st part of problem assignment: refer Package src/main/java/TreeNode.

2nd part of problem statement: refer Package src/main/java/WebScrapper.

Sample HTML payloads for testing present in src/main/resources.

Note:

  1. There are two #Main class present in both packages, you can run it to see the result/working.
  2. Please lower down the SIMILARITY_THRESHOLD, If any gird is not being detected, currently its 0.80.
  3. For webScrapper, List of "a" element is returned in a grid, it can be easily modified to only return href using getAttr("href"). For better visual experience I have printed hrefs in grids.
  4. One of the similarity score present in assignment(last one) is incorrect, code written will return correct answer.

About

Smart Web Scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages