Smart Web Scraper
1st part of problem assignment: refer Package src/main/java/TreeNode.
2nd part of problem statement: refer Package src/main/java/WebScrapper.
Sample HTML payloads for testing present in src/main/resources.
- There are two #Main class present in both packages, you can run it to see the result/working.
- Please lower down the SIMILARITY_THRESHOLD, If any gird is not being detected, currently its 0.80.
- For webScrapper, List of "a" element is returned in a grid, it can be easily modified to only return href using getAttr("href"). For better visual experience I have printed hrefs in grids.
- One of the similarity score present in assignment(last one) is incorrect, code written will return correct answer.