Skip to content

yongxiangtang/JavaWebScrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BrightEdge Java WebScrapper

Summary 
-------
This java webscrapper is robust text scraper that will connect to a page on 
www.shopping.com, and return a result for a given keyword.This scrapper utilize
Jsoup HTML scrapper to obtain pages from www.shopping com. Two query can be 
perform using this program. The first query is getting the total number of 
queries by a given search keywords. The second query is find all results detail 
information by a given keywords. This program also include handle exceptions on 
invalid arguments, invalid page number, and no page found by using some keywords
or exceeding maximum number of search pages.
Queries: 
    Query 1: (requires a single argument)
    java -jar Assignment.jar <keyword> 
    (e.g. java -jar Assignment.jar "baby strollers")
    
    Query 2: (requires two arguments)
    java -jar Assignment.jar <keyword> <page number> 
    (e.g. java -jar Assignment.jar "baby strollers" 2)


Approach
--------
The program can be divided into two sections. The first section is handling user
input. When program starts, the default query is Query1. If user provides one 
argument, it means user wants to perform Query1.The program parse user's 
argument and find the keyword. The keyword will append to the specific URL for 
searching. After calling Jsoup's connect method, the program obtains Document 
Object, which is the returned webpage by offering keyword. Then, by calling 
getTotalResultNumber() method, the program find the location of total number of 
searching results.Finally,print results on screen.

If user provides two arguments, the second arugment will be page number. Page 
number must be non-zero positive number. If a negative or 0 is provided by user,
the program will be interrupted by a InvalidPageNumber exception. If user 
provides a number that exceed the maximum of page number. Then a no-match 
webpage is returned. The program will detect the no-match page and exit by 
create a PageNotFound exception. If user provides proper search keywords and 
page number, program will execute Query2 option, and Jsoup connect shopping.com
with another URL including both keywords and page nubmers. Then program will 
scrap the webpage to find each product's title, price, shipping price and 
vendor. After gathering all the required information, a Product Object is 
created. All product Objects will be stored into a ArrayList.Finally, the 
progarm print out all products in list. 

Command Line Argument Example
-----------------------------
Query 1: java -jar Assignment.jar "digital camera"
Query 2: java -jar Assignment.jar "digital camera" 2

Contact
-------
Author: Yongxiang Tang
Date: Oct 22, 2014
Email: yongxiat@usc.edu



About

This scrapper is designed and implemented for BrightEdge online assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages