Web Scraping Assessment

See the assessment requirments for additional context about the project.

Insrtuctions

Install dependencies (npm install)
Compile the program (npm run build)
Run the program (npm start)

Overview

This project is a CLI that automates the gathering of the user's Amazon order history data. The user is prompted to enter their Amazon credentials, and then given the options to either retrieve their most recent order history or retrieve orders that match specific search strings. A looping dialogue allows the user to continue retrieve/search order data until they choose to exit.

The diolague CLI design was chose because the requirements imply the user's presence to manually handle MFA scenarios.

Sign-in

The user is prompted to enter their Amazon email address and password. The email is not currently validated for email format because the sign-in flow was inteded to be generalized for future extension of the program to target other websites. A failed sign-in will gracefully close the program.

Retrieve Order History

The user is prompted to enter the maximum number of recent orders they want to retrieve. This is done in reverse chronological order, so the most recent orders are retrieved first. The default results limit is 10 orders.

Search Order History

The user is prompted to enter a comma-separated list of search strings. The program will then retrieve all orders that contain the search string. The user is prompted to enter another search string, and the program will continue to retrieve orders that contain the search string until the user chooses to exit.

Caveats

Amazon's order history is based around "orders", that is a single purchase instance that may contain multiple items. Because of that, the order price listed is the total for all items in the order. Even clicking into the "order details" for an individual item listed within order history search results will display a page with the total order price and list all items for that order.

It would require a further step of loading the order invoices and parsing the item-price pairs to retrieve the individual item prices. In the case of search results, item prices would further need to be filtered from the order invoice data to only include item prices from the search results.

I chose to stop the scraping at the order level and include the total order price along with a list of item descriptions and page links. For an assessment project, I felt that was a reasonable stopping point that sufficiently demonastrated my abilities while meeting the rough requirments provided.

Next Steps

Were this a paid/production project, the following would be my next steps:

Scrape individual item prices from order invoices
Add a thorough set of feature tests
Deduplicate search results orders where multiple matched items are from the same order.
Add some performance-based batch limit processing when launching many "order details" pages simultaneously
Troubleshoot cases when order data is "not foun", but expected

This may be isolated to Whole Foods grocery orders

Add option to limit the total (or individual) search results
Add option to use a .env to configure login credentials
Add option to save results to a JSON file
Add further user input validation (e.g. email format)
Add option to retrieve orders from a specific date range

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src		src
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Assessment

Insrtuctions

Overview

Sign-in

Retrieve Order History

Search Order History

Caveats

Next Steps

About

Releases

Packages

Languages

metamas/web-scraping-assessment

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Assessment

Insrtuctions

Overview

Sign-in

Retrieve Order History

Search Order History

Caveats

Next Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages