The LLM Fetcher is a Python script that automates the task of fetching user-level order details from the Amazon website. By leveraging Selenium, the script navigates to the order history page, extracts key order details from each historical order, and saves them in a structured format.
Features:
- Automated navigation to the Amazon order history page.
- Extraction of order details such as order number, product names, quantities, prices, and delivery status.
- Saving of order details in raw HTML files with a structured naming convention.
- Structured storage of order details in JSON or CSV format.
- Secure handling of user authentication credentials.
- Robust handling of unexpected events and website changes
For more info check Intuition-and-Proposed-Solution.md
-
Clone this repository using :
git clone https://github.com/safffrron/LLM-Scraper.git
For more information on cloning refer - https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository
-
Install selenium , by writing this in your terminal or your python notebook :
pip install selenium
The other required dependencies like the LLM model , will automatically be installed during execution.
-
You may incur an error while login in Amazon , specifically when there is a captcha page. In that case just enter the captcha manually and press enter , if the problem is not solved just sign in the browser manually and re run the block of code. This will solve the error.
-
This scraper works on a free openly available LLM taken from Huggingface.transformers called GPT neo. For improving the performance you can replace that by your own better LLM model such as GPT 4 and such.
-
When getting the order history , it only consider the default page that pops up. To include other pages , if there is next page button decomment the section as mentioned there.
For running via termial , first cd to this repo and execute -
python main.py
If you want to understand the codebase deeply , a notebook with all the comments is provided.