GitHub - safffrron/LLM-Scraper: Scraping your Amazon order history using LLM and selenium.

LLM-Scraper

Introduction

The LLM Fetcher is a Python script that automates the task of fetching user-level order details from the Amazon website. By leveraging Selenium, the script navigates to the order history page, extracts key order details from each historical order, and saves them in a structured format.

Features:

Automated navigation to the Amazon order history page.
Extraction of order details such as order number, product names, quantities, prices, and delivery status.
Saving of order details in raw HTML files with a structured naming convention.
Structured storage of order details in JSON or CSV format.
Secure handling of user authentication credentials.
Robust handling of unexpected events and website changes

For more info check Intuition-and-Proposed-Solution.md

Set-up

Clone this repository using :
```
git clone https://github.com/safffrron/LLM-Scraper.git
```
For more information on cloning refer - https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository
Install selenium , by writing this in your terminal or your python notebook :
```
pip install selenium 
```
The other required dependencies like the LLM model , will automatically be installed during execution.

Things to consider -

You may incur an error while login in Amazon , specifically when there is a captcha page. In that case just enter the captcha manually and press enter , if the problem is not solved just sign in the browser manually and re run the block of code. This will solve the error.
This scraper works on a free openly available LLM taken from Huggingface.transformers called GPT neo. For improving the performance you can replace that by your own better LLM model such as GPT 4 and such.
When getting the order history , it only consider the default page that pops up. To include other pages , if there is next page button decomment the section as mentioned there.

Running -

For running via termial , first cd to this repo and execute -

python main.py

Understanding the code base -

If you want to understand the codebase deeply , a notebook with all the comments is provided.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Intuition-and-Proposed-Solution.md		Intuition-and-Proposed-Solution.md
LLM-Scraper-Notebook.ipynb		LLM-Scraper-Notebook.ipynb
README.md		README.md
all_orders.html		all_orders.html
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Scraper

Introduction

Set-up

Things to consider -

Running -

Understanding the code base -

About

Releases

Packages

Languages

safffrron/LLM-Scraper

Folders and files

Latest commit

History

Repository files navigation

LLM-Scraper

Introduction

Set-up

Things to consider -

Running -

Understanding the code base -

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages