Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request #1

Open
Svalbard92 opened this issue May 31, 2023 · 11 comments
Open

feature request #1

Svalbard92 opened this issue May 31, 2023 · 11 comments

Comments

@Svalbard92
Copy link

Hi,
The code works flawlessly.
Can you please modify it to get the results from https://www.amazon.com/gp/goldbox too?

Thanks.

@sushil-rgb
Copy link
Owner

Svalbard92

Thank you, I appreciate it. I will look into this, please wait for the update.

@Svalbard92
Copy link
Author

Thanks for your prompt response.
after a closer look, it was seen that some items of that page are nested inside another and that may be the reason why it is returning null data.

one more feature request:

Currently only one image of the product is being fetched and stored. and also product description (about the item section, under id feature-bullets) is not being fetched/stored.

Can you modify the code to store### all the images (links) available and the features (texts of about the item) of the item?

Thanks.

@sushil-rgb
Copy link
Owner

Thanks for your prompt response.
after a closer look, it was seen that some items of that page are nested inside another and that may be the reason why it is returning null data.

one more feature request:

Currently only one image of the product is being fetched and stored. and also product description (about the item section, under id feature-bullets) is not being fetched/stored.

Can you modify the code to store### all the images (links) available and the features (texts of about the item) of the item?

Thanks.

Certainly, I will look into this and will update the code as soon as possible.

@Svalbard92
Copy link
Author

Hi,

Any update on the same?

@sushil-rgb
Copy link
Owner

Hi,

Any update on the same?

I haven't had a chance to look further, I am stuck on pagination, the goldbox page is JS rendered so using playwright, you can see the new addition Goldbox method in my scraper class, however the script only extracts the next page's url. I will finish the script by next week (estimation)

@Svalbard92
Copy link
Author

I have seen that you have made some changes and committed to master branch, so i was not quite sure if the modifications worked for you or not as it was retuning null data for me.

Thanks for the update.

@sushil-rgb
Copy link
Owner

Hello @Svalbard92, I have made some updates to the script. The new method, called concurrent_scraping_gb, is responsible for scraping Amazon deals. However, there are several errors occurring. One of the issues is that the script is skipping certain URLs during the scraping process. This problem arises because some of the URLs directly lead to a product within the deal URLs. I still need to investigate this issue. I would appreciate it if you could try running the script and see the results for yourself. The current method successfully scrapes various product information, including product breakdown, description, saved deals, and a list of images, and stores them in an Excel database. Please keep in mind that these fields only work for goldbox URLs.
Cheers!!

@Svalbard92
Copy link
Author

Thanks for the update.
while running the code, i am getting raise ValueError("No objects to concatenate") ValueError: No objects to concatenate in scaper.py after

Crawling page | 436.
Content loading error beyond this page. Error message | Element is not attached to the DOM
=========================== logs ===========================
attempting click action
  waiting for element to be visible, enabled and stable
============================================================.
The extraction process has begun and is currently in progress. 
The web scraper is scanning through all the links and collecting relevant information. 
Please be patient while the data is being gathered.

@Svalbard92
Copy link
Author

Hi @sushil-rgb , could you manage time to look into the issue?

@sushil-rgb
Copy link
Owner

Hi @sushil-rgb , could you manage time to look into the issue?

Hey @Svalbard92. I will look into this today and will update the codebase as soon as possible.

@Svalbard92
Copy link
Author

Hi,
I have tried the code today. Unfortunately, it is giving me error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants