Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue with links not being found #298

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

Joeclinton1
Copy link

@Joeclinton1 Joeclinton1 commented Feb 5, 2020

Google recently changed the way they present the image data, and so the links were no longer being scraped.
I figured out how to get the image urls with the new system and made the appropriate changes so it would work.

Unfortunately, google no longer provides file format data so I had to try and retrieve it from the url of the image, which does not work in some cases.

EDIT: Since this keeps being asked, here's the code to download the patch for windows:

git clone https://github.com/Joeclinton1/google-images-download.git
cd google-images-download && python setup.py install

Google recently changed the way they present the image data, and so the links were no longer being scraped.
I figured out how to get the image urls with the new system and made the appropriate changes so it would work. 

Unfortunately, google no longer provides file format data so I had to try and retrieve it from the url of the image, which does not work in some cases.
Copy link

@landing-insights-bot landing-insights-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this will only get the first 100 images, correct?
The rest of the images get dynamically loaded through the batchexecute call.

@Joeclinton1
Copy link
Author

Joeclinton1 commented Feb 5, 2020

Seems like this will only get the first 100 images, correct?
The rest of the images get dynamically loaded through the batch execute call.

Sorry, I wasn't downloading more than 100, so I didn't think about this. I have not tested if this works with above 100, but my guess is it will not.

However, I know the below 100 does not work without these changes.

@landing-insights-bot
Copy link

landing-insights-bot commented Feb 5, 2020 via email

@MarlonHie
Copy link

I got everytime this error after circa 20 downloaded images.
I tried from command line and with a python file

Traceback (most recent call last):
File "/home/user/.local/bin/googleimagesdownload", line 11, in
load_entry_point('google-images-download==2.8.0', 'console_scripts', 'googleimagesdownload')()
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 1005, in main
paths,errors = response.download(arguments) #wrapping response in a variable just for consistency
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 832, in download
paths, errors = self.download_executor(arguments)
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 959, in download_executor
items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments) #get all image items and download images
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 769, in _get_all_items
object = self.format_object(image_objects[i])
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 276, in format_object
main = data[3]
TypeError: 'NoneType' object is not subscriptable

@vk379
Copy link

vk379 commented Feb 8, 2020

Hey, Much like MarlonHie,
I have also received the same error.
Could you please advise? It keeps saying NoneType object is not subscriptable.
Thanks,

@Rian-T
Copy link

Rian-T commented Feb 8, 2020

I made a quick fix for the NoneType error. I was working on a project using this so I needed it to work again rapidly. Still working only under 100 images though.

Joeclinton1#1

@Joeclinton1
Copy link
Author

Sorry, for not replying faster, the none-type thing is because every so often a item with a null value for the image data is given. Fortunately, all of these items are marked with 2 in the data[0] column, so I will just remove them. This should fix the problem. Rian-T's solution also works.

By filtering out the image objects which had data[0]==2, I have removed the null items and it will no longer give the error: "TypeError: 'NoneType' object is not subscriptable".
@greg-oz
Copy link

greg-oz commented Feb 10, 2020

I am still getting these errors with the latest Joeclinton1 version:

File "google_images_download.py", line 1017, in
main()
File "google_images_download.py", line 1006, in main
paths,errors = response.download(arguments) #wrapping response in a variable just for consistency
File "google_images_download.py", line 842, in download
paths, errors = self.download_executor(arguments)
File "google_images_download.py", line 960, in download_executor
items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments) #get all image items and download images
File "google_images_download.py", line 763, in _get_all_items
image_objects = self._get_image_objects(page)
File "google_images_download.py", line 752, in _get_image_objects
object_decode = bytes(object_raw, "utf-8").decode("unicode_escape")
TypeError: str() takes at most 1 argument (2 given)

This system is not very flexible, it seems google does not keep the same positions of target items, so sometimes it doens't work. I added a try-except just in case there are more problems
@hodsonus
Copy link

hodsonus commented Feb 10, 2020

Doesn't seem to work with more than 100 photos, I attempted with 1000 and it gave me this.
Screen Shot 2020-02-10 at 2 29 01 PM

edit: Oops, read a little bit closer and that's a known issue

@edgabaldi
Copy link

I ran with 20 queries and some returns this exception:

Traceback (most recent call last):
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/celery/app/trace.py", line 648, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/deploy/curador/releases/20200201133617/apps/ean/tasks.py", line 14, in download_image
    cmd.download()
  File "/home/deploy/curador/releases/20200201133617/apps/ean/domain/googleimages.py", line 32, in download
    response.download(config_dict)
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 838, in download
    paths, errors = self.download_executor(arguments)
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 965, in download_executor
    items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments)    #get all image items and download images
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 768, in _get_all_items
    image_objects = self._get_image_objects(page)
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 758, in _get_image_objects
    image_objects = json.loads(object_decode)[31][0][12][2]
IndexError: list index out of range

@codefreak404
Copy link

Hi all,

For time being the probable fix is to add image downloader extension to your chrome browser (https://chrome.google.com/webstore/detail/image-downloader/cnpniohnfphhjihaiiggeabnkjhpaldj?hl=en-US).
I am working to fix the issue, will give an update shortly.

Thanks.

@Joeclinton1
Copy link
Author

Joeclinton1 commented Feb 11, 2020

I believe the solution I have is too inflexible for deployment, as google does not seem to keep a stable enough structure to the databack send in the callback. A different solution, perhaps one which collects links which are not thumbnails inside the callback might work better.

@decidev22
Copy link

How do you import this fixed version and run it?

@hodsonus
Copy link

there isn't a working solution right now.

@seth814
Copy link

seth814 commented Feb 18, 2020

I've been trying to get limit > 100 to work. It seems selenium's browser.page_source returns lots of new lines compared to the other raw_html you typically get. I've tried stripping newlines off, but no success. Eventually it will search for: "AF_initDataCallback({key: \'ds:2\'" but returns -1. If I search just "AF_initDataCallback" I can get a start index, but this will still just result in JSONDecodeError. So it seems the entire raw_html from download_extended_page is getting parsed incorrectly.

EDIT: Converting the string to a bytearray and back to a string allowed the image_objects to parse correctly. len(image_objects) was only 100 though so maybe selenium isn't scrolling far down enough? Will keep looking...

EDIT2: It seems my string from download_extended_page is larger, but object length staying at 100. Running with short length vs length > 100, the delta between the start and stop indexes is ~122400 for both raw_html after parsing. So no new images seem to be actually included with the expanded page_source despite it being a larger string.

MoveAngel pushed a commit to MoveAngel/One4uBot that referenced this pull request Feb 22, 2020
Unfortunately, it appears the google image formatting has been changed
this is a temporary solution from "hardikvasa/google-images-download#298"

Change-Id: Iadcfa995e6b7c6229505ec0872810876575d738e
goodmeow pushed a commit to goodmeow/OpenUbot that referenced this pull request Feb 23, 2020
Unfortunately, it appears the google image formatting has been changed
this is a temporary solution from "hardikvasa/google-images-download#298"

Change-Id: Iadcfa995e6b7c6229505ec0872810876575d738e
Signed-off-by: goodmeow <harunbam3@gmail.com>
goodmeow pushed a commit to goodmeow/OpenUbot that referenced this pull request Mar 4, 2020
Unfortunately, it appears the google image formatting has been changed
this is a temporary solution from "hardikvasa/google-images-download#298"

Change-Id: Iadcfa995e6b7c6229505ec0872810876575d738e
Signed-off-by: goodmeow <harunbam3@gmail.com>
goodmeow added a commit to goodmeow/OpenUbot that referenced this pull request Mar 9, 2020
Unfortunately, it appears the google image formatting has been changed
this is a temporary solution from "hardikvasa/google-images-download#298"

Change-Id: Iadcfa995e6b7c6229505ec0872810876575d738e
Signed-off-by: goodmeow <harunbam3@gmail.com>

scrappers.py:
@RetroSeasons
Copy link

RetroSeasons commented Aug 21, 2022

Getting an error with every try, for example:

googleimagesdownload --keywords "ty cobb" --limit 10
Item no.: 1 --> Item name = ty cobb
Evaluating...
str() takes at most 1 argument (2 given)
Image objects data unpacking failed. Please leave ...

Python 2.7.17
Ubuntu 18.04.4 LTS
Selenium 3.141.0

@mrclean789
Copy link

@RetroSeasons pip uninstall google-images-download and then run setup.py again

@mrclean789
Copy link

mrclean789 commented Aug 26, 2022

I'm now getting this error too. I've run the command multiple times and it always works in the beginning but then the error appears at random. Sometimes it's after the first 1 or 2 keywords - the most its gone up to is around 30 keywords before it gives me the error.

list index out of range
Image objects data unpacking failed.

@Jerick5555
Copy link

I am also getting this error.

list index out of range
Image objects data unpacking failed.

It seems to happen after at least 2 keywords then it fails somewhat randomly at the start of any keyword afterwards.

@ignaciodamiang
Copy link

ignaciodamiang commented Sep 23, 2022

im getting this error, the same as @Jerick5555 .

Evaluating...
list index out of range
Image objects data unpacking failed.

I've proved in a virtual machine and I'm getting the same error. It's very strange because yesterday I used the program and it worked fine... if anyone comes up with something let me know.

Btw I have Ubuntu 22.

Update:

I executed the test provided in the project like this:
python3 -m unittest test_google_images_download.py
and obtained this output:
Looks like we cannot locate the path the 'chromedriver' (use the '--chromedriver' argument to specify the path to the executable.) or google chrome browser is not installed on your machine (exception: Message: 'chromedriver.exe' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home

@Joeclinton1
Copy link
Author

Joeclinton1 commented Sep 23, 2022

@mrclean789
@Jerick5555
@ignaciodamiang

I ran test_google_images_downloads.py and was able to reproduce the error. Thank you for alerting me!
The problem occurs for both <100 and >100 images.

The issue is likely caused by google once again changing the way they format their image object array.
I'll try to fix the issue when I have more time.

@ignaciodamiang
Copy link

Great. I hope you find the time. If I knew Python I would try to fix it. Thank you!

@ellisbrown
Copy link

ellisbrown commented Sep 23, 2022

@mrclean789 @Jerick5555 @ignaciodamiang

I ran test_google_images_downloads.py and was able to reproduce the error. Thank you for alerting me! The problem occurs for both <100 and >100 images.

The issue is likely caused by google once again changing the way they format their image object array. I'll try to fix the issue when I have more time.

@Joeclinton1 looks like they changed it. I found the issue and am fixing it, I'll raise a PR.

Update: Joeclinton1#26

@eamonnkenny
Copy link

It seems that the download list is always empty now since yesterday or the day before. This is using the joe clinton version. It was working for quite some time with some strange periodic problems that would occur for 1/2 a day at a time and then disappear, but since yesterday no search term has downloaded anything for me. Are others finding this?

@ellisbrown
Copy link

It seems that the download list is always empty now since yesterday or the day before. This is using the joe clinton version. It was working for quite some time with some strange periodic problems that would occur for 1/2 a day at a time and then disappear, but since yesterday no search term has downloaded anything for me. Are others finding this?

@eamonnkenny see #298 (comment)

fix breaking change due to google's response format
@modikush80
Copy link

Getting this error , did anyone find solution to this?

Evaluating...
'NoneType' object is not subscriptable
Image objects data unpacking failed.

@tallevy22
Copy link

is there a way to encode the returned metadata, I get \u05de\u05d3\u05d5\u05d6\u05d4 \u05d7\u05d5\u05e3 \u05d0\u05e9\u05d3\u05d5\u05d3 instead of Hebrew, i tried adding
in lines 1130
json_file = open("logs/" + search_keyword[i] + ".json", "w",encoding="utf-8")
json.dump(items, json_file, indent=4, sort_keys=True, ensure_ascii=False)
json_file.close()
but it didn't help

@Jerick5555
Copy link

Evaluating...
'NoneType' object is not subscriptable
Image objects data unpacking failed.

Got this error too

@Jerick5555
Copy link

Evaluating... 'NoneType' object is not subscriptable Image objects data unpacking failed.

Got this error too

nvm, i updated to the latest version and it is working now. @modikush80

@Jerick5555
Copy link

just pull from the repo and do the setup again

@Joeclinton1
Copy link
Author

As of currently, I think google has changed their JSON again and it no longer works. I am currently very busy and have not had a chance to fix it, but on the github there are a few PR's which claim to have fixed the problem: https://github.com/Joeclinton1/google-images-download/pulls

I will test these at some point, but in the mean time if you need it to work you may consider one of their forks. If it works for you please tell me and I'll just merge their fork.

Thank you for your understanding.

@galantra
Copy link

I've tried Joeclinton1#35 and it works for me (using it as part of https://github.com/galantra/FluentForeverVocabBuilder/)

@ellisbrown
Copy link

I am working on a project that depends heavily on this functionality. I refactored it and am maintaining it here https://github.com/ellisbrown/google-images-download/tree/wrapperless if it helps anyone

@copperwiring
Copy link

Doesnt work for me

what are the correct instructions to use the updated version? I used the following:

git clone https://github.com/d0codesoft/google-images-download.git
cd google-images-download
git checkout patch-1
python setup.py install
googleimagesdownload -k "children in park" -l 10

I get


Item no.: 1 --> Item name = children in park
Evaluating...
Starting Download...
'NoneType' object is not subscriptable
Traceback (most recent call last):
  File "/Users/srishtiy/anaconda3/bin/googleimagesdownload", line 33, in <module>
    sys.exit(load_entry_point('google-images-download==2.8.0', 'console_scripts', 'googleimagesdownload')())
  File "/Users/srishtiy/anaconda3/lib/python3.10/site-packages/google_images_download-2.8.0-py3.10.egg/google_images_download/google_images_download.py", line 1167, in main
  File "/Users/srishtiy/anaconda3/lib/python3.10/site-packages/google_images_download-2.8.0-py3.10.egg/google_images_download/google_images_download.py", line 971, in download
  File "/Users/srishtiy/anaconda3/lib/python3.10/site-packages/google_images_download-2.8.0-py3.10.egg/google_images_download/google_images_download.py", line 1119, in download_executor
  File "/Users/srishtiy/anaconda3/lib/python3.10/site-packages/google_images_download-2.8.0-py3.10.egg/google_images_download/google_images_download.py", line 907, in _get_all_items
TypeError: 'NoneType' object is not subscriptable

@ellisbrown
Copy link

@copperwiring see my above comment for a working fork. the following worked for me just now:

git clone git@github.com:ellisbrown/google-images-download.git

cd google-images-download

pip install .

python tests/test_google_images_download.py --limit 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet