GoogleImageCrawler: TypeError: 'NoneType' object is not iterable #96

LostInDarkMath · 2021-03-06T12:58:15Z

Hi there, I just tried out your library, but unfortunately, I get an error:

2021-03-06 13:56:54,609 - INFO - icrawler.crawler - start crawling...
2021-03-06 13:56:54,609 - INFO - icrawler.crawler - starting 1 feeder threads...
2021-03-06 13:56:54,610 - INFO - feeder - thread feeder-001 exit
2021-03-06 13:56:54,611 - INFO - icrawler.crawler - starting 1 parser threads...
2021-03-06 13:56:54,612 - INFO - icrawler.crawler - starting 1 downloader threads...
2021-03-06 13:56:55,160 - INFO - parser - parsing result page https://www.google.com/search?q=cat&ijn=0&start=0&tbs=&tbm=isch
Exception in thread parser-001:
Traceback (most recent call last):
  File "C:\Users\WILLI\AppData\Local\Programs\Python\Python38\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Users\WILLI\AppData\Local\Programs\Python\Python38\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "D:\Projekte\Foo\source\backend_prototyp\venv\lib\site-packages\icrawler\parser.py", line 104, in worker_exec
    for task in self.parse(response, **kwargs):
TypeError: 'NoneType' object is not iterable
2021-03-06 13:56:59,615 - INFO - downloader - no more download task for thread downloader-001
2021-03-06 13:56:59,615 - INFO - downloader - thread downloader-001 exit
2021-03-06 13:56:59,616 - INFO - icrawler.crawler - Crawling task done!

Process finished with exit code 0

But I just use your example code:

from icrawler.builtin import GoogleImageCrawler

google_crawler = GoogleImageCrawler(storage={'root_dir': 'D:'})
google_crawler.crawl(keyword='cat', max_num=100)

How can I fix the problem?

The text was updated successfully, but these errors were encountered:

ZhiyuanChen · 2021-03-09T13:06:56Z

Sorry for the inconvenience, could you please try to clone this project and build it manually?
This has been fixed by #93 but I'm a bit busy to build and release a new package.

LostInDarkMath · 2021-03-09T13:15:19Z

I don't have the time to build it either. I just wanted to quickly test your library to see if it was suitable for my use case. And I'm probably not the only one having this problem either. Should all users now build this manually?

ZhiyuanChen · 2021-03-11T08:16:07Z

I don't have the time to build it either. I just wanted to quickly test your library to see if it was suitable for my use case. And I'm probably not the only one having this problem either. Should all users now build this manually?

Sorry again for the inconvenience, I have updated the package on pypi

LostInDarkMath · 2021-03-12T18:17:43Z

It works now! Thank you were much :)

Viachaslau85 · 2022-08-02T15:15:06Z

I have the same problem. Works with Bing and Baidu, but does not work with Google. I keep getting the following errors:
2022-07-27 18:52:22,851 - INFO - icrawler.crawler - start crawling... 2022-07-27 18:52:22,852 - INFO - icrawler.crawler - starting 1 feeder threads... 2022-07-27 18:52:22,852 - INFO - icrawler.crawler - starting 1 parser threads... 2022-07-27 18:52:22,853 - INFO - icrawler.crawler - starting 4 downloader threads... 2022-07-27 18:52:23,323 - INFO - parser - parsing result page https://www.google.com/search?q=cat&ijn=0&start=0&tbs=isz%3Al%2Cic%3Aspecific%2Cisc%3Aorange%2Csur%3Afmc%2Ccdr%3A1%2Ccd_min%3A01%2F01%2F2017%2Ccd_max%3A11%2F30%2F2017&tbm=isch Exception in thread parser-001: Traceback (most recent call last): File "C:\Python310\lib\threading.py", line 1009, in _bootstrap_inner self.run() File "C:\Python310\lib\threading.py", line 946, in run self._target(*self._args, **self._kwargs) File "C:\Python310\lib\site-packages\icrawler\parser.py", line 104, in worker_exec for task in self.parse(response, **kwargs): TypeError: 'NoneType' object is not iterable 2022-07-27 18:52:27,857 - INFO - downloader - no more download task for thread downloader-001 2022-07-27 18:52:27,858 - INFO - downloader - thread downloader-001 exit 2022-07-27 18:52:27,858 - INFO - downloader - no more download task for thread downloader-003 2022-07-27 18:52:27,858 - INFO - downloader - thread downloader-003 exit 2022-07-27 18:52:27,858 - INFO - downloader - no more download task for thread downloader-004 2022-07-27 18:52:27,858 - INFO - downloader - thread downloader-004 exit 2022-07-27 18:52:27,859 - INFO - downloader - no more download task for thread downloader-002 2022-07-27 18:52:27,859 - INFO - downloader - thread downloader-002 exit 2022-07-27 18:52:27,894 - INFO - icrawler.crawler - Crawling task done!

ZhiyuanChen · 2022-08-03T05:14:20Z

I have the same problem. Works with Bing and Baidu, but does not work with Google. I keep getting the following errors: 2022-07-27 18:52:22,851 - INFO - icrawler.crawler - start crawling... 2022-07-27 18:52:22,852 - INFO - icrawler.crawler - starting 1 feeder threads... 2022-07-27 18:52:22,852 - INFO - icrawler.crawler - starting 1 parser threads... 2022-07-27 18:52:22,853 - INFO - icrawler.crawler - starting 4 downloader threads... 2022-07-27 18:52:23,323 - INFO - parser - parsing result page https://www.google.com/search?q=cat&ijn=0&start=0&tbs=isz%3Al%2Cic%3Aspecific%2Cisc%3Aorange%2Csur%3Afmc%2Ccdr%3A1%2Ccd_min%3A01%2F01%2F2017%2Ccd_max%3A11%2F30%2F2017&tbm=isch Exception in thread parser-001: Traceback (most recent call last): File "C:\Python310\lib\threading.py", line 1009, in _bootstrap_inner self.run() File "C:\Python310\lib\threading.py", line 946, in run self._target(*self._args, **self._kwargs) File "C:\Python310\lib\site-packages\icrawler\parser.py", line 104, in worker_exec for task in self.parse(response, **kwargs): TypeError: 'NoneType' object is not iterable 2022-07-27 18:52:27,857 - INFO - downloader - no more download task for thread downloader-001 2022-07-27 18:52:27,858 - INFO - downloader - thread downloader-001 exit 2022-07-27 18:52:27,858 - INFO - downloader - no more download task for thread downloader-003 2022-07-27 18:52:27,858 - INFO - downloader - thread downloader-003 exit 2022-07-27 18:52:27,858 - INFO - downloader - no more download task for thread downloader-004 2022-07-27 18:52:27,858 - INFO - downloader - thread downloader-004 exit 2022-07-27 18:52:27,859 - INFO - downloader - no more download task for thread downloader-002 2022-07-27 18:52:27,859 - INFO - downloader - thread downloader-002 exit 2022-07-27 18:52:27,894 - INFO - icrawler.crawler - Crawling task done!

This is not relevant to this issue, looks like #107

gustavozantut · 2023-04-08T02:22:49Z

seem's this problem is back

gijhi · 2024-01-17T13:34:42Z

Any solution for this problem?

gustavozantut · 2024-03-06T18:48:14Z

Looks like some or many website's hosts are identifiying bots and asking for human validation, causing the problem.

OxFF00FF · 2024-04-03T06:59:07Z

change the code like this. helped to me.
file ....\site-packages\icrawler\parser,py

uris = re.findall(r"http[^\[]*?.(?:jpg|png|bmp)", txt)
uris = [bytes(uri, 'utf-8').decode('unicode-escape') for uri in uris]
if uris:
return [{"file_url": uri} for uri in uris]

ZhiyuanChen · 2024-04-03T10:45:01Z

change the code like this. helped to me. file ....\site-packages\icrawler\parser,py

uris = re.findall(r"http[^\[]*?.(?:jpg|png|bmp)", txt) uris = [bytes(uri, 'utf-8').decode('unicode-escape') for uri in uris] if uris: return [{"file_url": uri} for uri in uris]

Would you mind to submit a PR?

LostInDarkMath closed this as completed Mar 12, 2021

ZhiyuanChen reopened this Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GoogleImageCrawler: TypeError: 'NoneType' object is not iterable #96

GoogleImageCrawler: TypeError: 'NoneType' object is not iterable #96

LostInDarkMath commented Mar 6, 2021

ZhiyuanChen commented Mar 9, 2021

LostInDarkMath commented Mar 9, 2021 •

edited

ZhiyuanChen commented Mar 11, 2021

LostInDarkMath commented Mar 12, 2021

Viachaslau85 commented Aug 2, 2022

ZhiyuanChen commented Aug 3, 2022

gustavozantut commented Apr 8, 2023

gijhi commented Jan 17, 2024

gustavozantut commented Mar 6, 2024

OxFF00FF commented Apr 3, 2024 •

edited

ZhiyuanChen commented Apr 3, 2024

GoogleImageCrawler: TypeError: 'NoneType' object is not iterable #96

GoogleImageCrawler: TypeError: 'NoneType' object is not iterable #96

Comments

LostInDarkMath commented Mar 6, 2021

ZhiyuanChen commented Mar 9, 2021

LostInDarkMath commented Mar 9, 2021 • edited

ZhiyuanChen commented Mar 11, 2021

LostInDarkMath commented Mar 12, 2021

Viachaslau85 commented Aug 2, 2022

ZhiyuanChen commented Aug 3, 2022

gustavozantut commented Apr 8, 2023

gijhi commented Jan 17, 2024

gustavozantut commented Mar 6, 2024

OxFF00FF commented Apr 3, 2024 • edited

ZhiyuanChen commented Apr 3, 2024

LostInDarkMath commented Mar 9, 2021 •

edited

OxFF00FF commented Apr 3, 2024 •

edited