Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not downloading any images #24

Closed
Ajithbalakrishnan opened this issue Apr 26, 2020 · 14 comments
Closed

Not downloading any images #24

Ajithbalakrishnan opened this issue Apr 26, 2020 · 14 comments

Comments

@Ajithbalakrishnan
Copy link

`python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.0:1080 apple

Scraping From Google Image Search ...

Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
/home/ajith/miniconda3/lib/python3.7/site-packages/selenium-4.0.0a5-py3.7.egg/selenium/webdriver/remote/webdriver.py:640: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
Find 0 images.

== 0 out of 0 crawled images urls will be used.

Finished.`

I tried with GUI also. But it doesnt work. Please guid me.

@sczhengyabin
Copy link
Collaborator

@Ajithbalakrishnan I believe you behaved a typo on '127.0.0.0:1080", which should be '127.0.0.1:1080

@Ajithbalakrishnan
Copy link
Author

@sczhengyabin Thanks for your quick comment . But i have tried every combination and i got the same answer.

`python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.1:1080 apple

Scraping From Google Image Search ...

Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
/home/ajith/miniconda3/lib/python3.7/site-packages/selenium-4.0.0a5-py3.7.egg/selenium/webdriver/remote/webdriver.py:640: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
Find 0 images.

== 0 out of 0 crawled images urls will be used.

Finished.`

I tried the same with GUI also. But got the same results.

@sczhengyabin
Copy link
Collaborator

image
@Ajithbalakrishnan I can download images using exact the same args as yours.
It's more likely to be a network issue.
Maybe you network is too slow or proxy server internal error.
From my tests, if my network has issue with google webs, I will get the exact same outputs as what your commented.

@Ajithbalakrishnan
Copy link
Author

Ajithbalakrishnan commented Apr 26, 2020

@sczhengyabin I have proper network. But am woking on ubuntu with anaconda environment. I hopes that will not be a problem. I installed the requiremnets through pip.

@sczhengyabin
Copy link
Collaborator

@Ajithbalakrishnan Try using chrome mode. Which you can see visual actions in chrome browser to see where goes wrong.

@Ajithbalakrishnan
Copy link
Author

@sczhengyabin I tried chrome mode in GUI. Please watch the result. Chrome promted for a second. But it went off. I checked the chrome driver also. Versin also same only.
Screenshot from 2020-04-26 21-31-30

@sczhengyabin
Copy link
Collaborator

@Ajithbalakrishnan no clue yet. Does Bing engine works?

@Ajithbalakrishnan
Copy link
Author

Ajithbalakrishnan commented Apr 26, 2020

@sczhengyabin Nope. Same result. Chrome is not showing that search results. I checked the internet. I have good network.
Screenshot from 2020-04-27 00-39-32

@sczhengyabin Please share the dependancies and its versions that u have used.

@sczhengyabin
Copy link
Collaborator

sczhengyabin commented Apr 27, 2020

@Ajithbalakrishnan

requests==2.18.4
selenium==3.141.0
PyQt5==5.14.2

generated using pipreqs

Seems to me still a network issue, at least for this project.

To verify, you can setup proxy using 'proxychains', rather than the proxy option in this project.

# config in /etc/proxychains.conf
proxychains python3 image_downloader.py ...

@Ajithbalakrishnan
Copy link
Author

`proxychains python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.1:1080 apple
ProxyChains-3.1 (http://proxychains.sf.net)

Scraping From Google Image Search ...

Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
|S-chain|-<>-127.0.0.1:1080-<--timeout
|DNS-request| localhost
|S-chain|-<>-127.0.0.1:1080-<--timeout
|DNS-response|: localhost does not exist
|DNS-request| localhost
|S-chain|-<>-127.0.0.1:1080-<--timeout
|DNS-response|: localhost does not exist
`
I am adding my proxychains.config file below.

proxychains.zip

I tried to change the line "socks4 127.0.0.1 9050" in proxychain config file to 127 0 0 1 1080. But no use.

@sczhengyabin
Copy link
Collaborator

@Ajithbalakrishnan
proxychains conf should be
socks5 127.0.0.1 1080
if you can use proxychains to downloads other things, e.g. apt-get, then it's an issue with Image-Downloader, other wise it's definitely something wrong with your socks5 proxy configuration.

@Ajithbalakrishnan
Copy link
Author

Ajithbalakrishnan commented Apr 27, 2020

@sczhengyabin Its working now. I made some changes in /etc/proxychains config file.

  1. Strict chain to dynamic chain
  2. added one more line in last socks5 127.0.0.1 9050

Then i have installed Tor,pysocks in my environment.


   sudo apt-get install tor
    pip install PySocks


As the sock5 port has been changed, so command will be

python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images/kerlaflood --proxy_socks5 127.0.0.1:9050 kerlaflood2018

Hopes this might helpful for others. Sorry for wasting your valuable time.

@sczhengyabin
Copy link
Collaborator

@Ajithbalakrishnan It's ok, as long as the problem is solved.

@lucidBrot
Copy link

fwiw I have a similar issue but only with Google. I think the reason is that google shows a "before you continue to google" page - that's what I quickly see in the interactive Chrome option, before it closes.

Using Bing instead works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants