Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

一段奇怪的报错 #14

Closed
BadReese opened this issue Mar 10, 2018 · 5 comments
Closed

一段奇怪的报错 #14

BadReese opened this issue Mar 10, 2018 · 5 comments

Comments

@BadReese
Copy link

BadReese commented Mar 10, 2018

haipproxy_1 | 2018/03/10 09:53:43| WARNING: HTTP: Invalid Response: Bad header encountered from http://userapi.plu.cn/user/collect?roomId=2115304 AKA userapi.plu.cn/user/collect?roomId=2115304

上下文:
haipproxy_1 | 2018/03/10 09:53:18| TCP connection to 160.16.223.156/8080 failed
haipproxy_1 | 2018/03/10 09:53:20| TCP connection to 160.16.223.156/8080 failed
haipproxy_1 | 2018/03/10 09:53:20| Detected DEAD Parent: proxy-34
haipproxy_1 | 2018/03/10 09:53:20| Detected REVIVED Parent: proxy-34
haipproxy_1 | 2018/03/10 09:53:24| WARNING: HTTP: Invalid Response: Bad header encountered from http://userapi.longzhu.com/user/collect?roomId=2220760 AKA userapi.longzhu.com/user/collect?roomId=2220760
haipproxy_1 | 2018/03/10 09:53:28| local=172.18.0.4:3128 remote=120.76.222.200:63864 FD 223 flags=1: read/write failure: (32) Broken pipe
haipproxy_1 | 2018/03/10 09:53:30| TCP connection to 160.16.214.186/8080 failed
haipproxy_1 | 2018/03/10 09:53:35| TCP connection to 160.16.223.146/8080 failed
haipproxy_1 | 2018/03/10 09:53:35| TCP connection to 160.16.214.186/8080 failed
haipproxy_1 | 2018/03/10 09:53:35| TCP connection to 95.215.25.233/8080 failed
haipproxy_1 | 2018/03/10 09:53:35| Detected DEAD Parent: proxy-31
haipproxy_1 | 2018/03/10 09:53:35| Detected REVIVED Parent: proxy-31
haipproxy_1 | 2018/03/10 09:53:42| TCP connection to 95.215.25.233/8080 failed
haipproxy_1 | 2018/03/10 09:53:42| TCP connection to 160.16.213.241/8080 failed
haipproxy_1 | 2018/03/10 09:53:43| WARNING: HTTP: Invalid Response: Bad header encountered from http://userapi.plu.cn/user/collect?roomId=2115304 AKA userapi.plu.cn/user/collect?roomId=2115304

请问我没做龙珠的爬虫,怎么就出现龙珠直播的API了,难道服务器被黑了

@BadReese
Copy link
Author

然后大概抓了一百多个IP后自己中断了
haipproxy_1 | 2018/03/10 09:55:07| Closing HTTP port [::]:3128
haipproxy_1 | 2018/03/10 09:55:07| storeDirWriteCleanLogs: Starting...
haipproxy_1 | 2018/03/10 09:55:07| Finished. Wrote 0 entries.
haipproxy_1 | 2018/03/10 09:55:07| Took 0.00 seconds ( 0.00 entries/sec).
haipproxy_1 | Aborted (core dumped)
haipproxymaster_haipproxy_1 exited with code 134

过会再请求IP就变回0了

@ResolveWang
Copy link
Member

ResolveWang commented Mar 10, 2018

这几个地方你确认一下
(1)你看看redis中的ip的情况,推荐用redisdesktopmanager
(2)你的squid是否做了权限处理,如果你的squid暴露在了公网且没给它设置访问控制权限,那么恭喜你,你的服务器肯定被端口扫描器扫描了,也就是充当肉鸡了。这种情况可以查看为squid设置访问权限
(3)确实可能出现ip空缺的情况,但是这个情况是极少的,这种情况下py_cli会降低筛选IP的要求,但是貌似在squid那段代码中忘了做标准降低处理了。所以导致某些时候取出来的代理为0

@BadReese
Copy link
Author

第一点我等下看看
关于第二点,我之前就注释掉了Dockerfile里的
RUN apt install squid -yq
RUN sed -i 's/http_access deny all/http_access deny all/g' /etc/squid/squid.conf
RUN cp /etc/squid/squid.conf /etc/squid/squid.conf.backup
这三行squid相关的,不过Run.sh里的squid-update.py没有注释
因为我用不到squid,也不太符合我的需求,所以想请教一个官方的关闭或者不安装DOCKER里的squid的方法,谢谢

@ResolveWang
Copy link
Member

你给的日志不就是squid的日志吗

haipproxy_1 | 2018/03/10 09:55:07| Closing HTTP port [::]:3128

明显3128这个端口就是squid的端口。

按理说你注释掉了squid的安装命令,怎么都不会启动squid了。

如果你不想安装squid的话,除了把Dockerfile的相关内容删了,也把run.sh中的内容改为:

#!/bin/bash
nohup python crawler_booter.py --usage crawler common > crawler.log 2>&1 &
nohup python scheduler_booter.py --usage crawler common > crawler_scheduler.log 2>&1 &
nohup python crawler_booter.py --usage validator init > init_validator.log 2>&1 &
nohup python crawler_booter.py --usage validator https > https_validator.log 2>&1&
python scheduler_booter.py --usage validator https

这种形式。

然后你用telnet或者docker exec看看吧,这样应该squid是没有的。

然后你看看会不会出现你这个问题。

@BadReese
Copy link
Author

经测试,就算去掉run.sh里的以及dockerfile里的,squid的服务还是会被启动,还是会变成代理,很怪。
所以还是希望大佬增加选项,让用户可选安装squid。不然如果没改端口,估计很多布这个IP池的都被用作他用了。
另外,提醒其他用户splash的host端口也要记得改。
辛苦大佬了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants