Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_all_answers 方法仍然出现数组越界错误 #48

Open
Altriaex opened this issue Mar 25, 2016 · 2 comments
Open

get_all_answers 方法仍然出现数组越界错误 #48

Altriaex opened this issue Mar 25, 2016 · 2 comments

Comments

@Altriaex
Copy link

qid = '27099248'
q = Question('https://www.zhihu.com/question/'+qid)
A = [i for i in q.get_all_answers()]

这个bug并不是对每个问题都出现,只有少部分,比如上面这个qid,才会出现

256 soup = BeautifulSoup(self.soup.encode("utf-8"))
257 print "j",j
--> 258 answer_soup = BeautifulSoup(answer_list[j])
259
260 if answer_soup.find("div", class_="zm-editable-content clearfix") == None:

IndexError: list index out of range

我对于258行前后加了输出来观察,发现在i=1时,answer_list长度只有16,而min(answers_num - i * 20, 20) =20,所以越界了

看起来像是soap没有能取回剩下的答案。
如果将for j in xrange(min(answers_num - i * 20, 20)): 改为 for j in xrange(len(answer_list)):

那么i=1这一个循环可以通过,
但i=2时 answer_list长度就是0

@egrcc
Copy link
Owner

egrcc commented Mar 25, 2016

感觉这是知乎本身的bug啊。。。显示有50个答案,实际只有36个答案。即使在知乎网站点击“更多”按钮也只能加载16个答案

@Altriaex
Copy link
Author

http://www.zhihu.com/question/24671496

这个问题显示5个回答但只有四个,所以i=0那部分也会报错

这个看起来可以通过在i=0的j循环里,soap赋值后,增加一条判断语句解决
if len(soup.find_all("div", class_="zm-item-answer")) == j:
break

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants