Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

返回200,但无全文 #12

Closed
yilu1015 opened this issue Jul 5, 2020 · 15 comments
Closed

返回200,但无全文 #12

yilu1015 opened this issue Jul 5, 2020 · 15 comments

Comments

@yilu1015
Copy link

yilu1015 commented Jul 5, 2020

成功用您的方法获得了三枚cookies,然后构造了请求headers和data。

data = 'docID=97e53a7245264aaeacd4abde01272f72&ciphertext=1101101 1101110 110101 1110111 110100 110010 110010 1110010 1110010 1110001 1101011 110010 1110010 1110001 110100 110100 1100011 111000 111000 1101101 111001 110111 110010 1101011 110010 110000 110010 110000 110000 110111 110000 110100 1010100 1010010 110100 1101101 1111010 1011000 1000100 110000 110000 1110111 1101111 1000001 1000110 1100010 1001111 1000101 1000111 1001110 101011 1110010 1000111 1100111 111101 111101&cfg=com.lawyee.judge.dc.parse.dto.SearchDataDsoDTO@docInfoSearch&__RequestVerificationToken=2zf38eflt4p7u9nu86iv7jwf'

如下:

最终返回200,但无全文。请问这是什么情况呀?

{'code': 1, 'description': None, 'secretKey': 'v8fMYwbUVeZ35ogwx4wquzX4', 'result': 'DJBOPNsJwcs=', 'success': True}
@nciefeiniu
Copy link
Owner

@yilu1015 ciphertext这个参数正确吗

@nciefeiniu
Copy link
Owner

nciefeiniu commented Jul 6, 2020

import requests


headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:76.0) Gecko/20100101 Firefox/76.0",
    "Cookie": "HM4hUBT0dDOn80S=_wkz59snzaPmdO69oJWw7RwKvOLLkQX0DikwmBGlkQPmxpSSOx12K0bMQsZbsAnM; HM4hUBT0dDOn80T=4Cy7lu21LZTzgJixkyHmmEWZbRc8ka8p5n4j3VjY4QGDG1SvaYh_7s905F2vIqvUGdeqkJsKzN4nn207l2ZD5vCAYFgItnHeaHE9BgfeeFdxdrkPoybXDL1RJ7ZP_5WTlOs5R7awSBB_ft9xbGTXkYY4Yk3Cg4H5_iirToB6gyrJi67k95Ce8R.uGobThrdX2fAuiZF2ME1Wi9uIefdYS9UEajx44DAw2oi3R7X6o7XKmuyrMkU7h1DSW3I5XrUYu3wrrpNRSiTZoFndIDsOuiA9iKs2RnTnS3.v9Gi34m_msrGtVPkMlqjZxrXHzsjfKtO7; SESSION=ff5520e2-66e0-4bce-998e-02062e95b414"

}


res = requests.post(url="http://wenshu.court.gov.cn/website/parse/rest.q4w", data={
    "docId": "83451b69d9ff46b6af96abeb00d51326",
    "ciphertext": "110010+1000110+1100100+110100+1001001+1001101+1001010+1100001+1100100+1000100+1110111+110011+1001011+110010+1001100+1110101+110101+1110110+1101000+1101011+110000+1010010+1101101+1001111+110010+110000+110010+110000+110000+110111+110000+110110+1110100+110110+1101111+110010+1101001+1110011+1010011+110010+110100+1101110+1001000+1000011+1000100+1110110+1110111+1110001+110111+1000110+1001110+1110110+110100+1000001+111101+111101",
    "cfg": "com.lawyee.judge.dc.parse.dto.SearchDataDsoDTO@docInfoSearch",
    "__RequestVerificationToken": "SnhEAA5fkrhLG4Yqhv6ySDvi"
},
                    headers = headers)


print(res.text)

我测试是可以的啊,没问题 @yilu1015

返回结果如下

{"code":1,"description":null,"secretKey":"YuNfjorc70mO1Cllf6Isxf2B","result":"","success":true}

@yilu1015
Copy link
Author

yilu1015 commented Jul 6, 2020

@yilu1015 ciphertext这个参数正确吗

应该没问题。我用它成功获得了条目信息。跑出来就只有

{'code': 1, 'description': None, 'secretKey': 'c6LrFHW57hQQraFRWLcgLcFh', 'result': '7DMzlEH7ahk=', 'success': True}

@yilu1015
Copy link
Author

yilu1015 commented Jul 9, 2020

@nciefeiniu 参数如何设置有方法吗? 我当时看了#4 ,以为不需要。

@nciefeiniu
Copy link
Owner

@yilu1015 参考 #13 (comment)

@huangsiyuan924
Copy link

我也是返回200但是没有全文, 楼主解决了吗

@yilu1015
Copy link
Author

yilu1015 commented Aug 8, 2020

抱歉,这两周忙其他的项目,还没仔细研究。请问你抓全文是用APP版还是网页版?欢迎参考 #13

@huangsiyuan924
Copy link

抱歉,这两周忙其他的项目,还没仔细研究。请问你抓全文是用APP版还是网页版?欢迎参考 #13

已经解决了, 不过还有个问题是pyqt5可以获取cookie, 但是连续获取第二次的话会直接退出Process finished with exit code -1073741819 (0xC0000005), 请问你又出现吗

@yilu1015
Copy link
Author

yilu1015 commented Aug 8, 2020

抱歉,这两周忙其他的项目,还没仔细研究。请问你抓全文是用APP版还是网页版?欢迎参考 #13

已经解决了, 不过还有个问题是pyqt5可以获取cookie, 但是连续获取第二次的话会直接退出Process finished with exit code -1073741819 (0xC0000005), 请问你又出现吗

哦?请问问题出在哪里?我读了大牛的回答,以为是cookies的问题,看着要设置pyppeteer + asyncio,就还没做。所以最后还是请求设置的问题?谢谢指教!

@huangsiyuan924
Copy link

抱歉,这两周忙其他的项目,还没仔细研究。请问你抓全文是用APP版还是网页版?欢迎参考 #13

已经解决了, 不过还有个问题是pyqt5可以获取cookie, 但是连续获取第二次的话会直接退出Process finished with exit code -1073741819 (0xC0000005), 请问你又出现吗

哦?请问问题出在哪里?我读了大牛的回答,以为是cookies的问题,看着要设置pyppeteer + asyncio,就还没做。所以最后还是请求设置的问题?谢谢指教!

pyqt获取的cookie没问题, 我是formdata的queryCondition多了个逗号

@yilu1015
Copy link
Author

yilu1015 commented Aug 9, 2020

抱歉,这两周忙其他的项目,还没仔细研究。请问你抓全文是用APP版还是网页版?欢迎参考 #13

已经解决了, 不过还有个问题是pyqt5可以获取cookie, 但是连续获取第二次的话会直接退出Process finished with exit code -1073741819 (0xC0000005), 请问你又出现吗

哦?请问问题出在哪里?我读了大牛的回答,以为是cookies的问题,看着要设置pyppeteer + asyncio,就还没做。所以最后还是请求设置的问题?谢谢指教!

pyqt获取的cookie没问题, 我是formdata的queryCondition多了个逗号

谢谢提示。以下是我POST方法的请求数据,感觉没问题:你的formdata是怎么设置的?

至于pyqt退出问题,我也有同样问题。目前还在测试获取全文,我只是重启Jupyter kernel,实战如何解决,也还等大佬指教。

data = {
    'docID': '97e53a7245264aaeacd4abde01272f72',
    'ciphertext': make_ciphertext(),
    'cfg': 'com.lawyee.judge.dc.parse.dto.SearchDataDsoDTO@docInfoSearch',
    '__RequestVerificationToken': verification_token()
}


headers = {
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Host": "wenshu.court.gov.cn",
    "Origin": "https://wenshu.court.gov.cn",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
    "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
    "X-Requested-With": "XMLHttpRequest",
    "cookie": cookie_string
}

@huangsiyuan924
Copy link

抱歉,这两周忙其他的项目,还没仔细研究。请问你抓全文是用APP版还是网页版?欢迎参考 #13

已经解决了, 不过还有个问题是pyqt5可以获取cookie, 但是连续获取第二次的话会直接退出Process finished with exit code -1073741819 (0xC0000005), 请问你又出现吗

哦?请问问题出在哪里?我读了大牛的回答,以为是cookies的问题,看着要设置pyppeteer + asyncio,就还没做。所以最后还是请求设置的问题?谢谢指教!

pyqt获取的cookie没问题, 我是formdata的queryCondition多了个逗号

谢谢提示。以下是我POST方法的请求数据,感觉没问题:你的formdata是怎么设置的?

至于pyqt退出问题,我也有同样问题。目前还在测试获取全文,我只是重启Jupyter kernel,实战如何解决,也还等大佬指教。

data = {
    'docID': '97e53a7245264aaeacd4abde01272f72',
    'ciphertext': make_ciphertext(),
    'cfg': 'com.lawyee.judge.dc.parse.dto.SearchDataDsoDTO@docInfoSearch',
    '__RequestVerificationToken': verification_token()
}


headers = {
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Host": "wenshu.court.gov.cn",
    "Origin": "https://wenshu.court.gov.cn",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
    "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
    "X-Requested-With": "XMLHttpRequest",
    "cookie": cookie_string
}

能看到的就是headers里面cookie的c没大写,网站里的是大写

@nciefeiniu
Copy link
Owner

@yilu1015 老哥,你这个问题。。。。。。。。

今天有空,就来看看这个

现在这样做还是能爬取到数据的。

老哥你获取不到详细数据,是你请求携带的data 数据搞错了!!!!!!

data = {
'docId': '199a3ed2137846f1bf17ac1d01116358'  # 请注意这个 docId 的大小写
} 

我自己看半天也没看出哪里错了。抓包一下就看到了。

@nciefeiniu
Copy link
Owner

nciefeiniu commented Aug 23, 2020

@yilu1015

image
image

@hujisong
Copy link

老师,我想请教个幼稚的问题,我用了您的方法来获取文书网首页的访问量:

res = requests.post(url="http://wenshu.court.gov.cn/website/parse/rest.q4w", data={
"cfg": "com.lawyee.judge.dc.parse.dto.SearchDataDsoDTO@wsCountSearch",
"__RequestVerificationToken": "Vy3UDgRWHtqQdQG14quguqDm"
}, headers = header01)

其中,cfg 和 header01 都是我从xhr获取的,但总是得不到数据,报405错误,我明明用的post,报错信息却是: Request method 'GET' not supported

以下时我执行后的结果:
<!doctype html><title>HTTP Status 405 – Method Not Allowed</title><style type="text/css">
H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}
B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}
A {color : black;}A.name {color : black;}.line {height: 1px; background-color: #525D76; border: none;}
</style>

HTTP Status 405 – Method Not Allowed


Type Status Report

Message Request method 'GET' not supported

Description The method received in the request-line is known by the origin server but not supported by the target resource.


Apache Tomcat/8.0.53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants