Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP, gzip or normal web page,i want to a common way, if gzip auto unzip, normal no need unzip. #323

Closed
kibear opened this issue Jul 21, 2016 · 12 comments
Assignees
Milestone

Comments

@kibear
Copy link

kibear commented Jul 21, 2016

i want to use it in web crawler, but web page sometimes return gzip, other return normal web page. i do not know the web page whether compress. so i want to a common way, when it meets gzip, can auto unzip the web page ?

@igr
Copy link
Member

igr commented Jul 21, 2016

Simply use the method response.unzip(). It will check if returned header contains:

Content-Encoding: gzip

and unzip the content. After the unzip() you continue to use response, as the content is not zipped

@igr igr added the howto label Jul 21, 2016
@igr igr self-assigned this Jul 21, 2016
@igr
Copy link
Member

igr commented Jul 21, 2016

Yes. It will work even if page is not gzipped.
Now you got me thinking, I might add a flag to call this method every time :)

On Thu, Jul 21, 2016 at 11:28, govert notifications@github.com wrote:
ok, thanks. Do you means whatever use response.unzip() is work fine? Even the normal web page use unzip method is alse work fine?


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub [https://github.com//issues/323#issuecomment-234203195] , or mute the thread [https://github.com/notifications/unsubscribe-auth/ACLKQT-hKPeXcKNnRa6XwAzlvTtv_6NEks5qXzvKgaJpZM4JRahs] .

@kibear
Copy link
Author

kibear commented Jul 21, 2016

ok, thanks,

@igr
Copy link
Member

igr commented Jul 24, 2016

Is it working for you @pzn4jc ?

@kibear
Copy link
Author

kibear commented Jul 28, 2016

ok, thanks for you answer. it is work fine.
HTTP is really good, but i still have some issues when use it:

  1. when response gzipped, i use desktop browser view the source is correct. but use this library request sometimes response correct, sometimes is incorrect, it looks not stable.
  2. when it meets https, i get the response code 503, what should i do?

when request https:
i use HttpRequest send(), it is working. but i use HttpBrowser sendRequest() is not work.

i write some test for https, i found:
when i do not set connectionTimeout, i can get the source of web page.
when i set connectionTimeout to HttpRequest, i get an exception:

Exception in thread "main" jodd.http.HttpException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target; <--- sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at jodd.http.net.SocketHttpConnectionProvider.createHttpConnection(SocketHttpConnectionProvider.java:90)
at jodd.http.HttpRequest.open(HttpRequest.java:662)
at jodd.http.HttpRequest.open(HttpRequest.java:646)
at jodd.http.HttpRequest._send(HttpRequest.java:744)
at jodd.http.HttpRequest.send(HttpRequest.java:739)
at com.hiekn.spider.uyint.common.http.impl.JoddHttpReader.main(JoddHttpReader.java:36)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1886)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:276)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:270)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1341)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:153)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:868)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:804)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1016)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1312)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1339)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1323)
at jodd.http.net.SocketHttpSecureConnection.init(SocketHttpSecureConnection.java:43)
at jodd.http.net.SocketHttpConnectionProvider.createHttpConnection(SocketHttpConnectionProvider.java:85)
... 5 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:385)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:326)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:231)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:126)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1323)
... 14 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:196)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:268)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:380)
... 20 more
---[cause]------------------------------------------------------------------------
sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:196)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:268)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:380)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:326)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:231)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:126)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1323)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:153)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:868)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:804)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1016)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1312)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1339)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1323)
at jodd.http.net.SocketHttpSecureConnection.init(SocketHttpSecureConnection.java:43)
at jodd.http.net.SocketHttpConnectionProvider.createHttpConnection(SocketHttpConnectionProvider.java:85)
at jodd.http.HttpRequest.open(HttpRequest.java:662)
at jodd.http.HttpRequest.open(HttpRequest.java:646)
at jodd.http.HttpRequest._send(HttpRequest.java:744)
at jodd.http.HttpRequest.send(HttpRequest.java:739)

@igr
Copy link
Member

igr commented Jul 30, 2016

  1. when response gzipped, i use desktop browser view the source is correct. but use this library request sometimes response correct, sometimes is incorrect, it looks not stable.

Hm, that is strange. When you say "sometimes is incorrect"; do you mean on the same page, or on some different pages? Because, maybe some web sites do not play by the HTTP rules and do not set the headers (for example).

Would it be possible to give me an URL that does not work; I guess it is something trivial.

@igr
Copy link
Member

igr commented Jul 30, 2016

For the second issue, you need to add the missing certificate as trusted to Java. You can read more about this issue here

@kibear
Copy link
Author

kibear commented Jul 30, 2016

Yes,,, I mean on different page, not the same page. I also found "connectionTimeout" param not set, the https page is work fine, when i set connectionTimeout , the https page throws the exception.@IgorSpasic

@kibear kibear closed this as completed Aug 1, 2016
@kibear kibear reopened this Aug 1, 2016
@igr
Copy link
Member

igr commented Aug 1, 2016

Working on this :) I was able to reproduce.

Please send me URLs to sites where you had gzip issue, please!

@igr igr added this to the 3.8 milestone Aug 1, 2016
@igr igr added the bug label Aug 1, 2016
@kibear
Copy link
Author

kibear commented Aug 1, 2016

@igr igr closed this as completed in c3b3b50 Aug 3, 2016
@igr
Copy link
Member

igr commented Aug 3, 2016

I just made a fix for this HTTPS and connectionTimeout issue!

Please, if you find any other problematic URL, just open a new issue!

@kibear
Copy link
Author

kibear commented Aug 3, 2016

ok, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants