HTTP, gzip or normal web page,i want to a common way, if gzip auto unzip, normal no need unzip. #323

kibear · 2016-07-21T03:34:11Z

i want to use it in web crawler, but web page sometimes return gzip, other return normal web page. i do not know the web page whether compress. so i want to a common way, when it meets gzip, can auto unzip the web page ?

igr · 2016-07-21T06:37:51Z

Simply use the method response.unzip(). It will check if returned header contains:

Content-Encoding: gzip

and unzip the content. After the unzip() you continue to use response, as the content is not zipped

igr · 2016-07-21T09:31:09Z

Yes. It will work even if page is not gzipped.
Now you got me thinking, I might add a flag to call this method every time :)

On Thu, Jul 21, 2016 at 11:28, govert notifications@github.com wrote:
ok, thanks. Do you means whatever use response.unzip() is work fine? Even the normal web page use unzip method is alse work fine?

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub [https://github.com//issues/323#issuecomment-234203195] , or mute the thread [https://github.com/notifications/unsubscribe-auth/ACLKQT-hKPeXcKNnRa6XwAzlvTtv_6NEks5qXzvKgaJpZM4JRahs] .

kibear · 2016-07-21T09:33:50Z

ok, thanks,

igr · 2016-07-24T20:07:28Z

Is it working for you @pzn4jc ?

kibear · 2016-07-28T07:24:13Z

ok, thanks for you answer. it is work fine.
HTTP is really good, but i still have some issues when use it:

when response gzipped, i use desktop browser view the source is correct. but use this library request sometimes response correct, sometimes is incorrect, it looks not stable.
when it meets https, i get the response code 503, what should i do?

when request https:
i use HttpRequest send(), it is working. but i use HttpBrowser sendRequest() is not work.

i write some test for https, i found:
when i do not set connectionTimeout, i can get the source of web page.
when i set connectionTimeout to HttpRequest, i get an exception:

Exception in thread "main" jodd.http.HttpException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target; <--- sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at jodd.http.net.SocketHttpConnectionProvider.createHttpConnection(SocketHttpConnectionProvider.java:90)
at jodd.http.HttpRequest.open(HttpRequest.java:662)
at jodd.http.HttpRequest.open(HttpRequest.java:646)
at jodd.http.HttpRequest._send(HttpRequest.java:744)
at jodd.http.HttpRequest.send(HttpRequest.java:739)
at com.hiekn.spider.uyint.common.http.impl.JoddHttpReader.main(JoddHttpReader.java:36)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1886)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:276)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:270)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1341)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:153)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:868)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:804)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1016)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1312)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1339)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1323)
at jodd.http.net.SocketHttpSecureConnection.init(SocketHttpSecureConnection.java:43)
at jodd.http.net.SocketHttpConnectionProvider.createHttpConnection(SocketHttpConnectionProvider.java:85)
... 5 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:385)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:326)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:231)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:126)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1323)
... 14 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:196)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:268)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:380)
... 20 more
---[cause]------------------------------------------------------------------------
sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:196)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:268)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:380)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:326)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:231)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:126)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1323)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:153)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:868)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:804)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1016)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1312)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1339)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1323)
at jodd.http.net.SocketHttpSecureConnection.init(SocketHttpSecureConnection.java:43)
at jodd.http.net.SocketHttpConnectionProvider.createHttpConnection(SocketHttpConnectionProvider.java:85)
at jodd.http.HttpRequest.open(HttpRequest.java:662)
at jodd.http.HttpRequest.open(HttpRequest.java:646)
at jodd.http.HttpRequest._send(HttpRequest.java:744)
at jodd.http.HttpRequest.send(HttpRequest.java:739)

igr · 2016-07-30T05:39:49Z

when response gzipped, i use desktop browser view the source is correct. but use this library request sometimes response correct, sometimes is incorrect, it looks not stable.

Hm, that is strange. When you say "sometimes is incorrect"; do you mean on the same page, or on some different pages? Because, maybe some web sites do not play by the HTTP rules and do not set the headers (for example).

Would it be possible to give me an URL that does not work; I guess it is something trivial.

igr · 2016-07-30T05:41:34Z

For the second issue, you need to add the missing certificate as trusted to Java. You can read more about this issue here

kibear · 2016-07-30T06:46:12Z

Yes,,, I mean on different page, not the same page. I also found "connectionTimeout" param not set, the https page is work fine, when i set connectionTimeout , the https page throws the exception.@IgorSpasic

igr · 2016-08-01T08:28:14Z

Working on this :) I was able to reproduce.

Please send me URLs to sites where you had gzip issue, please!

kibear · 2016-08-01T08:30:16Z

https://www.itjuzi.com/investfirm/6486

igr · 2016-08-03T11:53:04Z

I just made a fix for this HTTPS and connectionTimeout issue!

Please, if you find any other problematic URL, just open a new issue!

kibear · 2016-08-03T13:30:15Z

ok, thanks.

igr added the howto label Jul 21, 2016

igr self-assigned this Jul 21, 2016

kibear closed this as completed Aug 1, 2016

kibear reopened this Aug 1, 2016

igr added this to the 3.8 milestone Aug 1, 2016

igr added the bug label Aug 1, 2016

igr closed this as completed in c3b3b50 Aug 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP, gzip or normal web page,i want to a common way, if gzip auto unzip, normal no need unzip. #323

HTTP, gzip or normal web page,i want to a common way, if gzip auto unzip, normal no need unzip. #323

kibear commented Jul 21, 2016

igr commented Jul 21, 2016

igr commented Jul 21, 2016

kibear commented Jul 21, 2016

igr commented Jul 24, 2016

kibear commented Jul 28, 2016 •

edited

igr commented Jul 30, 2016

igr commented Jul 30, 2016

kibear commented Jul 30, 2016 •

edited

igr commented Aug 1, 2016

kibear commented Aug 1, 2016

igr commented Aug 3, 2016

kibear commented Aug 3, 2016

HTTP, gzip or normal web page,i want to a common way, if gzip auto unzip, normal no need unzip. #323

HTTP, gzip or normal web page,i want to a common way, if gzip auto unzip, normal no need unzip. #323

Comments

kibear commented Jul 21, 2016

igr commented Jul 21, 2016

igr commented Jul 21, 2016

kibear commented Jul 21, 2016

igr commented Jul 24, 2016

kibear commented Jul 28, 2016 • edited

igr commented Jul 30, 2016

igr commented Jul 30, 2016

kibear commented Jul 30, 2016 • edited

igr commented Aug 1, 2016

kibear commented Aug 1, 2016

igr commented Aug 3, 2016

kibear commented Aug 3, 2016

kibear commented Jul 28, 2016 •

edited

kibear commented Jul 30, 2016 •

edited