Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Change the truncation limit of Proxy TunnelError from 32 to 1000 #5007

Merged
merged 8 commits into from
May 11, 2021

Conversation

Bhavesh0327
Copy link
Contributor

@Bhavesh0327 Bhavesh0327 commented Feb 24, 2021

Resolves #4881, changed the truncation value of proxy tunnel error message from 32 to 64.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a good chance to move that number into a _-prefixed constant in the file.

@codecov
Copy link

codecov bot commented Feb 24, 2021

Codecov Report

Merging #5007 (833cec7) into master (ab037ab) will decrease coverage by 0.18%.
The diff coverage is 75.00%.

❗ Current head 833cec7 differs from pull request most recent head ba1bc1f. Consider uploading reports for the commit ba1bc1f to get more accurate results

@@            Coverage Diff             @@
##           master    #5007      +/-   ##
==========================================
- Coverage   88.01%   87.83%   -0.19%     
==========================================
  Files         158      158              
  Lines        9736     9738       +2     
  Branches     1435     1435              
==========================================
- Hits         8569     8553      -16     
- Misses        911      929      +18     
  Partials      256      256              
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/http11.py 92.73% <75.00%> (+0.04%) ⬆️
scrapy/robotstxt.py 75.30% <0.00%> (-22.23%) ⬇️

@Bhavesh0327
Copy link
Contributor Author

yeah Makes sense, I will make the change

@Gallaecio
Copy link
Member

It would be great if you could increase the limit further to 1 000, as Mikhail suggested. After that, I think this is ready to land. Thanks!

@Bhavesh0327
Copy link
Contributor Author

I am a bit confused here now, @Gallaecio sir any suggestions? Should we decrease the value of truncation length from 1000 at line 170

@@ -167,7 +169,7 @@ def processProxyResponse(self, rcvd_bytes):
extra = {'status': int(respm.group('status')),
'reason': respm.group('reason').strip()}
else:
extra = rcvd_bytes[:32]
extra = rcvd_bytes[:_truncatedLength]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused as to why tests were passing, because you are using _truncatedLength instead of self._truncatedLength here.

Then I saw your comment about being confused yourself, and I think you mean that you don’t understand why the coverage complains. It turns out our current tests do not test this line, that’s why code coverage is incomplete. It was not covered before either. And new code should be covered by tests.

In this specific case, the code would have failed, which is why tests are so important. Please, have a look at the existing tests, and see if you can create a new test that covers this line. Otherwise, please let me know, maybe I can add one myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the corrections for other changes you told me and went through how the tests work. It would be great if you can add a test for this particular case or give a clear idea about how to go through it.

scrapy/core/downloader/handlers/http11.py Outdated Show resolved Hide resolved
scrapy/core/downloader/handlers/http11.py Outdated Show resolved Hide resolved
@Bhavesh0327 Bhavesh0327 changed the title [Fix] Change the truncation limit oof Proxy TunnelError from 32 to 64 [Fix] Change the truncation limit of Proxy TunnelError from 32 to 64 Mar 16, 2021
@Bhavesh0327 Bhavesh0327 changed the title [Fix] Change the truncation limit of Proxy TunnelError from 32 to 64 [Fix] Change the truncation limit of Proxy TunnelError from 32 to 1000 Mar 16, 2021
@Bhavesh0327
Copy link
Contributor Author

Uploading the coverage report along with the PR can actually make a difference, wondering why is it missing in .gitignore file.

@Gallaecio
Copy link
Member

I think you may have merged an outdated master branch from your fork, and that may be why the diff shows so many unrelated changes. Please, see https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/syncing-a-fork

@Bhavesh0327
Copy link
Contributor Author

Bhavesh0327 commented Mar 17, 2021

Yes, I understood the reasons after I pushed, it's quite intimidating to work on this kinda codebase that's why I am facing such stupid issues, I think I should close this pull request, fork it again and push making the changes again?
Also was quite confused, a few hours ago one test showed failed, but now suddenly it showed that it was passed.

@Gallaecio
Copy link
Member

I think I should close this pull request, fork it again and push making the changes again?

I would avoid that if possible. I would first try to fix the issue in the branch associated to this pull requests. You can try merging the upstream master branch as described in the documentation I linked. If that still does not work, you could try to rebase, instead of merging. And if nothing works, you could recreate the branch from the upstream master, and cherry-pick your changes or apply them again manually, which is like creating a separate pull request, but using the same branch name.

If you decide to create a separate pull request nonetheless, please remember to include a reference to this one for background, and use the current description wording (“Resolves #4881”) to mark your new pull request as one that aims to fix #4881 (by using that wording, merging your pull request closes that issue).

Also was quite confused, a few hours ago one test showed failed, but now suddenly it showed that it was passed.

I’m not sure which one it was. If it was codecov, yes, that jumps up and down sometimes, as long as the Diff tab on the codecov page (the one you find when you follow their link) looks OK, that’s fine.

@Bhavesh0327
Copy link
Contributor Author

I’m not sure which one it was. If it was codecov, yes, that jumps up and down sometimes, as long as the Diff tab on the codecov page (the one you find when you follow their link) looks OK, that’s fine.

It was the ubutntu test for pypy3, pypy3-pinned, 3.6-v7.2.0, and codecov test yes

@Gallaecio
Copy link
Member

The next and final step would be to write a test that for the case where line 171 is used, which as you can see in https://app.codecov.io/gh/scrapy/scrapy/compare/5007/diff is not covered by tests now and was not covered before.

I’m not sure how hard it would be to extend the current tests to cover that line. Please, give it a try if you can. If it’s hard to figure out how to do it (I’m not even sure we have tests specifically covering this part of the code, current coverage may be accidental due to unrelated tests), let me know, maybe I can give it a try.

@Bhavesh0327
Copy link
Contributor Author

Bhavesh0327 commented Apr 2, 2021

Apologies @Gallaecio for replying to this so late, I was stuck up due to urgent issues and the proposal.
I tried looking into the code, there was no test for this particular class i.e. TunnelingTCP4ClientEndpoint, and I couldn't understand why this class is mentioned nowhere else. If it's just the case about writing a test for that particular line 171, maybe I can just pass a random byte string of size more than 1000 to the processProxyResponse method in order to test it.

Maybe it can go around something like

from scrapy.core.downloader.handlers.http11 import TunnelingTCP4ClientEndpoint
class TunnelingTCP4ClientEndpointTestCase(TunnelingTCP4ClientEndpoint)
        def setUp(self):
            self.host = '127.0.0.1'
            self.port = reactor.listenTCP(0, test_site(), interface="127.0.0.1")
        def proxyLength(self):
            rcvd_bytes = b'some_string_of_size_more_than_the_truncated_length''
            recv_error = TunnelingTCP4ClientEndpoint.processProxyResponse(rcvd_bytes) 
            self.assertGreaterEqual(recv_error.size(), 1000)
                

@Gallaecio
Copy link
Member

I was hoping for something more like an end-to-end test.

I’ve had a look at the history of the original code, which was introduced in #2069, but the change did not include any test related specifically to that part. However, @redapple makes a good point there which I believe still applies:

Note:
HTTPS proxy tests need some love.
We could perhaps reuse https://github.com/fmoo/twisted-connect-proxy

I run some proxy-related tests with an exception on the relevant parts of code, to see what failed, and found a mitmproxy-powered test that comes close, but I don’t think mitmproxy will allow to reproduce what we want here (large status code messages). So getting proper tests here would require using a custom Twisted implementation (like https://github.com/fmoo/twisted-connect-proxy ) where we have total control over the proxy server.

Now, given how complicated adding test coverage for these changes would be, I’m OK with not adding tests here. I’ll create a separate issue to eventually work on those.

@kmike kmike merged commit ee682af into scrapy:master May 11, 2021
@kmike
Copy link
Member

kmike commented May 11, 2021

Looks good, thanks @Bhavesh0327!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proxy Tunnel Error message length
3 participants