-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Minimal centralized request fingerprints #4524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal centralized request fingerprints #4524
Conversation
…entralized-request-keys
Codecov Report
@@ Coverage Diff @@
## master #4524 +/- ##
==========================================
+ Coverage 88.79% 88.84% +0.04%
==========================================
Files 163 163
Lines 10671 10744 +73
Branches 1819 1834 +15
==========================================
+ Hits 9475 9545 +70
- Misses 923 926 +3
Partials 273 273
|
@Gallaecio ❤️ the design, and can see how this can be extended in Scrapy itself with something more elaborate if we decide to do so. The only thing I'd change is making REQUEST_FINGERPRINTER a Scrapy extension instead of a callable - i.e. load it using create_instance. This would allow fingerprinter to access Scrapy settings, which would allow to implement e.g. a fingerprinter which reads SESSION_IDENTIFIERS option and removes them from the request GET arguments before taking a fingerprint. There is no reason not to use create_instance for all extension points we provide :) It always comes up later that we'd like to add from_crawler support to some component. |
I like the idea, but I’m not convinced about 0092962. It may be cleaner if we only support Also, being able to define |
…tter, but before extension initialization
I wrote a new benchmark to test alternative JSON libraries, as well as including the performance of The results seem to indicate that But the results vary greatly from one run to another in my system, so I’m not sure adding a new dependency is worth the performance boost, specially since the new system gives freedom to implement fingerprinters focused on performance only, if desired. |
Note: release procedure updated to make sure that last commit gets handled properly before release: https://github.com/scrapy/scrapy/wiki/Scrapy-release-procedure#release-notes |
Even though this is the default value for backward compatibility reasons, | ||
it is a deprecated value. | ||
|
||
- ``'VERSION'`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it "VERSION" as a string, or is the plan to replace it with a certain version (e.g. "2.6")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same applies for "PREVIOUS_VERSION"; sorry if the question is stupid, and answered in some previous comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are both meant to be replaced, see #4524 (comment)
hey @Gallaecio! I re-checked the PR, and it still looks great. Are you ok with merging it after resolving merge conflicts? |
I count on tests passing now. @wRAR, @kmike, please check the latest merge to resolve conflicts, as it includes some non-trivial changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Fork of #4113, with minimal code changes, based on @nyov’s feedback.
Fixes #900, fixes #3420, closes #4113, fixes #4762.
To do after merging:
PS: Please, do not merge without squashing.