New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add possibility to use Selector (bytes) added in parsel 1.8.0. #5906
Comments
We've discussed this briefly yesterday and found that this may not be useful if we detect the response encoding at the same time as converting body to str (I think the relevant code is at scrapy/scrapy/http/response/text.py Line 105 in 5a37af1
|
As we can see from scrapy/scrapy/http/response/text.py Lines 62 to 72 in 5a37af1
Response object with valid encoding received without call of It means that on unified Selector scrapy/scrapy/selector/unified.py Lines 66 to 83 in 5a37af1
|
Summary
Please review possibility to apply RAM memory efficient Selector that can accept
bytes
as inputadded as result of scrapy/parsel#210 (added in parsel 1.8.0) in scrapy (as default)
Currently UnifiedSelector (subclass of parsel's Selector used in scrapy) configured to use Response.text as input
Response.text (property) -> creates new object (Response.body -> converted to str) which is memory intensive and not needed
scrapy/scrapy/selector/unified.py
Lines 66 to 73 in 5a37af1
Motivation
As mentioned on scrapy/parsel#210 usage of
str
input for creating Selector object is more RAM memory intensiveDescribe alternatives you've considered
At this stage it will require to.. separately create Selector object with bytes input inside spiders callback method (and not use Response.Selector)
Additional context
Removing other usages of Response.text will significantly reduce RAM required to processing response
The text was updated successfully, but these errors were encountered: