Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove lambda attributes in default link extractor #4554

Merged
merged 2 commits into from May 8, 2020

Conversation

elacuesta
Copy link
Member

@elacuesta elacuesta commented May 8, 2020

Removes lambda attributes in the default link extractor, allowing instances to be serialized using pickle.

Motivated by scrapinghub/scrapy-autounit#74, but this would also mean that requests which include link extractors in their meta attribute would be serializable too. I imagine that might be a rare use case, most link extractors I've seen are stored as spider attributes, but the possibility exists.

I intentionally left extractors other than LxmlParserLinkExtractor unmodified because they're deprecated. Let me know if that's an issue, I could update them as well.

/cc @fcanobrash

Copy link
Member

@Gallaecio Gallaecio left a comment

Nice!

@codecov
Copy link

codecov bot commented May 8, 2020

Codecov Report

Merging #4554 into master will decrease coverage by 0.42%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4554      +/-   ##
==========================================
- Coverage   84.51%   84.08%   -0.43%     
==========================================
  Files         164      164              
  Lines        9904     9784     -120     
  Branches     1482     1475       -7     
==========================================
- Hits         8370     8227     -143     
- Misses       1266     1289      +23     
  Partials      268      268              
Impacted Files Coverage Δ
scrapy/linkextractors/lxmlhtml.py 95.00% <100.00%> (+2.89%) ⬆️
scrapy/utils/py36.py 20.00% <0.00%> (-80.00%) ⬇️
scrapy/robotstxt.py 73.33% <0.00%> (-24.20%) ⬇️
scrapy/utils/spider.py 62.50% <0.00%> (-12.50%) ⬇️
scrapy/extensions/memdebug.py 45.00% <0.00%> (-2.62%) ⬇️
scrapy/core/downloader/__init__.py 89.47% <0.00%> (-1.51%) ⬇️
scrapy/extensions/debug.py 44.73% <0.00%> (-1.42%) ⬇️
scrapy/extensions/throttle.py 44.89% <0.00%> (-1.11%) ⬇️
scrapy/extensions/telnet.py 81.03% <0.00%> (-0.64%) ⬇️
scrapy/extensions/memusage.py 48.35% <0.00%> (-0.57%) ⬇️
... and 66 more

@codecov
Copy link

codecov bot commented May 8, 2020

Codecov Report

Merging #4554 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #4554   +/-   ##
=======================================
  Coverage   84.51%   84.51%           
=======================================
  Files         164      164           
  Lines        9904     9908    +4     
  Branches     1482     1475    -7     
=======================================
+ Hits         8370     8374    +4     
- Misses       1266     1267    +1     
+ Partials      268      267    -1     
Impacted Files Coverage Δ
scrapy/linkextractors/lxmlhtml.py 95.00% <100.00%> (+2.89%) ⬆️
scrapy/core/downloader/__init__.py 89.47% <0.00%> (-1.51%) ⬇️

@kmike kmike merged commit 333910f into scrapy:master May 8, 2020
2 checks passed
@kmike
Copy link
Member

kmike commented May 8, 2020

👍

@elacuesta elacuesta deleted the linkextractor-remove-lambdas branch May 8, 2020
@ryonlife
Copy link

ryonlife commented May 15, 2020

I opened scrapinghub/scrapy-autounit#74, so just wanted to say thanks!

I also took a crack at a generalized fix inside of scrapy-autounit at scrapinghub/scrapy-autounit#75.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants