Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4005: Move ItemLoader to a new library #4516

Merged
merged 59 commits into from Jul 15, 2020

Conversation

ejulio
Copy link
Contributor

@ejulio ejulio commented Apr 27, 2020

Fix #4005

Installed the new library itemloaders from github for now.
Remove tests that are already in itemloaders. Kept some that seem to be a bit specific to scrapy. Not completely on this removal. Thoughts?

Made scrapy's ItemLoader an extension of the base class from the library an added support for Item

Removed processors and imported them from itemloaders

Removed common.py with wrap_loader_context which moved to the new library

Before merging:

  • Define the new library name
    • itemloaders
    • data_loaders: move away from scrapy's idea of items
    • xhtml_loaders: loads data from any XHTML text/parsel selector
  • Update scrapy docs mentioning the library

scrapy/loader/__init__.py Outdated Show resolved Hide resolved
scrapy/loader/common.py Outdated Show resolved Hide resolved
scrapy/loader/processors.py Outdated Show resolved Hide resolved
scrapy/loader/processors.py Outdated Show resolved Hide resolved
scrapy/loader/__init__.py Outdated Show resolved Hide resolved
tests/test_loader.py Outdated Show resolved Hide resolved
@kmike
Copy link
Member

kmike commented Apr 27, 2020

As for the package name, my vote is for itemloaders.

@ejulio
Copy link
Contributor Author

ejulio commented Apr 29, 2020

@kmike @Gallaecio @wRAR @victor-torres
Thanks for the comments.
JUst left some thoughts on the deprecation and imports from the new library.
Once we get it sorted out, I'll update it.

@ejulio
Copy link
Contributor Author

ejulio commented May 2, 2020

@victor-torres @kmike @Gallaecio @wRAR
I updated the changes to log warnings whenever something that was ported is used.
Not sure if that's the best approach, so fell free to raise any concerns about it.

Also, updated the documentation removing most of the content.
The idea is to include a link to itemloaders library docs once it is up and running.
Regarding the docs, not sure if we should keep some sort of TOC pointing to stuff that was moved from scrapy..

docs/topics/loaders.rst Outdated Show resolved Hide resolved
docs/topics/loaders.rst Show resolved Hide resolved
docs/topics/loaders.rst Show resolved Hide resolved
docs/topics/loaders.rst Outdated Show resolved Hide resolved
scrapy/loader/processors.py Outdated Show resolved Hide resolved
docs/topics/loaders.rst Outdated Show resolved Hide resolved
@ejulio ejulio requested a review from Gallaecio Jul 3, 2020
docs/topics/loaders.rst Outdated Show resolved Hide resolved
docs/topics/loaders.rst Outdated Show resolved Hide resolved
docs/topics/loaders.rst Outdated Show resolved Hide resolved
scrapy/loader/__init__.py Outdated Show resolved Hide resolved
scrapy/loader/__init__.py Outdated Show resolved Hide resolved
scrapy/loader/__init__.py Outdated Show resolved Hide resolved
scrapy/loader/__init__.py Outdated Show resolved Hide resolved
scrapy/loader/__init__.py Outdated Show resolved Hide resolved
scrapy/loader/__init__.py Outdated Show resolved Hide resolved
ejulio and others added 8 commits Jul 8, 2020
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
@ejulio ejulio requested a review from Gallaecio Jul 8, 2020
docs/topics/loaders.rst Outdated Show resolved Hide resolved
kmike
kmike approved these changes Jul 10, 2020
Copy link
Member

@kmike kmike left a comment

Looks good to me, besides a small question about docs. Thanks for the monumental work @ejulio!

Copy link
Member

@Gallaecio Gallaecio left a comment

Looking at the coverage of the diff, we should add a test about the deprecation of wrap_loader_context.

Looking at the coverage changes, it looks like we should also deprecate scrapy.utils.misc.extract_regex.

scrapy/loader/__init__.py Outdated Show resolved Hide resolved
tests/test_loader.py Outdated Show resolved Hide resolved
tests/test_loader_deprecated.py Outdated Show resolved Hide resolved
@ejulio
Copy link
Contributor Author

ejulio commented Jul 13, 2020

@Gallaecio , I've updated the PR with your ideas.
Just bear in mind in that, extract_regex was already in parsel.utils, so I just added a simple test in test_loader_deprecated so we don't need a new file only for this case.

Copy link
Contributor

@victor-torres victor-torres left a comment

LGTM 🚀

Copy link
Member

@Gallaecio Gallaecio left a comment

Awesome work!

@Gallaecio Gallaecio merged commit 38496a0 into scrapy:master Jul 15, 2020
2 checks passed
@ejulio
Copy link
Contributor Author

ejulio commented Jul 15, 2020

I can't believe this was finally merged 🎉 😛
Thanks for all the effort here @victor-torres @Gallaecio @kmike 👏

@ejulio ejulio deleted the feat-itemloaders branch Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

move item loaders to a separate library
7 participants