Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to make Scrapy Item class customizable via configuration #145

Merged
merged 9 commits into from
May 13, 2020

Conversation

thihara
Copy link
Contributor

@thihara thihara commented Apr 22, 2020

No description provided.

thihara and others added 3 commits April 20, 2020 07:15
The LOG_ENABLED variable is not honored in the spider (using RecursiveSpider) because it's being unset here.
Stopped the LOG_ENABLED variable from being unset
@fhamborg fhamborg self-assigned this Apr 24, 2020
@fhamborg
Copy link
Owner

fhamborg commented Apr 24, 2020

Hi, thanks for the PR. Could you please elaborate which problem these changes solve, i.e., what would be the use case when one wants to customize the ScrapyItem? Thank you!

@thihara
Copy link
Contributor Author

thihara commented Apr 24, 2020

Hi,

Sorry I should've explained better.

If I want to modify the plugin to include secondary processing - let's say add a word count or something, the pipeline can't add additional fields to the current Scrapy Item object being moved forward in the pipeline.

This PR is to allow a sub class of the current Scrapy Item class to be substituted via the config.cfg file. This will allow the pipeline to be customized extensively - without forking the project.

@fhamborg
Copy link
Owner

Thanks for your elaborations. After reviewing the code, I think there are a few things that should be addressed before it can be merged into master.

  1. Because in the thihara:item_class branch only the config.cfg but not the config_lib.cfg is changed to contain the new attribute, your changes cannot be used in library mode, but only in CLI mode. I think the feature makes sense in both modes and for consistency thus should be added to lib mode, too.
  2. f-strings were used, which required py>=3.6, whereas news-please also runs on py3.5. Could you replace the f-strings with "{}".format()?
    Please let me know if you have any questions and thanks for the work you already put into this.

@fhamborg fhamborg self-requested a review April 27, 2020 15:28
Copy link
Owner

@fhamborg fhamborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comment

@thihara
Copy link
Contributor Author

thihara commented Apr 28, 2020

@fhamborg I didn't think there would be much use for setting up an customized pipeline in the library mode. For me using library mode means you would use existing pipeline elements and add your own customizations programmatically, in which case you have complete control anyway.

But I'll defer to you on this and add the configuration to the config_lib.cfg file.

Sorry about the f strings - should've realized that would increase the library supported python version. I'll fix it too.

@fhamborg
Copy link
Owner

@fhamborg I didn't think there would be much use for setting up an customized pipeline in the library mode. For me using library mode means you would use existing pipeline elements and add your own customizations programmatically, in which case you have complete control anyway.

Thanks for the hint. Do I understand you correctly that the changes you made initially in the PR are for the use case of being in CLI mode and having news-please set up by pypi?

@thihara
Copy link
Contributor Author

thihara commented Apr 28, 2020

@fhamborg That is correct.

@thihara thihara requested a review from fhamborg April 28, 2020 08:36
@thihara
Copy link
Contributor Author

thihara commented Apr 28, 2020

@fhamborg Changes are done. Please review and let me know if there are additional issues.

@thihara
Copy link
Contributor Author

thihara commented May 2, 2020

@fhamborg Is there anything else blocking this PR?

@fhamborg
Copy link
Owner

fhamborg commented May 4, 2020

Sorry for the delay. Before I can merge this PR, I need to comprehensively test the changes that come with it, as they are partly in core files of news-please.

@thihara
Copy link
Contributor Author

thihara commented May 4, 2020

@fhamborg Understood. Let me know if there's anything I can do to help speed things up.

Copy link
Owner

@fhamborg fhamborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minors, other than that we can merge. Thanks for your effort!

@classmethod
def from_string(cls, class_name):
if "." not in class_name:
raise ImportError("{0} does't look like a module path".format(class_name))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. please replace does't with doesn't
  2. Why this test condition on containing "."? IIRC, is not only some.path.modulename a valid path, but for example only modulename is also a valid import path.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename the file to a more descriptive name, e.g., class_loader.py? Because currently it only contains the ClassLoader class and nothing else, thus we should go with the more descriptive name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhamborg About no2; the fully qualified name will be module.class at least. The use of a non names-paced class seemed rare enough I didn't see the value in handling that case. Do you disagree?

I'll do the other changes and update the PR soon. Apologies about the long delay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhamborg Changes are done.

…ing for loading custom news item module and class. Fixed a typo.
@thihara thihara requested a review from fhamborg May 12, 2020 02:39
@fhamborg fhamborg merged commit 07174cc into fhamborg:master May 13, 2020
@fhamborg
Copy link
Owner

Thanks for the PR and the subsequent changes you implemented. I will keep this in the master branch for a while and if no issues arise push a new version to Pypi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants