Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_crawler constructor for feed exporters and storages #1605

Merged

Conversation

@jdemaeyer
Copy link
Contributor

@jdemaeyer jdemaeyer commented Nov 16, 2015

Addresses #1567. This moves the "which constructor should I call" logic from the middleware manager into a util function, and re-uses it in the feed exporter extension.

Feed exporters and storages can now provide a from_crawler() class method and properly access settings.

@@ -116,4 +116,28 @@ def md5sum(file):
def rel_has_nofollow(rel):
"""Return True if link rel attribute has nofollow type"""
return True if rel is not None and 'nofollow' in rel.split() else False


def construct_from_settings_or_crawler(objcls, settings, crawler, *args,

This comment has been minimized.

@jdemaeyer

jdemaeyer Nov 16, 2015
Author Contributor

Any suggestions for a shorter name? ;)

This comment has been minimized.

@eliasdorneles

eliasdorneles Nov 16, 2015
Member

create_instance ?

This comment has been minimized.

@nramirezuy

nramirezuy Nov 16, 2015
Contributor

@jdemaeyer just call it from_crawler, from_settings is deprecated for most components already.

This comment has been minimized.

@eliasdorneles

eliasdorneles Nov 16, 2015
Member

hm, from_crawler is a good factory method name only in a class, it's a bit weird for an util function IMO.

This comment has been minimized.

@nramirezuy

nramirezuy Nov 16, 2015
Contributor

woops didn't even noticed it was a util function. I think the point is we should deprecate the few instances when this happens, so there is no point on having such a function. this makes sense?

This comment has been minimized.

@jdemaeyer

jdemaeyer Nov 16, 2015
Author Contributor

Here is a use case where we cannot deprecate from_settings(), but it should be possible for middlewares/pipelines/extensions. Maybe we could call it construct_from_crawler() and add a deprecation note that triggers when from_settings() exists, but from_crawler() does not?

This comment has been minimized.

@nramirezuy

nramirezuy Nov 17, 2015
Contributor

@jdemaeyer Well that's a very specific case that must happen before the crawler, on current version; but I wouldn't say its not.
I like your idea, but I will always trigger the warning when from_settings is present.

@codecov-io
Copy link

@codecov-io codecov-io commented Nov 16, 2015

Codecov Report

Merging #1605 into master will increase coverage by 0.12%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1605      +/-   ##
==========================================
+ Coverage    84.7%   84.83%   +0.12%     
==========================================
  Files         163      163              
  Lines        9191     9209      +18     
  Branches     1367     1370       +3     
==========================================
+ Hits         7785     7812      +27     
+ Misses       1154     1144      -10     
- Partials      252      253       +1
Impacted Files Coverage Δ
scrapy/extensions/feedexport.py 78.84% <100%> (+5.88%) ⬆️
scrapy/middleware.py 90% <100%> (-0.75%) ⬇️
scrapy/utils/misc.py 95.71% <100%> (+0.71%) ⬆️
@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/alternate-feedexport-constructors branch 2 times, most recently from 8b5e4eb to 0af564d Nov 30, 2015
@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/alternate-feedexport-constructors branch from 0af564d to fc20bc1 Jul 24, 2017
@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Jul 24, 2017

@redapple @kmike Polished this up, please take a look :)

@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/alternate-feedexport-constructors branch from fc20bc1 to 4d77c30 Jul 25, 2017
@redapple
Copy link
Contributor

@redapple redapple commented Jul 27, 2017

Looks good to me @jdemaeyer

@defer.inlineCallbacks
def test_store(self):
assert_aws_environ()
uri = os.environ.get('S3_TEST_FILE_URI')
if not uri:
raise unittest.SkipTest("No S3 URI available for testing")
storage = S3FeedStorage(uri)
storage = S3FeedStorage(uri, Settings())

This comment has been minimized.

@kmike

kmike Jul 19, 2018
Member

It seems Settings() object is passed in access_key argument, it doesn't look correct.

def __init__(self, uri):
def __init__(self, uri, access_key=None, secret_key=None):
# BEGIN Backwards compatibility for initialising without keys (and
# without using from_crawler)
from scrapy.conf import settings

This comment has been minimized.

@kmike

kmike Jul 19, 2018
Member

it can be a good time to do this deprecated import only if no_defaults is True

@@ -93,12 +93,28 @@ def store(self, file):

class S3FeedStorage(BlockingFeedStorage):

def __init__(self, uri):
def __init__(self, uri, access_key=None, secret_key=None):
# BEGIN Backwards compatibility for initialising without keys (and

This comment has been minimized.

@kmike

kmike Jul 19, 2018
Member

there is BEGIN, no END. I think it is better to either remove this BEGIN from the comment, or add the END comment.

@kmike
Copy link
Member

@kmike kmike commented Jul 19, 2018

It looks good to me, besides a few minor comments 👍 I'd also prefer less mocks in tests, but this is not a blocker in any way. create_instance function is needed in other pull requests as well, e.g. in #2956.

@@ -117,3 +117,27 @@ def md5sum(file):
def rel_has_nofollow(rel):
"""Return True if link rel attribute has nofollow type"""
return True if rel is not None and 'nofollow' in rel.split() else False

def create_instance(objcls, settings, crawler, *args, **kwargs):

This comment has been minimized.

@kmike

kmike Jul 19, 2018
Member

nitpick: there should be 2 blank lines before this function, not one

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Jul 20, 2018

Just opened ☝️ to address Mikhail's comments, I hope you don't mind @jdemaeyer 🙏

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Jul 23, 2018

@elacuesta Perfect, thanks a lot! 🙏 (<- hoping this is a high five and not a prayer emoji :D)

@dangra dangra merged commit 4d77c30 into scrapy:master Jul 25, 2018
2 checks passed
2 checks passed
codecov/patch 100% of diff hit (target 84.7%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants