Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] FilesPipeline supports ACL for Google Cloud Storage #3199

Merged
merged 6 commits into from Apr 25, 2018

Conversation

@rhoboro
Copy link
Contributor

@rhoboro rhoboro commented Apr 3, 2018

  • In #2923 , FilesPipeline added support for Google Cloud Storage.
  • However, the PR did not implement support for ACL.
  • So, this PR add support ACL for Google Cloud Storage.
(venv) [rhoboro]~/github/scrapy % GCS_PROJECT_ID="xxx" GCS_TEST_FILE_URI="gs://xxx" tox -- tests/test_pipeline_files.py -v                                                           
...
=================================================================================================== test session starts ====================================================================================================
platform darwin -- Python 2.7.10, pytest-2.9.2, py-1.5.3, pluggy-0.3.1 -- /Users/suyamar/github/scrapy/.tox/py27/bin/python
cachedir: .cache
rootdir: /Users/suyamar/github/scrapy, inifile: pytest.ini
plugins: twisted-1.7.1, cov-2.2.1
collected 18 items 

tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_expired <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_not_expired <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_path PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_fs_store PASSED
tests/test_pipeline_files.py::DeprecatedFilesPipelineTestCase::test_default_file_key_method PASSED
tests/test_pipeline_files.py::DeprecatedFilesPipelineTestCase::test_overridden_file_key_method PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseFields::test_item_fields_default PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseFields::test_item_fields_override_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_cls_attrs_with_DEFAULT_prefix PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_custom_settings_and_class_attrs_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_custom_settings_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_different_settings_for_different_instances PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_no_custom_settings_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_subclass_attributes_preserved_if_no_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_subclass_attrs_preserved_custom_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_user_defined_subclass_default_key_names PASSED
tests/test_pipeline_files.py::TestS3FilesStore::test_persist <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py SKIPPED
tests/test_pipeline_files.py::TestGCSFilesStore::test_persist <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED

=========================================================================================== 17 passed, 1 skipped in 8.53 seconds ===========================================================================================
_________________________________________________________________________________________________________ summary __________________________________________________________________________________________________________
  py27: commands succeeded
  congratulations :)
@codecov
Copy link

@codecov codecov bot commented Apr 3, 2018

Codecov Report

Merging #3199 into master will increase coverage by <.01%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##           master    #3199      +/-   ##
==========================================
+ Coverage   82.12%   82.12%   +<.01%     
==========================================
  Files         228      228              
  Lines        9593     9599       +6     
  Branches     1385     1385              
==========================================
+ Hits         7878     7883       +5     
- Misses       1456     1457       +1     
  Partials      259      259
Impacted Files Coverage Δ
scrapy/utils/test.py 54.54% <0%> (-1.02%) ⬇️
scrapy/pipelines/images.py 90.44% <100%> (+0.07%) ⬆️
scrapy/pipelines/files.py 68.24% <100%> (+0.23%) ⬆️
scrapy/settings/default_settings.py 98.62% <100%> (+0.01%) ⬆️
You can modify the Access Control List (ACL) policy used for the stored files,
which is defined by the :setting:`FILES_STORE_GCS_ACL` and
:setting:`IMAGES_STORE_GCS_ACL` settings. By default, the ACL is set to
``projectPrivate``. To make the files publicly available use the ``publicRead``

This comment has been minimized.

@kmike

kmike Apr 11, 2018
Member

From https://cloud.google.com/storage/docs/access-control/lists#predefined-acl:

This is the default ACL for newly created buckets. This is also the default ACL for newly created objects unless the default object ACL for that bucket has been changed.

Does it mean this PR is not backwards compatible for GCS buckets where default object ACL is changed - projectPrivate will be used, instead of a configured default? If so, do you think it is possible to add a special value (empty?) which tells "use whatever configured for a bucket", and have it by default?

This comment has been minimized.

@rhoboro

rhoboro Apr 13, 2018
Author Contributor

Thank you for review!!

I found this description from https://cloud.google.com/storage/docs/access-control/lists#defaultobjects

If you don't specify an ACL, Cloud Storage applies the bucket's default object ACL to the object.

And predefined_acl of upload_from_string() has None as default value .
https://googlecloudplatform.github.io/google-cloud-python/latest/storage/blobs.html#google.cloud.storage.blob.Blob.upload_from_file

So, I modified the default value from projectPrivate to None.

@kmike
Copy link
Member

@kmike kmike commented Apr 11, 2018

Thanks @rhoboro! The PR looks good: simple, clear docs and consistent option names 👍
I have a question regarding default value - could you please check it?

@rhoboro
Copy link
Contributor Author

@rhoboro rhoboro commented Apr 13, 2018

@kmike I revised the default value, please review again.

@@ -159,6 +159,7 @@
FEED_EXPORT_INDENT = 0

FILES_STORE_S3_ACL = 'private'
FILES_STORE_GCS_ACL = None

This comment has been minimized.

@kmike

kmike Apr 13, 2018
Member

I wonder how this default value plays with command line usage. Let's say FILES_STORE_GCS_ACL is set to some value in project settings (e.g. to "publicRead"), and user wants to override it, to use default bucket settings instead. scrapy crawl myspider -s FILES_STORE_GCS_ACL=None won't work in this case, because value will be set to "None" (string), not to None. That was the reason I suggested to use '' as a default value of Scrapy setting. In this case it is pipeline who should convert this empty value to None, in order to pass a correct value to blob.upload_from_string.

This comment has been minimized.

@rhoboro

rhoboro Apr 13, 2018
Author Contributor

@kmike
Sorry, I understand your suggestion just now.
That's right, so I fixed code and docs. Please check once more.

@kmike kmike changed the title FilesPipeline supports ACL for Google Cloud Storage [MRG+1] FilesPipeline supports ACL for Google Cloud Storage Apr 13, 2018
@kmike
Copy link
Member

@kmike kmike commented Apr 13, 2018

Looks good, thanks @rhoboro!

@lopuhin lopuhin merged commit f36e1b5 into scrapy:master Apr 25, 2018
1 of 2 checks passed
1 of 2 checks passed
codecov/patch 71.42% of diff hit (target 82.12%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@lopuhin
Copy link
Member

@lopuhin lopuhin commented Apr 25, 2018

Thanks @rhoboro !

@rhoboro rhoboro deleted the rhoboro:gcs_acl branch Apr 26, 2018
@kmike kmike added this to the v1.6 milestone Jul 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants