Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Fixes #685 FilesPipeline support for Google Cloud Storage. #2923

Merged
merged 4 commits into from Oct 2, 2017

Conversation

@rhoboro
Copy link
Contributor

@rhoboro rhoboro commented Sep 13, 2017

  • Fixes #685
  • FilesPipeline support for Google Cloud Storage.
  • This feature use two setting values.
    • FILES_STORE = 'gs://your_bucket/'
    • GCS_PROJECT_ID = 'your_project_id'
  • Using google-cloud-storage as optional dependency. For information about authenticating, see this page.
  • Added some tests and passed. (the last line)
(venv) ~/github/scrapy/scrapy % GCS_PROJECT_ID="myproject" GCS_TEST_FILE_URI="gs://my-bucket/" tox -- tests/test_pipeline_files.py -v

...

tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_expired <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_not_expired <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_path PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_fs_store PASSED
tests/test_pipeline_files.py::DeprecatedFilesPipelineTestCase::test_default_file_key_method PASSED
tests/test_pipeline_files.py::DeprecatedFilesPipelineTestCase::test_overridden_file_key_method PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseFields::test_item_fields_default PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseFields::test_item_fields_override_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_cls_attrs_with_DEFAULT_prefix PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_custom_settings_and_class_attrs_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_custom_settings_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_different_settings_for_different_instances PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_no_custom_settings_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_subclass_attributes_preserved_if_no_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_subclass_attrs_preserved_custom_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_user_defined_subclass_default_key_names PASSED
tests/test_pipeline_files.py::TestS3FilesStore::test_persist <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py SKIPPED
tests/test_pipeline_files.py::TestGCSFilesStore::test_persist <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED

========================================================================================== 17 passed, 1 skipped in 11.54 seconds ===========================================================================================
_________________________________________________________________________________________________________ summary __________________________________________________________________________________________________________
  py27: commands succeeded
  congratulations :)
@codecov
Copy link

@codecov codecov bot commented Sep 13, 2017

Codecov Report

Merging #2923 into master will decrease coverage by 0.23%.
The diff coverage is 34.14%.

@@            Coverage Diff            @@
##           master   #2923      +/-   ##
=========================================
- Coverage   84.83%   84.6%   -0.24%     
=========================================
  Files         164     164              
  Lines        9192    9241      +49     
  Branches     1370    1375       +5     
=========================================
+ Hits         7798    7818      +20     
- Misses       1138    1166      +28     
- Partials      256     257       +1
Impacted Files Coverage Δ
scrapy/pipelines/images.py 90.37% <100%> (+0.14%) ⬆️
scrapy/utils/test.py 55.55% <27.27%> (-7.24%) ⬇️
scrapy/pipelines/files.py 67.04% <32.14%> (-4.15%) ⬇️
scrapy/utils/versions.py 78.57% <0%> (-4.04%) ⬇️
scrapy/commands/version.py 100% <0%> (ø) ⬆️
@dangra dangra changed the title Fixes #685 FilesPipeline support for Google Cloud Storage. [MRG+1] Fixes #685 FilesPipeline support for Google Cloud Storage. Sep 27, 2017
@dangra dangra self-requested a review Sep 27, 2017
@dangra
dangra approved these changes Sep 27, 2017
Copy link
Member

@dangra dangra left a comment

Good work. It looks clean and good to merge.

@kmike any pushback?

@rhoboro
Copy link
Contributor Author

@rhoboro rhoboro commented Oct 2, 2017

@dangra @kmike
Thank you for review.
I modified some docs. Please check my changes.

@kmike
Copy link
Member

@kmike kmike commented Oct 2, 2017

👍 Thanks @rhoboro!

FTR: I haven't tried to run tests or use this code, but it looks good to merge.

@kmike kmike changed the title [MRG+1] Fixes #685 FilesPipeline support for Google Cloud Storage. [MRG+2] Fixes #685 FilesPipeline support for Google Cloud Storage. Oct 2, 2017
@zinyosrim
Copy link

@zinyosrim zinyosrim commented Oct 2, 2017

can you please point me to the docs for GCP as I need this feature asap. With S3 the authentication params need to be entered into settings.py. Couldn't find a similar code for GCP

@rhoboro
Copy link
Contributor Author

@rhoboro rhoboro commented Oct 2, 2017

@zinyosrim
If locally, gcloud auth login command(in cloud sdk) is the easiest way to check this feature.

In another way, google-cloud-storage gets credential information from GOOGLE_APPLICATION_CREDENTIALS. Please see this page.

@dangra dangra merged commit 5fac2d7 into scrapy:master Oct 2, 2017
1 of 2 checks passed
1 of 2 checks passed
codecov/patch 34.14% of diff hit (target 84.83%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dangra
Copy link
Member

@dangra dangra commented Oct 2, 2017

@rhoboro I hope you can stand by your code in case of issues with this storage backend. But not least, thanks for contributing it.

@rhoboro rhoboro deleted the rhoboro:fixes-685 branch Oct 2, 2017
@rhoboro rhoboro restored the rhoboro:fixes-685 branch Oct 3, 2017
@rhoboro
Copy link
Contributor Author

@rhoboro rhoboro commented Oct 3, 2017

Sorry, I'm still using my branch. So, I leave it as it is released.

@kmike kmike added this to the v1.4.1 milestone Oct 5, 2017
@kmike kmike modified the milestones: v1.4.1, v1.5 Dec 22, 2017
@zinyosrim
Copy link

@zinyosrim zinyosrim commented Dec 27, 2017

I have the issue #3044
Any idea what's going wrong here? I am using the splash docker image, because I am crawling a site using JS. I first tried with Twisted-17.5.0 then upgraded to Twisted-17.9.0 with no effect.

ZIn

@rhoboro
Copy link
Contributor Author

@rhoboro rhoboro commented Dec 28, 2017

@zinyosrim
hmm... I tried to store images to GCS bucket with splash docker image, and it worked fine.
I pushed my source code to following repository, so please check it.
https://github.com/rhoboro/splash_gcs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants