Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Fixes #685 FilesPipeline support for Google Cloud Storage. #2923

Merged
merged 4 commits into from Oct 2, 2017

Conversation

rhoboro
Copy link
Contributor

@rhoboro rhoboro commented Sep 13, 2017

(venv) ~/github/scrapy/scrapy % GCS_PROJECT_ID="myproject" GCS_TEST_FILE_URI="gs://my-bucket/" tox -- tests/test_pipeline_files.py -v

...

tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_expired <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_not_expired <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_file_path PASSED
tests/test_pipeline_files.py::FilesPipelineTestCase::test_fs_store PASSED
tests/test_pipeline_files.py::DeprecatedFilesPipelineTestCase::test_default_file_key_method PASSED
tests/test_pipeline_files.py::DeprecatedFilesPipelineTestCase::test_overridden_file_key_method PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseFields::test_item_fields_default PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseFields::test_item_fields_override_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_cls_attrs_with_DEFAULT_prefix PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_custom_settings_and_class_attrs_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_custom_settings_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_different_settings_for_different_instances PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_no_custom_settings_for_subclasses PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_subclass_attributes_preserved_if_no_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_subclass_attrs_preserved_custom_settings PASSED
tests/test_pipeline_files.py::FilesPipelineTestCaseCustomSettings::test_user_defined_subclass_default_key_names PASSED
tests/test_pipeline_files.py::TestS3FilesStore::test_persist <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py SKIPPED
tests/test_pipeline_files.py::TestGCSFilesStore::test_persist <- .tox/py27/lib/python2.7/site-packages/twisted/internet/defer.py PASSED

========================================================================================== 17 passed, 1 skipped in 11.54 seconds ===========================================================================================
_________________________________________________________________________________________________________ summary __________________________________________________________________________________________________________
  py27: commands succeeded
  congratulations :)

@codecov
Copy link

codecov bot commented Sep 13, 2017

Codecov Report

Merging #2923 into master will decrease coverage by 0.23%.
The diff coverage is 34.14%.

@@            Coverage Diff            @@
##           master   #2923      +/-   ##
=========================================
- Coverage   84.83%   84.6%   -0.24%     
=========================================
  Files         164     164              
  Lines        9192    9241      +49     
  Branches     1370    1375       +5     
=========================================
+ Hits         7798    7818      +20     
- Misses       1138    1166      +28     
- Partials      256     257       +1
Impacted Files Coverage Δ
scrapy/pipelines/images.py 90.37% <100%> (+0.14%) ⬆️
scrapy/utils/test.py 55.55% <27.27%> (-7.24%) ⬇️
scrapy/pipelines/files.py 67.04% <32.14%> (-4.15%) ⬇️
scrapy/utils/versions.py 78.57% <0%> (-4.04%) ⬇️
scrapy/commands/version.py 100% <0%> (ø) ⬆️

@dangra dangra changed the title Fixes #685 FilesPipeline support for Google Cloud Storage. [MRG+1] Fixes #685 FilesPipeline support for Google Cloud Storage. Sep 27, 2017
@dangra dangra self-requested a review September 27, 2017 18:29
Copy link
Member

@dangra dangra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. It looks clean and good to merge.

@kmike any pushback?

@kmike
Copy link
Member

kmike commented Sep 28, 2017

It looks good to me. @rhoboro would you mind adding some docs? See https://doc.scrapy.org/en/latest/topics/media-pipeline.html#supported-storage and https://doc.scrapy.org/en/latest/topics/media-pipeline.html#amazon-s3-storage.

@rhoboro
Copy link
Contributor Author

rhoboro commented Oct 2, 2017

@dangra @kmike
Thank you for review.
I modified some docs. Please check my changes.

@kmike
Copy link
Member

kmike commented Oct 2, 2017

👍 Thanks @rhoboro!

FTR: I haven't tried to run tests or use this code, but it looks good to merge.

@kmike kmike changed the title [MRG+1] Fixes #685 FilesPipeline support for Google Cloud Storage. [MRG+2] Fixes #685 FilesPipeline support for Google Cloud Storage. Oct 2, 2017
@zinyosrim
Copy link

can you please point me to the docs for GCP as I need this feature asap. With S3 the authentication params need to be entered into settings.py. Couldn't find a similar code for GCP

@rhoboro
Copy link
Contributor Author

rhoboro commented Oct 2, 2017

@zinyosrim
If locally, gcloud auth login command(in cloud sdk) is the easiest way to check this feature.

In another way, google-cloud-storage gets credential information from GOOGLE_APPLICATION_CREDENTIALS. Please see this page.

@dangra dangra merged commit 5fac2d7 into scrapy:master Oct 2, 2017
@dangra
Copy link
Member

dangra commented Oct 2, 2017

@rhoboro I hope you can stand by your code in case of issues with this storage backend. But not least, thanks for contributing it.

@rhoboro rhoboro deleted the fixes-685 branch October 2, 2017 17:01
@rhoboro rhoboro restored the fixes-685 branch October 3, 2017 05:00
@rhoboro
Copy link
Contributor Author

rhoboro commented Oct 3, 2017

Sorry, I'm still using my branch. So, I leave it as it is released.

@kmike kmike added this to the v1.4.1 milestone Oct 5, 2017
@kmike kmike modified the milestones: v1.4.1, v1.5 Dec 22, 2017
@zinyosrim
Copy link

I have the issue #3044
Any idea what's going wrong here? I am using the splash docker image, because I am crawling a site using JS. I first tried with Twisted-17.5.0 then upgraded to Twisted-17.9.0 with no effect.

ZIn

@rhoboro
Copy link
Contributor Author

rhoboro commented Dec 28, 2017

@zinyosrim
hmm... I tried to store images to GCS bucket with splash docker image, and it worked fine.
I pushed my source code to following repository, so please check it.
https://github.com/rhoboro/splash_gcs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Google Cloud Storage Support (Storage backends)
4 participants