Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Added Cerberus as a new option for item validation #201

Open
wants to merge 81 commits into
base: master
from

Conversation

@vipulgupta2048
Copy link
Contributor

commented Aug 20, 2019

🐣
This brings an end to a great and fulfilling period of contributing to Spidermon and the Scrapy Project as part of Google Summer of Code 2019.

Google Summer of Code 2019 with The Scrapy Project

Project - Integrate Cerberus, solves #182,
Project Description- Google Archive

Co-Org Admin - Cathal
Mentors - @rennerocha , @ejulio
Personal GSoC Blog - Mixster x GSoC
PSF Blog - https://blogs.python-gsoc.org/en/blogs/vipulgupta2048s-blog/

Description

  • Spidermon is a recommended tool for monitoring spiders created using Scrapy. The user at the time can choose between two libraries for item validation rules: jsonschema and schematics. We want to provide a third option that being Cerberus.
  • Cerberus provides powerful yet simple and lightweight data validation functionality out of the box and is designed to be easily extensible, allowing for custom validation. It has no dependencies and is thoroughly tested on several Python versions.
  • The goal of this project was to integrate, test and enable Cerberus as a new option for item validation available for the user.

Deliverables & Work Done

  1. All Code of highest quality standards having detailed documentation, black​ styling and well tested (Pull request – #201)
    This Pull Request Includes:
  • CerberusValidator() class for Item validation through Cerberus. (vipulgupta2048#2)
  • Translator for translating errors for a better, unified system working with other validation methods. (vipulgupta2048#4)
  • Complete integration with Scrapy pipelines, working with raw schema, URL’s, and paths. (vipulgupta2048#5)
  1. Unit + integration tests for each component in place.
  2. Documentation for Cerberus Validation method. (vipulgupta2048#6)
  3. A ​detailed, well-documented tutorial​ will be developed during the course of the summers implementing almost every feature of Spidermon to help developers as a reference and blogs will be written.
  4. One blog ​each week​ regarding Spidermon and my experience, learning through the project on ​Mixster x GSoC.
  5. For the community to track progress, a ​tracker was maintained with my latest developments containing week-to-week updates​, and MoM of mentor meetings​. This helps to maintain accountability​, transparency and keeping track.​

For system testing, one could go ahead and use the pre-configured Quotes spider https://github.com/vipulgupta2048/testing_quotes and installing Spidermon from the master branch of my fork.

Looking forward to improving the source code even further, through all your valued opinions, reviews, and comments. Would love to clarify and help understand the work done.

This project has been completed with long nights of reading and writing the code, learning new concepts on the fly and asking hundreds of pop-questions on Slack, that were answered duly by my mentors @ejulio @rennerocha as without their constant help, motivation, and guidance completing this uphill task wouldn't be ever possible.

vipulgupta2048 added 30 commits Jul 6, 2019
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
* Cerberus Validator : Validate Only

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>

* Implement Translator

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>

* Changes implemented as suggested

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>

* Change location for validator tests

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>

* Changes Implemented as suggested, also code formatted

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>

* Changes implemented as suggested

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>

* Ported to PyTest

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>

* Changes implemented as suggested

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>

* Remove extra lines
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Implement Translator
a
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
@vipulgupta2048 vipulgupta2048 force-pushed the vipulgupta2048:master branch 2 times, most recently from ee442e3 to 766a6b2 Aug 24, 2019
docs/source/getting-started.rst Outdated Show resolved Hide resolved
docs/source/item-validation.rst Outdated Show resolved Hide resolved
docs/source/item-validation.rst Outdated Show resolved Hide resolved
@rennerocha rennerocha changed the title Integrate Cerberus with Spidermon #GSoC 2019 feature: Added Cerberus as a new option for item validation Aug 26, 2019
@rennerocha rennerocha added the GSOC label Aug 26, 2019
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
docs/source/getting-started.rst Outdated Show resolved Hide resolved
docs/source/getting-started.rst Outdated Show resolved Hide resolved
docs/source/item-validation.rst Outdated Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
raphapassini and others added 2 commits Aug 22, 2019
* Add documentation for Expression Monitors
Co-Authored-By: Adrián Chaves <adrian@chaves.io>
@vipulgupta2048 vipulgupta2048 force-pushed the vipulgupta2048:master branch from aeb705d to c94f198 Aug 29, 2019
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
docs/source/getting-started.rst Outdated Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
spidermon/contrib/validation/utils.py Outdated Show resolved Hide resolved
spidermon/contrib/validation/utils.py Show resolved Hide resolved
str(e) + "\nCould not parse schema in '{}'".format(source)
)
else:
schema = load_object(source)

This comment has been minimized.

Copy link
@Gallaecio

Gallaecio Aug 30, 2019

Member

The documentation does not make it obvious that the schema may be a string defining the import path of a variable defining the schema.

This comment has been minimized.

Copy link
@vipulgupta2048

vipulgupta2048 Aug 30, 2019

Author Contributor

Umm, should we give an example of how a schema should be like? I feel that could be done in a more practical way with the tutorial I am building.

This comment has been minimized.

Copy link
@Gallaecio

Gallaecio Sep 3, 2019

Member

Umm, should we give an example of how a schema should be like?

In the reference documentation you wrote above, the first paragraph describes the different ways a schema may be defined, and then gives an example with all the three different ways used at once.

If we intend to support an import string as well, I think the documentation above should be expanded accordingly: mention this possibility in the introductory paragraph and feature this possibility in the example.

I feel that could be done in a more practical way with the tutorial I am building.

Usually, API reference documentation covers all the information, and tutorials cover only the most important information, to makes things simpler, and link to the reference documentation for additional details.

So, while I’m not sure whether this way of defining a schema should me mentioned in the tutorial, I’m sure it should be mentioned in the reference documentation.

This comment has been minimized.

Copy link
@vipulgupta2048

vipulgupta2048 Sep 3, 2019

Author Contributor

I agree with you on the purpose of information given in reference and tutorial, but I am not entirely sure about what you mean by import string or the review you made. I feel there is a bit of misunderstanding on this. As I don't get how this (schema may be a string defining the import path of a variable defining the schema.) conclusion is being made.

This comment has been minimized.

Copy link
@vipulgupta2048

vipulgupta2048 Sep 18, 2019

Author Contributor

@Gallaecio Can you take another go of what you meant here? It would be better to clear the air by knowing exactly what you mean by this.

If we intend to support an import string as well, I think the documentation above should be expanded accordingly: mention this possibility in the introductory paragraph and feature this possibility in the example.

)
else:
schema = load_object(source)
if isinstance(schema, six.string_types):

This comment has been minimized.

Copy link
@Gallaecio

Gallaecio Aug 30, 2019

Member

The documentation does not mention either that the schema may be specified as a string.

I wonder if we should remove support for this, and instead allow paths to JSON files regardless of their file extension, just as we don’t require a file extension or Content-Type header when using a URL.

This comment has been minimized.

Copy link
@vipulgupta2048

vipulgupta2048 Aug 31, 2019

Author Contributor

From my perspective, I feel like adding raw schemas directly to the settings.py file as strings for validation is something users wouldn't feel like doing. Also, only my code (Cerberus Validation) has tests and docs for adding raw schemas because I thought that it would be a good feature to work with.

Bit of context behind this change:
The code is from the original file is /jsonschema/tools.py, and it was moved, tweaked, and was made available for both Cerberus and JSONSchema to use. Hence, these lines are from the original file only.

This comment has been minimized.

Copy link
@Gallaecio

Gallaecio Sep 3, 2019

Member

@rennerocha What are your thoughts on this?

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
@vipulgupta2048 vipulgupta2048 force-pushed the vipulgupta2048:master branch from 4f82df1 to d7b0cbc Aug 31, 2019
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
setup.py Outdated Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
str(e) + "\nCould not parse schema in '{}'".format(source)
)
else:
schema = load_object(source)

This comment has been minimized.

Copy link
@Gallaecio

Gallaecio Sep 3, 2019

Member

Umm, should we give an example of how a schema should be like?

In the reference documentation you wrote above, the first paragraph describes the different ways a schema may be defined, and then gives an example with all the three different ways used at once.

If we intend to support an import string as well, I think the documentation above should be expanded accordingly: mention this possibility in the introductory paragraph and feature this possibility in the example.

I feel that could be done in a more practical way with the tutorial I am building.

Usually, API reference documentation covers all the information, and tutorials cover only the most important information, to makes things simpler, and link to the reference documentation for additional details.

So, while I’m not sure whether this way of defining a schema should me mentioned in the tutorial, I’m sure it should be mentioned in the reference documentation.

Copy link
Contributor Author

left a comment

Hi @rennerocha, we need your comments in some reviews 😅, have you checked them out yet.

Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <vipulgupta2048@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.