Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint the output file with Rubocop #12

Closed
wants to merge 1 commit into from
Closed

Conversation

alanlacerda
Copy link

This PR appends a call to rubocop -a after the schema is written to a file.

The command is only executed if the Rubocop binstub is available.

Copy link
Contributor

@GabrielNagy GabrielNagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks for your contribution on this issue. As mentioned internally I don't believe this is an optimal solution because:

  1. We are linting generated code which we shouldn't care too much about - we will never touch this by hand. Additionally, these generated schemas are long and having rubocop run on each and every one of them is a waste of resources in my opinion (here I'm thinking both of the linting being done by schema-test, and subsequent linting done outside of it like CI or local runs of rubocop).
  2. The rewriter uses PP which for better or worse is an established output format, I'm not a fan of us altering it after generation. I would much prefer if we could directly generate already compliant output (but this is a time sink which is not worth approaching imo).
  3. Shelling out to external commands (especially in the context of Ruby and Bundler) is costly. Doing this for every expanded file greatly increases the duration of the test run. For reference, I ran your implementation on 5 test files, and while minitest reported a duration of 20.4 seconds, the actual duration was 44 seconds. This can be circumvented in CI if rubocop is not a test dependency, but has a visible impact in development environments.
  4. By linting the entire file this gem oversteps its boundaries and touches code parts outside of its documented methods. At worst this should be a configurable, opt-in setting, not something that's on by default.

There are probably more arguments to make but I would much prefer if we went with the way of disabling rubocop in these auto-generated blocks, which is faster and has less footprint on already existing schema tests.

@alanlacerda
Copy link
Author

alanlacerda commented Jan 15, 2024

Thank you for the through write up @GabrielNagy!

TL;DR: I overall agree with your points, specially that the changes in this PR are definitely not the right solution.

Just to give some context about this PR: I created it under the assumption that we only expand the JSON schema when calling the test for the first time. That's why I suggested an approach similar as the one that Rubocop Rails uses for autocorrecting generated code. But even thinking about it now, for VS Code users that's actually useless, because Format on save would basically fix the code automatically. Anyway...

As it's clear now, I missed the fact that we re-generate the JSON schema nearly every time assert_valid_json_for_schema is called. In this case my change actually introduces a race condition in the CI: even if the linted code is pushed to a branch, the JSON schema would be overwritten when the tests are executed.

  1. We are linting generated code which we shouldn't care too much about - we will never touch this by hand. Additionally, these generated schemas are long and having rubocop run on each and every one of them is a waste of resources in my opinion (here I'm thinking both of the linting being done by schema-test, and subsequent linting done outside of it like CI or local runs of rubocop).

Although I agree that we shouldn't care for generated code in general, we have a pretty unique case here. For instance schema.rb or .rubocop_todo.yml are files generated by other libraries and we should not care about even opening them most of the time. Additionally, we interact with those libraries in very specific workflows via their APIs. In contrast, the expanded JSON schema lives among code which we do interact with in a daily basis. Following the adage that we spend more time reading than writing code, I'd rather have it optimised for the former. Furthermore, I wonder why we even have generated code that shouldn't be changed inside our tests to start with, but that's a discussion for another day.

Regarding the waste of resources, I agree. If we were to lint only once, I think that's actually part of the usual Rubocop development workflow. But running it every time the test runs is a terrible idea.

  1. The rewriter uses PP which for better or worse is an established output format, I'm not a fan of us altering it after generation. I would much prefer if we could directly generate already compliant output (but this is a time sink which is not worth approaching imo).

As I mentioned above, I would be fine with that but I wonder:

  • why Ruby was chosen in the first place, since JSON schemas are JSON files to begin with?
  • why do write a JSON schema to the test file during an assertion? I think we'd benefit from splitting schema generation and validation in two separate well-defined worflows:
    • Schema generation (dump): similar to Rails with rails db:schema:dump or graphql:schema:dump used by the GraphQL library
    • Schema validation (tests): similar to what we do now, but without generating new schemas. Instead, the assertion fails if the current one doesn't match the schema definition
  1. Shelling out to external commands (especially in the context of Ruby and Bundler) is costly. Doing this for every expanded file greatly increases the duration of the test run. For reference, I ran your implementation on 5 test files, and while minitest reported a duration of 20.4 seconds, the actual duration was 44 seconds. This can be circumvented in CI if rubocop is not a test dependency, but has a visible impact in development environments.

After considering how often the command actually runs, I 💯 agree with you.

  1. By linting the entire file this gem oversteps its boundaries and touches code parts outside of its documented methods. At worst this should be a configurable, opt-in setting, not something that's on by default.

I agree. If we were to move forward with the approach on this PR (which we shouldn't), I though of turning this in an opt-in setting like you did on your PR.

There are probably more arguments to make but I would much prefer if we went with the way of disabling rubocop in these auto-generated blocks, which is faster and has less footprint on already existing schema tests.

I agree. In the interest of unblocking the development workflow, I think your approach is the best now.
I'd like to, however, discuss whether the current approach for code generation is the best. We could potentially save time in our CI if we rethink how we deal with JSON schemas; maybe even consider how we could leverage it for some interesting stuff such as OpenAPI (e.g. rswag).

@GabrielNagy
Copy link
Contributor

In contrast, the expanded JSON schema lives among code which we do interact with in a daily basis. Following the adage that we spend more time reading than writing code, I'd rather have it optimised for the former. Furthermore, I wonder why we even have generated code that shouldn't be changed inside our tests to start with, but that's a discussion for another day.

Yes I agree, ideally this "golden" data should live somewhere outside the test files, but this brings some complications such as how do we handle moving assertions throughout a test file (assuming we store the json data using an unique key of class name + test name + ...?). We'd need to add something unique to the assert call to be able to retroactively locate the generated files, to be able to compare the previous data with the new data. Maybe schema-test can generate an unique hash and write it as a comment on the assert line and as part of the golden file name.

  • why Ruby was chosen in the first place, since JSON schemas are JSON files to begin with?
  • why do write a JSON schema to the test file during an assertion? I think we'd benefit from splitting schema generation and validation in two separate well-defined worflows

I haven't worked on this gem but I believe this was a compromise for simplicity's sake, and the best way to handle existing projects with no established schema definitions. Thus, schemas have to be manually defined by the user (under schema_definitions), then during tests the API responses are compared to the user-defined schema. The JSON representation is dumped to the test file, and on subsequent runs schema-test will either:

  • flunk the test if the API response changes and doesn't conform to the existing schema dump, if the CI environment variable is set
  • update the test with the new schema otherwise

@alanlacerda
Copy link
Author

alanlacerda commented Jan 15, 2024

So if I get it correctly, we don't even need the schema dump. 🤔

Below we should have all we need:

def assert_valid_json_for_schema(json, name, version) 
  # Load the definition from file
  definition = SchemaTest::Definition.find(name, version)

  # Raises if no definition found
  raise "Unknown definition #{name}, version: #{version}" unless definition.present? 

  # Validate json payload against definition defined by user
  errors = SchemaTest.validate_json(json, definition) 
   
  # Fails if the payload is incompatible with the definition
  assert_empty errors, "JSON did not pass schema:\n#{errors.join("\n")}" 
end 

Calling it should be simpler:

assert_valid_json_for_schema(response_json, :student_splash_partner, 1)

❌ Assertion fails if:

  • Controller changed and the payload is now incompatible with the schema definition

    • Solution: Update the definition or refactor controller to non-breaking change
  • Schema definition changed and the controller payload is now incompatible with it

    • Solution: Update controller accordingly

✅ Assertion only passes if:

  • Controller and definition were both updated in lockstep

In both failure cases the developer would be aware that a breaking change was introduced either on the payload side or the definition side, without the need for a schema dump.

@GabrielNagy
Copy link
Contributor

That's true, but refer to this portion of the readme:

Keeping the full schema directly in the tests means that it is impossible for us to accidentally impact any API endpoints with a distant schema change without also producing some change in the test files for those endpoints. That is the main benefit that this library tries to acheive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants