Lint the output file with Rubocop #12

alanlacerda · 2024-01-12T14:06:40Z

This PR appends a call to rubocop -a after the schema is written to a file.

The command is only executed if the Rubocop binstub is available.

GabrielNagy

First of all, thanks for your contribution on this issue. As mentioned internally I don't believe this is an optimal solution because:

We are linting generated code which we shouldn't care too much about - we will never touch this by hand. Additionally, these generated schemas are long and having rubocop run on each and every one of them is a waste of resources in my opinion (here I'm thinking both of the linting being done by schema-test, and subsequent linting done outside of it like CI or local runs of rubocop).
The rewriter uses PP which for better or worse is an established output format, I'm not a fan of us altering it after generation. I would much prefer if we could directly generate already compliant output (but this is a time sink which is not worth approaching imo).
Shelling out to external commands (especially in the context of Ruby and Bundler) is costly. Doing this for every expanded file greatly increases the duration of the test run. For reference, I ran your implementation on 5 test files, and while minitest reported a duration of 20.4 seconds, the actual duration was 44 seconds. This can be circumvented in CI if rubocop is not a test dependency, but has a visible impact in development environments.
By linting the entire file this gem oversteps its boundaries and touches code parts outside of its documented methods. At worst this should be a configurable, opt-in setting, not something that's on by default.

There are probably more arguments to make but I would much prefer if we went with the way of disabling rubocop in these auto-generated blocks, which is faster and has less footprint on already existing schema tests.

alanlacerda · 2024-01-15T11:13:23Z

Thank you for the through write up @GabrielNagy!

TL;DR: I overall agree with your points, specially that the changes in this PR are definitely not the right solution.

Just to give some context about this PR: I created it under the assumption that we only expand the JSON schema when calling the test for the first time. That's why I suggested an approach similar as the one that Rubocop Rails uses for autocorrecting generated code. But even thinking about it now, for VS Code users that's actually useless, because Format on save would basically fix the code automatically. Anyway...

As it's clear now, I missed the fact that we re-generate the JSON schema nearly every time assert_valid_json_for_schema is called. In this case my change actually introduces a race condition in the CI: even if the linted code is pushed to a branch, the JSON schema would be overwritten when the tests are executed.

We are linting generated code which we shouldn't care too much about - we will never touch this by hand. Additionally, these generated schemas are long and having rubocop run on each and every one of them is a waste of resources in my opinion (here I'm thinking both of the linting being done by schema-test, and subsequent linting done outside of it like CI or local runs of rubocop).

Although I agree that we shouldn't care for generated code in general, we have a pretty unique case here. For instance schema.rb or .rubocop_todo.yml are files generated by other libraries and we should not care about even opening them most of the time. Additionally, we interact with those libraries in very specific workflows via their APIs. In contrast, the expanded JSON schema lives among code which we do interact with in a daily basis. Following the adage that we spend more time reading than writing code, I'd rather have it optimised for the former. Furthermore, I wonder why we even have generated code that shouldn't be changed inside our tests to start with, but that's a discussion for another day.

Regarding the waste of resources, I agree. If we were to lint only once, I think that's actually part of the usual Rubocop development workflow. But running it every time the test runs is a terrible idea.

The rewriter uses PP which for better or worse is an established output format, I'm not a fan of us altering it after generation. I would much prefer if we could directly generate already compliant output (but this is a time sink which is not worth approaching imo).

As I mentioned above, I would be fine with that but I wonder:

why Ruby was chosen in the first place, since JSON schemas are JSON files to begin with?
why do write a JSON schema to the test file during an assertion? I think we'd benefit from splitting schema generation and validation in two separate well-defined worflows:
- Schema generation (dump): similar to Rails with rails db:schema:dump or graphql:schema:dump used by the GraphQL library
- Schema validation (tests): similar to what we do now, but without generating new schemas. Instead, the assertion fails if the current one doesn't match the schema definition

Shelling out to external commands (especially in the context of Ruby and Bundler) is costly. Doing this for every expanded file greatly increases the duration of the test run. For reference, I ran your implementation on 5 test files, and while minitest reported a duration of 20.4 seconds, the actual duration was 44 seconds. This can be circumvented in CI if rubocop is not a test dependency, but has a visible impact in development environments.

After considering how often the command actually runs, I 💯 agree with you.

By linting the entire file this gem oversteps its boundaries and touches code parts outside of its documented methods. At worst this should be a configurable, opt-in setting, not something that's on by default.

I agree. If we were to move forward with the approach on this PR (which we shouldn't), I though of turning this in an opt-in setting like you did on your PR.

There are probably more arguments to make but I would much prefer if we went with the way of disabling rubocop in these auto-generated blocks, which is faster and has less footprint on already existing schema tests.

I agree. In the interest of unblocking the development workflow, I think your approach is the best now.
I'd like to, however, discuss whether the current approach for code generation is the best. We could potentially save time in our CI if we rethink how we deal with JSON schemas; maybe even consider how we could leverage it for some interesting stuff such as OpenAPI (e.g. rswag).

GabrielNagy · 2024-01-15T11:57:08Z

In contrast, the expanded JSON schema lives among code which we do interact with in a daily basis. Following the adage that we spend more time reading than writing code, I'd rather have it optimised for the former. Furthermore, I wonder why we even have generated code that shouldn't be changed inside our tests to start with, but that's a discussion for another day.

Yes I agree, ideally this "golden" data should live somewhere outside the test files, but this brings some complications such as how do we handle moving assertions throughout a test file (assuming we store the json data using an unique key of class name + test name + ...?). We'd need to add something unique to the assert call to be able to retroactively locate the generated files, to be able to compare the previous data with the new data. Maybe schema-test can generate an unique hash and write it as a comment on the assert line and as part of the golden file name.

why Ruby was chosen in the first place, since JSON schemas are JSON files to begin with?

why do write a JSON schema to the test file during an assertion? I think we'd benefit from splitting schema generation and validation in two separate well-defined worflows

I haven't worked on this gem but I believe this was a compromise for simplicity's sake, and the best way to handle existing projects with no established schema definitions. Thus, schemas have to be manually defined by the user (under schema_definitions), then during tests the API responses are compared to the user-defined schema. The JSON representation is dumped to the test file, and on subsequent runs schema-test will either:

flunk the test if the API response changes and doesn't conform to the existing schema dump, if the CI environment variable is set
update the test with the new schema otherwise

alanlacerda · 2024-01-15T14:05:58Z

So if I get it correctly, we don't even need the schema dump. 🤔

Below we should have all we need:

def assert_valid_json_for_schema(json, name, version) 
  # Load the definition from file
  definition = SchemaTest::Definition.find(name, version)

  # Raises if no definition found
  raise "Unknown definition #{name}, version: #{version}" unless definition.present? 

  # Validate json payload against definition defined by user
  errors = SchemaTest.validate_json(json, definition) 
   
  # Fails if the payload is incompatible with the definition
  assert_empty errors, "JSON did not pass schema:\n#{errors.join("\n")}" 
end

Calling it should be simpler:

assert_valid_json_for_schema(response_json, :student_splash_partner, 1)

❌ Assertion fails if:

Controller changed and the payload is now incompatible with the schema definition
- Solution: Update the definition or refactor controller to non-breaking change
Schema definition changed and the controller payload is now incompatible with it
- Solution: Update controller accordingly

✅ Assertion only passes if:

Controller and definition were both updated in lockstep

In both failure cases the developer would be aware that a breaking change was introduced either on the payload side or the definition side, without the need for a schema dump.

GabrielNagy · 2024-01-15T14:15:41Z

That's true, but refer to this portion of the readme:

Keeping the full schema directly in the tests means that it is impossible for us to accidentally impact any API endpoints with a distant schema change without also producing some change in the test files for those endpoints. That is the main benefit that this library tries to acheive.

Lint the output file with Rubocop

e4556d2

GabrielNagy reviewed Jan 13, 2024

View reviewed changes

alanlacerda closed this Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lint the output file with Rubocop #12

Lint the output file with Rubocop #12

alanlacerda commented Jan 12, 2024

GabrielNagy left a comment

alanlacerda commented Jan 15, 2024 •

edited

Loading

GabrielNagy commented Jan 15, 2024

alanlacerda commented Jan 15, 2024 •

edited

Loading

GabrielNagy commented Jan 15, 2024

Lint the output file with Rubocop #12

Lint the output file with Rubocop #12

Conversation

alanlacerda commented Jan 12, 2024

GabrielNagy left a comment

Choose a reason for hiding this comment

alanlacerda commented Jan 15, 2024 • edited Loading

GabrielNagy commented Jan 15, 2024

alanlacerda commented Jan 15, 2024 • edited Loading

GabrielNagy commented Jan 15, 2024

alanlacerda commented Jan 15, 2024 •

edited

Loading

alanlacerda commented Jan 15, 2024 •

edited

Loading