Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate JSON files provided to OPA could be dealt with more gracefully #2983

Open
anderseknert opened this issue Dec 7, 2020 · 24 comments

Comments

@anderseknert
Copy link
Member

anderseknert commented Dec 7, 2020

If a JSON file is provided more than once when starting OPA, a merge error is thrown. This came up on Slack when a user mistakenly used something like opa test . data.json and got a cryptic merge error as a result.

Expected Behavior

OPA should be able to either 1) filter out duplicates from the list of provided files or 2) give a more specific error message if duplicates are found.

Actual Behavior

1 error occurred during loading: data.json: merge error

Steps to Reproduce the Problem

$ echo '{"a":"b"}' > data.json
$ opa test . data.json
@faraway
Copy link

faraway commented May 29, 2021

@anderseknert just saw you added the "good first issue" label. Do you think this is something I can take can fix?

@anderseknert
Copy link
Member Author

@faraway I think @olamiko might be working on this - seems like I can't assign issues unless that person has interacted with an issue first, but I know I tried. Are you on the OPA Slack? We could sync there if you're looking for a good first issue to tackle.

@olamiko
Copy link
Contributor

olamiko commented May 30, 2021

Hi @faraway, I am currently working on this

@stale
Copy link

stale bot commented Nov 22, 2021

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label Nov 22, 2021
@stale stale bot removed the inactive label Dec 3, 2021
@stale
Copy link

stale bot commented Jan 2, 2022

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label Jan 2, 2022
@anderseknert
Copy link
Member Author

This came up again today, when someone ran something like opa check src/ test/ . and saw merge errors.

@stale stale bot removed the inactive label Jan 2, 2023
@stale
Copy link

stale bot commented Feb 1, 2023

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@stale stale bot added the inactive label Feb 1, 2023
@renatosc
Copy link
Contributor

renatosc commented May 9, 2023

I am seeing this error when trying to do opa test . (from my src folder) which is giving me

1 error occurred during loading: src\schemas\check1_schema.json

I do not have duplicated files inside the src folder and I started getting this error when I added a 2nd schema json file to my schema folder.

Any ideas?

And if no one is currently investigating this issue, I can take care of it.

@stale stale bot removed the inactive label May 9, 2023
@anderseknert
Copy link
Member Author

OPA merges all data it encounters according to its document model. You don't need to have duplicate files for this error to be displayed — just data that exists at the same path in two different JSON documents. I.e.

foo.json

{"foo": 1}

bar.json

{"foo": 2}

Would be a conflict, as foo will be merged into the same document from two different sources.

@renatosc
Copy link
Contributor

renatosc commented May 9, 2023

Thanks for the response. I added a new schema file and its contents is just a new json object, so no same key was being used. I even made sure to have a different "id" for the schema, no difference.

The Doc page about Schema give us example of using multiple schema files and even provides a repo example here.

I just tried running opa test . on that kubernetes folder and got merge error back:

3 errors occurred during loading:
pod.json: merge error
schemas\input-anyOf.json: merge error
schemas\input.json: merge error

Maybe we have an issue merging schema objects?

PS: Funny that I get the error (on my code, not the k8s above) during tests (which I do not even need the schema) and when I do opa eval to check the schema it works fine. So I am thinking if I can just exclude the schema folder when running the tests.

PS2: Yes, if I run opa test. --ignore schemas the tests work fine. So at least I can get my tests working here. Do you think we may have a bug on opa test merging (since opa eval works fine)?

@anderseknert
Copy link
Member Author

I believe some attribute is merged with a conflict :) If you can create a minimal repo to reproduce I'd be happy to take a look.

@renatosc
Copy link
Contributor

There you go: https://github.com/renatosc/opa-merge-error

@anderseknert
Copy link
Member Author

Thanks Renato! That is a conflict, as both of your schemas contain the same keys at the same level of the document. You'd either need to put each under it's own top-level key, like:

{
  "schema1": { ... }
}

Or use a directory structure where each file is in its own directory, and treat the topmost directory as a bundle (i.e. --bundle) in which case the directory structure will be mirrored in the document.

@renatosc
Copy link
Contributor

renatosc commented May 15, 2023

Hi @anderseknert, thanks for checking it out.

I tried your suggestion of adding a key (like as "schema1") before the current schema json object and although it fixes the merge error it appears to make the schema validation to not work properly.

I updated the repo by updating the policy1_schema.json to have a "schema1" key as you suggested. I also changed the schema to expect a role2 property which currently does not exist in policy1.rego (so we can see the schema validation working).

So, by running

opa eval data.policy1.allow --data regos/policy1.rego --schema schemas/policy1_schema.json

I am now getting no error back when I should in fact be receiving an error (since schema is expecting role2 and rego is using role.)

If I remove that "schema1" key added the schema file and run the command above, I get the schema validation error correctly.

Any ideas?

@anderseknert
Copy link
Member Author

Ah okay, I think I see what's going on here. OPA merges all documents that are included for policy evaluation — i.e. all documents that contribute to the data document. The documents loaded using the --schema flag do not contribute directly to the data document, but are only used as schemas, while running e.g. opa test . naturally will include all documents in the directory, includsing the schemas subdirectory but treat them as data, not as schemas. Does that make any sense?

@renatosc
Copy link
Contributor

Yes, which goes to my point that maybe we could have a bug on opa test. Want me to take a look on that or do you think it may be too complex for opa first-time contributor to take care of?

@anderseknert
Copy link
Member Author

Not sure what the bug would be though. If you run opa test . you're explicitly asking it to use all files in the directory, including in any subdirectories. In that context a JSON schema is just any other JSON file.

Rather than running opa test ., I'd suggest keeping your policy in a separate directory, e.g.

.
├── data
│   └── data1.json
├── policies
│   ├── policy1.rego
│   └── policy2.rego
└── schemas
    ├── schema1.json
    └── schema2.json

And then you'll be able to run opa test policy/ to run your tests, and opa eval -d data/ -d policies/ --schema schemas to run evaluation.

If you'd like to contribute, perhaps adding the --schema flag to opa test would be a good candidate? I can't find an issue for it, but I know it's been requested in the past.

@renatosc
Copy link
Contributor

I think adding a --schema flag to opa test makes sense and keeps it consistent with opa eval.

Right now I was able to make it work by calling opa test --ignore schemas, but having the --schema param and making opa test also validate the schema I think would be a nice addition (for instance, right now I call opa test for unit test and then opa eval for schema validation. If opa test did both I do not need to call opa eval anymore)

@anderseknert
Copy link
Member Author

Agreed, that'd be a nice addition. @johanfylling do you remember where this was discussed in the past?
I doubt that anyone objects to this, but if you do, now would be a good time to say so :)

@johanfylling
Copy link
Contributor

@anderseknert , this is probably the issue you're thinking of: #5430

@anderseknert
Copy link
Member Author

Thanks @johanfylling! That was it, yes. @renatosc if you'd like to work on that, I'd be happy to create an issue.

@renatosc
Copy link
Contributor

@anderseknert Yes, I can work on that.

@anderseknert
Copy link
Member Author

There you go :) #5923
If you comment something in that issue, I'll be able to assign it to you.

@stale
Copy link

stale bot commented Jun 16, 2023

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

@stale stale bot added the inactive label Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants