Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds std.parseYaml #888

Closed
wants to merge 11 commits into from
Closed

Adds std.parseYaml #888

wants to merge 11 commits into from

Conversation

groodt
Copy link
Contributor

@groodt groodt commented Jan 26, 2021

Adds std.parseYaml to address the YAML aspect of: #460

go-jsonnet implemented here: google/go-jsonnet#339

@google-cla google-cla bot added the cla: yes label Jan 26, 2021
@sbarzowski
Copy link
Collaborator

Great! I need to read it carefully. It adds so many files, Github won't even display the table of contents....

YAML Stream (multiple docs with ---) is not yet implemented. Not sure if it's necessary for intial support, but will be popular for k8s users.

Does the Go version support it? We would like to avoid major differences if possible. (We can never guarantee full parity, but we should be compatible when it comes to big and obvious stuff).

Thoughts for a path forwards? I know many people are requesting YAML support in jsonnet.

Do we have tests for parseYaml? It's especially important for checking compatibility between implementations.

This is Bazel only at the moment. I might need some help to add CMake and Makefile options if necesary, but shouldn't be too bad.

We'll need to add that, but we can wait until we decide which yaml implementation we will use. It shouldn't be a big deal.

@groodt
Copy link
Contributor Author

groodt commented Jan 26, 2021

Great! I need to read it carefully. It adds so many files, Github won't even display the table of contents....

Yes. Apologies. I wasn't sure which approach I should take when I vendored the third_party code, so I've kept 99% of it, even though not all of it is required. If we only need to support Bazel, I could probably get it to work using a repository rule like http_archive or git_repository, but because we need to vendor the source to build it with make and cmake, I don't know which approach we should take. I've taken the simple approach of "copy-paste the code verbatim", but maybe submodules are better? They can be fragile in my experience and I don't know how they would work recursively. I can also strip back all of the copy pasted code only to the code we use (remove tests other files etc), but then I'm worried about tracking any upstream changes in future (maybe not a major risk) and perhaps licensing complexity?

The main changes I would suggest to orient yourself around the PR:
https://github.com/groodt/jsonnet/blob/03c86d2135ebcc62d2451a58856f83858016f3ef/third_party/rapidyaml/README.md
https://github.com/groodt/jsonnet/blob/03c86d2135ebcc62d2451a58856f83858016f3ef/third_party/yaml-cpp/README.md
https://github.com/groodt/jsonnet/blob/03c86d2135ebcc62d2451a58856f83858016f3ef/core/vm.cpp#L1599
https://github.com/groodt/jsonnet/blob/03c86d2135ebcc62d2451a58856f83858016f3ef/core/vm.cpp#L1620
https://github.com/groodt/jsonnet/blob/03c86d2135ebcc62d2451a58856f83858016f3ef/test_suite/stdlib.jsonnet#L1039
https://github.com/groodt/jsonnet/blob/03c86d2135ebcc62d2451a58856f83858016f3ef/test_suite/stdlib.jsonnet#L1042

Does the Go version support it?

Yes, it does. Originally the go-jsonnet implementation I proposed pulled in k8s.io/apimachinery/pkg/util/yaml to handle the yaml parsing. We decided that this was too heavy so ported over the few functions that were being used in k8s.io/apimachinery/pkg/util/yaml and the implementation is now based on kubernetes-sigs/yaml which is a permanent fork of ghodss/yaml.

Perhaps a demonstration will make things clearer.

❯ cat parse_yaml_single.jsonnet
std.parseYaml(
|||
 a:
   b:
   - c
   - 42
   - 1.0
|||
)

❯ cat parse_yaml_multi.jsonnet
std.parseYaml(
|||
 ---
 a:
   b:
   - c
   - 42
   - 1.0
 ---
 x: "document 2"
|||
)

❯ go-jsonnet parse_yaml_single.jsonnet
[
   {
      "a": {
         "b": [
            "c",
            42,
            1
         ]
      }
   }
]

❯ go-jsonnet parse_yaml_multi.jsonnet
[
   {
      "a": {
         "b": [
            "c",
            42,
            1
         ]
      }
   },
   {
      "x": "document 2"
   }
]

We would like to avoid major differences if possible.

I agree. We definitely should make it clear that yaml support is best effort, but for things such as single vs multiple documents, there should be parity.

Perhaps there needs to be a different API for single vs multiple yaml document parsing?

Not sure what your appetite is for this, but if we do implement something on the CPP side, we could perhaps use the same CPP implementation on the go-jsonnet side if we create a small cgo wrapper which seems like it should work in Bazel? That might be making life too difficult for ourselves if the go-jsonnet version is becoming the canonical implementation of jsonnet over time. Maybe this more complex approach can be considered only if there are significant issues raised where there are large inconsistencies between std.parseYaml between the Go and CPP versions of jsonnet.

Do we have tests for parseYaml? It's especially important for checking compatibility between implementations.

Some very basic tests to kick the tyres:

https://github.com/google/jsonnet/pull/888/files#diff-147cbf28163b0d9e2e652b47b10f41806fba3d02041ae9f5672784a1617798b4R1038

Similar or more can and should be added to the go-jsonnet implementation of parseYaml. If "smoke" testing on the jsonnet stdlib side is enough, then it seems fairly straightforward to add some tests. There only seem to be a small number of smoke tests for parseJson, so doing the same for parseYaml will hopefully be acceptable.

We'll need to add that, but we can wait until we decide which yaml implementation we will use. It shouldn't be a big deal.

Agree. Thanks!

@groodt
Copy link
Contributor Author

groodt commented Jan 26, 2021

Similar demo output for the CPP implementation.

❯ cat parse_yaml_single.jsonnet
std.parseYaml(
|||
 a:
   b:
   - c
   - 42
   - 1.0
|||
)

❯ cat parse_rapid_yaml_single.jsonnet
std.parseRapidYaml(
|||
 a:
   b:
   - c
   - 42
   - 1.0
|||
)

❯ bazel-bin/cmd/jsonnet parse_yaml_single.jsonnet
{
   "a": {
      "b": [
         "c",
         "42",
         "1.0"
      ]
   }
}

❯ bazel-bin/cmd/jsonnet parse_rapid_yaml_single.jsonnet
{
   "a": {
      "b": [
         "c",
         42,
         1
      ]
   }
}

As mentioned above, no multidoc support at the moment.

@sbarzowski
Copy link
Collaborator

I got an error when compiling without optimization:

In file included from third_party/rapidyaml/rapidyaml/ext/c4core/src/c4/substr.hpp:11,
                 from third_party/rapidyaml/rapidyaml/src/c4/yml/./common.hpp:5,
                 from third_party/rapidyaml/rapidyaml/src/c4/yml/preprocess.hpp:17,
                 from third_party/rapidyaml/rapidyaml/src/c4/yml/preprocess.cpp:1:
third_party/rapidyaml/rapidyaml/ext/c4core/src/c4/error.hpp:63:13: fatal error: c4/ext/debugbreak/debugbreak.h: No such file or directory
   63 | #   include <c4/ext/debugbreak/debugbreak.h>
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Probably it needs to be added as an explicit dependency if we go with Rapidyaml.

If we only need to support Bazel, I could probably get it to work using a repository rule like http_archive or git_repository, but because we need to vendor the source to build it with make and cmake, I don't know which approach we should take. I've taken the simple approach of "copy-paste the code verbatim", [...]

Yes. I think this is the best approach.

but maybe submodules are better? They can be fragile in my experience and I don't know how they would work recursively.

In this case probably not. It tends to be a bit of a pain with Bazel (with go-jsonnet we have a submodule setup and a separate download-tar-over-http setup for sharing cpp-jsonnet test suite etc.). This is supposed to be pretty stable, so updating it manually every now and then should be enough.

I can also strip back all of the copy pasted code only to the code we use (remove tests other files etc), but then I'm worried about tracking any upstream changes in future (maybe not a major risk) and perhaps licensing complexity?

Not worth it. Mostly, because it makes updating the vendored version harder.

Perhaps there needs to be a different API for single vs multiple yaml document parsing?

Hmmm... Python has safe_load and safe_load_all. So there is a precedent for that. I'm not sure here, but at the moment I don't see what could go wrong with having just the version which supports multiple documents.

Not sure what your appetite is for this, but if we do implement something on the CPP side, we could perhaps use the same CPP implementation on the go-jsonnet side if we create a small cgo wrapper which seems like it should work in Bazel?

Sounds complicated. Also, it should be possible for third parties to achieve the same level of parity as the official implementations. We don't want to depend too much on specific library being used. With YAML we'll have to accept differences in some "dark corners". As long as we can get the same result in obvious cases, I'll be happy.

Similar or more can and should be added to the go-jsonnet implementation of parseYaml.

We run go-jsonnet against cpp-jsonnet tests, so it's enough to add some here. If we're going to test any error cases, we may need to override their results for go-jsonnet (we have a mechanism for that).

There only seem to be a small number of smoke tests for parseJson, so doing the same for parseYaml will hopefully be acceptable.

Yeah. The point is not testing the underlying YAML implementation for subtle differences, but just verifying that the thing actually works and doesn't have any interface quirks (e.g. that it handles various top-level types correctly).

@groodt
Copy link
Contributor Author

groodt commented Feb 7, 2021

Thanks for the review!

Probably it needs to be added as an explicit dependency if we go with Rapidyaml.

I can add this dependency as well 👍

Yes. I think this is the best approach.

Ok, if I go this route, what is the plan for CMake and Make? Work that out after we get the Bazel version implemented?

Hmmm... Python has safe_load and safe_load_all. So there is a precedent for that. I'm not sure here, but at the moment I don't see what could go wrong with having just the version which supports multiple documents.

Ok, I think we can have more discussions on details later. As long as they're consistently applied in both CPP and Go it should be fine.

Ok, it all sounds fine to me. I may not have much time soon to work further on this, but I think my next steps will be:

  • Remove vendored code
  • Add rapidyaml as a Bazel dependency

Then we can discuss any further outstanding questions.

@sbarzowski
Copy link
Collaborator

Ok, it all sounds fine to me. I may not have much time soon to work further on this,

This is perfectly fine, there's no rush. In case you need to drop this completely, please let me know, so I can pick this up.

Yes. I think this is the best approach.

Ok, if I go this route, what is the plan for CMake and Make? Work that out after we get the Bazel version implemented?

I was referring to copy-paste approach in the last sentence. Sorry I didn't make it more explicit. So, we'll leave the vendored version and we'll have no problem in any build system. There's also less work, since it's basically what you already did.

Ok, if I go this route, what is the plan for CMake and Make? Work that out after we get the Bazel version implemented?

We can work out the details after we have Bazel working. Since everything is vendored it should be quite easy.

@groodt
Copy link
Contributor Author

groodt commented Feb 8, 2021

I was referring to copy-paste approach in the last sentence. Sorry I didn't make it more explicit. So, we'll leave the vendored version and we'll have no problem in any build system. There's also less work, since it's basically what you already did.

Ah, ok. Got it. So then my plan will change to:

  • Remove yaml-cpp
  • Add debugbreak to current rapidyaml implementation
  • Tidy up implementation and add multi-doc support to be consistent with go-jsonnet

@sbarzowski
Copy link
Collaborator

Sounds good!

@groodt
Copy link
Contributor Author

groodt commented Feb 20, 2021

TODO

  • Tidy up implementation and add multi-doc support to be consistent with go-jsonnet

@groodt
Copy link
Contributor Author

groodt commented Feb 20, 2021

If the YAML document is a single document with a top-level object, a jsonnet object is returned.

If the document is a YAML stream, with multiple top-level objects, a jsonnet array is returned.

This behaviour seems correct to me. It does currently differ from my go-jsonnet PR, so that should be updated to match this behaviour.

See below for details:

❯ cat parse_yaml_single.jsonnet
std.parseYaml(
|||
 a:
   b:
   - c
   - 42
   - 1.0
|||
)

~/repos/jsonnet groodt-jsonnet-yaml
❯ cat parse_yaml_multi.jsonnet
std.parseYaml(
|||
 ---
 a:
   b:
   - c
   - 42
   - 1.0
 ---
  x:
    b:
    - c
    - 42
    - 1.0
|||
)

~/repos/jsonnet groodt-jsonnet-yaml
❯ bazel-bin/cmd/jsonnet parse_yaml_single.jsonnet
{
   "a": {
      "b": [
         "c",
         42,
         1
      ]
   }
}

~/repos/jsonnet groodt-jsonnet-yaml
❯ bazel-bin/cmd/jsonnet parse_yaml_multi.jsonnet
[
   {
      "a": {
         "b": [
            "c",
            42,
            1
         ]
      }
   },
   {
      "x": {
         "b": [
            "c",
            42,
            1
         ]
      }
   }
]

@groodt
Copy link
Contributor Author

groodt commented Feb 20, 2021

PTAL @sbarzowski

@groodt groodt changed the title [WIP] Adds std.parseYaml Adds std.parseYaml Feb 21, 2021
@sbarzowski
Copy link
Collaborator

Looks good. Sorry for the delay. Now we just need to make sure it works with all build systems.

@groodt
Copy link
Contributor Author

groodt commented Feb 28, 2021

Looks good. Sorry for the delay. Now we just need to make sure it works with all build systems.

Thanks!

I'd appreciate any help here. My make and cmake is a bit rusty, so I would be slower than somebody who is familiar with these tools or with the specifics of this repo.

Some news, I've tested parseYaml in the jsonnet repo and the go-jsonnet repo against a repo we have at work where I've been maintaining a fork of bitnami/kubecfg and this should be a drop-in replacement! 🍾

I tested against ~3k lines of YAML in-the-wild from vendors such as AWS, kubernetes cluster autoscaler, datadog, weaveworks and bitnami and it was able to parse them successfully, so that gives me some comfort that it will work against the popular YAML formats out there.

@sbarzowski
Copy link
Collaborator

I'd appreciate any help here. My make and cmake is a bit rusty, so I would be slower than somebody who is familiar with these tools or with the specifics of this repo.

Sure. I don't know how you want to go about it. Do you want to give it a try (I can see you added the CMake thing)? I can help if you get stuck. I can also just pick it up from here. Please let me know how I can help.

I tested against ~3k lines of YAML in-the-wild from vendors such as AWS, kubernetes cluster autoscaler, datadog, weaveworks and bitnami and it was able to parse them successfully, so that gives me some comfort that it will work against the popular YAML formats out there.

That's great!

@groodt
Copy link
Contributor Author

groodt commented Mar 1, 2021

Sure. I don't know how you want to go about it. Do you want to give it a try (I can see you added the CMake thing)? I can help if you get stuck. I can also just pick it up from here. Please let me know how I can help.

I would definitely appreciate it if you gave it a try. CMake is a big mystery to me. I tried it, but I'm not sure that it works correctly.

This is failing:

cmake . -Bbuild

However, this appears to compile everything and runs the tests:

cmake --build build --target run_tests

🤷

@sbarzowski
Copy link
Collaborator

Ok, so I'll handle the build systems. With some luck, I'll be able to do it this weekend.

@sbarzowski
Copy link
Collaborator

Update: I started working on this form the Makefile side. I doesn't work so far and it's not officially supported (only CMake is), but it doesn't seem to difficult. I'll update here when I have something concrete.

@groodt
Copy link
Contributor Author

groodt commented Mar 27, 2021

Any news on this @sbarzowski ?

@sbarzowski
Copy link
Collaborator

sbarzowski commented Mar 28, 2021

I got it to work for the most part: #899. Ran into some stupid issues unrelated mostly unrelated to the core change (about available compiler version in Travis).

The Makefile part is a bit hacky, but should work fine. CMake required removing a small part of upstream CMakeLists.txt to make it work (167113c#diff-399a771633b7813f785ae66df1d986e82aec414cab5a33cf22e99df04a5354abL68).

@groodt
Copy link
Contributor Author

groodt commented Mar 29, 2021

Ok! Great! Any next steps?

@sbarzowski
Copy link
Collaborator

I just need to do more trial and error to get it to pass.

I'm wondering how to handle submitting this. There are two ways. Either you'll incorporate my changes in your PR or I need your consent in #899. Either way I'll squash all commits into one with you as the author.

Also, sorry it's taking me so long. I've been a bit overwhelmed by other obligations.

@groodt
Copy link
Contributor Author

groodt commented Apr 12, 2021

Also, sorry it's taking me so long. I've been a bit overwhelmed by other obligations.

Totally understand. Finding time for open source is difficult. It took me ages to find some moments to work on this as well. Just let me know if there is any way I can help.

In terms of submitting all this once we've got the necessary things working, I don't mind how we do it. I can include your changes into the original PR, or you can do a new one with #899.

@groodt
Copy link
Contributor Author

groodt commented Apr 26, 2021

@sbarzowski Is there anything I can do to assist?

@sbarzowski
Copy link
Collaborator

It was submitted in #899 (it's basically this PR + a few build system fixes).

Thank you very much for implementing this much needed feature and for your patience!

@sbarzowski sbarzowski closed this May 20, 2021
@groodt
Copy link
Contributor Author

groodt commented May 21, 2021

Thank you very much for implementing this much needed feature and for your patience!

It really has been a journey! 😄 Thanks for your help and support.

@groodt groodt deleted the groodt-jsonnet-yaml branch May 21, 2021 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants