Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use owl-diff library for diff operation. Add markdown and html diff output formats. #461

Merged
merged 12 commits into from
Jun 11, 2019

Conversation

balhoff
Copy link
Contributor

@balhoff balhoff commented Apr 10, 2019

This addresses some of the issues in #432, especially "separate diff logic from diff rendering". It also includes a Markdown diff rendering, although currently different levels of granularity are not configurable.

Diff in this PR is implemented by calling out to functionality in my owl-diff library, which is also used in the owl-diff server. This diff is faster than the old version (I think the main slowdown in the old one was rendering of all axioms to string before performing comparisons). Also, label rendering in the simple output is done with a custom ShortFormProvider rather than regex operations.

This version passes the existing tests, although I made one small change to one test data file to correct what I believe to be an inconsistency in CURIE rendering in the previous code.

@cmungall
Copy link
Contributor

Perfect timing as we work on making the "end parts" of the release pipeline (diff for release notes and publishing large files to github) easier here: INCATools/ontology-development-kit#205

@balhoff
Copy link
Contributor Author

balhoff commented Apr 10, 2019

@cmungall please take a look at the Markdown output and let me know if I can make any improvements.

@jamesaoverton
Copy link
Member

Thanks @balhoff.

Are we using other Scala stuff in ROBOT already? Does the dependency on scala-lang make a big difference to the size of robot.jar?

@balhoff
Copy link
Contributor Author

balhoff commented Apr 11, 2019

@jamesaoverton I think this is the first Scala dependency. The change to robot.jar from this PR is about 5 MB, and I would guess most of that is the Scala lib, since owl-diff is tiny and doesn't add any other new dependencies.

@cmungall
Copy link
Contributor

cmungall commented May 1, 2019

As well as getting complete diffs, it's useful to get simple lists e.g of new terms and of obsoleted terms. These are particularly useful for including in release notes, e.g https://github.com/EnvironmentOntology/envo/releases

Should there be separate report types?

Having a JSON format could also be useful, particularly for summary stats and lists

Would the logic for this reside in owl-diff, robot, or external? The way James Malone had bubastis work had a nice separation of logic, with the html generated from xslt. Here it could be something like mustache.

But I think this can be done incrementally, I am good with merging this

cc @lpalbou

@jamesaoverton
Copy link
Member

@cmungall I'd prefer to merge this PR before adding more features, unless you want to head in a very different direction.

@rctauber added some comments and I would like @balhoff to address them.

@jamesaoverton
Copy link
Member

@balhoff I would like to merge this, but there are a few small things that still need to be resolved.

@balhoff
Copy link
Contributor Author

balhoff commented Jun 5, 2019

Thanks for the reminder, @jamesaoverton, I have been meaning to finish this up. Will do so soon.

@balhoff
Copy link
Contributor Author

balhoff commented Jun 10, 2019

@jamesaoverton @rctauber this should be ready now.

@jamesaoverton
Copy link
Member

The docs/examples/release-diff.txt file has been changed to use CURIEs instead of full IRIs. This breaks the old behaviour, which we try hard not to do. The CURIEs are nicer but somebody downstream might be depending on the full IRIs.

I have two suggestions and a problem. To keep the old behaviour we could either/both:

  1. introduce a --curies option that defaults to false
  2. add a plain default/legacy format, and a pretty text format with labels and CURIEs

When I tried to do implement (1) by using just labels and not CURIEs, it broke the testCompareModifiedWithLabels unit test. I think it's because the labelProvider returns an empty string when there's no label. I didn't see a way to just use the IRI in this case.

@cmungall
Copy link
Contributor

cmungall commented Jun 11, 2019

I like the idea of having an open-ended list of formats. This could be orthogonal to whether CURIEs, URIs are used; and orthogonal to whether the markdown uses [..](URL) or plain embedding.

I don't think we want to get boxed in by early formatting choices. I think we will learn more as people use this feature and we will want to incorporate new formats or options.

Also if we have a standard JSON people can easily customize the display layer as they like.

[edited original comment as I realized I was repeating myself]

@balhoff
Copy link
Contributor Author

balhoff commented Jun 11, 2019

@jamesaoverton I was trying to address this comment from @rctauber:

With simple format, no labels, I get outputs like:
- SubClassOf(<http://purl.obolibrary.org/obo/ECO_0000000> owl:Thing)

When using labels, I see:
- SubClassOf(<ECO:0000000>[evidence] <owl:Thing>)

Could you use OBOShortenerShortFormProvider iriProvider for the else case of useLabels as well?

We could go back to the previous output (which I think did automatically shorten some IRIs, such as owl:Thing).

For your label issue, we ought to be able to create a label provider with an alternative backup using this constructor.

@jamesaoverton
Copy link
Member

Ok, I added a plain format using the renderPlain function, and a pretty format that uses labels and CURIEs. It passes all the old tests. Is this ok with you @balhoff ?

@balhoff
Copy link
Contributor Author

balhoff commented Jun 11, 2019

This sounds pretty good. I think now --labels is kind of a holdover, since it's just a way to trigger pretty when used in combination with plain. But that is kind of unavoidable when introducing a larger set of formats.

@jamesaoverton
Copy link
Member

Yes, --labels becomes a legacy way to upgrade plain to pretty. It's lame but it should prevent anybody's code from breaking.

@jamesaoverton jamesaoverton merged commit ed86761 into ontodev:master Jun 11, 2019
@jamesaoverton
Copy link
Member

Thanks for the PR. It's nice to have Markdown and HTML formats supported, at long last! I'm happy to include more formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants