Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track changes using pandiff #200

Merged
merged 27 commits into from
Apr 12, 2019
Merged

Track changes using pandiff #200

merged 27 commits into from
Apr 12, 2019

Conversation

dhimmel
Copy link
Contributor

@dhimmel dhimmel commented Apr 9, 2019

Closes #198

Work in progress.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 9, 2019

As of cfe911c, here's what I've been able to create using pandiff: manuscript-diff.pdf (printed to PDF from HTML).

In general, it's pretty good. Some limitations / possible pandiff bugs:

  • pandoc-citeproc and pandoc-xnos filters are never executed. Therefore, the citations are in standard ids. This is good to minimize irrelevant diffing (like changed reference numbers), but makes it hard to look up what reference is being cited:

    image

  • Old/new figures are not shown side-by-side. Currently, just the new figure is shown with no indication of change. It looks like pandiff is not detecting changes to the figure caption (specified as markdown alt text based on Pandoc's implicit_figures):

    image

  • table cells get shifted/munged, likely pandiff bug:

image

So very impressed with pandiff as a potential diffing solution. Still it looks like we have a bit more work to make it production ready. @davidar is pandiff something you're actively maintaining? We are happy to open issues for things like the table cell problem if that would be helpful?

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 9, 2019

Currently, our pandiff usage is similar to GitHub's rich diff compare changes of manuscript.md from the output branch. However, we potentially can also use pandiff on the output manuscript.html documents.

@slochower
Copy link
Collaborator

pandoc-citeproc and pandoc-xnos filters are never executed. Therefore, the citations are in standard ids. This is good to minimize irrelevant diffing (like changed reference numbers), but makes it hard to look up what reference is being cited:

Agree, but are you going to show the diff including the new citations? I think reviewers will want to see that. Wouldn't that require passing the diff through manubot again?

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 9, 2019

In 4863e0a, I created a standalone HTML output based on the diff of manuscript.html. View it here. This includes diff of rendered reference numbers, but for some reason the references list disappears. Still some issues with the tables.

Wouldn't that require passing the diff through manubot again?

It is possible we could create a markdown diff using pandiff and then pass through pandoc to render citations.

@slochower
Copy link
Collaborator

This includes diff of rendered reference numbers, but for some reason the references list disappears.

Right :) Also agree about the tables. Maybe a quick fix upstream.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 9, 2019

It is possible we could create a markdown diff using pandiff and then pass through pandoc to render citations.

The issue is that the markdown diff has the citations escaped like:

science open, while supporting new massively collaborative models of
{~~research.\~\>research \[\@1DiVJ3t6P\].~~} However, the scientific
community requires tools whose workflows encourage
{~~openness.\~\>openness \[\@IWBJQIkl\].~~} Manuscripts are the

@davidar
Copy link

davidar commented Apr 10, 2019

Thanks for the ping, this looks cool. Yes, if you're able to submit issues with a minimal example reproducing the problem, that would be great. I have some suspicions about what might be going wrong here, but I haven't explicitly tested pandiff against these kinds of cases yet.

@agitter
Copy link
Collaborator

agitter commented Apr 12, 2019

In addition to this diff, and possibly a manually generated diff I create in Word, should we also link to the source updates? We could include one of the following in the response letter:

I'm not sure how readable it is because 71 files changed, with more to come.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 12, 2019

The v2 to v3 GitHub diff would be a great addition to response-to-reviewers. It's nice to see all the commits listed like that. Currently GitHub chokes on the rich diff for main-text, but that could be useful if it works in future. I'll add as part of this PR.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 12, 2019

@agitter once we are basing this PR on the the final output, can you create a diff using your Microsoft Word between plos-comp-bio/diff/old/manuscript.docx and plos-comp-bio/diff/new/manuscript.docx?

We will then add that file as plos-comp-bio/diff/diff-from-docx.pdf (or whatever) to match response to reviewers.

@agitter
Copy link
Collaborator

agitter commented Apr 12, 2019

@dhimmel I reviewed https://github.com/greenelab/meta-review/tree/5c6a68a721a9866ea2baa7d82ccb0d0e05986acd and didn't find anything else that needs to be changed. I'm working on the docx diff now. I'll finish that before reviewing this pull request in full.

Do we want to date stamp this diff directory? I don't anticipate needing another round of revisions for this paper, but it would establish good general practices for Manubot journal submission that others can adopt.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 12, 2019

Do we want to date stamp this diff directory? I don't anticipate needing another round of revisions for this paper, but it would establish good general practices for Manubot journal submission that others can adopt.

I changed the directory to diff-v3.0 to match the upcoming release. I think that may actually be better than a date since I think it implies the diff since the previous release.

Sort of related: we should go back to our existing releases and add a couple more notes like, "this release corresponds to the initial PLOS Comp Bio submission"

Diff generated from old/manuscript.docx and new/manuscript.docx with
Microsoft Word's Compare feature.
Show all revisions inline.
@agitter
Copy link
Collaborator

agitter commented Apr 12, 2019

I changed the directory to diff-v3.0 to match the upcoming release.

Good idea

we should go back to our existing releases and add a couple more notes

That makes sense to me now that we're doing it in retrospect. For other manuscripts that I tag, I generally avoid the overhead of updating each release with the journal(s) that it was sent to in case of desk rejections.

I pushed 628f71c with the Word docx diff. It looks pretty good except for all the broken svgs. I'm going to the Word diff another way as well.

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 12, 2019

It looks pretty good except for all the broken svgs.

Also okay to manually edit that diff to delete the broken SVG placeholders.

Create old/manuscript-manual.docx by copying https://greenelab.github.io/meta-review/v/12fd1685f01d992e37f997a1daf6b523e1dc152a/ into Word.
Create new/manuscript-manual.docx by copying https://greenelab.github.io/meta-review/v/879971180dfd0f4cb654b05144fe15e4043d22c1/ into Word.
Diff generated from old/manuscript-manual.docx and new/manuscript-manual.docx with
Microsoft Word's Compare feature.
Show all revisions inline.
@agitter
Copy link
Collaborator

agitter commented Apr 12, 2019

In d5acec0 I manually made docx files by copying the HTML into Word. I primarily tried this to see whether it is still necessary or if pandoc doxc works as well. I'll review both Word diffs side-by-side now.

@agitter
Copy link
Collaborator

agitter commented Apr 12, 2019

Here are the differences I noticed in the Pandoc docx relative to the manual docx:

  • Need to manually delete broken svg images
  • Table diff looks better
  • Manubot features section written entirely as a header
  • Figure 3 is sized correctly

Both worked well. I prefer the Pandoc version so I'll remove the broken images and regenerate the PDF. I'll remove the manual files.

@agitter
Copy link
Collaborator

agitter commented Apr 12, 2019

@dhimmel diff-from-docx.pdf now looks good to me. However, I noticed that it doesn't include the changes from #216. Can you please update the new version of these files and I'll make the diff again?

@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 12, 2019

However, I noticed that it doesn't include the changes

Ah shoot, my bad. I guess I didn't rerun the download commands. Fixed in 50995b5

Diff generated from old/manuscript.docx and new/manuscript.docx with
Microsoft Word's Compare feature.
Show all revisions inline.
@dhimmel
Copy link
Contributor Author

dhimmel commented Apr 12, 2019

Okay @agitter squash merge this whenever you're ready to.

@agitter agitter merged commit 4ef4d9f into greenelab:master Apr 12, 2019
agitter pushed a commit that referenced this pull request Apr 12, 2019
This build is based on
4ef4d9f.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/meta-review/builds/519456958
https://travis-ci.org/greenelab/meta-review/jobs/519456959

[ci skip]

The full commit message that triggered this build is copied below:

Track changes using pandiff (#200)

* Diff in progress

* git mv diff analysis/

* updates

* Enables images

* Fix image widths

* create diff from manuscript.html

* Move diff to plos-comp-bio

* Add script in README to make downloads

* Rearrange pandiff output

* indent

* Update response to reviewers

* Update NEW version and standalone md diff

* Update response to reviewers

* Update diff

* Update response to reviewers

* newline

* git mv diff diff-v3.0

* Add docx diff from Microsoft Word
Diff generated from old/manuscript.docx and new/manuscript.docx with
Microsoft Word's Compare feature.
Show all revisions inline.

* Manually create docx files and diff
Create old/manuscript-manual.docx by copying https://greenelab.github.io/meta-review/v/12fd1685f01d992e37f997a1daf6b523e1dc152a/ into Word.
Create new/manuscript-manual.docx by copying https://greenelab.github.io/meta-review/v/879971180dfd0f4cb654b05144fe15e4043d22c1/ into Word.
Diff generated from old/manuscript-manual.docx and new/manuscript-manual.docx with
Microsoft Word's Compare feature.
Show all revisions inline.

* Revert "Manually create docx files and diff"

This reverts commit d5acec0.

* Remove broken images and regenerate docx diff

* Download latest files from NEW_COMMIT

* Update Word docx diff
Diff generated from old/manuscript.docx and new/manuscript.docx with
Microsoft Word's Compare feature.
Show all revisions inline.

* Minor formatting change
agitter pushed a commit that referenced this pull request Apr 12, 2019
This build is based on
4ef4d9f.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/meta-review/builds/519456958
https://travis-ci.org/greenelab/meta-review/jobs/519456959

[ci skip]

The full commit message that triggered this build is copied below:

Track changes using pandiff (#200)

* Diff in progress

* git mv diff analysis/

* updates

* Enables images

* Fix image widths

* create diff from manuscript.html

* Move diff to plos-comp-bio

* Add script in README to make downloads

* Rearrange pandiff output

* indent

* Update response to reviewers

* Update NEW version and standalone md diff

* Update response to reviewers

* Update diff

* Update response to reviewers

* newline

* git mv diff diff-v3.0

* Add docx diff from Microsoft Word
Diff generated from old/manuscript.docx and new/manuscript.docx with
Microsoft Word's Compare feature.
Show all revisions inline.

* Manually create docx files and diff
Create old/manuscript-manual.docx by copying https://greenelab.github.io/meta-review/v/12fd1685f01d992e37f997a1daf6b523e1dc152a/ into Word.
Create new/manuscript-manual.docx by copying https://greenelab.github.io/meta-review/v/879971180dfd0f4cb654b05144fe15e4043d22c1/ into Word.
Diff generated from old/manuscript-manual.docx and new/manuscript-manual.docx with
Microsoft Word's Compare feature.
Show all revisions inline.

* Revert "Manually create docx files and diff"

This reverts commit d5acec0.

* Remove broken images and regenerate docx diff

* Download latest files from NEW_COMMIT

* Update Word docx diff
Diff generated from old/manuscript.docx and new/manuscript.docx with
Microsoft Word's Compare feature.
Show all revisions inline.

* Minor formatting change
@davidar davidar mentioned this pull request Jul 20, 2019
@davidar
Copy link

davidar commented Jul 20, 2019

@dhimmel I've just released a new version of pandiff which should fix the problems you noticed, if you'd like to try it again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track changes from initial to revised journal submission
4 participants