Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting manubot manuscript to Rmd #381

Open
taylorreiter opened this issue Oct 30, 2020 · 6 comments
Open

Exporting manubot manuscript to Rmd #381

taylorreiter opened this issue Oct 30, 2020 · 6 comments

Comments

@taylorreiter
Copy link

Hello! I recently used manubot to draft a collaborative document, it was a wonderful experience -- thank you for generating such a great tool! I now find need to export the manuscript to Rmarkdown. Using the output manuscript.md, I find with very few changes that everything knits appropriately and generates a rendered pdf of the document. However, I could not get citations to render properly. When I knit using bibliography: references.json, I get output like:

pandoc-citeproc: reference doi:10.1038/s41587-020-0439-x not found
pandoc-citeproc: reference doi:10.1371/journal.pcbi.1005755 not found

It seems like there is enough information output by manubot between markdown.md, references.json, and citations.tsv that citations/references in Rmarkdown might work relatively easily, but I couldn't figure out how to make this work. My current plan is to replace all of the ~125 citations by hand with bibtex references and generate a new bibliography, but I would love to avoid this if at all possible!

@taylorreiter
Copy link
Author

Adding more information to include examples!

The RMarkdown might look something like this:

---
bibliography: references.json
title: My document
output: pdf_document
---

## Introduction

Historically, this has led to a thing that we all observed[@doi:10.1038/s41587-020-0439-x].

The references.json looks like this:

{
    "type": "article-journal",
    "id": "wq4G2CfQ",
    "author": [
      {
        "family": "Ewels",
        "given": "Philip A."
      },
      {
        "family": "Peltzer",
        "given": "Alexander"
      },
      {
        "family": "Fillinger",
        "given": "Sven"
      },
      {
        "family": "Patel",
        "given": "Harshil"
      },
      {
        "family": "Alneberg",
        "given": "Johannes"
      },
      {
        "family": "Wilm",
        "given": "Andreas"
      },
      {
        "family": "Garcia",
        "given": "Maxime Ulysse"
      },
      {
        "family": "Di Tommaso",
        "given": "Paolo"
      },
      {
        "family": "Nahnsen",
        "given": "Sven"
      }
    ],
    "issued": {
      "date-parts": [
        [
          2020,
          2,
          13
        ]
      ]
    },
    "container-title": "Nature Biotechnology",
    "DOI": "10.1038/s41587-020-0439-x",
    "volume": "38",
    "issue": "3",
    "page": "276-278",
    "publisher": "Springer Science and Business Media LLC",
    "title": "The nf-core framework for community-curated bioinformatics pipelines",
    "URL": "https://doi.org/ggk3qh",
    "PMID": "32055031",
    "note": "This CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: doi:10.1038/s41587-020-0439-x"
  }

And the citations.tsv looks like this:

input_id dealiased_id standard_id short_id
doi:10.1038/s41587-020-0439-x doi:10.1038/s41587-020-0439-x doi:10.1038/s41587-020-0439-x wq4G2CfQ

@taylorreiter
Copy link
Author

Andddd in posting this update, I realized that replacing

---
bibliography: references.json
title: My document
output: pdf_document
---

## Introduction

Historically, this has led to a thing that we all observed[@doi:10.1038/s41587-020-0439-x].

with

---
bibliography: references.json
title: My document
output: pdf_document
---

## Introduction

Historically, this has led to a thing that we all observed[@wq4G2CfQ].

Allows Rmd/pandoc-citeproc to see the citation appropriately, so I just need to programmatically replace the input_id with the short_id throughout the file!

@dhimmel

This comment has been minimized.

@dhimmel
Copy link
Member

dhimmel commented Oct 30, 2020

My above comment is wrong and applied to an old version of Manubot.

Now that pandoc-manubot-cite is its own pandoc filter I see two options.

Calling the pandoc-manubot-cite filter from Rmarkdown

As per the docs at https://rmarkdown.rstudio.com/docs/articles/lua-filters.html, you might be able to add something like the following in your Rmarkdown document

---
output:
  html_document:
    pandoc_args:
    - --filter=pandoc-manubot-cite
    - --filter=pandoc-citeproc
---

The pandoc options used by Manubot are specified at https://github.com/manubot/rootstock/blob/8b9b5ced2c7c963bf3ea5afb8f31f9a4a54ab697/build/pandoc/defaults/common.yaml

Running pandoc to export to markdown

Here is the command Manubot runs to export to HTML.

I think what you want is to export to markdown, so possibly:

pandoc --verbose \
  --data-dir="$PANDOC_DATA_DIR" \
  --defaults=common.yaml \
  --to=markdown \
  --output=output/manuscript-post-filters.md

Haven't tested this, but the goal is to run the pandoc-manubot-cite filter to process the citations but to write to markdown and not HTML.

I think this option might be better than 1. Haven't tested either, but happy to help debug any issues.

Option 2 should also run the other pandoc filters to number figures, tables, and equations.

@dhimmel
Copy link
Member

dhimmel commented Oct 30, 2020

One thing we might consider is adding an opt-in BUILD_MD option to rootstock, so you could enable this environment variable and get a more portable markdown output. One question would be which markdown to export to: markdown (pandocs markdown), commonmark, or commonmark_x. Perhaps this could be an option.

Update: I opened PR #382 that demonstrates running pandoc to export to markdown. I think this should get you what you need (running the filters for citations and figure/table/equation numbering)

@dhimmel dhimmel transferred this issue from manubot/manubot Oct 30, 2020
dhimmel added a commit to dhimmel/manubot-rootstock that referenced this issue Oct 30, 2020
@dhimmel
Copy link
Member

dhimmel commented Nov 1, 2020

Okay, I think the following code in #382 will create a markdown file you can use with RMarkdown:

rootstock/build/build.sh

Lines 38 to 43 in 6645e8b

echo >&2 "Exporting Pandoc Markdown manuscript after performing filters"
pandoc --verbose \
--data-dir="$PANDOC_DATA_DIR" \
--defaults=common.yaml \
--to=markdown \
--output=output/manuscript-post-filters.md

Let us know how that works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants