Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use tinkr to retrieve markdown images insertions in order to convert them into .Rmd code chunks containing a call to knitr::include_graphics() ? #37

Closed
pokyah opened this issue Apr 22, 2021 · 15 comments

Comments

@pokyah
Copy link

pokyah commented Apr 22, 2021

Hello,

My goal is to convert a markdown document to a Rmarkdown (.Rmd) document where the pictures are inserted using knitr::include_graphics() into a code chunk so that I can knit it back to markdown format where the figures will be numerated. I have easily converted the extension to .Rmd and added the proper yaml header containing the rendering options. Now I'm stuck with the images numeration.

I was thinking of using regex to find and replace all the images insertions but someone suggested me to use tinkr package.

I guess that by parsing the document, I can retrieve all the expressions corresponding to an image insertion and then replace these by a character containing the code chunk.

Here is a reprex :

=====
dummy.md markdown document content :

# This is a dummy markdown document to be parsed

This document contains text and images.

The following picture is the first of the document

![logoR][logoR.jpg]

And here is the second picture of the document : 

![logoTidyverse][logoTidyverse.png]

====

How can I parse this document in order to find all the images insertions :

![logoR][logoR.jpg]
![logoTidyverse][logoTidyverse.png]

and replace these by .Rmd code chunks as follow :

```{r logoR, echo=FALSE, fig.cap="logoR", out.width = '100%'}
knitr::include_graphics("logoR.jpg)
```

```{r logoR, echo=FALSE, fig.cap="logoTidyverse", out.width = '100%'}
knitr::include_graphics("logoTidyverse.png")
```

?

Maybe that tinkr is not the best solution to achieve this. Sorry if this is the case.

Thanks for your support

@maelle
Copy link
Member

maelle commented Apr 22, 2021

What I have for now.
Problems

  • Not easy to choose where to add the new md.
  • The original image doesn't disappear.
    I'll try a bit more.
library("tinkr")
library("xml2")
path <- tempfile()
brio::write_lines(
  c("![logoR](logoR.jpg)", "![logoTidyverse](logoTidyverse.png)"),
  path
)
ex1 <- tinkr::yarn$new(path)
# find all images
images <- xml_find_all(
  x = ex1$body, 
  xpath = ".//md:image", 
  ns = ex1$ns
)

handle_image <- function(image, ex1) {
  destination <- xml2::xml_attr(image, "destination")
  text <- xml2::xml_text(image)
  new_text <- glue::glue(
    '```{r @text$, echo=FALSE, fig.cap="@text$", out.width = "100%"}
 knitr::include_graphics("@destination$")
```\n',
    .open = "@",
    .close = "$"
  )
  # Add new Markdown text
  ex1$add_md(new_text, where = 1L)
  # Remove original image node
  xml2::xml_remove(image)
}
purrr::walk(images, handle_image, ex1 = ex1)
ex1$write(path)
brio::read_lines(path)
#>  [1] "![logoR](logoR.jpg)"                                                              
#>  [2] "![logoTidyverse](logoTidyverse.png)"                                              
#>  [3] ""                                                                                 
#>  [4] "```{r logoTidyverse, echo=FALSE, fig.cap=\"logoTidyverse\", out.width = \"100%\"}"
#>  [5] " knitr::include_graphics(\"logoTidyverse.png\")"                                  
#>  [6] "```"                                                                              
#>  [7] ""                                                                                 
#>  [8] "```{r logoR, echo=FALSE, fig.cap=\"logoR\", out.width = \"100%\"}"                
#>  [9] " knitr::include_graphics(\"logoR.jpg\")"                                          
#> [10] "```"                                                                              
#> [11] ""                                                                                 
#> [12] ""

Created on 2021-04-22 by the reprex package (v1.0.0.9001)

@maelle
Copy link
Member

maelle commented Apr 22, 2021

Ah now the image suppression works

library("tinkr")
library("xml2")
path <- tempfile()
brio::write_lines(
  c("![logoR](logoR.jpg)", "![logoTidyverse](logoTidyverse.png)"),
  path
)
ex1 <- tinkr::yarn$new(path)
# find all images
images <- xml_find_all(
  x = ex1$body, 
  xpath = ".//md:image", 
  ns = ex1$ns
)

images_copy <- images
xml2::xml_remove(images)

handle_image <- function(image, ex1) {
  destination <- xml2::xml_attr(image, "destination")
  text <- xml2::xml_text(image)
  new_text <- glue::glue(
    '```{r @text$, echo=FALSE, fig.cap="@text$", out.width = "100%"}
 knitr::include_graphics("@destination$")
```\n',
    .open = "@",
    .close = "$"
  )
  
  # Add new Markdown text
  ex1$add_md(new_text, where = 1L)
}
purrr::walk(images, handle_image, ex1 = ex1)
ex1$write(path)
brio::read_lines(path)
#>  [1] ""                                                                                 
#>  [2] ""                                                                                 
#>  [3] ""                                                                                 
#>  [4] "```{r logoTidyverse, echo=FALSE, fig.cap=\"logoTidyverse\", out.width = \"100%\"}"
#>  [5] " knitr::include_graphics(\"logoTidyverse.png\")"                                  
#>  [6] "```"                                                                              
#>  [7] ""                                                                                 
#>  [8] "```{r logoR, echo=FALSE, fig.cap=\"logoR\", out.width = \"100%\"}"                
#>  [9] " knitr::include_graphics(\"logoR.jpg\")"                                          
#> [10] "```"                                                                              
#> [11] ""                                                                                 
#> [12] ""

Created on 2021-04-22 by the reprex package (v1.0.0.9001)

@maelle
Copy link
Member

maelle commented Apr 22, 2021

Now for adding the chunks in the right position I need to find the original position of images. 🤔

@maelle
Copy link
Member

maelle commented Apr 22, 2021

I wonder whether tinkr is lacking a feature @zkamvar: maybe something like replace_md where you'd give new text like for add_md, but also a node to replace. 🤔

@pokyah
Copy link
Author

pokyah commented Apr 22, 2021

Now for adding the chunks in the right position I need to find the original position of images. 🤔

No way to retrieve the position via the "character number" corresponding to the retrived xpath node ?

@maelle
Copy link
Member

maelle commented Apr 22, 2021

"character number" corresponding to the retrived xpath node

what do you mean?

@maelle
Copy link
Member

maelle commented Apr 22, 2021

This below seems to work but it uses the unexported tinkr:::clean_content() function

library("tinkr")
library("xml2")
path <- tempfile()
brio::write_lines(
  c("![logoR](logoR.jpg)", "something else", "![logoTidyverse](logoTidyverse.png)"),
  path
)
ex1 <- tinkr::yarn$new(path)
# find all images
images <- xml_find_all(
  x = ex1$body, 
  xpath = ".//md:image", 
  ns = ex1$ns
)

handle_image <- function(image, ex1) {
  destination <- xml2::xml_attr(image, "destination")
  text <- xml2::xml_text(image)
  new_text <- glue::glue(
    '\n```{r @text$, echo=FALSE, fig.cap="@text$", out.width = "100%"}
 knitr::include_graphics("@destination$")
```\n',
    .open = "@",
    .close = "$"
  )
  new <- tinkr:::clean_content(new_text)
  new <- commonmark::markdown_xml(new, extensions = TRUE)
  new <- xml2::xml_ns_strip(xml2::read_xml(new))
  xml2::xml_replace(image, new)
}
purrr::walk(images, handle_image, ex1 = ex1)
ex1$write(path)
brio::read_lines(path)
#>  [1] "```{r logoR, echo=FALSE, fig.cap=\"logoR\", out.width = \"100%\"}"                
#>  [2] " knitr::include_graphics(\"logoR.jpg\")"                                          
#>  [3] "```"                                                                              
#>  [4] ""                                                                                 
#>  [5] "something else"                                                                   
#>  [6] "```{r logoTidyverse, echo=FALSE, fig.cap=\"logoTidyverse\", out.width = \"100%\"}"
#>  [7] " knitr::include_graphics(\"logoTidyverse.png\")"                                  
#>  [8] "```"                                                                              
#>  [9] ""                                                                                 
#> [10] ""                                                                                 
#> [11] ""

Created on 2021-04-22 by the reprex package (v1.0.0.9001)

@pokyah
Copy link
Author

pokyah commented Apr 22, 2021

"character number" corresponding to the retrived xpath node

what do you mean?

By the poition I mean this :

position_exclamation= stringr::str_locate_all(string = "a special character ! in a string", pattern = "!")
> position_exclamation
[[1]]
     start end
[1,]    21  21

@maelle
Copy link
Member

maelle commented Apr 22, 2021

ah no the position for xml2 is something different I think :-)

@zkamvar
Copy link
Member

zkamvar commented Apr 22, 2021

Now for adding the chunks in the right position I need to find the original position of images. thinking

No way to retrieve the position via the "character number" corresponding to the retrived xpath node ?

This is what the sourcepos argument is for: https://docs.ropensci.org/tinkr/reference/to_xml.html

I use this to get the line positions of elements in {pegboard}:

# Get the position of an element
get_pos <- function(x, e = 1) {
  as.integer(
    gsub(
      "^(\\d+?):(\\d+?)[-](\\d+?):(\\d)+?$",
      glue::glue("\\{e}"),
      xml2::xml_attr(x, "sourcepos")
    )
  )
}

# helpers for get_pos
get_linestart <- function(x) get_pos(x, e = 1)
get_colstart  <- function(x) get_pos(x, e = 2)
get_lineend   <- function(x) get_pos(x, e = 3)
get_colend    <- function(x) get_pos(x, e = 4)

@maelle
Copy link
Member

maelle commented Apr 23, 2021

ooooh that's how it works, thank you @zkamvar!

Now I wonder whether we still need to use replacement instead of adding. 🤔

@zkamvar
Copy link
Member

zkamvar commented Apr 23, 2021

Now I wonder whether we still need to use replacement instead of adding. thinking

I think replacement might still be a worthwhile option.

@maelle
Copy link
Member

maelle commented May 17, 2021

@pokyah do you have remaining problems?

@maelle maelle closed this as completed May 27, 2021
@pokyah
Copy link
Author

pokyah commented Aug 17, 2021

Hi @maelle and @zkamvar .

I'm back to this problem.

How could we develop the replace method for the md images as suggested in #41?

Thanks for the support and the investigations :)

@maelle
Copy link
Member

maelle commented Aug 23, 2021

👋 @pokyah, could you please comment in that issue as it's still open, with some more context on what you're trying to achieve? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants