Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String measurement incorrect? #143

Closed
dmurdoch opened this issue Sep 26, 2023 · 7 comments
Closed

String measurement incorrect? #143

dmurdoch opened this issue Sep 26, 2023 · 7 comments

Comments

@dmurdoch
Copy link

I'm trying to draw text to a bitmap (for an rgl update) and finding that the reported dimensions of the drawn text are sometimes inaccurate. For example:

library(grid)
library(ragg)

agg_png("test.png")
pushViewport(viewport(gp = gpar(cex = 5)))
y <-  c(0.2, 0.4, 0.6)

texts <- c("東京", "Tokyo", "abc")

h <- convertHeight(stringDescent(texts), "npc")
h
#> [1] 0npc                   0.0270833333333333npc  0.00208333333333333npc
as.numeric(h)*480 # Should give pixels
#> [1]  0 13  1

grid.text(texts,
          x = 0, y = y, just = c(0,0))
grid.segments(x0 = 0, x1 = 1, y0 = y, y1 = y) 

popViewport()
dev.off()
#> quartz_off_screen 
#>                 2

Created on 2023-09-26 with reprex v2.0.2

This reports that the 3 strings descend by 0, 13, and 1 pixels respectively, and produces this image on my system:
test

Clearly the kanji string "東京" descends below the baseline. If I zoom in, I see a descent of 6 pixels. Is this a bug?

@dmurdoch
Copy link
Author

Here's my sessionInfo:


R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
[1] ragg_1.2.5 rgl_1.2.6 

loaded via a namespace (and not attached):
 [1] jsonlite_1.8.7    compiler_4.3.1    reprex_2.0.2     
 [4] clipr_0.8.0       callr_3.7.3       systemfonts_1.0.4
 [7] textshaping_0.3.6 yaml_2.3.7        fastmap_1.1.1    
[10] R6_2.5.1          knitr_1.44        htmlwidgets_1.6.2
[13] tibble_3.2.1      R.cache_0.16.0    pillar_1.9.0     
[16] R.utils_2.12.2    rlang_1.1.1       utf8_1.2.3       
[19] Rttf2pt1_1.3.12   xfun_0.40         fs_1.6.3         
[22] cli_3.6.1         withr_2.5.0       magrittr_2.0.3   
[25] ps_1.7.5          processx_3.8.2    digest_0.6.33    
[28] rstudioapi_0.15.0 base64enc_0.1-3   lifecycle_1.0.3  
[31] R.methodsS3_1.8.2 R.oo_1.25.0       vctrs_0.6.3      
[34] extrafont_0.19    evaluate_0.21     glue_1.6.2       
[37] extrafontdb_1.0   styler_1.10.2     fansi_1.0.4      
[40] rmarkdown_2.25    purrr_1.0.2       tools_4.3.1      
[43] pkgconfig_2.0.3   htmltools_0.5.6  

@dmurdoch
Copy link
Author

I think I know what is happening, but I don't know how to fix it.

The kanji characters aren't present in the default font, so the device falls back to the "PingFang SC" family which does have them, and uses a font from that family to draw them. However, it doesn't use metrics based on that family when calculating to overall string metric in grid::stringDescent().

The systemfonts package has a font_fallback() function, which led me to that family. Its docs basically say you shouldn't use it though -- if different characters need different fallbacks, it will still only give one. Presumably ragg devices work on a character by character basis internally when rendering, and the info returned to grid::stringDescent() should also do that.

@dmurdoch
Copy link
Author

Here's a function that does better at calculating the measurements for strings that have font substitution. I think it does a good enough job for my needs, but for ragg it should probably be redone in C/C++. One other thing: the formula I found for descent that is used in extract_ascent_descent doesn't really make sense to me, but it seemed to give the right results. I was unable to find informative definitions of the results returned from systemfonts::shape_string, so I did it mostly by guesswork. What doesn't make sense is subtracting 2*top_border.

# This function computes ascent and descent of each of a vector
# of strings using systemfonts::shape_string().  shape_string()
# alone gives incorrect results when some of the string 
# entries include glyphs that are not in the specified font;
# this function recursively uses systemfonts::font_fallback()
# to do the correct calculation for the whole thing.

bbox_with_subs <- function(strings, family = "", 
                           italic = FALSE,
                           bold = FALSE, 
                           size = 12,
                           path = NULL, index = 0, 
                           ...) {
  
  # Function to make an empty result dataframe
  
  ascent_descent_df <- function(n) 
    data.frame(string = character(n), width = numeric(n),
               ascent = numeric(n), descent = numeric(n),
               pen_x = numeric(n))
  
  # Function to compute ascent and descent from the shape_string() results
  
  extract_ascent_descent <- function(metrics)
    with(metrics, 
         data.frame(string = string, 
                    width = width, 
                    ascent = top_border - top_bearing,
                    descent = height - 2*top_border -
                      bottom_bearing,
                    pen_x = pen_x))
  
  # Try shape_string() on the whole vector.  If no glyphs are
  # missing, trust the result
  
  shape0 <- systemfonts::shape_string(strings, family = family, path = path, index = index, italic = italic, bold = bold, size = size, ...)
  
  shape <- shape0$shape
  if (all(shape$index > 0))
    result <- extract_ascent_descent(shape0$metrics)
  
  else {
    # Some glyphs are missing.
    
    # We're going to be subsetting the strings, so all vectors
    # should be the same length
    
    n <- length(strings)
    stopifnot(nrow(shape0$metrics) == n)
    
    family <- rep_len(family, n)
    italic <- rep_len(italic, n)
    bold <- rep_len(bold, n)
    size <- rep_len(size, n)
    if (!is.null(path)) path <- rep_len(path, n)
    index <- rep_len(index, n)
    
    result <- ascent_descent_df(n)
    
    missings <- unique(shape$metric_id[shape$index == 0]) + 1
    nonmissings <- setdiff(unique(shape$metric_id) + 1, missings)
    
    # The nonmissings are fine; just extract those results
    if (length(nonmissings))
      result[nonmissings, ] <- extract_ascent_descent(shape0$metrics[nonmissings,])
    
    # For the missings, we need to find fallbacks.  Do them
    # one at a time.
    
    for (m in missings) {
      # Choose the glyph entries corresponding to this string
      rows <- which(shape$metric_id == m - 1)
        
      # Some glyphs are missing.  Break the string up into 
      # sequences of non-missing and missing glyphs
        
      parts <- rle(shape[rows, "index"] == 0)
      lens <- parts$lengths
      starts <- c(1, 1 + cumsum(lens)[-length(lens)])
      subs <- parts$values
      indx <- seq_along(parts$lengths)
      parts <- substring(shape0$metrics[m,"string"], 
                         starts, starts + lens - 1)
        
      n0 <- length(parts)
      result0 <- ascent_descent_df(n0)
      
      # Redo shape_string() on the parts with non-missing glyphs,
      # and save those results
      if (!all(subs)) {
        shape1 <- systemfonts::shape_string(parts[!subs], 
                                          family = family[m], 
                                          italic = italic[m], 
                                          bold = bold[m],
                                          size = size[m],
                                          path = if (!is.null(path)) path[m], 
                                          index = index[m], ...)
        

        result0[!subs, ] <- extract_ascent_descent(shape1$metrics)
      }
      
      # Find the fallback fonts for missing parts, and use them
      # in a recursive call
        
      fallback <- systemfonts::font_fallback(parts[subs], 
                                             family = family[m],
                                             italic = italic[m],
                                             bold = bold[m],
                                             path = if (!is.null(path)) path[m], 
                                             index = index[m])
      result0[subs, ] <- bbox_with_subs(parts[subs], path = fallback$path, index = fallback$index, size = size[m])
        
      # Combine the results from the parts of the string
      # back into a single result for the whole string
      result[m, ] <- data.frame(string = shape0$metrics[m, "string"],
                                width = sum(result0$pen_x[-n0]) + result0$width[n0],
                                ascent = max(result0$ascent),
                                descent = max(result0$descent),
                                pen_x = sum(result0$pen_x))
    }
  }
  result
}

@thomasp85
Copy link
Member

Thanks for the detailed exploration. It is right that the logic path of rendering and string dimension calculations diverge - I'll look into a fix for this so fallback fonts are correctly handled

@thomasp85
Copy link
Member

So, having poked at this for some time I've come to the conclusion that there is no "real" fix for it. The main issue is that the R graphic engine is measuring the total height of a string by iterating over each character in the string, asking the dimensions for it, and compounding that information. This approach is flawed as the text shaping might change the 1-to-1 relationship between the string and the glyphs rendered. One issue is the fallback problems you are reporting, but various ligatures could also result in different ascenders and descenders and it is impossible to get this correct when queried one character at a time...

The best way to get the measurements accurately is to use textshaping::shape_text() over the values reported by the graphics device

@thomasp85 thomasp85 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2023
@dmurdoch
Copy link
Author

dmurdoch commented Oct 4, 2023

Thanks for looking into this. Just one quick question: is there a good source of documentation (maybe including a diagram) for the actual meaning of the textshaping::shape_text return values, or at least a formula for ascent and descent based on them? I found them empirically:

 ascent = top_border - top_bearing

 descent = height - 2*top_border - bottom_bearing

but the descent formula doesn't really make sense to me, so I worry it might not always work.

@thomasp85
Copy link
Member

There is not... the R facing text shaping functions in systemfonts and textshaping have always been slightly experimental and for tinkering while me and Paul discussed a better text rendering API for the graphics engine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants