support emoji encoding for Flux jobids #5174

grondo · 2023-05-16T18:57:27Z

This PR implements the emoji encoding for FLUIDs as proposed in the pending PR flux-framework/rfc#381.
It is a WIP pending acceptance of that RFC PR.

Instead of trying to implement a generic binary to emoji encoding, the "basemoji" implementation here simply converts uint64_t to base 576 using the same pattern as the F58 implementation. This allows high-order zero bits to be dropped since the result is a "number" with emoji as the base 576 digits.

A named emoji encoding is then added to the fluid.h with support for detection and parsing in fluid_parse(3), along with an id.emoji property for the JobID class and flux-jobs output formats.

$ flux job id --to=emoji 0 1 512 65535 123456789
😃
😄
🔞
💁📚
🎰💗📣

chu11

Idea: update the cute format to use id.emoji? However, getting things to align is hard. I haven't been able to figure anything out yet that looks ok.

Edit: I think the emojis are double width in my terminal, so I need to "space over" the JOBID header. We need like a "header offset' or something.

Edit: heh, this fixed the alignment :P (12 spaces)

diff --git a/src/bindings/python/flux/job/info.py b/src/bindings/python/flux/job/info.py
index 393d40a..12f0ccb 100644
--- a/src/bindings/python/flux/job/info.py
+++ b/src/bindings/python/flux/job/info.py
@@ -633,7 +633,7 @@ class JobInfoFormat(flux.util.OutputFormat):
         "id.dec": "JOBID",
         "id.hex": "JOBID",
         "id.f58": "JOBID",
-        "id.emoji": "JOBID",
+        "id.emoji": "           JOBID",
         "id.kvs": "JOBID",
         "id.words": "JOBID",
         "id.dothex": "JOBID",

chu11 · 2023-05-16T19:30:28Z

src/common/libutil/basemoji.c

+
+/* Maximum number of emoji "digits" in a basemoji string is
+ *
+ *  ciel (ln (2^64-1)/ln (576)) = 7


chu11 · 2023-05-16T19:40:35Z

src/common/libutil/basemoji.c

+    }
+
+    /*  Check for overflow of provided buffer:
+     *  Need space for prefix + count bytes for emoji + NUL


chu11 · 2023-05-16T19:40:53Z

src/common/libutil/basemoji.c

+    /*  Check for overflow of provided buffer:
+     *  Need space for prefix + count bytes for emoji + NUL
+     */
+    if (count + 1 > buflen) {


parens around (count + 1)

Ok, but operator precedence puts + before > so what is the need here? Just preference?

FWIW I would not add parens there (precedence is clear).

ok no prob, I tend to add parens when there's more than "1 thingie" on one side of the gt or lt.

chu11 · 2023-05-16T19:44:00Z

src/common/libutil/basemoji.c

+    /* Check for expected length of a basemoji string, and if the
+     * first two bytes match the expected UTF-8 encoding.
+     * This doesn't guarantee that `s` is a valid basemoji string,
+     * but this will catch most obvious cases and other invalid strings
+     * are left to be detected in decode.
+     */
+    if (len >= BASEMOJI_MINLEN
+        && len <= BASEMOJI_MAXLEN
+        && (uint8_t)s[0] == 0xf0
+        && (uint8_t)s[1] == 0x9f)
+        return true;


should check strlen (s) % 4 == 0? I'm waffling on the need for it, would be good to catch obvious errors, but code below would handle it ....

hm, good point. It is a simple check so might as well add it.

Note we could go through and ensure ever other two bytes start with F0 9F, but that seemed a bit heavyweight to me. This isn't validating the emoji string, but just detecting if we should try to decode it as one..

chu11 · 2023-05-16T19:57:10Z

src/common/libutil/test/fluid.c

@@ -236,7 +245,7 @@ void test_basic (void)
    ok (fluid_encode (buf, sizeof (buf), id, FLUID_STRING_DOTHEX) == 0,
        "fluid_encode type=DOTHEX works");
    ok (fluid_decode (buf, &id2, FLUID_STRING_DOTHEX) == 0 && id == id2,
-        "fluid_decode type=MNEMONIC works");
+        "fluid_decode type=DOTHEX works");


nit: should be in another commit? another one below too.

Oh yeah, those were cut-and-paste fixes I missed adding to a different commit :-(

Problem: There is no implementation for encoding FLUIDs to emoji as described in RFC 19. Add basemoji, an encoding which uses a table of 576 standard Unicode emoji to represent 64 bit unsigned integers in base 576.

Problem: There are no unit tests for the basemoji encoding for 64 bit unsigned integers. Add a set of simple unit tess for basemoji to libutil/test.

Problem: Encoding FLUIDs to a string of emoji is not supported. Use the libutil/basemoji implementation to add a FLUID_STRING_EMOJI string type to fluid_string_type_t. Make sure FLUID_STRING_EMOJI is supported with fluid_parse(3).

Problem: There are a couple cut-and-paste errors in the fluid unit test that print type=MNEMONIC when they mean type=DOTHEX. Fix the typos.

Problem: There are no unit tests for the FLUID_STRING_EMOJI encoding. Amend the existing fluid unit tests to exercise this encoding.

Problem: libjob does not support encoding flux_jobid_t to emoji. Add an "emoji" encoding that uses FLUX_STRING_EMOJI to encode flux_jobid_t as a string of emoji using the basemoji implementation.

Problem: No tests in the libjob unit testsuite ensure the "emoji" jobid encoding works as intended. Add some expected emoji output to the libjob unit tests.

Problem: A Python JobID object can't return the emoji encoding of a jobid since there's no corresponding class property. Add an `emoji` property to the JobID class which returns the emoji encoding of the jobid.

Problem: The Python job tests do not test the emoji encoding for JobIDs. Add a tests for expected 'emoji' encodings of JobIDs.

Problem: The Python job formatting class does not support id.emoji, even though emoji is valid jobid encoding. Add `id.emoji` to various dictionaries as required to support this encoding in output formats.

Problem: `id.emoji` is missing from the list of available field names in flux-jobs(1). Add id.emoji to the list of valid field names in the flux-jobs(1) manual.

chu11 · 2023-05-16T23:37:25Z

I was completely messing around to see if I could get the emoji alignment to work and this ended up working:

@@ -443,6 +444,17 @@ class UtilFormatter(Formatter):
             basecases = ("", "0s", "0.0", "0:00:00", "1970-01-01T00:00:00", localepoch)
             value = "-" if str(value) in basecases else str(value)
             spec = spec[:-1] + "s"
+
+        if isinstance (value, str):
+            widecount = 0
+            for chr in value:
+                if unicodedata.east_asian_width(chr) == 'W':
+                    widecount += 1
+
+            if widecount and spec == ">12":
+                count = 12 - widecount
+                spec = f">{count}"
+
         retval = super().format_field(value, spec)

Obviously I hard coded it based on what I was expecting for the spec, but perhaps the basics of something to align wide chars is doable / reasonable ... I think there's a lot of corner cases on emoji chars and some are half-chars, but the output would atleast be sensible in most cases, vs the super shifted one I showed above.

grondo · 2023-05-16T23:58:25Z

Yeah, that might work! I wonder if we should have a special conversion so this doesn't have to be done for every string that is being formatted, or maybe there's some other way to better detect a possible "wide" string (or maybe it is fast enough that it doesn't matter)

grondo · 2023-05-17T00:00:33Z

BTW I stumbled across this library https://gitlab.com/fgallaire/cjkwrap/-/blob/master/cjkwrap.py
That has a ckjlen() function (len of a possible wide string) that we could maybe pilfer:

def cjklen(text):
     """cjklen(object) -> integer
         Return the real width of an unicode text, the len of any other type.
     """
     if not isinstance(text, text_type):
         return len(text)
     return sum(2 if is_wide(char) else 1 for char in text)

grondo · 2023-05-17T00:03:47Z

Just removed WIP since the RFC 19 changes went in.

codecov · 2023-05-17T00:13:21Z

Codecov Report

Merging #5174 (631e983) into master (c0e977c) will increase coverage by 0.02%.
The diff coverage is 90.66%.

@@            Coverage Diff             @@
##           master    #5174      +/-   ##
==========================================
+ Coverage   83.13%   83.15%   +0.02%     
==========================================
  Files         454      455       +1     
  Lines       77961    78035      +74     
==========================================
+ Hits        64814    64892      +78     
+ Misses      13147    13143       -4

Impacted Files	Coverage Δ
src/bindings/python/flux/job/info.py	`92.85% <ø> (ø)`
src/common/libutil/fluid.c	`93.17% <85.71%> (-0.30%)`	⬇️
src/common/libutil/basemoji.c	`90.62% <90.62%> (ø)`
src/bindings/python/flux/job/JobID.py	`100.00% <100.00%> (ø)`
src/common/libjob/id.c	`90.69% <100.00%> (+0.22%)`	⬆️

... and 5 files with indirect coverage changes

chu11 · 2023-05-17T00:20:42Z

I wonder if we should have a special conversion so this doesn't have to be done for every string that is being formatted

Oh that's a good idea. And honestly, we could just limit it to certain formatting in that case. like >\d+ and <\d+ or something.

chu11

LGTM

chu11 · 2023-05-17T04:46:10Z

ehhh, this wasn't too awful, maybe could propose in a follow up PR (probably can cleanup some logic here).

diff --git a/src/bindings/python/flux/util.py b/src/bindings/python/flux/util.py
index f753f5e..1c2f82e 100644
--- a/src/bindings/python/flux/util.py
+++ b/src/bindings/python/flux/util.py
@@ -29,6 +29,7 @@ from string import Formatter
 from typing import Mapping
 
 import yaml
+import unicodedata
 
 # tomllib added to standard library in Python 3.11
 # flux-core minimum is Python 3.6.
@@ -443,6 +444,28 @@ class UtilFormatter(Formatter):
             basecases = ("", "0s", "0.0", "0:00:00", "1970-01-01T00:00:00", localepoch)
             value = "-" if str(value) in basecases else str(value)
             spec = spec[:-1] + "s"
+
+        if spec.endswith("W"):
+            if isinstance (value, str):
+                match = re.search(r"^([<>])(\d+)W", spec)
+                if match:
+                    align = match[1]
+                    width = int(match[2])
+                    widecount = 0
+                    for chr in value:
+                        if unicodedata.east_asian_width(chr) == 'W':
+                            widecount += 1
+
+                    if width > widecount:
+                        width -= widecount;
+                        spec = f"{align}{width}s"
+                    else:
+                        spec = ""
+                else:
+                    spec = spec[:-1]
+            else:
+                spec = spec[:-1]
+
         retval = super().format_field(value, spec)
 
         if denote_truncation and len(retval) < len(str(value)):
diff --git a/src/cmd/flux-jobs.py b/src/cmd/flux-jobs.py
index 5211fb6..a4a14bd 100755
--- a/src/cmd/flux-jobs.py
+++ b/src/cmd/flux-jobs.py
@@ -39,7 +39,7 @@ class FluxJobsConfig(UtilConfig):
         "cute": {
             "description": "Cute flux-jobs format string (default with emojis)",
             "format": (
-                "{id.f58:>12} ?:{queue:<8.8} {username:<8.8} {name:<10.10+} "
+                "{id.emoji:>12W} ?:{queue:<8.8} {username:<8.8} {name:<10.10+} "
                 "{status_emoji:>5.5} {ntasks:>6} {nnodes:>6h} "
                 "{contextual_time!F:>8h} {contextual_info}"
             ),

I did not add W to the status_emoji output though. I think there is some combo of wide width, total width, and width of the header that is coming into play here for alignment that I haven't quite figured out. I think this is a decent enough hack to get us something more sensible to update the cute format though. Edit: oh duh ... the header JOBID is always going to be shorter than the output of a basemoji, so all of those extra chars mess up alignment. But with STATUS, there is only one emoji. So it'll always be shorter than the header, so using W could mess up alignment.

grondo · 2023-05-17T13:47:32Z

Ok, I guess I'll set MWP then.. 😆

chu11 reviewed May 16, 2023

View reviewed changes

grondo added 11 commits May 16, 2023 16:16

libutil: add basemoji, an emoji uint64_t encoding

33d428d

Problem: There is no implementation for encoding FLUIDs to emoji as described in RFC 19. Add basemoji, an encoding which uses a table of 576 standard Unicode emoji to represent 64 bit unsigned integers in base 576.

libutil/test: add tests for basemoji implementation

805bc6d

Problem: There are no unit tests for the basemoji encoding for 64 bit unsigned integers. Add a set of simple unit tess for basemoji to libutil/test.

libutil: fluid: support FLUID_STRING_EMOJI

eb4dc88

Problem: Encoding FLUIDs to a string of emoji is not supported. Use the libutil/basemoji implementation to add a FLUID_STRING_EMOJI string type to fluid_string_type_t. Make sure FLUID_STRING_EMOJI is supported with fluid_parse(3).

libutil/test: fix typos in fluid unit test

a3ab55f

Problem: There are a couple cut-and-paste errors in the fluid unit test that print type=MNEMONIC when they mean type=DOTHEX. Fix the typos.

libutil/test: test FLUID_STRING_EMOJI

2a01426

Problem: There are no unit tests for the FLUID_STRING_EMOJI encoding. Amend the existing fluid unit tests to exercise this encoding.

libjob: support emoji jobid encoding

3c28891

Problem: libjob does not support encoding flux_jobid_t to emoji. Add an "emoji" encoding that uses FLUX_STRING_EMOJI to encode flux_jobid_t as a string of emoji using the basemoji implementation.

libjob/test: test the emoji flux_jobid_t encoding

6e9499c

Problem: No tests in the libjob unit testsuite ensure the "emoji" jobid encoding works as intended. Add some expected emoji output to the libjob unit tests.

python: add emoji property to JobID class

3c8de96

Problem: A Python JobID object can't return the emoji encoding of a jobid since there's no corresponding class property. Add an `emoji` property to the JobID class which returns the emoji encoding of the jobid.

testsuite: test JobID emoji encoding

be5a75c

Problem: The Python job tests do not test the emoji encoding for JobIDs. Add a tests for expected 'emoji' encodings of JobIDs.

python: support id.emoji in output formats

2a45ed7

Problem: The Python job formatting class does not support id.emoji, even though emoji is valid jobid encoding. Add `id.emoji` to various dictionaries as required to support this encoding in output formats.

doc: add id.emoji to flux-jobs(1)

631e983

Problem: `id.emoji` is missing from the list of available field names in flux-jobs(1). Add id.emoji to the list of valid field names in the flux-jobs(1) manual.

grondo force-pushed the f576 branch from 46915bb to 631e983 Compare May 16, 2023 23:23

grondo changed the title ~~WIP: support emoji encoding for Flux jobids~~ support emoji encoding for Flux jobids May 17, 2023

chu11 approved these changes May 17, 2023

View reviewed changes

grondo added the merge-when-passing label May 17, 2023

mergify bot merged commit 11d4362 into flux-framework:master May 17, 2023
32 checks passed

grondo deleted the f576 branch May 17, 2023 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support emoji encoding for Flux jobids #5174

support emoji encoding for Flux jobids #5174

grondo commented May 16, 2023

chu11 left a comment •

edited

chu11 May 16, 2023

chu11 May 16, 2023

chu11 May 16, 2023

grondo May 16, 2023

garlick May 16, 2023 •

edited

chu11 May 16, 2023

chu11 May 16, 2023

grondo May 16, 2023

grondo May 16, 2023

chu11 May 16, 2023 •

edited

grondo May 16, 2023

chu11 commented May 16, 2023 •

edited

grondo commented May 16, 2023

grondo commented May 17, 2023

grondo commented May 17, 2023

codecov bot commented May 17, 2023

chu11 commented May 17, 2023

chu11 left a comment

chu11 commented May 17, 2023 •

edited

grondo commented May 17, 2023

support emoji encoding for Flux jobids #5174

support emoji encoding for Flux jobids #5174

Conversation

grondo commented May 16, 2023

chu11 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garlick May 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chu11 May 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chu11 commented May 16, 2023 • edited

grondo commented May 16, 2023

grondo commented May 17, 2023

grondo commented May 17, 2023

codecov bot commented May 17, 2023

Codecov Report

chu11 commented May 17, 2023

chu11 left a comment

Choose a reason for hiding this comment

chu11 commented May 17, 2023 • edited

grondo commented May 17, 2023

chu11 left a comment •

edited

garlick May 16, 2023 •

edited

chu11 May 16, 2023 •

edited

chu11 commented May 16, 2023 •

edited

chu11 commented May 17, 2023 •

edited