New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
support emoji encoding for Flux jobids #5174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: update the cute
format to use id.emoji
? However, getting things to align is hard. I haven't been able to figure anything out yet that looks ok.
Edit: I think the emojis are double width in my terminal, so I need to "space over" the JOBID header. We need like a "header offset' or something.
Edit: heh, this fixed the alignment :P (12 spaces)
diff --git a/src/bindings/python/flux/job/info.py b/src/bindings/python/flux/job/info.py
index 393d40a..12f0ccb 100644
--- a/src/bindings/python/flux/job/info.py
+++ b/src/bindings/python/flux/job/info.py
@@ -633,7 +633,7 @@ class JobInfoFormat(flux.util.OutputFormat):
"id.dec": "JOBID",
"id.hex": "JOBID",
"id.f58": "JOBID",
- "id.emoji": "JOBID",
+ "id.emoji": " JOBID",
"id.kvs": "JOBID",
"id.words": "JOBID",
"id.dothex": "JOBID",
src/common/libutil/basemoji.c
Outdated
|
||
/* Maximum number of emoji "digits" in a basemoji string is | ||
* | ||
* ciel (ln (2^64-1)/ln (576)) = 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ceil
src/common/libutil/basemoji.c
Outdated
} | ||
|
||
/* Check for overflow of provided buffer: | ||
* Need space for prefix + count bytes for emoji + NUL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no prefix?
/* Check for overflow of provided buffer: | ||
* Need space for prefix + count bytes for emoji + NUL | ||
*/ | ||
if (count + 1 > buflen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parens around (count + 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but operator precedence puts +
before >
so what is the need here? Just preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I would not add parens there (precedence is clear).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok no prob, I tend to add parens when there's more than "1 thingie" on one side of the gt or lt.
src/common/libutil/basemoji.c
Outdated
/* Check for expected length of a basemoji string, and if the | ||
* first two bytes match the expected UTF-8 encoding. | ||
* This doesn't guarantee that `s` is a valid basemoji string, | ||
* but this will catch most obvious cases and other invalid strings | ||
* are left to be detected in decode. | ||
*/ | ||
if (len >= BASEMOJI_MINLEN | ||
&& len <= BASEMOJI_MAXLEN | ||
&& (uint8_t)s[0] == 0xf0 | ||
&& (uint8_t)s[1] == 0x9f) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should check strlen (s) % 4 == 0
? I'm waffling on the need for it, would be good to catch obvious errors, but code below would handle it ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, good point. It is a simple check so might as well add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we could go through and ensure ever other two bytes start with F0 9F, but that seemed a bit heavyweight to me. This isn't validating the emoji string, but just detecting if we should try to decode it as one..
@@ -236,7 +245,7 @@ void test_basic (void) | |||
ok (fluid_encode (buf, sizeof (buf), id, FLUID_STRING_DOTHEX) == 0, | |||
"fluid_encode type=DOTHEX works"); | |||
ok (fluid_decode (buf, &id2, FLUID_STRING_DOTHEX) == 0 && id == id2, | |||
"fluid_decode type=MNEMONIC works"); | |||
"fluid_decode type=DOTHEX works"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be in another commit? another one below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, those were cut-and-paste fixes I missed adding to a different commit :-(
Problem: There is no implementation for encoding FLUIDs to emoji as described in RFC 19. Add basemoji, an encoding which uses a table of 576 standard Unicode emoji to represent 64 bit unsigned integers in base 576.
Problem: There are no unit tests for the basemoji encoding for 64 bit unsigned integers. Add a set of simple unit tess for basemoji to libutil/test.
Problem: Encoding FLUIDs to a string of emoji is not supported. Use the libutil/basemoji implementation to add a FLUID_STRING_EMOJI string type to fluid_string_type_t. Make sure FLUID_STRING_EMOJI is supported with fluid_parse(3).
Problem: There are a couple cut-and-paste errors in the fluid unit test that print type=MNEMONIC when they mean type=DOTHEX. Fix the typos.
Problem: There are no unit tests for the FLUID_STRING_EMOJI encoding. Amend the existing fluid unit tests to exercise this encoding.
Problem: libjob does not support encoding flux_jobid_t to emoji. Add an "emoji" encoding that uses FLUX_STRING_EMOJI to encode flux_jobid_t as a string of emoji using the basemoji implementation.
Problem: No tests in the libjob unit testsuite ensure the "emoji" jobid encoding works as intended. Add some expected emoji output to the libjob unit tests.
Problem: A Python JobID object can't return the emoji encoding of a jobid since there's no corresponding class property. Add an `emoji` property to the JobID class which returns the emoji encoding of the jobid.
Problem: The Python job tests do not test the emoji encoding for JobIDs. Add a tests for expected 'emoji' encodings of JobIDs.
Problem: The Python job formatting class does not support id.emoji, even though emoji is valid jobid encoding. Add `id.emoji` to various dictionaries as required to support this encoding in output formats.
Problem: `id.emoji` is missing from the list of available field names in flux-jobs(1). Add id.emoji to the list of valid field names in the flux-jobs(1) manual.
I was completely messing around to see if I could get the emoji alignment to work and this ended up working:
Obviously I hard coded it based on what I was expecting for the spec, but perhaps the basics of something to align wide chars is doable / reasonable ... I think there's a lot of corner cases on emoji chars and some are half-chars, but the output would atleast be sensible in most cases, vs the super shifted one I showed above. |
Yeah, that might work! I wonder if we should have a special conversion so this doesn't have to be done for every string that is being formatted, or maybe there's some other way to better detect a possible "wide" string (or maybe it is fast enough that it doesn't matter) |
BTW I stumbled across this library https://gitlab.com/fgallaire/cjkwrap/-/blob/master/cjkwrap.py def cjklen(text):
"""cjklen(object) -> integer
Return the real width of an unicode text, the len of any other type.
"""
if not isinstance(text, text_type):
return len(text)
return sum(2 if is_wide(char) else 1 for char in text) |
Just removed WIP since the RFC 19 changes went in. |
Codecov Report
@@ Coverage Diff @@
## master #5174 +/- ##
==========================================
+ Coverage 83.13% 83.15% +0.02%
==========================================
Files 454 455 +1
Lines 77961 78035 +74
==========================================
+ Hits 64814 64892 +78
+ Misses 13147 13143 -4
|
Oh that's a good idea. And honestly, we could just limit it to certain formatting in that case. like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ehhh, this wasn't too awful, maybe could propose in a follow up PR (probably can cleanup some logic here). diff --git a/src/bindings/python/flux/util.py b/src/bindings/python/flux/util.py
index f753f5e..1c2f82e 100644
--- a/src/bindings/python/flux/util.py
+++ b/src/bindings/python/flux/util.py
@@ -29,6 +29,7 @@ from string import Formatter
from typing import Mapping
import yaml
+import unicodedata
# tomllib added to standard library in Python 3.11
# flux-core minimum is Python 3.6.
@@ -443,6 +444,28 @@ class UtilFormatter(Formatter):
basecases = ("", "0s", "0.0", "0:00:00", "1970-01-01T00:00:00", localepoch)
value = "-" if str(value) in basecases else str(value)
spec = spec[:-1] + "s"
+
+ if spec.endswith("W"):
+ if isinstance (value, str):
+ match = re.search(r"^([<>])(\d+)W", spec)
+ if match:
+ align = match[1]
+ width = int(match[2])
+ widecount = 0
+ for chr in value:
+ if unicodedata.east_asian_width(chr) == 'W':
+ widecount += 1
+
+ if width > widecount:
+ width -= widecount;
+ spec = f"{align}{width}s"
+ else:
+ spec = ""
+ else:
+ spec = spec[:-1]
+ else:
+ spec = spec[:-1]
+
retval = super().format_field(value, spec)
if denote_truncation and len(retval) < len(str(value)):
diff --git a/src/cmd/flux-jobs.py b/src/cmd/flux-jobs.py
index 5211fb6..a4a14bd 100755
--- a/src/cmd/flux-jobs.py
+++ b/src/cmd/flux-jobs.py
@@ -39,7 +39,7 @@ class FluxJobsConfig(UtilConfig):
"cute": {
"description": "Cute flux-jobs format string (default with emojis)",
"format": (
- "{id.f58:>12} ?:{queue:<8.8} {username:<8.8} {name:<10.10+} "
+ "{id.emoji:>12W} ?:{queue:<8.8} {username:<8.8} {name:<10.10+} "
"{status_emoji:>5.5} {ntasks:>6} {nnodes:>6h} "
"{contextual_time!F:>8h} {contextual_info}"
), I did not add |
Ok, I guess I'll set MWP then.. 馃槅 |
This PR implements the emoji encoding for FLUIDs as proposed in the pending PR flux-framework/rfc#381.
It is a WIP pending acceptance of that RFC PR.
Instead of trying to implement a generic binary to emoji encoding, the "basemoji" implementation here simply converts
uint64_t
to base 576 using the same pattern as the F58 implementation. This allows high-order zero bits to be dropped since the result is a "number" with emoji as the base 576 digits.A named
emoji
encoding is then added to thefluid.h
with support for detection and parsing influid_parse(3)
, along with anid.emoji
property for theJobID
class andflux-jobs
output formats.