New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support emoji encoding for Flux jobids #5174
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
33d428d
libutil: add basemoji, an emoji uint64_t encoding
grondo 805bc6d
libutil/test: add tests for basemoji implementation
grondo eb4dc88
libutil: fluid: support FLUID_STRING_EMOJI
grondo a3ab55f
libutil/test: fix typos in fluid unit test
grondo 2a01426
libutil/test: test FLUID_STRING_EMOJI
grondo 3c28891
libjob: support emoji jobid encoding
grondo 6e9499c
libjob/test: test the emoji flux_jobid_t encoding
grondo 3c8de96
python: add emoji property to JobID class
grondo be5a75c
testsuite: test JobID emoji encoding
grondo 2a45ed7
python: support id.emoji in output formats
grondo 631e983
doc: add id.emoji to flux-jobs(1)
grondo File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,228 @@ | ||
/************************************************************\ | ||
* Copyright 2023 Lawrence Livermore National Security, LLC | ||
* (c.f. AUTHORS, NOTICE.LLNS, COPYING) | ||
* | ||
* This file is part of the Flux resource manager framework. | ||
* For details, see https://github.com/flux-framework. | ||
* | ||
* SPDX-License-Identifier: LGPL-3.0 | ||
\************************************************************/ | ||
|
||
/* basemoji.c - an emoji encoding for unsigned 64 bit integers | ||
*/ | ||
|
||
#if HAVE_CONFIG_H | ||
#include "config.h" | ||
#endif | ||
|
||
#include <stdio.h> | ||
#include <stdlib.h> | ||
#include <stdint.h> | ||
#include <string.h> | ||
#include <errno.h> | ||
#include <stdbool.h> | ||
|
||
#include "ccan/array_size/array_size.h" | ||
#include "basemoji.h" | ||
|
||
/* Minimum length of a b576 string is 1 emoji, or 4 bytes */ | ||
#define BASEMOJI_MINLEN 4 | ||
|
||
/* Maximum number of emoji "digits" in a basemoji string is | ||
* | ||
* ceil (ln (2^64-1)/ln (576)) = 7 | ||
* | ||
* 4 bytes per emoji, so 4*7 = 28 bytes. | ||
*/ | ||
#define BASEMOJI_MAXLEN 28 | ||
|
||
/* The following is a Selection of 576 emoji in CLDR[1] collation order[2] | ||
* taken from the version 2010 Unicode emoji set[3]. Note: Selected code | ||
* points are all represented in 4 bytes, which is assumed in the | ||
* implementation in this module. Additionally, every character in this | ||
* selected set has a common first two bytes of F0 9F in UTF-8 encoding, | ||
* which aids in detection of a valid basemoji string. | ||
* | ||
* 1. https://cldr.unicode.org | ||
* 2. https://unicode.org/emoji/charts-12.1/emoji-ordering.txt | ||
* 3. https://unicode.org/emoji/charts/emoji-versions.html | ||
* | ||
*/ | ||
const char *emojis[] = { | ||
"😃", "😄", "😁", "😆", "😅", "😂", "😉", "😊", "😍", "😘", "😚", "😋", | ||
"😜", "😝", "😏", "😒", "😌", "😔", "😪", "😷", "😵", "😲", "😳", "😨", | ||
"😰", "😥", "😢", "😭", "😱", "😖", "😣", "😞", "😓", "😩", "😫", "😤", | ||
"😡", "😠", "👿", "💀", "💩", "👹", "👺", "👻", "👽", "👾", "😺", "😸", | ||
"😹", "😻", "😼", "😽", "🙀", "😿", "😾", "🙈", "🙉", "🙊", "💌", "💘", | ||
"💝", "💖", "💗", "💓", "💞", "💕", "💟", "💔", "💛", "💚", "💙", "💜", | ||
"💋", "💯", "💢", "💥", "💫", "💦", "💨", "💬", "💤", "👋", "👌", "👈", | ||
"👉", "👆", "👇", "👍", "👎", "👊", "👏", "🙌", "👐", "🙏", "💅", "💪", | ||
"👂", "👃", "👀", "👅", "👄", "👶", "👦", "👧", "👱", "👨", "👩", "👴", | ||
"👵", "🙍", "🙎", "🙅", "🙆", "💁", "🙋", "🙇", "👮", "💂", "👷", "👸", | ||
"👳", "👲", "👰", "👼", "🎅", "💆", "💇", "🚶", "🏃", "💃", "👯", "🏂", | ||
"🏄", "🏊", "🛀", "👫", "💏", "💑", "👪", "👤", "👣", "🐵", "🐒", "🐶", | ||
"🐩", "🐺", "🐱", "🐯", "🐴", "🐎", "🐮", "🐷", "🐗", "🐽", "🐑", "🐫", | ||
"🐘", "🐭", "🐹", "🐰", "🐻", "🐨", "🐼", "🐾", "🐔", "🐣", "🐤", "🐥", | ||
"🐦", "🐧", "🐸", "🐢", "🐍", "🐲", "🐳", "🐬", "🐟", "🐠", "🐡", "🐙", | ||
"🐚", "🐌", "🐛", "🐜", "🐝", "🐞", "💐", "🌸", "💮", "🌹", "🌺", "🌻", | ||
"🌼", "🌷", "🌱", "🌴", "🌵", "🌾", "🌿", "🍀", "🍁", "🍂", "🍃", "🍄", | ||
"🍇", "🍈", "🍉", "🍊", "🍌", "🍍", "🍎", "🍏", "🍑", "🍒", "🍓", "🍅", | ||
"🍆", "🌽", "🌰", "🍞", "🍖", "🍗", "🍔", "🍟", "🍕", "🍳", "🍲", "🍱", | ||
"🍘", "🍙", "🍚", "🍛", "🍜", "🍝", "🍠", "🍢", "🍣", "🍤", "🍥", "🍡", | ||
"🍦", "🍧", "🍨", "🍩", "🍪", "🎂", "🍰", "🍫", "🍬", "🍭", "🍮", "🍯", | ||
"🍵", "🍶", "🍷", "🍸", "🍹", "🍺", "🍻", "🍴", "🔪", "🌏", "🗾", "🌋", | ||
"🗻", "🏠", "🏡", "🏢", "🏣", "🏥", "🏦", "🏨", "🏩", "🏪", "🏫", "🏬", | ||
"🏭", "🏯", "🏰", "💒", "🗼", "🗽", "🌁", "🌃", "🌄", "🌅", "🌆", "🌇", | ||
"🌉", "🎠", "🎡", "🎢", "💈", "🎪", "🚃", "🚄", "🚅", "🚇", "🚉", "🚌", | ||
"🚑", "🚒", "🚓", "🚕", "🚗", "🚙", "🚚", "🚲", "🚏", "🚨", "🚥", "🚧", | ||
"🚤", "🚢", "💺", "🚀", "🕛", "🕐", "🕑", "🕒", "🕓", "🕔", "🕕", "🕖", | ||
"🕗", "🕘", "🕙", "🕚", "🌑", "🌓", "🌔", "🌕", "🌙", "🌛", "🌟", "🌠", | ||
"🌌", "🌀", "🌈", "🌂", "🔥", "💧", "🌊", "🎃", "🎄", "🎆", "🎇", "🎈", | ||
"🎉", "🎊", "🎋", "🎍", "🎎", "🎏", "🎐", "🎑", "🎀", "🎁", "🎫", "🏆", | ||
"🏀", "🏈", "🎾", "🎳", "🎣", "🎽", "🎿", "🎯", "🔫", "🎱", "🔮", "🎮", | ||
"🎰", "🎲", "🃏", "🀄", "🎴", "🎭", "🎨", "👓", "👔", "👕", "👖", "👗", | ||
"👘", "👙", "👚", "👛", "👜", "👝", "🎒", "👞", "👟", "👠", "👡", "👢", | ||
"👑", "👒", "🎩", "🎓", "💄", "💍", "💎", "🔊", "📢", "📣", "🔔", "🎼", | ||
"🎵", "🎶", "🎤", "🎧", "📻", "🎷", "🎸", "🎹", "🎺", "🎻", "📱", "📲", | ||
"📞", "📟", "📠", "🔋", "🔌", "💻", "💽", "💾", "💿", "📀", "🎥", "🎬", | ||
"📺", "📷", "📹", "📼", "🔍", "🔎", "💡", "🔦", "🏮", "📔", "📕", "📖", | ||
"📗", "📘", "📙", "📚", "📓", "📒", "📃", "📜", "📄", "📰", "📑", "🔖", | ||
"💰", "💴", "💵", "💸", "💳", "💹", "📧", "📨", "📩", "📤", "📥", "📦", | ||
"📫", "📪", "📮", "📝", "💼", "📁", "📂", "📅", "📆", "📇", "📈", "📉", | ||
"📊", "📋", "📌", "📍", "📎", "📏", "📐", "🔒", "🔓", "🔏", "🔐", "🔑", | ||
"🔨", "💣", "🔧", "🔩", "🔗", "📡", "💉", "💊", "🚪", "🚽", "🚬", "🗿", | ||
"🏧", "🚹", "🚺", "🚻", "🚼", "🚾", "🚫", "🚭", "🔞", "🔃", "🔙", "🔚", | ||
"🔛", "🔜", "🔝", "🔯", "🔼", "🔽", "🎦", "📶", "📳", "📴", "💱", "💲", | ||
"🔱", "📛", "🔰", "🔟", "🔠", "🔡", "🔢", "🔣", "🔤", "🆎", "🆑", "🆒", | ||
"🆓", "🆔", "🆕", "🆖", "🆗", "🆘", "🆙", "🆚", "🈁", "🈶", "🈯", "🉐", | ||
"🈹", "🈚", "🈲", "🉑", "🈸", "🈴", "🈳", "🈺", "🈵", "🔴", "🔵", "🔶", | ||
"🔷", "🔸", "🔹", "🔺", "🔻", "💠", "🔘", "🔳", "🔲", "🏁", "🚩", "🎌", | ||
}; | ||
|
||
bool is_basemoji_string (const char *s) | ||
{ | ||
int len = strlen (s); | ||
|
||
/* This code assumes length of emoji array is 576 | ||
* Generate error at build time if this becomes untrue: | ||
*/ | ||
BUILD_ASSERT(ARRAY_SIZE(emojis) == 576); | ||
|
||
/* Check for expected length of a basemoji string, and if the | ||
* first two bytes match the expected UTF-8 encoding. | ||
* This doesn't guarantee that `s` is a valid basemoji string, | ||
* but this will catch most obvious cases and other invalid strings | ||
* are left to be detected in decode. | ||
*/ | ||
if (len >= BASEMOJI_MINLEN | ||
&& len <= BASEMOJI_MAXLEN | ||
&& len % 4 == 0 | ||
&& (uint8_t)s[0] == 0xf0 | ||
&& (uint8_t)s[1] == 0x9f) | ||
return true; | ||
return false; | ||
} | ||
|
||
/* Encode id into buf in reverse (i.e. higher order bytes are encoded | ||
* and placed first into 'buf' since we're doing progressive division.) | ||
*/ | ||
static int emoji_revenc (char *buf, int buflen, uint64_t id) | ||
{ | ||
int index = 0; | ||
memset (buf, 0, buflen); | ||
if (id == 0) { | ||
memcpy (buf, emojis[0], 4); | ||
return 4; | ||
} | ||
while (id > 0) { | ||
int rem = id % 576; | ||
memcpy (buf+index, emojis[rem], 4); | ||
index += 4; | ||
id = id / 576; | ||
} | ||
return index; | ||
} | ||
|
||
int uint64_basemoji_encode (uint64_t id, char *buf, int buflen) | ||
{ | ||
int count; | ||
int n; | ||
char reverse[BASEMOJI_MAXLEN+1]; | ||
|
||
if (buf == NULL || buflen <= 0) { | ||
errno = EINVAL; | ||
return -1; | ||
} | ||
|
||
/* Encode bytes to emoji (in reverse), which also gives us a count | ||
* of the total bytes required for this encoding. | ||
*/ | ||
if ((count = emoji_revenc (reverse, sizeof (reverse), id)) < 0) { | ||
errno = EINVAL; | ||
return -1; | ||
} | ||
|
||
/* Check for overflow of provided buffer: | ||
* Need space for count bytes for emoji + NUL | ||
*/ | ||
if (count + 1 > buflen) { | ||
errno = EOVERFLOW; | ||
return -1; | ||
} | ||
|
||
memset (buf, 0, buflen); | ||
n = 0; | ||
|
||
/* Copy 4-byte emojis back in order so that most significant bits are | ||
* on the left: | ||
*/ | ||
for (int i = count - 4; i >= 0; i-=4) { | ||
memcpy (buf+n, reverse+i, 4); | ||
n+=4; | ||
} | ||
return 0; | ||
} | ||
|
||
|
||
static int basemoji_lookup (const char *c, int *result) | ||
{ | ||
for (int i = 0; i < 576; i++) { | ||
if (memcmp (c, emojis[i], 4) == 0) { | ||
*result = i; | ||
return 0; | ||
} | ||
} | ||
errno = EINVAL; | ||
return -1; | ||
} | ||
|
||
int uint64_basemoji_decode (const char *str, uint64_t *idp) | ||
{ | ||
uint64_t id = 0; | ||
uint64_t scale = 1; | ||
int len; | ||
|
||
if (str == NULL | ||
|| idp == NULL | ||
|| !is_basemoji_string (str)) { | ||
errno = EINVAL; | ||
return -1; | ||
} | ||
|
||
/* Move through basemoji string in reverse since least significant | ||
* bits are at the end. Since all emoji are 4 bytes, start at 4 from | ||
* the end to point to the final emoji. | ||
*/ | ||
len = strlen (str); | ||
for (int i = len - 4; i >= 0; i-=4) { | ||
int c; | ||
if (basemoji_lookup (str+i, &c) < 0) { | ||
errno = EINVAL; | ||
return -1; | ||
} | ||
id += c * scale; | ||
scale *= 576; | ||
} | ||
*idp = id; | ||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
/************************************************************\ | ||
* Copyright 2023 Lawrence Livermore National Security, LLC | ||
* (c.f. AUTHORS, NOTICE.LLNS, COPYING) | ||
* | ||
* This file is part of the Flux resource manager framework. | ||
* For details, see https://github.com/flux-framework. | ||
* | ||
* SPDX-License-Identifier: LGPL-3.0 | ||
\************************************************************/ | ||
|
||
#ifndef _UTIL_BASEMOJI_H | ||
#define _UTIL_BASEMOJI_H | ||
|
||
#include <stdint.h> | ||
#include <stdbool.h> | ||
|
||
/* basemoji - an implementation the RFC 19 FLUID emoji encoding | ||
*/ | ||
|
||
/* Convert a 64 bit unsigned integer to basemoji, placing the result | ||
* in buffer 'buf' of size 'buflen'. | ||
* | ||
* Returns 0 on success, -1 on failure with errno set: | ||
* EINVAL: Invalid arguments | ||
* EOVERFLOW: buffer too small for encoded string | ||
*/ | ||
int uint64_basemoji_encode (uint64_t id, char *buf, int buflen); | ||
|
||
/* Decode a string in basemoji to an unsigned 64 bit integer. | ||
* | ||
* Returns 0 on success, -1 on failure with errno set: | ||
* EINVAL: Invalid arguments | ||
*/ | ||
int uint64_basemoji_decode (const char *str, uint64_t *idp); | ||
|
||
/* Return true if 's' could be a basemoji string, i.e. it falls | ||
* within the minimum and maximum lengths, and starts with the | ||
* expected bytes. | ||
*/ | ||
bool is_basemoji_string (const char *s); | ||
|
||
#endif /* !_UTIL_BASEMOJI_H */ | ||
|
||
/* | ||
* vi:tabstop=4 shiftwidth=4 expandtab | ||
*/ |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parens around (count + 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but operator precedence puts
+
before>
so what is the need here? Just preference?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I would not add parens there (precedence is clear).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok no prob, I tend to add parens when there's more than "1 thingie" on one side of the gt or lt.