Summary
Repeater node names containing multi-byte UTF-8 characters can be advertised with invalid UTF-8 when the advert also includes location data. The firmware currently accepts such names, but later truncates them by raw byte count while building advert
appdata. This can cut an emoji/Unicode code point in the middle.
This makes names with emoji flags fail or appear corrupted in clients.
Example
I tried to use this repeater name:
Example RPT 🔋🇵🇱
UTF-8 byte length: 24 bytes.
When advertised with location enabled, the transmitted name payload becomes:
Example RPT 🔋🇵 plus an incomplete UTF-8 sequence for the second regional indicator.
Expected final bytes for 🇵🇱:
F0 9F 87 B5 F0 9F 87 B1
Observed advert tail:
F0 9F 87 B5 F0 9F 87
The last byte B1 is missing, so the advert contains invalid UTF-8.
Steps to Reproduce
-
Configure a repeater to include location in adverts, for example using the default/prefs location advert policy.
-
Set the repeater name to:
Example RPT 🔋🇵🇱
-
Send or wait for a repeater advert.
-
Decode the advertised node name from appdata.
Expected Behavior
The firmware should either:
- accept and advertise the full valid name if it fits, or
- reject the name at configuration time with a clear error if it does not fit the active advert constraints, or
- truncate it only at a valid UTF-8 code point boundary.
It should never emit invalid UTF-8 in the advertised node name.
Actual Behavior
The name passes current firmware validation. Later, while building advert appdata, it is truncated by byte count. If the byte limit lands inside a multi-byte UTF-8 sequence, the resulting advertised name is invalid UTF-8.
Current Validation
There is basic validation in src/helpers/CommonCLI.cpp:
static bool isValidName(const char *n) {
while (*n) {
if (*n == '[' || *n == ']' || *n == '\\' || *n == ':' || *n == ',' || *n == '?' || *n == '*') return false;
n++;
}
return true;
}
This validation only rejects a small blacklist of ASCII characters:
[ ] \ : , ? *
It does not currently check:
- whether the input is valid UTF-8
- whether the name fits in the persistent node_name[32] buffer without cutting UTF-8
- whether the name fits in advert appdata with the currently selected advert location policy
- whether truncation would happen at a valid UTF-8 boundary
The set name command then stores the name with StrHelper::strncpy():
StrHelper::strncpy(_prefs->node_name, &config[5], sizeof(_prefs->node_name));
That helper also copies raw bytes and can cut UTF-8 in the middle if the input exceeds the destination buffer.
Probable Cause
MAX_ADVERT_DATA_SIZE is 32 bytes.
When location is included, advert appdata uses:
- 1 byte for flags
- 4 bytes latitude
- 4 bytes longitude
That leaves only 23 bytes for the name, not 24.
In src/helpers/AdvertDataHelpers.cpp, AdvertDataBuilder::encodeTo() copies the name byte by byte:
while (*sp && i < MAX_ADVERT_DATA_SIZE) {
app_data[i++] = *sp++;
}
This does not respect UTF-8 character boundaries.
Documentation Note
The CLI docs currently say:
If a location is set, the max length is 24 bytes; 32 otherwise.
But with the current advert format, the effective advertised name limits seem to be:
- 23 bytes with location: 32 - 1 flags - 8 lat/lon
- 31 bytes without location: 32 - 1 flags
So this looks like an off-by-one documentation issue as well.
Suggested Fix
The most robust fix would be to extend the existing isValidName() validation instead of only fixing advert encoding.
Since set name is the point where the user input is accepted, it should probably validate:
- the existing forbidden ASCII characters
- valid UTF-8 structure
- persistent storage byte limit, currently sizeof(_prefs->node_name) - 1
- effective advert name byte limit for the current advert location policy
For example, with location in adverts enabled, the max advertised name length should be 23 bytes. Without location, it should be 31 bytes.
A helper could return both validity and the UTF-8-safe byte length, so set name can reject names that are too long before storing them.
The firmware should at least perform strict UTF-8 validation and UTF-8-safe truncation so it never emits invalid UTF-8. A more complete solution would be grapheme-cluster-aware truncation, but UTF-8 code point boundary truncation would already fix the invalid payload bug.
Pseudo-logic:
static size_t utf8ValidPrefixLen(const char* s, size_t max_bytes, bool* valid) {
size_t i = 0;
*valid = true;
while (s[i]) {
uint8_t c = (uint8_t)s[i];
size_t n = 1;
if ((c & 0x80) == 0x00) n = 1;
else if ((c & 0xE0) == 0xC0) n = 2;
else if ((c & 0xF0) == 0xE0) n = 3;
else if ((c & 0xF8) == 0xF0) n = 4;
else {
*valid = false;
return i;
}
if (i + n > max_bytes) {
return i; // valid prefix, but does not fit
}
for (size_t j = 1; j < n; j++) {
if (((uint8_t)s[i + j] & 0xC0) != 0x80) {
*valid = false;
return i;
}
}
i += n;
}
return i;
}
Then isValidName() / set name could reject invalid or too-long names explicitly, for example:
if (!isValidNameChars(name)) {
strcpy(reply, "Error, bad chars");
} else if (!isValidUtf8(name)) {
strcpy(reply, "Error, bad UTF-8");
} else if (utf7ByteLen(name) > maxAdvertNameBytes()) {
strcpy(reply, "Error, name too long");
} else {
StrHelper::strncpy(_prefs->node_name, name, sizeof(_prefs->node_name));
savePrefs();
strcpy(reply, "OK");
}
Even if validation is added at set name, AdvertDataBuilder::encodeTo() should still avoid emitting invalid UTF-8 as a defensive measure. That protects against names already stored from older firmware versions and against future call sites that bypass
CLI validation.
Workaround
Disable location in adverts:
gps advert none
or shorten the name so it fits in 23 bytes when location is included. For example:
Example RPT🔋🇵🇱
This is 23 bytes and should fit with location enabled.
Summary
Repeater node names containing multi-byte UTF-8 characters can be advertised with invalid UTF-8 when the advert also includes location data. The firmware currently accepts such names, but later truncates them by raw byte count while building advert
appdata. This can cut an emoji/Unicode code point in the middle.
This makes names with emoji flags fail or appear corrupted in clients.
Example
I tried to use this repeater name:
Example RPT 🔋🇵🇱UTF-8 byte length: 24 bytes.
When advertised with location enabled, the transmitted name payload becomes:
Example RPT 🔋🇵plus an incomplete UTF-8 sequence for the second regional indicator.Expected final bytes for
🇵🇱:F0 9F 87 B5 F0 9F 87 B1Observed advert tail:
F0 9F 87 B5 F0 9F 87The last byte
B1is missing, so the advert contains invalid UTF-8.Steps to Reproduce
Configure a repeater to include location in adverts, for example using the default/prefs location advert policy.
Set the repeater name to:
Example RPT 🔋🇵🇱Send or wait for a repeater advert.
Decode the advertised node name from appdata.
Expected Behavior
The firmware should either:
It should never emit invalid UTF-8 in the advertised node name.
Actual Behavior
The name passes current firmware validation. Later, while building advert appdata, it is truncated by byte count. If the byte limit lands inside a multi-byte UTF-8 sequence, the resulting advertised name is invalid UTF-8.
Current Validation
There is basic validation in
src/helpers/CommonCLI.cpp:This validation only rejects a small blacklist of ASCII characters:
[ ] \ : , ? *
It does not currently check:
The set name command then stores the name with StrHelper::strncpy():
StrHelper::strncpy(_prefs->node_name, &config[5], sizeof(_prefs->node_name));
That helper also copies raw bytes and can cut UTF-8 in the middle if the input exceeds the destination buffer.
Probable Cause
MAX_ADVERT_DATA_SIZE is 32 bytes.
When location is included, advert appdata uses:
That leaves only 23 bytes for the name, not 24.
In src/helpers/AdvertDataHelpers.cpp, AdvertDataBuilder::encodeTo() copies the name byte by byte:
while (*sp && i < MAX_ADVERT_DATA_SIZE) {
app_data[i++] = *sp++;
}
This does not respect UTF-8 character boundaries.
Documentation Note
The CLI docs currently say:
But with the current advert format, the effective advertised name limits seem to be:
So this looks like an off-by-one documentation issue as well.
Suggested Fix
The most robust fix would be to extend the existing isValidName() validation instead of only fixing advert encoding.
Since set name is the point where the user input is accepted, it should probably validate:
For example, with location in adverts enabled, the max advertised name length should be 23 bytes. Without location, it should be 31 bytes.
A helper could return both validity and the UTF-8-safe byte length, so set name can reject names that are too long before storing them.
The firmware should at least perform strict UTF-8 validation and UTF-8-safe truncation so it never emits invalid UTF-8. A more complete solution would be grapheme-cluster-aware truncation, but UTF-8 code point boundary truncation would already fix the invalid payload bug.
Pseudo-logic:
Then isValidName() / set name could reject invalid or too-long names explicitly, for example:
Even if validation is added at set name, AdvertDataBuilder::encodeTo() should still avoid emitting invalid UTF-8 as a defensive measure. That protects against names already stored from older firmware versions and against future call sites that bypass
CLI validation.
Workaround
Disable location in adverts:
gps advert none
or shorten the name so it fits in 23 bytes when location is included. For example:
Example RPT🔋🇵🇱
This is 23 bytes and should fit with location enabled.