Repeater advert truncates UTF-8 node names in the middle of a character when location is included

## Summary

Repeater node names containing multi-byte UTF-8 characters can be advertised with invalid UTF-8 when the advert also includes location data. The firmware currently accepts such names, but later truncates them by raw byte count while building advert
appdata. This can cut an emoji/Unicode code point in the middle.

This makes names with emoji flags fail or appear corrupted in clients.

## Example

I tried to use this repeater name:

`Example RPT 🔋🇵🇱`

UTF-8 byte length: 24 bytes.

When advertised with location enabled, the transmitted name payload becomes:

`Example RPT 🔋🇵` plus an incomplete UTF-8 sequence for the second regional indicator.

Expected final bytes for `🇵🇱`:

`F0 9F 87 B5 F0 9F 87 B1`

Observed advert tail:

`F0 9F 87 B5 F0 9F 87`

The last byte `B1` is missing, so the advert contains invalid UTF-8.

## Steps to Reproduce

1. Configure a repeater to include location in adverts, for example using the default/prefs location advert policy.
2. Set the repeater name to:

   `Example RPT 🔋🇵🇱`

3. Send or wait for a repeater advert.
4. Decode the advertised node name from appdata.

## Expected Behavior

The firmware should either:

- accept and advertise the full valid name if it fits, or
- reject the name at configuration time with a clear error if it does not fit the active advert constraints, or
- truncate it only at a valid UTF-8 code point boundary.

It should never emit invalid UTF-8 in the advertised node name.

## Actual Behavior

The name passes current firmware validation. Later, while building advert appdata, it is truncated by byte count. If the byte limit lands inside a multi-byte UTF-8 sequence, the resulting advertised name is invalid UTF-8.

## Current Validation

There is basic validation in `src/helpers/CommonCLI.cpp`:

```cpp
static bool isValidName(const char *n) {
  while (*n) {
    if (*n == '[' || *n == ']' || *n == '\\' || *n == ':' || *n == ',' || *n == '?' || *n == '*') return false;
    n++;
  }
  return true;
}
```

This validation only rejects a small blacklist of ASCII characters:

[ ] \ : , ? *

It does not currently check:

- whether the input is valid UTF-8
- whether the name fits in the persistent node_name[32] buffer without cutting UTF-8
- whether the name fits in advert appdata with the currently selected advert location policy
- whether truncation would happen at a valid UTF-8 boundary

The set name command then stores the name with StrHelper::strncpy():

StrHelper::strncpy(_prefs->node_name, &config[5], sizeof(_prefs->node_name));

That helper also copies raw bytes and can cut UTF-8 in the middle if the input exceeds the destination buffer.

## Probable Cause

MAX_ADVERT_DATA_SIZE is 32 bytes.

When location is included, advert appdata uses:

- 1 byte for flags
- 4 bytes latitude
- 4 bytes longitude

That leaves only 23 bytes for the name, not 24.

In src/helpers/AdvertDataHelpers.cpp, AdvertDataBuilder::encodeTo() copies the name byte by byte:

while (*sp && i < MAX_ADVERT_DATA_SIZE) {
  app_data[i++] = *sp++;
}

This does not respect UTF-8 character boundaries.

## Documentation Note

The CLI docs currently say:

> If a location is set, the max length is 24 bytes; 32 otherwise.

But with the current advert format, the effective advertised name limits seem to be:

- 23 bytes with location: 32 - 1 flags - 8 lat/lon
- 31 bytes without location: 32 - 1 flags

So this looks like an off-by-one documentation issue as well.

## Suggested Fix

The most robust fix would be to extend the existing isValidName() validation instead of only fixing advert encoding.

Since set name is the point where the user input is accepted, it should probably validate:

1. the existing forbidden ASCII characters
2. valid UTF-8 structure
3. persistent storage byte limit, currently sizeof(_prefs->node_name) - 1
4. effective advert name byte limit for the current advert location policy

For example, with location in adverts enabled, the max advertised name length should be 23 bytes. Without location, it should be 31 bytes.

A helper could return both validity and the UTF-8-safe byte length, so set name can reject names that are too long before storing them.

The firmware should at least perform strict UTF-8 validation and UTF-8-safe truncation so it never emits invalid UTF-8. A more complete solution would be grapheme-cluster-aware truncation, but UTF-8 code point boundary truncation would already fix the invalid payload bug.

Pseudo-logic:

```cpp
static size_t utf8ValidPrefixLen(const char* s, size_t max_bytes, bool* valid) {
  size_t i = 0;

  *valid = true;

  while (s[i]) {
    uint8_t c = (uint8_t)s[i];
    size_t n = 1;

    if ((c & 0x80) == 0x00) n = 1;
    else if ((c & 0xE0) == 0xC0) n = 2;
    else if ((c & 0xF0) == 0xE0) n = 3;
    else if ((c & 0xF8) == 0xF0) n = 4;
    else {
      *valid = false;
      return i;
    }

    if (i + n > max_bytes) {
      return i; // valid prefix, but does not fit
    }

    for (size_t j = 1; j < n; j++) {
      if (((uint8_t)s[i + j] & 0xC0) != 0x80) {
        *valid = false;
        return i;
      }
    }

    i += n;
  }

  return i;
}
```

Then isValidName() / set name could reject invalid or too-long names explicitly, for example:

```cpp
if (!isValidNameChars(name)) {
  strcpy(reply, "Error, bad chars");
} else if (!isValidUtf8(name)) {
  strcpy(reply, "Error, bad UTF-8");
} else if (utf7ByteLen(name) > maxAdvertNameBytes()) {
  strcpy(reply, "Error, name too long");
} else {
  StrHelper::strncpy(_prefs->node_name, name, sizeof(_prefs->node_name));
  savePrefs();
  strcpy(reply, "OK");
}
```

Even if validation is added at set name, AdvertDataBuilder::encodeTo() should still avoid emitting invalid UTF-8 as a defensive measure. That protects against names already stored from older firmware versions and against future call sites that bypass
CLI validation.

## Workaround

Disable location in adverts:

gps advert none

or shorten the name so it fits in 23 bytes when location is included. For example:

Example RPT🔋🇵🇱

This is 23 bytes and should fit with location enabled.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repeater advert truncates UTF-8 node names in the middle of a character when location is included #2613

Summary

Example

Steps to Reproduce

Expected Behavior

Actual Behavior

Current Validation

Probable Cause

Documentation Note

Suggested Fix

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Repeater advert truncates UTF-8 node names in the middle of a character when location is included #2613

Description

Summary

Example

Steps to Reproduce

Expected Behavior

Actual Behavior

Current Validation

Probable Cause

Documentation Note

Suggested Fix

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions