Skip to content

Conversation

@pfcoperez
Copy link

@pfcoperez pfcoperez commented Oct 28, 2025

Adds TO_ASCII string function to ESQL functions.

It escapes:

  • Unicode characters when they don't have direct correspondence with ASCII characters.
  • Java escape sequences such as \n, \t ...

Closes: #137282

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 28, 2025
@pfcoperez pfcoperez requested a review from nik9000 October 28, 2025 23:23
@pfcoperez pfcoperez self-assigned this Oct 28, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Oct 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@github-actions
Copy link
Contributor

github-actions bot commented Oct 28, 2025

@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@nik9000
Copy link
Member

nik9000 commented Oct 29, 2025

I looked at the docs for Postgresql and a few others and it looks like the function called ASCII returns an integer that is the ascii code for the first character.

I'm fine having a function that escapes non-ascii characters though. Is there a better name for it?

@pfcoperez
Copy link
Author

I looked at the docs for Postgresql and a few others and it looks like the function called ASCII returns an integer that is the ascii code for the first character.

Thank you @nik9000 !

I'm fine having a function that escapes non-ascii characters though. Is there a better name for it?

ESCAPE sounds simple enough but it assumes the target alphabet is ASCII when you could be escaping things like just quotes in JSON, etc.

Following the convention of other functions in the catalog, like TO_BASE64, what do you think about TO_ASCII ?

formatStr = "\\\\U%08x";
}

resultStr = Strings.format(formatStr, code);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe format needs Locale.ROOT on the front of it. Without it the build complains.

@nik9000
Copy link
Member

nik9000 commented Oct 29, 2025

TO_ASCII feels fine with me.

@@ -2796,3 +2748,14 @@ book_no:keyword | author_encoded:keyword | title_encoded:keyword
1463 | J.%20R.%20R.%20Tolkien | Realms%20of%20Tolkien%3A%20Images%20of%20Middle-earth
;

ascii
required_capability: ascii
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name the capability fn_to_ascii instead please?

required_capability: ascii
// tag::ascii[]
ROW a = "Hello\n\t 世界! 🌍 Café naïve résumé こんにちは 🎉 中文测试 αβγδε 日本語テスト 🚀🔥💧🪨" | EVAL x = ASCII(a) | KEEP x;
// end::ascii[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is going to be displayed in the docs it might be nicer to format it like:

ROW s = [
  "Hello world!\n\t",
  "世界!",
  "🌍",
  "Café naïve résumé",
  "こんにちは",
  " 🎉",
  "中文测试",
  "αβγδε",
  "日本語テスト"
  "🚀🔥💧🪨"
]
| EVAL s = TO_ASCII(s)

Otherwise it gets really wide to fit into the screen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to also have a csv-spec test for this that uses an index. Something like:

FROM airports
| WHERE abbrev = "BTS"
| EVAL ascii_name = TO_ASCII(name)
| KEEP name, ascii_name

Maybe even:

FROM airports
| WHERE TO_ASCII(abbrev) LIKE "%Mu\u(whatever ñ is)oz%"
| KEEP name

@pfcoperez pfcoperez changed the title ESQL: ASCII function ESQL: TO_ASCII function Oct 30, 2025
@ParametersFactory
public static Iterable<Object[]> parameters() {

List<TestCaseSupplier> cases = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common logic in these 11 test cases could be delegated to a private method, that only needs to be passed the input, the output and the test title. Would be nicer to read.

Also there is a way to add random tests for string functions, we should probably add those here as they test for more than just input/output. An example can be found in AbstractUrlEncodeDecodeTestCase.java#L78.

@FunctionInfo(
returnType = { "keyword" },
description = "Escape non ASCII characters.",
examples = @Example(file = "string", tag = "ascii"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 9.2.0 still the correct version?

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Nov 6, 2025
BASE=0b984b4cf53145085c87445d39a3a5c7bc37dde5
HEAD=be5beefa2a699bff708a14b10c2a58da9bc0ebb5
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Nov 7, 2025
BASE=0b984b4cf53145085c87445d39a3a5c7bc37dde5
HEAD=be5beefa2a699bff708a14b10c2a58da9bc0ebb5
Branch=main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: String functions ASCII

4 participants