-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: TO_ASCII function #137299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ESQL: TO_ASCII function #137299
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
|
I looked at the docs for Postgresql and a few others and it looks like the function called ASCII returns an integer that is the ascii code for the first character. I'm fine having a function that escapes non-ascii characters though. Is there a better name for it? |
Thank you @nik9000 !
Following the convention of other functions in the catalog, like |
...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/Ascii.java
Outdated
Show resolved
Hide resolved
...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/Ascii.java
Outdated
Show resolved
Hide resolved
...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/Ascii.java
Outdated
Show resolved
Hide resolved
| formatStr = "\\\\U%08x"; | ||
| } | ||
|
|
||
| resultStr = Strings.format(formatStr, code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe format needs Locale.ROOT on the front of it. Without it the build complains.
|
|
| @@ -2796,3 +2748,14 @@ book_no:keyword | author_encoded:keyword | title_encoded:keyword | |||
| 1463 | J.%20R.%20R.%20Tolkien | Realms%20of%20Tolkien%3A%20Images%20of%20Middle-earth | |||
| ; | |||
|
|
|||
| ascii | |||
| required_capability: ascii | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you name the capability fn_to_ascii instead please?
| required_capability: ascii | ||
| // tag::ascii[] | ||
| ROW a = "Hello\n\t 世界! 🌍 Café naïve résumé こんにちは 🎉 中文测试 αβγδε 日本語テスト 🚀🔥💧🪨" | EVAL x = ASCII(a) | KEEP x; | ||
| // end::ascii[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is going to be displayed in the docs it might be nicer to format it like:
ROW s = [
"Hello world!\n\t",
"世界!",
"🌍",
"Café naïve résumé",
"こんにちは",
" 🎉",
"中文测试",
"αβγδε",
"日本語テスト"
"🚀🔥💧🪨"
]
| EVAL s = TO_ASCII(s)
Otherwise it gets really wide to fit into the screen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to also have a csv-spec test for this that uses an index. Something like:
FROM airports
| WHERE abbrev = "BTS"
| EVAL ascii_name = TO_ASCII(name)
| KEEP name, ascii_name
Maybe even:
FROM airports
| WHERE TO_ASCII(abbrev) LIKE "%Mu\u(whatever ñ is)oz%"
| KEEP name
| @ParametersFactory | ||
| public static Iterable<Object[]> parameters() { | ||
|
|
||
| List<TestCaseSupplier> cases = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common logic in these 11 test cases could be delegated to a private method, that only needs to be passed the input, the output and the test title. Would be nicer to read.
Also there is a way to add random tests for string functions, we should probably add those here as they test for more than just input/output. An example can be found in AbstractUrlEncodeDecodeTestCase.java#L78.
| @FunctionInfo( | ||
| returnType = { "keyword" }, | ||
| description = "Escape non ASCII characters.", | ||
| examples = @Example(file = "string", tag = "ascii"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 9.2.0 still the correct version?
…ultivalued inputs
BASE=0b984b4cf53145085c87445d39a3a5c7bc37dde5 HEAD=be5beefa2a699bff708a14b10c2a58da9bc0ebb5 Branch=main
BASE=0b984b4cf53145085c87445d39a3a5c7bc37dde5 HEAD=be5beefa2a699bff708a14b10c2a58da9bc0ebb5 Branch=main
Adds
TO_ASCIIstring function to ESQL functions.It escapes:
\n,\t...Closes: #137282