feat(search): support searchNormalize across non-Latin characters and intra-word punctuation#466
feat(search): support searchNormalize across non-Latin characters and intra-word punctuation#466
Conversation
Replace the previous NON_WORD_REGEX with a COMBINING_MARKS_REGEX (\u0300-\u036f) so normalizeString only removes Unicode combining diacritical marks after NFD normalization. This preserves valid characters (letters, digits, punctuation) instead of stripping all non-word characters.
Replace the normalization regex to strip Unicode combining marks (\u0300-\u036f) so searchNormalize correctly handles Greek and Cyrillic diacritics (e.g. Ένα, ё, й). Update the minified build accordingly. Add example initializations for Greek and Cyrillic selects in docs/assets/script.js and add Cypress E2E tests (cypress/e2e/examples.cy.ts) that verify search behavior with searchNormalize true/false and a regression check for Latin diacritics. Also update docs/examples.md to reflect the new examples.
Regenerate distribution artifacts: update dist/virtual-select.js, dist/virtual-select.min.js, and dist-archive/virtual-select-1.1.5.min.js. This updates the built/minified output to include recent changes from the source (no source code logic changes in this commit).
Bump multiple dev dependencies (Babel toolchain, babel-loader, css-loader, autoprefixer, cypress, cypress-real-events, sass, sass-loader, stylelint, webpack, webpack-cli, filemanager-webpack-plugin, postcss-loader, ts-api-utils/TypeScript, etc.). package-lock.json regenerated to lock the updated versions.
There was a problem hiding this comment.
Pull request overview
This PR updates the search normalization logic to support Greek and Cyrillic text when searchNormalize: true, and adds docs/examples + Cypress coverage to validate the behavior.
Changes:
- Update
normalizeString()to strip Unicode combining marks after NFD normalization (instead of stripping non-ASCII “non-word” chars). - Add documentation examples for Greek/Cyrillic normalization and wire them into the docs demo script.
- Add Cypress E2E coverage for Greek/Cyrillic normalization and Latin-diacritics regression checks.
Reviewed changes
Copilot reviewed 8 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/utils.js | Updates the normalization regex used by searchNormalize. |
| package.json | Updates devDependencies (build/test tooling) and adds ts-api-utils. |
| docs/examples.md | Adds a new Greek/Cyrillic “searchNormalize” example section. |
| docs/assets/virtual-select.js | Updates built docs asset to reflect new normalization logic. |
| docs/assets/script.js | Initializes new Greek/Cyrillic example selects in the docs demo page. |
| dist/virtual-select.js | Updates distributed (unminified) build with new normalization logic. |
| dist/virtual-select.min.js | Updates distributed minified build with new normalization logic. |
| cypress/e2e/examples.cy.ts | Adds E2E tests for Greek/Cyrillic searchNormalize and Latin regression. |
| .github/PULL_REQUEST_TEMPLATE.md | Adds a PR template for future contributions. |
| .claude/settings.local.json | Adds Claude tooling permissions config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add two examples to docs/examples.md demonstrating VirtualSelect configured with searchNormalize: false for Greek and Cyrillic option sets. These examples show search enabled with option descriptions while preserving original character forms, complementing the existing normalized-search examples.
Replace the explicit range /[\u0300-\u036f]/g with the Unicode property escape /\p{M}/gu to strip combining marks. This broadens matching to all Unicode combining marks (not just U+0300–U+036F) while preserving NFD normalization. Note: requires RegExp Unicode property escape support (ES2018+).
Regenerate built/minified bundles for Virtual Select. Updated dist/virtual-select.js, dist/virtual-select.min.js, dist-archive/virtual-select-1.1.5.min.js and the corresponding docs/assets copies so the committed distribution and documentation assets are in sync with the latest build.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Move COMBINING_MARKS_REGEX out of Utils.normalizeString and declare it at the top of src/utils/utils.js so the regex isn't recreated on each call. No functional change; normalizeString now uses the shared, precompiled constant for better clarity and minor performance improvement.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace focused Greek/Cyrillic test suite with a broader multi-language search normalize suite in cypress/e2e/examples.cy.ts (renamed IDs/sections and added many language cases and negative tests). Rebuild/minify output and documentation assets were updated accordingly: dist/, dist-archive/ and docs/assets/* and docs/examples.md reflect the changes. These updates expand coverage for search normalization behavior and sync compiled artifacts and docs with the new test/content changes.
Add multi-language search demos and end-to-end tests to cover search normalization behavior across different scripts. Changes include: - cypress/e2e/examples.cy.ts: Add Cypress tests for multi-language variants (tags and popup) with searchNormalize true/false, validating matches for diacritics, Cyrillic, and CJK inputs and tag behavior. - docs/assets/script.js: Initialize new VirtualSelect instances for the added demo elements (#multi-language-tags-search-select, #multi-language-tags-search-no-normalize-select, #multi-language-popup-search-select, #multi-language-popup-search-no-normalize-select). - docs/examples.md: Add documentation and example initialization snippets for the new multi-language tags and popup demos, and move/update the note about Thai and Japanese combining marks. These additions ensure consistent behavior is demonstrated and tested for diacritic-insensitive vs exact matching across multiple scripts.
Modify src/utils/utils.js (utility functions updated) and regenerate distribution and documentation bundles. Updated files include dist/virtual-select.js, dist/virtual-select.min.js, dist-archive/virtual-select-1.1.5.min.js, docs/assets/virtual-select.js, and docs/assets/virtual-select.min.js to incorporate the utils changes and produce updated minified/non-minified artifacts.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 13 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 13 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add NON_WORD_CHARS_REGEX (/[^\p{L}\p{N}_]/gu) at module scope and chain
it after combining-mark stripping in Utils.normalizeString. Restores the
punctuation-/whitespace-insensitive matching the original /[^\w]/g
provided for ASCII (e.g. "co-op" matches "coop") in a Unicode-aware way
while keeping the non-Latin script fix.
- Expand JSDoc to document the full contract, including the limitation
that scripts relying on combining marks (Thai vowel signs, Devanagari
matras, hiragana/katakana voicing) are not fully preserved.
- Add Cypress regression specs covering co-op/coop and e-mail/email
matching under searchNormalize: true, plus a negative spec under
searchNormalize: false to lock in the symmetric behavior.
- Add co-op and e-mail entries to the multi-language demo dataset so the
behavior is exercised in the live docs as well.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 13 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "@babel/core": "^7.29.0", | ||
| "@babel/preset-env": "^7.29.2", | ||
| "autoprefixer": "^10.5.0", | ||
| "babel-loader": "^10.1.1", | ||
| "css-loader": "^7.1.4", | ||
| "cypress": "^15.14.1", | ||
| "cypress-real-events": "^1.15.0", | ||
| "docsify-cli": "^4.4.4", | ||
| "eslint": "^8.57.1", | ||
| "eslint-config-airbnb-base": "^15.0.0", | ||
| "eslint-import-resolver-webpack": "^0.13.10", | ||
| "eslint-plugin-import": "^2.32.0", | ||
| "eslint-plugin-sonarjs": "^3.0.4", | ||
| "filemanager-webpack-plugin": "^9.0.1", | ||
| "filemanager-webpack-plugin": "^10.0.1", | ||
| "mini-css-extract-plugin": "^2.9.4", | ||
| "popover-plugin": "^1.0.13", | ||
| "postcss-loader": "^8.1.1", | ||
| "sass": "^1.90.0", | ||
| "sass-loader": "^16.0.5", | ||
| "stylelint": "^16.23.1", | ||
| "postcss-loader": "^8.2.1", | ||
| "sass": "^1.99.0", | ||
| "sass-loader": "^16.0.7", | ||
| "stylelint": "^16.26.1", | ||
| "stylelint-config-sass-guidelines": "^12.1.0", | ||
| "typescript": "^5.9.2", | ||
| "ts-api-utils": "^2.5.0", | ||
| "typescript": "^5.9.3", | ||
| "unminified-webpack-plugin": "^3.0.0", | ||
| "webpack": "^5.101.3", | ||
| "webpack-cli": "^6.0.1" | ||
| "webpack": "^5.106.2", | ||
| "webpack-cli": "^7.0.2" |
Restores docs/assets/external/vue.css to the master version. The single-line blockquote font-weight delta (600 -> 400) was unrelated to the search-normalization work and was flagged in PR review.
The Multi-language search normalize section in docs/examples.md only described diacritic stripping. Updated to call out that punctuation and whitespace are also stripped under searchNormalize: true, with explicit examples (co-op -> coop, Foo Bar -> FooBar) and a note that users who need exact word-boundary or punctuation matching should keep searchNormalize: false.
NON_WORD_CHARS_REGEX (/[^\p{L}\p{N}_]/gu) already removes everything
that COMBINING_MARKS_REGEX (/\p{M}/gu) matched, since combining marks
are not letters, numbers, or underscore. Collapses the two .replace()
passes into one and updates the JSDoc to describe the actual behavior.
No functional change to search results.
Without an explicit target, @babel/preset-env transpiled \p{L}, \p{N},
and \p{M} regex property escapes into multi-kilobyte expanded codepoint
ranges, bloating the production bundle. Adding a browserslist that
excludes IE11 and dead browsers tells preset-env to keep these
constructs native, since every targeted browser supports them.
The lockfile is committed (and must be, for npm ci and reproducible installs), so keeping it listed in .gitignore was contradictory and hid lockfile drift from git status. Removing the entry; the file remains tracked.
Picks up the simplified normalizeString and the new browserslist
target. Net effect on the production bundle:
- dist/virtual-select.min.js: 112 KB -> 82 KB (-26%)
- dist/virtual-select.js: 193 KB -> 142 KB (-26%)
Both reductions come from preset-env keeping \p{L}/\p{N} regex
property escapes native instead of expanding them to long
codepoint ranges.
Adds Cypress coverage for behaviors that were documented but not
asserted:
- Whitespace folding: "FooBar" matches "Foo Bar"; "VietNam" matches
"Việt Nam".
- Symmetric punctuation: search containing punctuation matches a label
without it ("walk-through" finds "walkthrough").
- Numbers preserved (\p{N}): "Mars2024" and "Mars 2024" both find
"Mars-2024".
- Leading/trailing whitespace in the search input still resolves to
the right option (" creme " finds "Crème brûlée").
- Pure-punctuation search ("!@#") normalizes to "" and currently
matches every label; documenting this so any future fix is
intentional.
Adds the necessary test data entries to multiLanguageOptions in
docs/assets/script.js.
| "not ie 11", | ||
| "not op_mini all", | ||
| "not dead" | ||
| ], |
Issue number: resolves #279
What is the current behavior?
normalizeStringutility used the regex/[^\w]/gto strip non-word characters after NFD decomposition.\wcharacter class in JavaScript only matches[a-zA-Z0-9_], so every non-Latin script (Greek, Cyrillic, Vietnamese, Chinese, Japanese, Korean, Arabic, Thai, …) was treated as non-word and stripped entirely during normalization. This madesearchNormalize: truecompletely broken for non-Latin scripts — labels became empty strings, so nothing could be matched.What is the new behavior?
normalizeStringnow performs a Unicode-aware two-pass strip after NFD decomposition:/\p{M}/gu— strips Unicode combining marks (category M). Removes diacritics across all scripts while preserving the underlying letters/ideographs./[^\p{L}\p{N}_]/gu— strips characters that are not Letters, Numbers, or underscore. Restores the punctuation- and whitespace-insensitive behavior the original/[^\w]/gprovided for ASCII content (e.g.co-opstill matchescoop), in a script-aware way.Both regexes are defined at module scope so they are compiled once instead of per call.
Language coverage
searchNormalize: truenow works correctly for a single dropdown containing options across many writing systems:Crème brûlée,Niñocreme,ninoMünchen,Mädchen,KölnMunchen,Madchen,KolnßGrößeGrosseßis an atomic letter (no NFD decomposition)åÅlesundAlesundø,æBjørn,TromsøBjorn,Tromsoøandæare atomic lettersGöteborg,MalmöGoteborg,MalmoJyväskylä,HämeenlinnaJyvaskyla,HameenlinnaΈναΕναЁжик,ЙогуртЕжик,ИогуртViệt Nam,Hà NộiViet Nam,Ha Noiمُرَحَّباًمرحبا서울,한국어北京,你好東京,カタカナco-op,e-mailcoop,emailPerformance
COMBINING_MARKS_REGEXandNON_WORD_CHARS_REGEXare defined at module scope instead of being re-created insidenormalizeString()on every call./\p{M}/guinto a ~2 KB character-class regex. Re-compiling that pattern on each keystroke during search was unnecessarily expensive..replace()pass is O(n) on already-short strings and adds no measurable overhead in real workloads.Documentation and examples
searchNormalize: trueandsearchNormalize: falsevariants for direct comparison.showValueAsTags) and popup variant (popupDropboxBreakpoint) sub-sections under the same multi-language data set, each with bothsearchNormalize: trueandfalsedropdowns.co-op,e-mail) so the regression behavior is visible in the live demo.docs/examples.md,docs/assets/script.js, and the table of contents.Utils.normalizeStringto call out the limitation that combining-mark-dependent scripts (Thai, Devanagari, hiragana voicing) are NOT fully preserved by the normalization pipeline.Tests
Added Cypress describe blocks against the unified multi-language dropdowns:
Multi-language search with searchNormalize: true(~25 specs) covers all listed scripts including positive cases (e.g.Munchen→München,Goteborg→Göteborg,Jyvaskyla→Jyväskylä,Ежик→Ёжик,Viet Nam→Việt Nam,مرحبا→مُرَحَّباً) and the documented atomic-letter limitations (Grossedoes NOT matchGröße;Bjorndoes NOT matchBjørn).Multi-language search with searchNormalize: falseverifies exact matches succeed (Greek/Cyrillic/Chinese/Japanese/Korean exact text) and that accent-stripped queries correctly find no options across all scripts.coop→co-opandemail→e-mailundersearchNormalize: true; a negative spec verifiescoopfinds nothing undersearchNormalize: false. These guard the second.replace()pass against silent regressions.Multi-language tags variant with searchNormalize: true/false— exercise multi-select withshowValueAsTags, including diacritic-insensitive search, tag rendering, and tag removal.Multi-language popup variant with searchNormalize: true/false— exercise popup mode (popupDropboxBreakpoint) with the same multi-language data.brulee→brûlée,cafe→café,nino→niño) preserved.Does this introduce a breaking change?
Behavior for ASCII inputs is preserved vs. the original
/[^\w]/gimplementation (punctuation- and whitespace-insensitive search). The fix expands correctness to non-Latin scripts; it does not narrow any previously working case.Validations
Ran regression scenarios in the documentation using the branch - ✅
Run automated tests - ✅