Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of fields in Admin does not support non A-Z characters #1187

Closed
jacobwod opened this issue Sep 26, 2022 · 6 comments
Closed

Validation of fields in Admin does not support non A-Z characters #1187

jacobwod opened this issue Sep 26, 2022 · 6 comments
Assignees
Milestone

Comments

@jacobwod
Copy link
Member

jacobwod commented Sep 26, 2022

I'm extracting this comment into an issue of its own:


Just another small fix needed ( I have got hardcoded the value of these in Cyrillic and it works-I have setup my GeoServer Cyrillic aliases for the real field names):

  "searchFields": [
    "улица"
  ],

  "infobox": "",
  "aliasDict": "",
  "displayFields": [
    "улица"
  ],

  "shortDisplayFields": [
    "улица"
  ]

Otherwise, it is impossible to save Cyrillic values for the field names in Admin Panel:
image
This is a screenshot of the red colored inputs.

Originally posted by @bitmapbulgaria in #1181 (comment)

@jacobwod jacobwod added this to the 3.x milestone Sep 26, 2022
@jacobwod
Copy link
Member Author

The issue lies here:

valid = value.every((val) => /^\w+$/.test(val));

As per RegEx docs:

\w
matches any word character (equivalent to [a-zA-Z0-9_])

There are two solutions now:
a. either remove the check entirely or
b. go for something like [\wа-я]+.

The second option limits the check to latin and cyrillic only though and is not a good fix in my opinion. So I'll just remove the check, for now.

@bitmapbulgaria: Are you aware of a recommended way to check for word characters that would accept all non-latin alphabets too?

@jacobwod jacobwod self-assigned this Sep 26, 2022
@jacobwod jacobwod modified the milestones: 3.x, 3.11 Sep 26, 2022
@bitmapbulgaria
Copy link

After some testing, I suggest this regex will do the work, needed for this job:

/^[\p{L}]+[-\p{L}\p{N}-]+(\s+[-\p{L}\p{N}_-]+)*$/ug

I have made some tests, you can check them on the screenshot:
image

My idea is not to allow some special cases in the field names (as on the screenshot - non colored lines), but allow different languages, numbers, dashes, underscore and whitespaces between the words.
Hope this will help!

@jacobwod
Copy link
Member Author

Thanks, this is a really good solution and I'll go with that. It seems to work for most cases, the only noticable exception I found is Hebrew (I'm not sure why it would not match, as it \p{L} is supposed to match any letter from any language). I'm not sure how common it is to use Hebrew column names in database though, so let's consider it an edge case for now and deal the day it becomes a problem for someone.

Skärmavbild 2022-09-27 kl  08 45 17

@jacobwod
Copy link
Member Author

On a second though: something like this would match Hebrew characters too. But I'm not sure if how much it does in this case, as we allow pretty much any character except a leading space.

^[\p{L}]+[\p{L}\p{N}\p{M}\p{P}\p{C}]+(\s+[-\p{L}\p{N}_-]+)*$

What do you guys think: should we allow pretty much anything except starting/ending whitespace and some punctation marks (such as .?+ and perhaps some more that aren't allowed as DB column names)? I'm not sure how much control is needed vs how much responsibility should be put to the system admin.

@bitmapbulgaria
Copy link

My regex proposal was with "I can fill some wrong here accidently and after that will try to figure out, what's wrong" in mind. Anyway, put ^ * & ( ) | \ symbols in PostgreSQL and Geoserver fields names is not possible (but I'm not sure about other DB/gisservers :) )
What about just adding Hebrew \u0590-\u05fe range?
For me, \p{L}\p{N} approach is working great, allow me to experiment with "I don't understand nothing, except Bulgarian" Hajk version.

@jacobwod
Copy link
Member Author

I think we'll settle with
^[\p{L}\u0590-\u05fe]+[-\p{L}\p{N}\u0590-\u05fe-]+(\s+[-\p{L}\p{N}\u0590-\u05fe_-]+)*$
which I tested with a variety of alphabets (including Latin, Cyrillic, Arabic, Hebrew as well as various asian glyphs from East Asian countries). There are surely edge case but we'll extend to support them when detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants