-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle multi-select list column names that aren't letters or numbers #2
Comments
currently in progress on the branch dev201601 |
As well as the dev branch work, ideas relevant to this are written out in docs/plans/naming_strategy.md. A simple solution recently occurred to me, which is probably very common and obvious: replace invalid identifier characters with their UTF-8 hex character code. This is pretty much what OpenClinica does for translations with non-Latin-1 characters. So for example a hyphen "-" becomes the UTF-8 code "2d", as demonstrated by the following query. Latin-1 range characters translate to 2-character codes, and more elaborate character set codes take up to 5 characters. SELECT
to_hex(ascii('-')) AS hex_hyphen,
chr(('x' || lpad('2d', 8, '0'))::bit(32)::int) AS string_hyphen,
to_hex(ascii('😊')) AS hex_smiling_face_with_smiling_eyes,
chr(('x' || lpad('1F60A', 8, '0'))::bit(32)::int) AS string_smiling_face_with_smiling_eyes This substitution could be indicated with the customary "u" prefix used in many languages, but also wrapped in underscores to help identify it, since the full customary prefix uses a backslash ("\u") which is an invalid identifier character. Using this method, the set of item names for the choices in the original post would be:
Pros:
Cons:
|
Implemented the above hex code escape solution in 4a5f9eb. Will close this issue & bump version if it seems to have fixed the problem following feedback from:
|
Use case
A study has the following kind of multi-select code list which is valid in OpenClinica.
Problem
When trying to build datamart (present in 2015.004), the character cleaning function dm_clean_name_string will remove the minus symbol and produce duplicate column names: [item_1, item_1, item_2]. This causes the database build to fail.
Limitations
Postgres, and most clients (SAS, Stata, Access) have object naming requirements that typically don't allow non alphanumeric characters.
Possible Solution
The relevant code is likely to be in dm_create_study_itemgroup_matviews.sql.
Implement detection of invalid code list characters and fall back to another column naming strategy for that portion of the identifier for the entire study. This would be similar to, but should not conflict with, the item name length to item oid fall back currently in place (this multi-select handling should happen first).
An alternative column naming strategy could be to use the code list order number, since it is:
The text was updated successfully, but these errors were encountered: