Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable names with special characters not recognized during harmo_process #27

Closed
zchenmr opened this issue Oct 23, 2023 · 9 comments
Closed
Labels
enhancement New feature or request to test

Comments

@zchenmr
Copy link

zchenmr commented Oct 23, 2023

R CITF: Rmonize v1.0.0.9004, madshapR 1.0.2.9004, fabR 2.0.0

In the past, it was possible to run harmo_process() using variables with non-standard names (ex: containing spaces, hyphens) as long as the variable name was surrounded by backticks. However, this now leads to an error. Is there any way to run harmo_process() while keeping the original variable names or would it be recommended to change the names prior to harmonization instead?

image

image

image

@zchenmr
Copy link
Author

zchenmr commented Oct 24, 2023

Actually, I'm not sure if this issue is being caused by the backticks - I got the same error when running harmo_process() on R CanPath with variables that have standard naming. With the exact same harmonization inputs (DataSchema, harmo rules, dataset), harmonizR::harmo_process() runs without any errors but Rmonize::harmo_process() leads to errors:

image

@GuiFabre
Copy link
Contributor

Good issue here. The reason was because (originally) the dataset names must be opal-compatible. As time goes by, this restriction is less and less mandatory (the functions have been changed accordingly.

I will try to find a solution for that.

@GuiFabre GuiFabre added the enhancement New feature or request label Oct 25, 2023
GuiFabre added a commit to maelstrom-research/madshapR that referenced this issue Oct 26, 2023
get rid of overwrite parameter in dataset_visualize()

maelstrom-research/Rmonize#27

allow special characters in datasets
@GuiFabre
Copy link
Contributor

@zchenmr
done to test

@zchenmr
Copy link
Author

zchenmr commented Oct 26, 2023

R CITF/CanPath: Rmonize v1.0.0.9005, madshapR v1.0.2.9007, fabR 2.0.0

One of the errors (for dis_covid_sr_otsp) no longer appears, but there are still errors for the other two:
image

image

@GuiFabre GuiFabre pinned this issue Oct 26, 2023
@GuiFabre
Copy link
Contributor

Hi @zchenmr, and thank you for your comment.
unfortunately (or fortunately hehe), it does not come from the package, but from your data processing elements

image

The function harmo_process (to spare some useless steps) creates a subset of your dataset with variables only used in the DPE.
The declaration of them is in input_variables . You forgot to put C3_PM_WEIGHT_CA in the list.

Today there is no other way to understand the error from what is provided in show_harmo_error(), but that'll be in the future with the function data_proc_elem_evaluate. Hopefully...

For the other one, I will keep investingating.

@GuiFabre
Copy link
Contributor

Should be ok now !

for your information : in your DPE, in the column input_variables, each variable is actually the name of the variable, so (technically) you must not add the backtick ` around your variable. (the same if for the column name in the data dictionary).

image

While in the column Mlstr_harmo::algorithm, each variable used is the column itself. That is why you need the backtick ` around them. That is why COVID-self_report generated the error, it was double backticked internally.
It does not through an error anymore, since, to the user point of you, the might know the difference between the name of a variable and a variable itself.

GuiFabre added a commit that referenced this issue Oct 27, 2023
@zchenmr
Copy link
Author

zchenmr commented Oct 27, 2023

When the backtick is removed, recode now works but direct_mapping doesn't - it still requires the backtick around the variable in "input_variables".

image

Also, thanks for the tip about the missing variable names for the other error! That issue has been resolved.

@GuiFabre
Copy link
Contributor

hoping this time it is working ;)

@zchenmr
Copy link
Author

zchenmr commented Oct 27, 2023

Yes, direct_mapping is working now. Thanks a lot!

@GuiFabre GuiFabre unpinned this issue Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request to test
Projects
None yet
Development

No branches or pull requests

2 participants