Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow spaces in id column name #25

Closed
salmasian opened this issue Dec 5, 2019 · 5 comments
Closed

Allow spaces in id column name #25

salmasian opened this issue Dec 5, 2019 · 5 comments

Comments

@salmasian
Copy link
Contributor

The way #20 was fixed does not allow for the id column's name to contain spaces.

dt <- data.frame(
  `Enc ID` = 1234,
  DxCode = 'N390'
)

comorbidity(dt, id = 'Enc ID', code = 'DxCode', icd = 'icd10', score = 'charlson', assign0 = F)

What you get is an error message:

Error in comorbidity(dt, id = "Enc ID", code = "DxCode", icd = "icd10",  : 
  1 assertions failed:
 * Variable 'id': Must be a subset of {'Enc.ID','DxCode'}, but is {'Enc ID'}.
@ellessenne
Copy link
Owner

Hi, Enc ID is not really a valid name for a data.frame (although it can be used with backticks), hence the error.
I am not sure this should be fixed, might be worth adding a note to the documentation though?

@salmasian
Copy link
Contributor Author

Well, technically, R does allow variable names with spaces :) but I agree with you that this may not be worth "fixing" here as much as it is worth updating the documentation.

@ellessenne
Copy link
Owner

ellessenne commented Dec 13, 2019

The issue with that is that data.frame is called by default with check.names = TRUE, which calls the make.names function to make syntactically valid names:

data.frame(
  `Enc ID` = 1234,
  DxCode = 'N390')
#>   Enc.ID DxCode
#> 1   1234   N390
data.frame(
  `Enc ID` = 1234,
  DxCode = 'N390',
  check.names = FALSE)
#>   Enc ID DxCode
#> 1   1234   N390

Created on 2019-12-13 by the reprex package (v0.3.0)

The error you see is that the name of the input data.frame is not matching the name passed to comorbidity.
However:

dt <- data.frame(
  `Enc ID` = 1234,
  DxCode = 'N390',
  check.names = FALSE)
comorbidity(dt, id = 'Enc ID', code = 'DxCode', icd = 'icd10', score = 'charlson', assign0 = F)
#> Error in parse(text = x, keep.source = FALSE): <text>:1:5: unexpected symbol
#> 1: Enc ID
#>         ^

Created on 2019-12-13 by the reprex package (v0.3.0)

This still does not work.

I now call the make.names function on both input dataset and column names to force them to be syntactically valid. The result is the following:

dt <- data.frame(
  `Enc ID` = 1234,
  DxCode = 'N390'
)
comorbidity(dt, id = 'Enc ID', code = 'DxCode', icd = 'icd10', score = 'charlson', assign0 = F)
#> Warning: The input 'id' string has been modified by make.names(). See ?
#> make.names() for more details.
#>   Enc.ID ami chf pvd cevd dementia copd rheumd pud mld diab diabwc hp rend canc
#> 1   1234   0   0   0    0        0    0      0   0   0    0      0  0    0    0
#>   msld metacanc aids score index wscore windex
#> 1    0        0    0     0     0      0      0

Created on 2019-12-13 by the reprex package (v0.3.0)

It now works!

As you can see, when names are changed a warning is triggered:

#> Warning: The input 'id' string has been modified by make.names(). See ?
#> make.names() for more details.

What do you think about all of the above @salmasian?

@salmasian
Copy link
Contributor Author

This is similar to the normal R behavior (i.e. in other places, R also cleans up variable names) so I think it is is a worthwhile effort and I support that. The warning should not be suppressed.

@ellessenne
Copy link
Owner

Sounds good to me - I am actually the one triggering the warning 😃

I am closing this, and will submit to CRAN by the end of the year. Thanks again @salmasian!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants