Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formulaic not raising an exception when required fields are missing in the dataset #157

Closed
hguturu opened this issue Sep 29, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@hguturu
Copy link

hguturu commented Sep 29, 2023

I am trying to make a design matrix from a master matrix of parameters.

all_phenotypes = pd.DataFrame({ "(AltGrp)": [1, 0, 0, 1, 0, 1], "BinGrp": [0, 0, 0, 1, 1, 1], "ContGrp" : [1,2,3,4,5,6]})

design = formulaic.model_matrix(["(AltGrp) + BinGrp"], all_phenotypes)

yields

   BinGrp
0       0
1       0
2       0
3       1
4       1
5       1

I assume this is due to the () in (AltGrp). I was curious if there are other special characters that should be excluded since this fails silently so I want to avoid passing in the wrong matrix in the future.

@matthewwardrop
Copy link
Owner

Hi @hguturu ,

Parentheses if formulae have special meaning (they are grouping order-of-operation operators). You can refere to the formula grammar docs for more info. You'll also find there how to quote special characters that should be included in field names; for example:

In [12]: all_phenotypes = pd.DataFrame({ "(AltGrp)": [1, 0, 0, 1, 0, 1], "BinGrp": [0, 0, 0, 1,
    ...: 1, 1], "ContGrp" : [1,2,3,4,5,6]})
    ...:
    ...: design = formulaic.model_matrix(["`(AltGrp)` + BinGrp"], all_phenotypes)

In [13]: design
Out[13]:
   (AltGrp)  BinGrp
0         1       0
1         0       0
2         0       0
3         1       1
4         0       1
5         1       1

However, there is a bug here... AltGrp is not found in the data sets, but is not throwing an exception. This is a regression, and so I'll make sure it gets fixed.

@matthewwardrop matthewwardrop changed the title What are allowable column names for phenotype matrix? Formulaic not raising an exception when required fields are missing in the dataset Oct 4, 2023
@matthewwardrop matthewwardrop added the bug Something isn't working label Oct 4, 2023
@matthewwardrop matthewwardrop self-assigned this Oct 4, 2023
@matthewwardrop
Copy link
Owner

Ah... I see you opened an issue about this separately anyway (#159 ). Closing this one in favour of that.

@matthewwardrop matthewwardrop closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants