Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "purity" filter #19

Closed
gaow opened this issue Jun 8, 2018 · 5 comments
Closed

Add "purity" filter #19

gaow opened this issue Jun 8, 2018 · 5 comments
Assignees

Comments

@gaow
Copy link
Member

gaow commented Jun 8, 2018

Currently I compute "purity" of sets separately based on LD and apply that to susie sets (too bad I coded it in Python). We should implement it as a separate function here. What should it be like? eg,

susie_get_sets(susie_in_CS(res), LD_mat = NULL, threshold = 0.2)

Or,

susie_get_sets(susie_in_CS(res), X = NULL, threshold = 0.2)

and we compute LD mat?

When LD mat is NULL, we just get the position of the variables for each set and report them as an R list of sets. With LD_mat filter we additionally compute minimal pairwise LD and remove the sets that fails the threshold?

@gaow gaow self-assigned this Jun 8, 2018
@stephens999
Copy link
Contributor

stephens999 commented Jun 8, 2018 via email

@gaow
Copy link
Member Author

gaow commented Jun 12, 2018

@stephens999 sorry I just cleaned up mostly my todo list on simulation so getting back to this one now. I think currently we use in_CS to refer to a binary vector of length p with 1 for a variable in CS. We did not provide clusters of the exact position of variables. That would be more like get_CS.

How about we then make in_CS as well as n_in_CS not exported function, and implement get_CS to return a list of actual CS? I guess in_CS is not that useful.

BTW n_in_CS can be trivially computed from get_CS or in_CS results so we still decide not to export it, right?

we could have a separate function purity(sets,LDmat) that does the work?

I agree it is cleaner to have a separate function for developers when we want to explore how "purity" works. But I think now that we have some idea about it through separate studies, then here from user's prospective having less functions is better. So I'm voting for:

susie_get_CS(res, X = NULL, R = NULL, threshold = 0.2)
  • Buy default we just give multiple CS by index of variables
  • If either X or R is provided we use LD filter. and maybe call it correlation or r2 -- better not call it LD_mat for context outside genetic fine-mapping. What's your pick? The default threshold 0.2 is chosen from our experience in genetic mapping studies.
  • If both X and R are provided, throw error.

Does it sound Okay?

@stephens999
Copy link
Contributor

I think correlation, X'X, is more natural to provide than squared correlation?

@stephens999
Copy link
Contributor

So maybe Xcor ?

@gaow
Copy link
Member Author

gaow commented Jun 12, 2018

Cool, Xcor it is!

@gaow gaow closed this as completed in 271bc14 Jun 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants