Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Issue #148 #220

Closed
wants to merge 3 commits into from
Closed

Fix for Issue #148 #220

wants to merge 3 commits into from

Conversation

Max-Bladen
Copy link
Collaborator

@Max-Bladen Max-Bladen commented Jun 21, 2022

Users requested some time ago to include a feature within our sparse methods which allows for them to pass in a specific set of variables to be guaranteed to be included in the model. Relevant user facing functions (spca, (mint).(block).spls(da)) all now take the retain.feats parameter to allow this funcitonality.

Check.retain.feats() handles checks and pre-processing of the input retain.feats. This parameter influences the regularisation that occurs within soft_threshold_L1() called by sparsity(). Along with select_feature, all features included as part of retain.feats do NOT have their loadings reduced to 0.

feat: introduces the `retain.feats` parameter to (mint).(block).spls(da). Allows for some features to be specifically retained in model independent of keepX/Y
@Max-Bladen Max-Bladen added the enhancement-request New feature or request label Jun 21, 2022
@Max-Bladen Max-Bladen self-assigned this Jun 21, 2022
@Max-Bladen Max-Bladen linked an issue Jun 21, 2022 that may be closed by this pull request
feat: introduced `retain.feats` functionality to `spca()`. additionally, fixed minor two issues causing incorrect AVE and weights in `spls(da)` functions as well as missing default values
doc: updated documentation for `spca`
@Max-Bladen
Copy link
Collaborator Author

Hi @aljabadi,

I need a bit of advice for this one. I've successfully implemented a way to retain user specified features in sparse functions via the new retain.feats parameter. It can take either indices or names of the desired features to retain and is implemented for spca, spls, splsda, mint.spls, mint.splsda, block.spls and block.splsda.

I'll refer to features selected by the sparse method (via keepX) as "selected" features. Those specified by the user (via retain.feats) will be referred to as "retained".

Here's the problem: the loading values of these retained features is severely overestimated. This is due to the selected and retained having their loadings reduced by max(abs.a[!select_feature]) (look here). Using the below code as an example:

data(liver.toxicity)
X <- liver.toxicity$gene[, 1:200]
Y <- liver.toxicity$clinic

colnames(X)[6:10] <- c("A", "B", "C", "D", "E")
colnames(Y)[8:10] <- c("A", "B", "C")

retain.feats <- list(X=6:10, 
                     Y=8:10)

spls.obj <- spls(X, Y, keepX = c(6,1), keepY = c(6,1), retain.feats = retain.feats)

So, the features to retain are named A -> E so they can be easily identified. In a given iteration, the loading values of the selected and retained features are as follows:

           A            B            C            D            E 
    37.56952     13.64893     20.92075      3.42440     20.26891     

    A_42_P649672  A_43_P11568  A_43_P21626 A_42_P454114 A_42_P683537  A_43_P21372 
    63.32374      68.51839     66.69514    66.56536     67.45634      66.31351

Unsurprisingly, the selected features (starting with A_4*) have the maximal absolute loadings, while the retained features have a range of values. These are then each scaled down by the maximal loading of all non-selected features (max(abs.a[!select_feature]) = 62.85361). Now the values look like:

           A            B            C            D            E 
 -25.2840898  -49.2046839  -41.9328549  -59.4292085  -42.5846995    

    A_42_P649672  A_43_P11568  A_43_P21626  A_42_P454114 A_42_P683537  A_43_P21372 
    0.4701328     5.6647843    3.8415326    3.7117548    4.6027321     3.4599011

Now if we look at the final loading values you can see the overestimation of the retained features (particularly on the X dataframe).

plotLoadings(spls.obj)

Created on 2022-06-22 by the reprex package (v2.0.1)

Mathematically, these values are correct but the scaling results in their inflation. I played around with removing the - max(abs.a[!select_feature]) but this obviously resulted in different values for other components of the method (eg. weights and AVE).

My primary question is, seeing as the loading values are scaled down via the L2 norm (sqrt(crossprod(loadings)) - look here) is the - max(abs.a[!select_feature]) necessary? If not, then the removing this subtraction resolves the loading inflation issue, like so:

Created on 2022-06-22 by the reprex package (v2.0.1)

Seeing as the loading values (as well as other components, eg: AVE) change to differing degrees when we remove this subtraction, I'll assume that this subtraction is necessary. Hence, where should I go from here? Is there a better way to implement the retain.feats functionality without this issue? Should I just forget about this feature?

Cheers

@mixOmicsTeam
Copy link
Owner

mixOmicsTeam commented Jun 22, 2022 via email

@Max-Bladen
Copy link
Collaborator Author

Thanks for the clarification @mixOmicsTeam. The results certainly didn't feel right

I'll inform users of this and put aside this PR. I'll do a bit of research and thinking as to if there is a way this could be implemented but I'll move on for now

@Max-Bladen Max-Bladen closed this Jun 23, 2022
@Max-Bladen Max-Bladen added feature-request Can be implemented if there's enough interest and removed enhancement-request New feature or request labels Jun 23, 2022
@Max-Bladen Max-Bladen deleted the issue-148 branch December 13, 2022 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Can be implemented if there's enough interest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: Possibility to keep some descriptors in sPLS
2 participants