Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some clarifications about the output from a quantitaive model prediction #10

Closed
pritwish opened this issue Apr 10, 2019 · 2 comments
Closed

Comments

@pritwish
Copy link

Hi! So considering that I am doing quantitative prediction, I have a few questions:
a. is there a place I can get the coefficients and intercepts for models other than the top ranked model?
b. Does it do scaling and standardization internally?
c. Does it consider the values in an absolute sense internally, because i ran two datasets with absolute values same and the sign (poisitive and negative) changed in some instances of the the two and the output models were the same. But this could be a special instance of the dataset too.
d. I understand that the output model should be put in the form of y'=mX+c where X is the value evaluated from the descriptor, and this finally would give me the predicted output variable. Is there any way I can change the linear function to a different function, say a polynomial function of order two?
e. Are two different descriptors linked in any way with each other (incase they are a multi dimensional descriptor and also incase they are not). really naive question, but bugs me a lot. :p

@rouyang2017
Copy link
Owner

rouyang2017 commented Apr 10, 2019

a. No, only for the top ranked model in the SISSO.out. However, you can do it this way for other models:

  1. Get the Feature_IDs for any model you want from the folder "models".
  2. Find the ID_corresponding feature formulas from the file "Uspace.name" in the folder "feature_space". The ID is the line number in the file 'Uspace.name'. E.g., ID:50 means the feature at line 50 in the file Uspace.name
  3. Find the ID_corresponding feature data from the file "Uspace_pxxx.dat" (here pxxx denotes property xxx) in the folder "feature_space". If you are doing single-task (one target property) learning, then the first column being the original data of your target property, and the second column being the feature ID:1, third column being feature ID:2, ...
  4. copy these features to create a new train.dat, and set rung=0 (and also corresponding nsf, subs_sis) in the SISSO.in. Then, run SISSO to get a new SISSO.out.

So, in short, take the features of the model you want and do SISSO again, without further feature transformation, to get the coefficients.

@rouyang2017
Copy link
Owner

b. No any scaling are done internally during feature construction so that the physical meaning of primary features are preserved.
c. No. Probably your models are insensitive to that feature? I would check the models why difference sign of that feature values does not make changes of the results.
d. There is no such restriction in SISSO to make the models have to be polynomial. However, I expect that polynomial models could appear if they are really important (strongly correlated with your data). Or you can try only power operators ^2, ^3, ... for feature construction, right?
e. Good question. This also bug me a lot :), and I think this is a general issue for any data-driven method, not just SISSO. We have some remarks in the large paragraph (General remarks on the descriptor-property relationship identified by SISSO) of the SISSO paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants