Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to prevent fields from being used as features but keep them as output fields? #209

Closed
alexnikitchuk opened this issue Jan 16, 2019 · 4 comments

Comments

@alexnikitchuk
Copy link

Problem
More question than problem:
In my case some fields contain auxiliary information that I need to have in the output but at the same time these fields can have high correlation with target label.
Is it possible to prevent fields from being used as features but keep them as output fields?

Solution
Unknown

Alternatives
Unknown

Additional context
N/A

@tovbinm
Copy link
Collaborator

tovbinm commented Jan 17, 2019

@alexnikitchuk when you say "have in the output" do you mean as an output of the model.score execution?

@alexnikitchuk
Copy link
Author

@tovbinm yes, exactly

@snabar
Copy link
Contributor

snabar commented Jan 26, 2019

Actually here is how you can do it:

val features: FeatureLike[OPVector] = ...
val label: FeatureLike[RealNN] = ...
val excluded: FeatureLike[_ <: FeatureType] = ... // say we want to exclude this feature from modeling

// define the model selector with label & features
val pred = BinaryClassificationModelSelector().setInput(label, features).getOutput()

// set the result features to the workflow including the excluded feature
val workflow = new OPWorkflow().setReader(reader).setResultFeatures(excluded, pred)

// train & score the model
val workflowModel = workflow.train()
val df = workflowModel.score()

df should be a dataframe with f2 and pred as columns where pred used only f1 as the predictor.

@snabar
Copy link
Contributor

snabar commented Jan 26, 2019

In general, this diagram may help you understand what happens when you call “train” on a workflow:

https://github.com/salesforce/TransmogrifAI/blob/master/resources/workflows.png

Essentially the underlying DAG needed to materialize ResultFeatures gets prepped, i.e., any estimators (in the above case, the binary classification model selector) get fitted on the data.

When you then call score on the resulting workflowModel, data is run through the prepped DAG and all the ResultFeatures get materialized, whether or not they were a part of any estimator in the DAG.

https://github.com/salesforce/TransmogrifAI/blob/master/resources/materializingdata.png

@tovbinm tovbinm closed this as completed Jan 26, 2019
@tovbinm tovbinm mentioned this issue Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants