-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to prevent fields from being used as features but keep them as output fields? #209
Comments
@alexnikitchuk when you say "have in the output" do you mean as an output of the |
@tovbinm yes, exactly |
Actually here is how you can do it: val features: FeatureLike[OPVector] = ...
val label: FeatureLike[RealNN] = ...
val excluded: FeatureLike[_ <: FeatureType] = ... // say we want to exclude this feature from modeling
// define the model selector with label & features
val pred = BinaryClassificationModelSelector().setInput(label, features).getOutput()
// set the result features to the workflow including the excluded feature
val workflow = new OPWorkflow().setReader(reader).setResultFeatures(excluded, pred)
// train & score the model
val workflowModel = workflow.train()
val df = workflowModel.score() df should be a dataframe with f2 and pred as columns where pred used only f1 as the predictor. |
In general, this diagram may help you understand what happens when you call “train” on a workflow: https://github.com/salesforce/TransmogrifAI/blob/master/resources/workflows.png Essentially the underlying DAG needed to materialize ResultFeatures gets prepped, i.e., any estimators (in the above case, the binary classification model selector) get fitted on the data. When you then call score on the resulting workflowModel, data is run through the prepped DAG and all the ResultFeatures get materialized, whether or not they were a part of any estimator in the DAG. https://github.com/salesforce/TransmogrifAI/blob/master/resources/materializingdata.png |
Problem
More question than problem:
In my case some fields contain auxiliary information that I need to have in the output but at the same time these fields can have high correlation with target label.
Is it possible to prevent fields from being used as features but keep them as output fields?
Solution
Unknown
Alternatives
Unknown
Additional context
N/A
The text was updated successfully, but these errors were encountered: