New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input Feature Selection - Does the relevant code exist? #2306
Comments
Hi, thank you for the question. As you've encountered, Ax is not well suited for this problem due to the large input space. For the approach you've outlined, we'll triage to @saitcakmak who has more context on historical research cases that may be similar to this. Thanks! |
Hi @CCranney. Just so we're on the same page,
In Ax, we typically use Bayesian optimization with Gaussian process (GP) surrogates. We do have some sensitivity measures associated with these surrogates, which can help understand how much each input matters for each output (based on the surrogate predictions). The standard GP models would not scale to ~60k inputs & ~7k outputs, so you'd need a more specialized approach here. There's some recent research from our team on dealing with high-dimensional intermediate outputs, though I am not sure how applicable it is for this use case & whether they can deal with ~60k parameters: https://arxiv.org/abs/2311.02213 |
Hi @CCranney, interesting problem! If we are interested in performing feature selection or identifying sparse solutions to the optimization problem, Sparsity Exploration Bayesian Optimization (SEBO) might be an option for you and it is already available in Ax: https://ax.dev/tutorials/sebo.html. However, handling 7k output might not be feasible with the (noisy) expected hypervolume improvement acquisition function, if there's no easy way to explicitly reducing it down to a few metrics to optimize. Additionally, as @saitcakmak pointed out, our recent work should enable the use of large number of intermediate outcomes to improve the optimization performance, but it'd still require some scalar value quantifying how good the outcomes are. If it is hard to define such quantity explicitly, it can be estimated through preference learning as well (https://botorch.org/tutorials/bope). Mixing & matching these method might be an interesting approach to this problem That said, above modification might take some work to enable in Ax, and it'd be easier to implement it in BoTorch directly if we are interested in mix & match these methods. If we are just looking for something already available in Ax, I'd recommend to either:
|
Thank you all for contributing! I should clarify one point - while the original problem has ~60k inputs and ~7k outputs, for this process of eliminating unnecessary inputs I am focusing on just a single output. While I would also be interested in eliminating inputs that have little to no bearing on the all of the outputs generally (one day), my request was for doing so while narrowing the focus to a specific output of interest. So things like SEBO and BOPE might be exactly what I was looking for. I'm going to research your responses in greater depth and get back to you, thank you! |
Handling the 60k inputs directly will be challenging. That said, I know that @dme65 has had some success with feature selection approaches where the optimization is performed over a lower-dimensional space that describes the parameters of other more heuristic feature selection methods. It's kind of an ensembling approach if you will, where you optimize over parameters |
I think my questions have been largely answered, I my re-open this if more ideas/questions come up in the future, but I'll close this for now. Thank you again! |
Hi,
I'm playing around with using Ax to perform an NAS. My experiment is a regression analysis that involves ~60k inputs and ~7k inputs. We've had moderate success creating a model that uses all the inputs to predict all the outputs. However, as we are interested in identifying which inputs have the largest impact on which outputs, I've noticed that programs like SHAP don't work as well as they have in my previous projects. It looks like the sheer number of inputs dilutes SHAP values across many inputs, making parsing out more direct impacts challenging.
I am researching alternatives to SHAP, but in considering a way around this problem, I wondered if I could design an NAS that chose inputs in a binary fashion to predict a single one of the outputs. I would implement this in a multi-objective approach, where accurate validation prediction and low number of input features are both tested. My experiments on this front have not gone well. The number of chosen features remains at ~half of all possible features across the NAS (it doesn't learn to "drop" useless inputs), and attempts to add parameter constraints have also not gone well. For the latter, I assume parameter constraints were designed with perhaps 2-10 parameters in mind, whereas I am applying a constraint to thousands.
Before signing this off as a lost cause, I wondered if you have encountered a similar research case before, and if my experimental attempts are reinventing the wheel (badly). It is clear my experiments have been testing the program's functionality with unexpected and inappropriately extreme examples, and am holding out some hope that a better solution exists.
Is there any functionality you know of that would cherry-pick specific impactful inputs to a simple neural network to evaluate which ones have the strongest impact in predicting a single output value?
The text was updated successfully, but these errors were encountered: