-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIR][Train] Multilabel classification with SklearnTrainer
#32732
Comments
SklearnTrainer
Yes, it does support multilabel classification. Here's an example you can build off of: import ray
from ray.train.sklearn import SklearnTrainer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.ensemble import RandomForestClassifier
train_dataset = ray.data.from_items([{"x": x, "y": x % 3} for x in range(32)])
trainer = SklearnTrainer(
estimator=OneVsRestClassifier(RandomForestClassifier()),
datasets={"train": train_dataset},
label_column="y",
scaling_config=ray.air.config.ScalingConfig(trainer_resources={"CPU": 4}),
)
result = trainer.fit() By the way, what was the error you were encountering? |
@justinvyu thanks for the clarification. The dataset will look like below.
I merged the target column as I used similar approach,
The below error I am getting
|
Please check the above comment. |
Hi @srimantacse, Got it, I misunderstood the original question. The problem is that The problemMultilabel classification algos provided by sklearn (ex: The I was able to get it working by just adding one line here:
I'm not sure if this is the most robust solution -- would you like to open up a PR for this? I can help guide you through it! If not, I can also put this on my backlog. WorkaroundAs for a current workaround - instead of using an def train_fn(config):
# see https://scikit-learn.org/stable/computing/parallelism.html
os.environ["OMP_NUM_THREADS"] = str(num_cpus)
os.environ["MKL_NUM_THREADS"] = str(num_cpus)
os.environ["OPENBLAS_NUM_THREADS"] = str(num_cpus)
os.environ["BLIS_NUM_THREADS"] = str(num_cpus)
dataset = ...
estimator = OneVsRestClassifier(...)
estimator.fit()
tuner = tune.Tuner(train_fn)
results = tuner.fit() Let me know if that makes sense and works for what you're trying to do. |
Thanks a lot @justinvyu Just one question, to open a PR do I need to add any tag? |
Description
Here in ray 2.x, we have the sklearnTrainer support.
Q1: does it support multilabel classification approach like OneVsRestClassifier?
SklearnTrainer class have one argument in the constructor like label_column. Does it take list of labels there? I tried but it is not working.
Use case
No response
The text was updated successfully, but these errors were encountered: