-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to fit metaclassifier with StackingCVClassifier #605
Comments
I can confirm, there seems to be an issue. For instance, the following self-contained example works fine for the StackingClassifier: from sklearn import model_selection
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from mlxtend.classifier import StackingClassifier
from mlxtend.classifier import StackingCVClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import fetch_20newsgroups
import numpy as np
categories = ['alt.atheism', 'soc.religion.christian',
'comp.graphics', 'sci.med']
twenty_train = fetch_20newsgroups(subset='train',
categories=categories, shuffle=True, random_state=42)
X_data = twenty_train.data
y = twenty_train.target
lr1 = LogisticRegression()
lr2 = LogisticRegression()
lr3 = LogisticRegression()
words = make_pipeline(
CountVectorizer(analyzer='word', token_pattern=r'\w{1,}', max_features=5000),
lr1)
pos = make_pipeline(
CountVectorizer(binary=True, ngram_range=(2,3),
max_features=5000),
lr2)
sclf = StackingClassifier(classifiers=[words, pos], meta_classifier=lr3)
sclf.fit(X_data, y) However, for the scvclf = StackingCVClassifier(classifiers=[words, pos], meta_classifier=lr3, cv=5)
scvclf.fit(X_data, y)
Looks like it is passing the |
For the example I posted above, disabling the input checking will solve the issue. The issue was that the inputs were checked before they were passed to the pipeline. Since the input data is text data, this would cause issues. So, there's currently no good way for checking inputs if pipelines are used. Not sure if it will solve the DataFrame issue though. I will let you know once I merged it into master. |
Great, thanks! |
Alright, the changes should be in the master branch now. Can you install the latest dev version via
and give it another try? |
Done, it works now. Thanks! |
awesome, thanks for letting me know. |
Hi there,
I'm trying to perform text classification with stacking. I'm new in ML, so apologies if this is a silly question.
I'm trying to train the same algorithm, LogisticRegression on different textual features to create different classifiers and then use a meta-classifier (also LogisticRegression) to join them all. The features I'm using are the words in the text and the corresponding Part-of-Speech tags.
The classifier that uses words as a feature is defined with the following pipeline:
lr =LogisticRegression()
The classifier that uses POS as a feature is defined with the following pipeline:
Finally, the metaclassifier is defined this way:
The problem comes when I try to train the classifier:
Words and POS are fitted, but the Stack classifier is not and I get the following error:
IndexError: only integers, slices (
:), ellipsis (
...), numpy.newaxis (
None) and integer or boolean arrays are valid indices
X_train contains a dataframe with a colum "text" that contains the raw text and a column "pos" that contains the raw POS tags, that's why I apply the transformations needed through the pipelines.
When I try the same with the StackingClassifier method, I don't have this problem.
Any idea about what's going wrong?
Thanks!
The text was updated successfully, but these errors were encountered: