New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while performing Binary Relevance or Label Powerset #89
Comments
As in #88 I cannot reproduce your problem, it seems to be caused by your data.
Without knowing the data I assume your dataframe is erroneous such that you have an unusable column in it (which causes the exception: I strongly recommend to set the proper types of pandas after loading the data set, to prevent doing something wrong when further down the rabbit hole (such as assuming a Gaussian distribution on the features, as you do with a Without the data I'm afraid I can't help you--I'm pretty sure though it's an error on your side. Is it publicly available? PS: @souravsingh's issue #88 seems to be the same code as yours.If you two work together I'd propose to close one issue to keep the open issues at a minimum. |
@ChristianSch : Yes, it's similar to his. However, I did try to change the type after loading the dataset. I also tried to convert them into list and then into ndarray or sparsematrix. However, it just throws me error about the dtype('O') which is an object. Also, the documentation states that using dtype=str will assign it as dtype=object. The column that I am using to train (a bunch of sentences), they are alll of the form of object. @souravsingh Could you please post a snippet of how you got it working? Because I still cannot. I can may be upload my raw dataset if you guys need me to. Edit: I have also tried it with DecisionTree and RandomForest! |
My code is something like this- import pandas as pd
import numpy as np
from sklearn.svm import SVC
#from sklearn.multioutput import ClassifierChain
from sklearn.naive_bayes import GaussianNB
from modlamp.sequences import MixedLibrary
from sklearn.model_selection import train_test_split
from skmultilearn.problem_transform import BinaryRelevance, LabelPowerset, ClassifierChain
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import f1_score
import xgboost as xgb
from mlxtend.classifier import StackingClassifier
data = pd.read_csv("full_dataset.csv")
y = data[['antiviral','antifungal','antibacterial']]
to_drop = ['# ID','Sequence','antiviral', 'antibacterial', 'antifungal']
X = data.drop(to_drop,axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
clf = LabelPowerset(xgb.XGBClassifier(n_estimators=500))
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(y_pred)
print("The macro averaged F1-score is: %.3f" %(f1_score(y_pred, y_test, average='macro'))) While the code doesn't work with XGBoost as base classifier for some reason, it works with all the scikit-learn classifiers. |
That is exactly the same. Except that I have been trying to do it with scikit-learn classifiers. I am assuming your columns antiviral, antifungal and antibacterial are columns with binary values. And X contains the list of columns with your features (in my case it is only one column with sentences). Could you please tell me where I am possibly going wrong? If not, we could introspect my dataset if you wish to. |
If you'd provide some data and your relevant code to the preprocessing I'd take a spin! |
Here you go, this is the code:
Attached the CSV File along. |
Well, you can't just throw your model at textual data. The See here for reference: http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html Also not related to scikit-multilearn, so closing afterwards. |
I am trying to perform a simple classification using Binary Relevance or Label Powerset. I consistently encounter the error despite also trying to convert it into a sparse matrix. How do I overcome this?
Here is my code:
However, I always get this:
The text was updated successfully, but these errors were encountered: