-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Label and Cluster evaluation? #105
Comments
You can't use a clustering evaluator on a classification dataset. You washed all the types off which removes the guarantees about the dataset, and the only reason it didn't throw a If you want to compare how the clusters line up against classification labels then you should write a new Looks like k-modes is for categorical data. We've not looked into that algorithm. We're considering adding k-medoids which guarantees that the cluster is a datapoint, rather than the mean of the cluster, but we've not implemented it yet. |
okk so can you just pin me some code snippets of how can i use responseProcessor and how to use clusteringfactoy to get what i want..there are too many classes interfaces so kind of lost in docs. |
Something like this should be sufficient. I've not tested it, and it's not going to be part of Tribuo. public class IrisClusterResponseProcessor implements ResponseProcessor<ClusterID> {
@Config(mandatory = true,description="The field name to read.")
private String fieldName;
private final OutputFactory<ClusterID> outputFactory = new ClusteringFactory();
private IrisClusterResponseProcessor() {}
public IrisClusterResponseProcessor(String fieldName) {
this.fieldName = fieldName;
}
@Override
public OutputFactory<ClusterID> getOutputFactory() {
return outputFactory;
}
@Override
public String getFieldName() {
return fieldName;
}
@Deprecated
@Override
public void setFieldName(String fieldName) {
this.fieldName = fieldName;
}
@Override
public Optional<ClusterID> process(String value) {
if ("Iris-setosa".equals(value)) {
return Optional.of(outputFactory.generateOutput("0"));
} else if ("Iris-versicolor".equals(value)) {
return Optional.of(outputFactory.generateOutput("1"));
} else if ("Iris-virginica".equals(value)) {
return Optional.of(outputFactory.generateOutput("2"));
} else {
return Optional.empty();
}
}
@Override
public ConfiguredObjectProvenance getProvenance() {
return new ConfiguredObjectProvenanceImpl(this,"ResponseProcessor");
}
} |
okkk so everytime i need to give dataset i need to do this okkk thanks ..is there any automatically stuff you are planning for this kind of stuffs. i have just done writing my own uniqueFeatureEncoder class by implementing FieldProcessor it was easy but if some param options is provided on API side will be good like for binarized feature.binary,for real data just add real.. btw thanks for your help .and quick support.. |
Well, in general you shouldn't try to feed classification data to a clustering task. Tribuo is setup to prevent users from confusing the prediction tasks like that, so it needs some tricks to make it work. We add new implementations of |
okk one last think before wrapping up ...WHAT SHOULD I CHOOSE to make my feature categorical into unique encoding but just one feature not all features for all classes ..whether i should use REAL or CATEGORICAL or WHAT ? |
I don't understand the question, could you give an example? |
for eg.i have a column shirt size=[tiny,medium,large] and i want to encode it into numeric but no features for every class just want one single feature with class labelled as numbers[1=tiny,2=medium,3=large] so as i have written my own rowprocessor what should i use ? should i use this GeneratedFeatureType.BINARISED_CATEGORICAL or this GeneratedFeatureType.REAL or this GeneratedFeatureType.CATEGORICAL ....theres also on this GeneratedFeatureType.TEXT..which one should i choose ? |
It's |
yes i m waiting for it thank for your grest support.
…On Thu, 10 Dec 2020 at 01:00, Adam Pocock ***@***.***> wrote:
It's GeneratedFeatureType.CATEGORICAL. We're working on further changes
to Tribuo's internal type system, as at the moment that enum only really
interacts with the LIME explanation module, but in the future it will
control how the feature statistics are computed.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#105 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AL3BT7E4NZBUO4GJOBII5NDST7F4LANCNFSM4UTONYBA>
.
|
hey,I m playing around with k means clustering and at evaluation time i was playing with Old friend Iris data.so i m new to unsupervised learning heres my question:
how to evaluate cluster with label column 'species in case of iris'
heres what i m doing simple stuff for getting familiar with clustering in tribuo:
AND I M GETTING THIS:
HOT TO EVALUATE AND SEE IF MY CLUSTERS CORRECTLY CLASSIFFIFIED CLASSES OR NOT
and one more thing i have heard about K-mode clustering .is that kind of thing exist for now.?
The text was updated successfully, but these errors were encountered: