Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method to retrieve predictions from SSC #4

Open
jllavin77 opened this issue Jan 11, 2022 · 5 comments
Open

Method to retrieve predictions from SSC #4

jllavin77 opened this issue Jan 11, 2022 · 5 comments

Comments

@jllavin77
Copy link

jllavin77 commented Jan 11, 2022

Dear developers,

I was looking for a Semi-Supervised ML method in R and found your excellent package. I tried your example code adapting it to my input data, and after some reformating it works apparently well. The problem I have is related to how to access prediction results for each of the rows in my input table.
I may sound naive, but I can't find the code to access the classification assigned for each of the "unlabeled" rows in my table, by any of the methods carried out in your vignette's example code.
I can access the sumary of how many samples have been assigned to each class, but I'd like to know how to access to each row's individual class/label prediction (in dataframe format, for instance).
I hope I was able to explain myself clearly enough for everybody to understand this request.
Thanks in advance and congrats for your nice work.

@jllavin77 jllavin77 changed the title How to obtain predictions How to retrieve predictions Jan 12, 2022
@jllavin77 jllavin77 changed the title How to retrieve predictions How to retrieve predictions from SSC Jan 12, 2022
@jllavin77 jllavin77 changed the title How to retrieve predictions from SSC Method to retrieve predictions from SSC Jan 12, 2022
@mabelc
Copy link
Owner

mabelc commented Jan 13, 2022

Thanks for your interest!
Please, use the predict method and supply the instances that were unlabeled during the training. That way you are using the transductive capabilities of the model because those instances were also seen during the training.
Hope I helped. If you still have questions don't hesitate to ask.

@jllavin77
Copy link
Author

My question is more related to having a function to obtain that information in table format. Using predict doesn't provide that info. You suggest to use predict on my unlabeled data, but, which model should I use for that prediction?
Could you provide an code example on that? Is it somethig similar to this snipet?

`######################REDUCED CODE######################

m <- selfTraining(x = xtrain, y = ytrain, learner = knn3, learner.pars = list(k = 1))

pred <- predict(m, xitest, interval="confidence")

summary(pred) `

Once I carry out this prediction, how do I get the data I'm really looking for, because this way I end up with a summary of the predictions, but no clue about which label corresponds to each row. Do you see what I mean?

@mabelc
Copy link
Owner

mabelc commented Jan 15, 2022

I think I understand what you are looking for. Could you please try this code? But if it is not solving your problem, please continue asking!

##Load Iris data set
data(iris)

x <- iris[, -5] # instances without classes
x <- as.matrix(x)
y <- iris$Species

##Prepare data, use 50% of instances for training
set.seed(1)
tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5))
xtrain <- x[tra.idx,] # training instances
ytrain <- y[tra.idx] # classes of training instances

##Use 70% of train instances as unlabeled set
tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7))
ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances

##train selftraining with base classifiers knn3
m <- selfTraining(x = xtrain, y = ytrain, learner = knn3, learner.pars = list(k = 1))

##transductive test
##it's called transductive because we want to predict the instances that were unlabeled during the training
xttest = xtrain[tra.na.idx,]
pred.label <- predict(m, xttest)

##creating a matrix with the training data unlabeled + predicted labels by selftraining-knn3
xttest <- cbind(xttest, pred.label)
xttest

@jllavin77
Copy link
Author

Dear @mabelc,

Thank you very much for your piece of code. It works, and was exactly what I was asking for.

Just one more question, I have read the selfTraining function documentation and cannot figure out how to change the learner parameter from KNN3 to random forest, svm or any other classifier. Is there a list of the available classifiers explained somewhere?

Thanks in advance for your kind help.

@mabelc
Copy link
Owner

mabelc commented Jan 22, 2022

Hi,

In this paper https://cran.r-project.org/web/packages/ssc/vignettes/ssc.pdf you can find many examples with different learners. I have modified the previous example to use SVM as learner. Basically you can use learners from R ecosystem, the generic functions provided will help you with that. In the example I am using the generic version of selfTraining, named selfTrainingG.

library('ssc')
library('e1071')

##Load Iris data set
data(iris)

x <- iris[, -5] # instances without classes
x <- as.matrix(x)
y <- iris$Species

##Prepare data, use 50% of instances for training
set.seed(1)
tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5))
xtrain <- x[tra.idx,] # training instances
ytrain <- y[tra.idx] # classes of training instances

##Use 70% of train instances as unlabeled set
tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7))
ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances

##wrapper functions to train a SVM
gen.learner <- function(indexes, cls)
e1071::svm(x = xtrain[indexes, ], y = cls, type='C-classification', probability=TRUE)

gen.pred <- function(model, indexes){
p <- predict(model, xtrain[indexes, ], probability=TRUE)
attr(p, "probabilities")
}

##train generic selftraining with SVM as base classifier
m <- selfTrainingG(y = ytrain, gen.learner, gen.pred)

##transductive test
##it's called transductive because we want to predict the instances that were unlabeled during the training
xttest = xtrain[tra.na.idx,]
pred.label <- predict(m$model, xttest)

##creating a matrix with the training data unlabeled + predicted labels by selftraining-knn3
xttest <- cbind(xttest, pred.label)
xttest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants