Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to predict on new data? #9

Closed
erdnaxel opened this issue Dec 17, 2020 · 5 comments
Closed

how to predict on new data? #9

erdnaxel opened this issue Dec 17, 2020 · 5 comments

Comments

@erdnaxel
Copy link

hello:

love the package!!

i’m wondering how to apply the model to new data?

@koheiw
Copy link
Owner

koheiw commented Dec 23, 2020

Hi @erdnaxel

In the original GibbsLDA++, topics of unseed documents are inferred in another round of Gibbs sampling. I haven't implemented this function, because I didn't think many people separate fitting and prediction steps with LDA.

With the current version, you can still predict topics of unseen documents using the distribution of topic over words (phi). Here, x should be fitted LDA object, and newdata is a DFM.

predict <- function(x, newdata = NULL) {
    if (!is.null(x)) {
        data <- newdata
    } else {
        data <- x$data
    }
    data <- dfm_match(data, colnames(x$phi))
    temp <- data %*% t(x$phi)
    result <- factor(max.col(temp), labels = rownames(x$phi),
                     levels = seq_len(nrow(x$phi)))
    result[rowSums(data) == 0] <- NA
    return(result)
}

Please be aware that the result of predict() can be different from topics() due to the different nature of algorithm.

@tomseinen
Copy link

Came here for the same question as @erdnaxel.
I think implementing the predict function will be much appreciated.

Great work!

@erdnaxel
Copy link
Author

thank you, i really appreciate the response! i will try it out as soon as i can.

@koheiw koheiw mentioned this issue Dec 30, 2020
@koheiw
Copy link
Owner

koheiw commented Dec 30, 2020

Guys, I created predict() in the issue-9 branch. Please give it a try.

@koheiw
Copy link
Owner

koheiw commented Dec 30, 2020

I close this as the branch is merged, so please open a new issue if there are problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants