Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information on reference documents to predict.textmodel_wordscores #1229

Closed
stefan-mueller opened this issue Feb 12, 2018 · 2 comments
Closed

Comments

@stefan-mueller
Copy link
Collaborator

As far as I see, the predict() function for textmodel_wordscores() only contains the document names and the estimated scores. As a result, based on the predict() output we do not which documents were originally reference texts.

This could be somewhat problematic if a user plots the estimated scores with textplot_scale1d(margin = c("documents")), but only wants the scores for the virgin texts (or wants to highlight which documents were reference texts with a different shape/colour).

To add information on the reference documents, it would probably sufficient to add the textscore/NA to the predict.textodel_wordscores function. Then we could also adjust textplot_scale1d() and add an option such as include_refscores = FALSE or highlight_refscores = TRUE.

ws <- textmodel_wordscores(data_dfm_lbgexample, c(seq(-1.5, 1.5, .75), NA))

summary(ws)

# information on reference scores
ws$y
# > ws$y
# [1] -1.50 -0.75  0.00  0.75  1.50    NA

str(predict(ws))
# > str(predict(ws))
# Classes 'predict.textmodel_wordscores', 'numeric'  Named num [1:6] -1.32 -7.40e-01 -8.67e-18 7.40e-01 1.32 ...
# ..- attr(*, "names")= chr [1:6] "R1" "R2" "R3" "R4" ...
@kbenoit
Copy link
Collaborator

kbenoit commented Feb 12, 2018

Understand the point, but we spent a lot of time trying to make the predict() methods for textmodel objects behave as closely to (e.g.) predict.lm() as possible. That’s why they predict whatever you ask them to predict on.

Two solutions: a) add an argument to predict.textmodel_wordscores() to exclude reference texts; or b) use textmodel_affinity(), which is the newer, better alternative to wordscores. And hopefully to be published soon.

@stefan-mueller
Copy link
Collaborator Author

This should avoid cases like this example from our tutorial where the reference text scores "distort" the plotted results.

@stefan-mueller stefan-mueller changed the title Add information on reference documents to predict.textmodel_wordscores Add information on reference documents to predict.textmodel_wordscores Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants