-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to left join docvars with those in an existing corpus #7
Comments
str
does not work for for corpus objects without docvars
Thanks. More generally (and basically): str(corpus("this is my single document"))
## Error in `[[.corpus`(object, 1L) :
## cannot index docvars this way because none exist |
But keep in mind this, from
😉 |
@conjugateprior Refresh with the latest GitHub version and try it now. |
FYI I was In the meantime I'll wait until the innards settle down. |
Have you seen the corpus1 <- corpus_subset(data_corpus_inaugural, President == "Bush")
corpus2 <- corpus_subset(data_corpus_inaugural, President == "Clinton")
docvars(corpus2, "newvar") <- "Added to Clinton"
corpus3 <- corpus_subset(data_corpus_inaugural, President == "Obama")
docvars(corpus3, "newvar") <- "Added to Obama"
docvars(c(corpus1, corpus2, corpus3))
## Year President FirstName newvar
## 1989-Bush 1989 Bush George <NA>
## 2001-Bush 2001 Bush George W. <NA>
## 2005-Bush 2005 Bush George W. <NA>
## 1993-Clinton 1993 Clinton Bill Added to Clinton
## 1997-Clinton 1997 Clinton Bill Added to Clinton
## 2009-Obama 2009 Obama Barack Added to Obama
## 2013-Obama 2013 Obama Barack Added to Obama
docvars(corpus2 + corpus3)
## Year President FirstName newvar
## 1993-Clinton 1993 Clinton Bill Added to Clinton
## 1997-Clinton 1997 Clinton Bill Added to Clinton
## 2009-Obama 2009 Obama Barack Added to Obama
## 2013-Obama 2013 Obama Barack Added to Obama If not, consider a PR that operates using accessor functions (try |
Definitely not As in the Currently it seems one must have the external metadata go in column by column and hope it lines up with the exact ordering of documents in the corpus. This has bitten me several times already. Hence the desire for a |
Well, we could modify You want following:
|
Yes, that would do it. Two small caveats.
|
OK, thinking about options for syntax:
|
How about this:
Using S4 methods with multiple dispatch will allow us to distinguish these two methods (even with S3 objects). Order from chaos. |
Four questions and proposed answers for the semantics of
Proposal:
Some discussion of the semantics factor conversion would be useful. |
Second suggestion: All this goes into an augmented |
@kbenoit Thoughts on these semantics or should I assume they're fine and send a PR? |
Insofar as I understood it fully, let's implement your answers to the scheme above. I'd say that the docvars class should be the left side, i.e. the existing variable, and if this is not compatible in the ways you list, then complain and stop. You mention a PR - great if you code this! |
Update: The solution to this could be part of quanteda/quanteda#1214. It could also be solved by the idea of creating a quanteda.dplyr extension package as described in quanteda/quanteda#1171, quanteda/quanteda#529. |
@conjugateprior with the new package this should be pretty easy to implement now. I'm adding it to the list. |
@kbenoit I was wondering if there is a solution to the question in this thread? I have been unsuccessful in trying to do add external variables to a corpus object. Thanks! |
This works
but this doesn't
apparently because there are no docvars
Seems like it should be possible to make a docvar-free corpus though.
The text was updated successfully, but these errors were encountered: