Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning docnames to dfm or tokens object not possible #987

Closed
stefan-mueller opened this issue Sep 21, 2017 · 2 comments
Closed

Assigning docnames to dfm or tokens object not possible #987

stefan-mueller opened this issue Sep 21, 2017 · 2 comments

Comments

@stefan-mueller
Copy link
Collaborator

Just tried to assign new document names to a dfm()/tokens() object. The documentation says that docnames() "gets or sets document names of a corpus, tokens, or dfm object".

However, it seems that setting docnames currently does not work for dfm()/tokens() objects. So either we fix this or change the documentation stating that setting docnames (currently) is only possible for corpus() objects.

docnames(data_corpus_inaugural) <- paste("Speech", 1:ndoc(data_corpus_inaugural), sep = "")

head(docnames(data_corpus_inaugural))
# [1] "Speech1" "Speech2" "Speech3" "Speech4" "Speech5" "Speech6"

dfm_inaugural <- dfm(data_corpus_inaugural)
docnames(dfm_inaugural) <- paste("Speech", 1:ndoc(dfm_inaugural), sep = "")
# Error in UseMethod("docnames<-") : 
#   no applicable method for 'docnames<-' applied to an object of class "c('dfmSparse', 'dfm', 
# 'dgCMatrix', 'CsparseMatrix', 'dsparseMatrix', 'generalMatrix', 'dCsparseMatrix', 'dMatrix', 
# 'sparseMatrix', 'compMatrix', 'Matrix', 'xMatrix', 'mMatrix', 'Mnumeric', 'replValueSp')"

tokens_inaugural <- tokens(data_corpus_inaugural)
docnames(tokens_inaugural) <- paste("Speech", 1:ndoc(tokens_inaugural), sep = "")
# Error in UseMethod("docnames<-") : 
#   no applicable method for 'docnames<-' applied to an object of class "c('tokens', 'tokenizedTexts')"
@kbenoit
Copy link
Collaborator

kbenoit commented Sep 22, 2017

True - we decided not to allow people to set docnames on "downstream" objects such as tokens or dfm objects. The idea is the the docnames would be baked in at the time of construction.

I agree we could produce a better output message than the standard "no applicable method" error.

But we could also consider allowing it. @koheiw what do you think?

@koheiw
Copy link
Collaborator

koheiw commented Sep 22, 2017

I think it is OK to allow assignment of docnames. I sometimes use names() to change docnames of tokens. In the long term, docnames should be stored as a system-level docvar, not as names of list or row names of data.frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants