-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No is.null() for empty dfmSparse object computed by dfm() #811
Comments
An empty dfm looks like NULL when printed, but it is still a dfm. Please use my_dictionary = dictionary( list( a = c( "asd", "dsa" ),
b = c( "foo", "jup" ) ) )
# writing a little piece of text
raw_text = c( "Wow I can't believe it's not raining!",
"Today is a beautiful day. The sky is blue and there are burritos" )
my_corpus = corpus( raw_text )
summary( my_corpus )
my_dfm = dfm( my_corpus, dictionary = my_dictionary )
is.null( my_dfm ) # FALSE
is.dfm( my_dfm ) # TRUE
nfeature( my_dfm ) # 0 (becasue it is still a dfm with zero features)
str( my_dfm ) |
Cool! Thank you. Thank you again. |
Better would be to change the print method to reflect an empty > matrix()
[,1]
[1,] NA
> data.frame()
data frame with 0 columns and 0 rows
> tibble::tibble()
# A tibble: 0 x 0
> as(matrix(), "dgCMatrix")
1 x 1 sparse Matrix of class "dgCMatrix"
[1,] NA |
Burritos? 🌯 😄 |
@kbenoit That would help indeed. My point was that when calling a printing method I am expecting to be able to check the output of that call. Printing |
I just ran into this same problem... But why the special NULL case, at all? Wouldn't an i-by-j matrix full of zeros be the more appropriate return? In my case, I am binding the dfm output to a separate matrix of dependency-based features from the same documents. An i-by-j matrix of zeros perfectly captures the information I need (i.e. none of these dictionary features are present) while the "NULL" 0-by-0 matrix breaks the pipeline. |
We also need to change (tmp <- kwic(data_char_ukimmig2010, "unicorn"))
## NULL
is.null(tmp)
## [1] FALSE |
Yes, that is correct. |
Fix and update for both conditions is imminent. |
I found that when I compute a document-feature-matrix using
dfm()
and a custom dictionary, if no words are matched, thendfm()
returns aNULL
. The problem arises when I check this using the standardis.null()
function which returns aFALSE
. In the following you will find the minimal.The temporarily workaround I found is to convert
my_dfm
object to either a matrix or a data.table and check the dimensions as follows.So I guess that to execute a code chunk if and only if the document-feature-matrix is full, one can run something like the following:
At least this worked for me, but it would be nice to have a direct control to check if
dfm()
computes an empty matrix.The text was updated successfully, but these errors were encountered: