Skip to content

Convert to data.frame issue if 'document' is non-unique column name. #1918

@danlewis85

Description

@danlewis85

Converting a dfm to a data.frame using 'convert' creates an issue if one of the features in your dfm is also called 'document'.

Perhaps rename the document column to something more likely to be unique, like "doc_id" in line with ropensci text interchange formats.

Reproducible code

Please paste minimal code that reproduces the bug. If possible, please upload the data file as .rds.

library(magrittr)
library(quanteda)

# convert dfm to data.frame
dfm_df <- dfm(c("this is a fine document")) %>% convert(to = 'data.frame')

# fix
names(dfm_df)[1] <- "doc_id"

Expected behavior

If you create a data.frame with two 'document' columns, R throws an Rlang error if you try to make use of that column: for example:

Call `rlang::last_error()` to see a backtrace.```


## System information

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] quanteda_2.0.1 magrittr_1.5

loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 rstudioapi_0.10 stopwords_1.0 tidyselect_0.2.5
[5] munsell_0.5.0 colorspace_1.4-1 lattice_0.20-38 R6_2.4.1
[9] rlang_0.4.1 fastmatch_1.1-0 dplyr_0.8.3 tools_3.6.1
[13] grid_3.6.1 data.table_1.12.8 gtable_0.3.0 lazyeval_0.2.2
[17] RcppParallel_5.0.0 assertthat_0.2.1 tibble_2.1.3 lifecycle_0.1.0
[21] crayon_1.3.4 Matrix_1.2-17 purrr_0.3.3 ggplot2_3.2.1
[25] glue_1.3.1 stringi_1.4.3 compiler_3.6.1 pillar_1.4.2
[29] scales_1.1.0 pkgconfig_2.0.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions