Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row and column labels are backwards in covert(...to-"lsa") #526

Closed
BobMuenchen opened this issue Jan 28, 2017 · 1 comment
Closed

Row and column labels are backwards in covert(...to-"lsa") #526

BobMuenchen opened this issue Jan 28, 2017 · 1 comment

Comments

@BobMuenchen
Copy link

The convert function does the right thing when it takes a dfm "to=lsa". However, the row and column headings are backwards and need to be reversed. Here is an example from the lsa help file showing what he calls a "dtm" is what most packages would call a "tdm" since the first letter usually refers to rows in R:

> library("lsa")
> ldir = tempfile()
> dir.create(ldir)
> write( c("human", "interface", "computer"), file=paste(ldir, "c1", sep="/"))
> write( c("survey", "user", "computer", "system", "response", "time"), file=paste(ldir, "c2", sep="/"))
> write( c("EPS", "user", "interface", "system"), file=paste(ldir, "c3", sep="/"))
> write( c("system", "human", "system", "EPS"), file=paste(ldir, "c4", sep="/"))
> write( c("user", "response", "time"), file=paste(ldir, "c5", sep="/"))
> write( c("trees"), file=paste(ldir, "m1", sep="/"))
> write( c("graph", "trees"), file=paste(ldir, "m2", sep="/"))
> write( c("graph", "minors", "trees"), file=paste(ldir, "m3", sep="/"))
> write( c("graph", "minors", "survey"), file=paste(ldir, "m4", sep="/"))
> 
> # -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
> # generate doc term matrix from landauer files
> 
> dtm = textmatrix(ldir, minWordLength=1)
> dtm
           docs  <===================Note how this refers to documents
terms       c1 c2 c3 c4 c5 m1 m2 m3 m4
  computer   1  1  0  0  0  0  0  0  0
  human      1  0  0  1  0  0  0  0  0
  interface  1  0  1  0  0  0  0  0  0
  response   0  1  0  0  1  0  0  0  0
  survey     0  1  0  0  0  0  0  0  1
  system     0  1  1  2  0  0  0  0  0
  time       0  1  0  0  1  0  0  0  0
  user       0  1  1  0  1  0  0  0  0
  eps        0  0  1  1  0  0  0  0  0
  trees      0  0  0  0  0  1  1  1  0
  graph      0  0  0  0  0  0  1  1  1
  minors     0  0  0  0  0  0  0  1  1

Here's a dfm created by quanteda:

> music_dfm_cleaned[1:10,1:5]
Document-feature matrix of: 10 documents, 5 features (86% sparse).
10 x 5 sparse Matrix of class "dfmSparse"
       light battery power great cost
text1      1       0     0     0    0
text2      0       1     1     1    0
text3      0       0     0     0    1
text4      0       0     0     0    0
text5      0       0     0     0    0
text6      0       1     0     0    0
text7      0       0     0     0    0
text8      0       0     0     0    0
text9      0       0     0     1    0
text10     0       0     0     0    0


...and now converted to lsa:

> library("quanteda")
> library("lsa")
> music_tdm <- convert(music_dfm_cleaned, to = "lsa")  # <===I call it tdm, don't let that confuse you!
> music_tdm[1:10,1:5]
         terms   <==================so "terms" should be replaced with "docs" & vice versa
docs      text1 text2 text3 text4 text5
  light       1     0     0     0     0
  battery     0     1     0     0     0
  power       0     1     0     0     0
  great       0     1     0     0     0
  cost        0     0     1     0     0
  size        0     0     1     0     0
  all_my      0     0     0     1     0
  cds         0     0     0     1     0
  palm        0     0     0     1     0
  hand        0     0     0     1     0
``
kbenoit added a commit that referenced this issue Jan 29, 2017
Add tests for convert(x, to = "lsa")

Solves #526
@kbenoit kbenoit mentioned this issue Jan 29, 2017
@kbenoit
Copy link
Collaborator

kbenoit commented Jan 29, 2017

Fixed now - thanks for finding this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants