Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error getting a directory of texts in to quanteda corpus #37

Closed
adamramey opened this issue May 1, 2015 · 4 comments
Closed

Error getting a directory of texts in to quanteda corpus #37

adamramey opened this issue May 1, 2015 · 4 comments

Comments

@adamramey
Copy link

I've tried to get a directory of texts in to a quanteda corpus with some issues. First, I make a VCorpus using the DirSource function in tm. Second, I try to make the object a quanteda corpus. However, I get the error "no applicable method for 'corpus' applied to an object of class "list."" But it's not a list; I've checked the files and everything seems sound.

library(quanteda)
library(tm)
Loading required package: NLP

Attaching package: ‘tm’

The following objects are masked from ‘package:quanteda’:

as.DocumentTermMatrix, stopwords

ds <- VCorpus(DirSource("~/Desktop/Speeches/House/2000/"))

make it a quanteda object

txts <- corpus(ds)
Error in UseMethod("corpus") :
no applicable method for 'corpus' applied to an object of class "list"
class(ds)
[1] "VCorpus" "Corpus"

@kbenoit
Copy link
Collaborator

kbenoit commented May 1, 2015

Try the dev branch rather than the cran version. And why start with tm at all? Will look at this in more detail next week.

Ken

Sent from my iPhone

On 1 May 2015, at 13:38, adamramey <notifications@github.commailto:notifications@github.com> wrote:

I've tried to get a directory of texts in to a quanteda corpus with some issues. First, I make a VCorpus using the DirSource function in tm. Second, I try to make the object a quanteda corpus. However, I get the error "no applicable method for 'corpus' applied to an object of class "list."" But it's not a list; I've checked the files and everything seems sound.

library(quanteda)
library(tm)
Loading required package: NLP

Attaching package: 'tm'

The following objects are masked from 'package:quanteda':

as.DocumentTermMatrix, stopwords

ds <- VCorpus(DirSource("~/Desktop/Speeches/House/2000/"))
##make it a quanteda object
txts <- corpus(ds)
Error in UseMethod("corpus") :
no applicable method for 'corpus' applied to an object of class "list"
class(ds)
[1] "VCorpus" "Corpus"

Reply to this email directly or view it on GitHubhttps://github.com//issues/37.

@adamramey
Copy link
Author

Was using the dev branch...there used to be a directory function in
quanteda to get a directory of text in, but it seems to be gone. Is there a
new way to do that?

On Fri, May 1, 2015 at 4:46 PM, Kenneth Benoit notifications@github.com
wrote:

Try the dev branch rather than the cran version. And why start with tm at
all? Will look at this in more detail next week.

Ken

Sent from my iPhone

On 1 May 2015, at 13:38, adamramey <notifications@github.com<mailto:
notifications@github.com>> wrote:

I've tried to get a directory of texts in to a quanteda corpus with some
issues. First, I make a VCorpus using the DirSource function in tm. Second,
I try to make the object a quanteda corpus. However, I get the error "no
applicable method for 'corpus' applied to an object of class "list."" But
it's not a list; I've checked the files and everything seems sound.

library(quanteda)
library(tm)
Loading required package: NLP

Attaching package: 'tm'

The following objects are masked from 'package:quanteda':

as.DocumentTermMatrix, stopwords

ds <- VCorpus(DirSource("~/Desktop/Speeches/House/2000/"))
##make it a quanteda object
txts <- corpus(ds)
Error in UseMethod("corpus") :
no applicable method for 'corpus' applied to an object of class "list"
class(ds)
[1] "VCorpus" "Corpus"

Reply to this email directly or view it on GitHub<
https://github.com/kbenoit/quanteda/issues/37>.

Reply to this email directly or view it on GitHub
#37 (comment).

Adam Ramey, Ph.D.
Assistant Professor of Politics
New York University Abu Dhabi

Saadiyat Island
Social Sciences Building (A5) - Room 141
PO Box 129188
Abu Dhabi, United Arab Emirates

Office: +971 2 628 5036
Cell: +971 56 194 5001
E-mail: adam.ramey@nyu.edu
Website: http://www.adamramey.com

N.B. Abu Dhabi is EST+8 from April-October and EST+9 from November-March.

@kbenoit
Copy link
Collaborator

kbenoit commented May 1, 2015

Yes ?textfile much better than the old method.

Sent from my iPhone

On 1 May 2015, at 14:49, adamramey <notifications@github.commailto:notifications@github.com> wrote:

Was using the dev branch...there used to be a directory function in
quanteda to get a directory of text in, but it seems to be gone. Is there a
new way to do that?

On Fri, May 1, 2015 at 4:46 PM, Kenneth Benoit <notifications@github.commailto:notifications@github.com>
wrote:

Try the dev branch rather than the cran version. And why start with tm at
all? Will look at this in more detail next week.

Ken

Sent from my iPhone

On 1 May 2015, at 13:38, adamramey <notifications@github.commailto:notifications@github.com<mailto:
notifications@github.commailto:notifications@github.com>> wrote:

I've tried to get a directory of texts in to a quanteda corpus with some
issues. First, I make a VCorpus using the DirSource function in tm. Second,
I try to make the object a quanteda corpus. However, I get the error "no
applicable method for 'corpus' applied to an object of class "list."" But
it's not a list; I've checked the files and everything seems sound.

library(quanteda)
library(tm)
Loading required package: NLP

Attaching package: 'tm'

The following objects are masked from 'package:quanteda':

as.DocumentTermMatrix, stopwords

ds <- VCorpus(DirSource("~/Desktop/Speeches/House/2000/"))
##make it a quanteda object
txts <- corpus(ds)
Error in UseMethod("corpus") :
no applicable method for 'corpus' applied to an object of class "list"
class(ds)
[1] "VCorpus" "Corpus"

Reply to this email directly or view it on GitHub<
https://github.com/kbenoit/quanteda/issues/37>.

Reply to this email directly or view it on GitHub
#37 (comment).

Adam Ramey, Ph.D.
Assistant Professor of Politics
New York University Abu Dhabi

Saadiyat Island
Social Sciences Building (A5) - Room 141
PO Box 129188
Abu Dhabi, United Arab Emirates

Office: +971 2 628 5036
Cell: +971 56 194 5001
E-mail: adam.ramey@nyu.edumailto:adam.ramey@nyu.edu
Website: http://www.adamramey.com

N.B. Abu Dhabi is EST+8 from April-October and EST+9 from November-March.

Reply to this email directly or view it on GitHubhttps://github.com//issues/37#issuecomment-98126349.

@pnulty
Copy link
Collaborator

pnulty commented May 1, 2015

Hi Adam, thanks for this feedback. We re-wrote this section substantially a couple of months ago and the directory import is not properly documented. The best way to do it currently is to use a filepath with a wildcard expression (a glob),for example this works:

library(quanteda)
myCorp <- corpus(textfile(file='~/Dropbox/QUANTESS/corpora/amicus/balanced/*'))
summary(myCorp)

You should also be able to use the wildcard to select only certain filetypes, e.g. /*.txt

I was also able to reproduce the first problem you mention, which seems to be a bug in our VCorpus import method - the extracted texts weren't typed as a character vector. I've made a change and pushed it, this now works on my system.

ds <- VCorpus(DirSource('~/Dropbox/QUANTESS/corpora/amicus/balanced/'))
corpus(ds)

Let me know if it works for you if you re-install the dev branch from github now.

@kbenoit kbenoit closed this as completed May 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants