-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error getting a directory of texts in to quanteda corpus #37
Comments
Try the dev branch rather than the cran version. And why start with tm at all? Will look at this in more detail next week. Ken Sent from my iPhone On 1 May 2015, at 13:38, adamramey <notifications@github.commailto:notifications@github.com> wrote: I've tried to get a directory of texts in to a quanteda corpus with some issues. First, I make a VCorpus using the DirSource function in tm. Second, I try to make the object a quanteda corpus. However, I get the error "no applicable method for 'corpus' applied to an object of class "list."" But it's not a list; I've checked the files and everything seems sound. library(quanteda) Attaching package: 'tm' The following objects are masked from 'package:quanteda': as.DocumentTermMatrix, stopwords ds <- VCorpus(DirSource("~/Desktop/Speeches/House/2000/")) Reply to this email directly or view it on GitHubhttps://github.com//issues/37. |
Was using the dev branch...there used to be a directory function in On Fri, May 1, 2015 at 4:46 PM, Kenneth Benoit notifications@github.com
Adam Ramey, Ph.D. Saadiyat Island Office: +971 2 628 5036 N.B. Abu Dhabi is EST+8 from April-October and EST+9 from November-March. |
Yes ?textfile much better than the old method. Sent from my iPhone On 1 May 2015, at 14:49, adamramey <notifications@github.commailto:notifications@github.com> wrote: Was using the dev branch...there used to be a directory function in On Fri, May 1, 2015 at 4:46 PM, Kenneth Benoit <notifications@github.commailto:notifications@github.com>
Adam Ramey, Ph.D. Saadiyat Island Office: +971 2 628 5036 N.B. Abu Dhabi is EST+8 from April-October and EST+9 from November-March. Reply to this email directly or view it on GitHubhttps://github.com//issues/37#issuecomment-98126349. |
Hi Adam, thanks for this feedback. We re-wrote this section substantially a couple of months ago and the directory import is not properly documented. The best way to do it currently is to use a filepath with a wildcard expression (a glob),for example this works:
You should also be able to use the wildcard to select only certain filetypes, e.g. /*.txt I was also able to reproduce the first problem you mention, which seems to be a bug in our VCorpus import method - the extracted texts weren't typed as a character vector. I've made a change and pushed it, this now works on my system.
Let me know if it works for you if you re-install the dev branch from github now. |
I've tried to get a directory of texts in to a quanteda corpus with some issues. First, I make a VCorpus using the DirSource function in tm. Second, I try to make the object a quanteda corpus. However, I get the error "no applicable method for 'corpus' applied to an object of class "list."" But it's not a list; I've checked the files and everything seems sound.
Attaching package: ‘tm’
The following objects are masked from ‘package:quanteda’:
The text was updated successfully, but these errors were encountered: