Hi Ken,
I have a question about quanteda collocations that I'm hoping you might be able to help with.
I'm looking for collocations of terms within a corpus of customer complaints and I've turned again to quanteda to help with this.
I notice that when you calculate the collocations your first tokenise the texts with simplify = TRUE, returning a simple vector of all tokens across all documents in the corpuse rather than tokens within a document.
This means that I'm detecting collocations that occur across documents in the corpus, not just within them and I'm not sure if that's correct.
For example if document 1 contains the text "This is a test" and document 2 contains the text "This is also a test" then the collocation "test this" is returned, even though it doesn't exist within a document in the corpus.
Are you able to shed any light on why this is the behaviour of collocations, and if this is correct?
Thanks very much and I hope you're well. I'm enjoying the work I'm doing with analysis of text and quanteda makes things a lot easier!
Jim.
Hi Ken,
I have a question about quanteda collocations that I'm hoping you might be able to help with.
I'm looking for collocations of terms within a corpus of customer complaints and I've turned again to quanteda to help with this.
I notice that when you calculate the collocations your first tokenise the texts with simplify = TRUE, returning a simple vector of all tokens across all documents in the corpuse rather than tokens within a document.
This means that I'm detecting collocations that occur across documents in the corpus, not just within them and I'm not sure if that's correct.
For example if document 1 contains the text "This is a test" and document 2 contains the text "This is also a test" then the collocation "test this" is returned, even though it doesn't exist within a document in the corpus.
Are you able to shed any light on why this is the behaviour of collocations, and if this is correct?
Thanks very much and I hope you're well. I'm enjoying the work I'm doing with analysis of text and quanteda makes things a lot easier!
Jim.