-
-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get word/sentence/paragraph count? #251
Comments
Did you check remark-retext? Although I've never used it, it seems to be what you need. Remark Retext |
Hi Kyle! 👋 Yup, retext does that! You’ll be interested in the links posted by @Rokt33r above, and I also made an example showing a way to use it all together: var unified = require('unified');
var parse = require('remark-parse');
var stringify = require('remark-stringify');
var english = require('retext-english');
var remark2retext = require('remark-retext');
var visit = require('unist-util-visit');
unified()
.use(parse)
.use(remark2retext, unified().use(english).use(count))
.use(stringify)
.processSync('*This* and _that_. \n> And some more stuff.\n\nAnd another thing.');
function count() {
return counter;
function counter(tree) {
var counts = {};
visit(tree, visitor);
console.log(counts);
function visitor(node) {
counts[node.type] = (counts[node.type] || 0) + 1;
}
}
} Yields: { RootNode: 1,
ParagraphNode: 3,
SentenceNode: 3,
WordNode: 10,
TextNode: 10,
WhiteSpaceNode: 10,
PunctuationNode: 3 } |
Oh this is perfect! And of course, Retext :-) I'm already using it so silly me for forgetting it. How hard is it to write the language parsers I'm curious? I'm planning on adding these counts as available data you can get from markdown files in Gatsby and I'm sure people will want support for other languages other than English and Dutch, the two I see you have parsers for. |
retext-latin is pretty OK for most Latin-script languages (and Cyrillic), however, sentence count is pretty hard to detect (is In kind-off think For other, “non-western” scripts, that’s pretty hard. We’d need other people for that as I’m not familiar with them enough to build the needed tools. |
Cool! I can add this plus add documentation pointing non-latin language people here. I also assume there's other tools that do word/sentence counts for non-latin languages so that might be a direction they suggest as well. |
Yes, I’d love it if more languages would be connected to retext, and I’m able to help out, but I don’t know enough of those languages to write them myself! |
Is there something in the Unist ecosystem for doing this for markdown files?
The text was updated successfully, but these errors were encountered: