Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect word count compared to wc -w #726

Closed
anarcat opened this issue Aug 29, 2022 · 3 comments
Closed

incorrect word count compared to wc -w #726

anarcat opened this issue Aug 29, 2022 · 3 comments

Comments

@anarcat
Copy link
Contributor

anarcat commented Aug 29, 2022

Markdown mode doesn't seem to support telling Emacs to skip certain characters to make sure word counts are accurate. An example of a recent text I wrote counts almost a thousand more words than wc.

Expected Behavior

anarcat@curie:anarc.at$ wc -w blog/2022-08-26-nationalize-internet.md
4475 blog/2022-08-26-nationalize-internet.md
anarcat@curie:anarc.at$ wc  blog/2022-08-26-nationalize-internet.md
  613  4475 34316 blog/2022-08-26-nationalize-internet.md

Actual Behavior

M-x count-words gives me:

Buffer has 613 lines, 5459 words, and 34267 characters.

Kind of fascinating that the byte count is different as well. I suspect that might be a bug in wc, where it would count UTF-8 characters as multiple characters instead of single ones.

Steps to Reproduce

  1. load this file in an emacs buffer with markdown-mode enabled
  2. M-x count-words

Backtrace

N/A

Software Versions

  • Markdown Mode: 2.4, from Debian stable packages
  • Emacs: GNU Emacs 27.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0) of 2021-03-27, modified by Debian
  • OS: Debian GNU/Linux stable "bullseye"
@syohex
Copy link
Collaborator

syohex commented Aug 30, 2022

I suppose that wc -w treats only spaces as a word separator. While emacs treats spaces, punctuations, quotes etc as a word separator. For example,

title="How"

wc -w counts this as 1 word, while emacs counts this as 2 words.

You can change emacs behavior through syntax-table configuration. Please modify your syntax-table configuration in markdown-mode, if you want emacs to work same as wc

Reference

@syohex syohex closed this as completed Aug 30, 2022
@anarcat
Copy link
Contributor Author

anarcat commented Aug 30, 2022

i understand where the bug comes from, what I am saying is i think markdown-mode should ship with such a syntax table. is there a table that i missed?

@syohex
Copy link
Collaborator

syohex commented Aug 30, 2022

markdown-mode should ship with such a syntax table

This is impossible. If markdown-mode uses such a syntax table, then many markdown commands and highlighting don't work as expected. And it also changes behaviors of many other commands such as word moving commands(M-f, M-b etc), symbol moving commands(C-M-f, C-M-b etc) and so on.

You can modify syntax-table personally . Please check markdown-mode-syntax-table and modify it as you expected in your configuration file. (You can see the current buffer syntax by M-x describe-syntax.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants