Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not resolve the Chinese title! #49

Closed
weihong1028 opened this issue Jan 5, 2017 · 15 comments
Closed

Can not resolve the Chinese title! #49

weihong1028 opened this issue Jan 5, 2017 · 15 comments

Comments

@weihong1028
Copy link

Chinese title will be resolved as "-"

@mixu
Copy link
Owner

mixu commented Jan 5, 2017

I think the issue is here: https://github.com/mixu/markdown-stream-utils/blob/master/lib/annotate-md-headings.js

token.id = token.text.toLowerCase().replace(/[^\w]+/g, '-');

I'm guessing the Chinese title matches /[^\w]+/g and hence becomes -. I think I'd need a piece of code that instead takes any non-word characters and converts them into base64 codes that correspond to the relevant (raw) characters since it looks like that's what the Chinese wikipedia does when it converts non-alphanumerics to a header id. Any chance you could write a PR to the repo I mentioned above with a fix (and ideally a test)?

edit: example from Wikipedia: https://zh.wikipedia.org/wiki/1949%E5%B9%B4%E5%AE%89%E5%B7%B4%E6%89%98%E5%9C%B0%E9%9C%87#.E5.9C.B0.E8.B4.A8

@weihong1028
Copy link
Author

weihong1028 commented Jan 5, 2017

I think you are right.
The Chinese title will becomes - when it matches /[^\w]+/g.
And I just use encodeURIComponent() to convert them.
token.id = encodeURIComponent(token.text.toLowerCase());
Now it works.

Another file should be fix in https://github.com/mixu/markdown-styles/blob/master/lib/convert-md.js.

var id = this.options.headerPrefix + encodeURIComponent (raw.toLowerCase());

And I find another issue, the same name heading in the toc cannot auto add -{num}

@weihong1028
Copy link
Author

weihong1028 commented Jan 5, 2017

Hello, I want to display the link raw like ../../#基本理解,not like ../../#%20%JD%90.
So I don't want to useencodeURIComponent to convert, what is your opinion?
token.id = token.text.toLowerCase();

weihong1028 pushed a commit to weihong1028/markdown-styles that referenced this issue Jan 5, 2017
@mixu
Copy link
Owner

mixu commented Jan 5, 2017

yeah I think token.id = token.text.toLowerCase().replace(/\s+/g, '-'); seems fine - according to http://stackoverflow.com/questions/70579/what-are-valid-values-for-the-id-attribute-in-html and https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/id there may be compatibility issues with non alphanumeric strings, but I think in practice that won't be an issue and the only "explicitly bad" string is whitespace

it seems slightly better than token.id = token.text.trim().toLowerCase(); because String.prototype.trim() only removes spaces at the beginning or end of the string

@mixu
Copy link
Owner

mixu commented Jan 5, 2017

thanks for looking into this! could you also add a test in https://github.com/mixu/markdown-styles/blob/master/test/integration.test.js with chinese headings (preferably with the same heading used multiple times) so that if I end up making any additional changes at some later time, the test will catch any potential problems with Chinese characters?

@weihong1028
Copy link
Author

Nice!token.id = token.text.toLowerCase().replace(/\s+/g, '-'); is better.
And I will try to add a test in integration.test.js.

@weihong1028
Copy link
Author

Hello, I think should use .trim() removes spaces at the beginning or end of the string first, and then try .replace(/\s+/g, '-').
Is it right?

@mixu
Copy link
Owner

mixu commented Jan 5, 2017

sgtm!

@weihong1028
Copy link
Author

weihong1028 commented Jan 5, 2017

I try to add a test in https://github.com/mixu/markdown-styles/blob/master/test/integration.test.js,and it can pass the test.

it('reads and renders metadata stored in a file with Chinese characters and the same heading used multiple times', function(done) {
      var dir = fixture.dir({
        'foo.md': [
          'title: 你好 世界',
          'author: Anonymous',
          '----',
          '# 世界因我而不同',
          '## 世界因我而不同',
          '### 世界因我而不同',
          '# 一个世界,一个梦想',
          '## 一个世界,一个梦想',
          '### 世界因我而不同'
        ].join('\n')
      });

      var out = fixture.dirname();

      mds.render({
        input: dir,
        output: out,
        layout: layoutDir
      }, function() {
        assert.equal(fs.readFileSync(out + '/foo.html', 'utf8'), [
            '"你好 世界" by Anonymous',
            '<h1 id="世界因我而不同">世界因我而不同</h1>',
            '<h2 id="世界因我而不同-1">世界因我而不同</h2>',
            '<h3 id="世界因我而不同-2">世界因我而不同</h3>',
            '<h1 id="一个世界,一个梦想">一个世界,一个梦想</h1>',
            '<h2 id="一个世界,一个梦想-1">一个世界,一个梦想</h2>',
            '<h3 id="世界因我而不同-3">世界因我而不同</h3>\n'
          ].join('\n'));
        done();
      });
    });

But the generate link in nav-list is wrong

@mixu
Copy link
Owner

mixu commented Jan 5, 2017

great, could you update the PRs and I'll see if I can fix that last issue tomorrow?

weihong1028 pushed a commit to weihong1028/markdown-styles that referenced this issue Jan 5, 2017
@weihong1028
Copy link
Author

Thank you, I have already update the PRs.

@weihong1028
Copy link
Author

replace(/\s+/g, '-') is wrong convert character .

"angular.directive".replace(/\s+/g, '-')
"angular.directive"
"angular directive".replace(/\s+/g, '-')
"angular-directive"

@mixu mixu closed this as completed in c3c31ae Jan 5, 2017
@mixu
Copy link
Owner

mixu commented Jan 5, 2017

cool thanks! I did a bunch of cleanup, squashed a few commits and added a test for the TOC generation. It should all be fixed if you install v3.1.9 (e.g. npm install -g markdown-styles@latest).

@mixu
Copy link
Owner

mixu commented Jan 5, 2017

thanks again, let me know if you run into any further issues!

@weihong1028
Copy link
Author

weihong1028 commented Jan 5, 2017

So great! It works! Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants