Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese characters are not grabbed with this regex? #475

Closed
mikeda37 opened this issue Mar 22, 2022 · 11 comments
Closed

Japanese characters are not grabbed with this regex? #475

mikeda37 opened this issue Mar 22, 2022 · 11 comments

Comments

@mikeda37
Copy link
Contributor

const has_word = /[a-z_0-9\]}]/i

Does this regex pattern grab Japanese characters?

@spencermountain
Copy link
Owner

oh, good catch. no it doesn't.
we should probably start using the \p{Letter} syntax - both here and elsewhere!
prs welcomed.
thanks

@mikeda37
Copy link
Contributor Author

@spencermountain
ok, i understand using Unicode property escapes but i just wonder if this filtering is necessary.

sub = sub.filter((a) => a && has_word.test(a))

can you please tell me what this is for? i tried reading code but it's hard for me!
thanks

@spencermountain
Copy link
Owner

hey mikeda37, good question. I don't remember.
If it's killing-off japanese language lists, feel free to remove or change it. My policy is that if the tests pass, it's good!

The number of list formats is completely nuts, if you're digging around in that code, there's a lot of random template bits.

Lastly, if you're making a pr, I've been converting dev branch to use es-modules. I was gonna ship it today, but you're not supposed to do releases on fridays. So i may do that on monday. it may make sense to wait until then.
cheers

@mikeda37
Copy link
Contributor Author

mikeda37 commented Apr 1, 2022

ok, you don't remember! 😄 i understand your policy!
i tested omitting the line but some unit tests fail here.

test('lists - get - return all lists on the page', (t) => {

test('lists - get - if the clue is any other type then return all lists', (t) => {

you don't remember these tests too? do you think i can remove these tests?

if you want to keep those tests, i would think of using Unicode property for filtering and then i need a few weeks until pr.
you can work for es-modules converting soon!
but in case you approve to remove those two tests, i can make a pr in a few days!
thanks

@spencermountain
Copy link
Owner

ok, 10.0.0 is released, you'll need to update before making a PR, let me know if I can help.
cheers

@mikeda37
Copy link
Contributor Author

mikeda37 commented Apr 4, 2022

ok, i got it!
how do you think about deleting those two tests? I did't understand those tests contents..
do you think we should keep them? or you think I have to analyze and judge?
thanks

@spencermountain
Copy link
Owner

hey - in the anarchism page, there are a bunch of lists with citations, with render no text, so they are empty lists.
the tests are correct - we need to filter them out. Feel free to come up with whatever solution you'd like
cheers

image

@mikeda37
Copy link
Contributor Author

mikeda37 commented Apr 6, 2022

@spencermountain, thanks for your explanation!
I see, those filtering is to remove empty lists, right?
So I won't delete the filtering and instead, I will fix this regex pattern /[a-z_0-9\]}]/i as removing empty lines but not to kill the unicode characters. is it fine?

by the way, the test fails with the latest source!
cheers

@spencermountain
Copy link
Owner

yep, that sounds good.
thanks

spencermountain added a commit that referenced this issue Jul 26, 2022
@spencermountain spencermountain mentioned this issue Jul 27, 2022
Merged
@spencermountain
Copy link
Owner

hi @mikeda37 i've added unicode support to the list regex you found in 10.0.2 - please let me know if you find any others
cheers

@mikeda37
Copy link
Contributor Author

hi, @spencermountain sorry for leaving the task and thanks for the fix!
I will try the fixed code when I got time.
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants