Skip to content

Remove unicode characters from name string#2

Merged
nbudin merged 1 commit into
gively:masterfrom
mission-met:master
Sep 30, 2019
Merged

Remove unicode characters from name string#2
nbudin merged 1 commit into
gively:masterfrom
mission-met:master

Conversation

@rickychilcott

Copy link
Copy Markdown
Contributor

The ntee categories json dataset has Unicode characters that aren't appropriate. This PR removes any Unicode characters from the name string.

You can see the dataset issues in these two examples, but there are plenty of others:
https://github.com/gively/ntee/blob/master/lib/ntee_categories.json#L1798
https://github.com/gively/ntee/blob/master/lib/ntee_categories.json#L1882

@nbudin

nbudin commented Sep 30, 2019

Copy link
Copy Markdown
Member

Interesting. Given where these are in the names of organizations, I'm guessing these are Windows curly quotes that got incorrectly translated to Unicode? (If so, what does this end up looking like for those two organizations after the change?)

@rickychilcott

rickychilcott commented Sep 30, 2019

Copy link
Copy Markdown
Contributor Author

I'm not sure it's curly quotes because when I look up those unicode characters (\u00e2, \u0080, \u0099) they are either not found or listed as nul null ctrl-@ see https://unicodelookup.com/#0080/1

When rendered in a list, they look like this:

Screen Shot 2019-09-28 at 12 22 54 PM

This PR will convert that list to:

Disabled Persons Rights (R23)
Womens Rights (R24)
Seniors Rights (R25)
Lesbian and Gay Rights (R26)
Childrens Rights (R28)

At first you'd of thought they were ' characters, but the pattern doesn't match for all names.

@nbudin

nbudin commented Sep 30, 2019

Copy link
Copy Markdown
Member

Man, so weird. In that case what you're doing definitely seems like the right solution. I'll get this merged and released. Thank you!

@nbudin nbudin merged commit 3c44068 into gively:master Sep 30, 2019
@nbudin

nbudin commented Sep 30, 2019

Copy link
Copy Markdown
Member

1.0.0 is now released with this patch. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants