Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Case Insensitive Identifiers #29

Closed
derenrich opened this issue Aug 31, 2021 · 9 comments
Closed

Handle Case Insensitive Identifiers #29

derenrich opened this issue Aug 31, 2021 · 9 comments

Comments

@derenrich
Copy link

Twitter usernames (for example) are case insensitive. If I tell the plugin to link a twitter handle to an item and the same handle is already present (but in a different case) it should not add the redundant case. You can detect whether a property is sensitive to case by checking for "has quality" "case insensitivity".

@fuddl
Copy link
Owner

fuddl commented Sep 1, 2021

Twitter username → has quality → case insensitivity so
twitter.com/DerenRich should resolve to Q108267155 (actual value derenrich).

Alright, I'll see what I can do.

@fuddl
Copy link
Owner

fuddl commented Sep 4, 2021

@derenrich do you have any clue about how to make a case insensitive sparql query?

`
  SELECT ?item
  WHERE {
    ?item wdt:${ prop } "${ id }".
  }
`

should I just assume the string in the database is lower case?

`
SELECT ?item
WHERE {
  ?item wdt:${ prop } "${ ci ? id.toLowerCase() : id }".
}
`

@derenrich
Copy link
Author

i think you have to rewrite the query with a filter and downcasing both the search and target strings (or a regex). probably less efficient but it's the only way to be sure.

@fuddl
Copy link
Owner

fuddl commented Sep 5, 2021

I'm starting to wonder why we are doing this. Does any part of the twitter api downcase the usernames? Is there a url scheme that only serves the usernames lowercased? If not, why don't we store usernames how they occur? What converts them to lowercase, and why?

@camelCaseNick
Copy link
Contributor

camelCaseNick commented Sep 5, 2021

We are not doing that at all. Nothing is converted to lower case. On our side, P2002 is supposed to store the preferred variant and not the lower case one.

For me it would be "camelCaseNick" not "camelcasenick" nd if you clicked from within Twitter on my profile you'd get https://twitter.com/camelCaseNick. However if you manually navigated to https://twitter.com/camelcasenick Twitter would serve the same profile without a redirect. That is the problem here.

A functioning (though less efficient) query could be:

select ?item where {
  ?item wdt:${ prop } ?id.
  filter(lcase(?id) = "${ id.toLowerCase() }")
}

@fuddl
Copy link
Owner

fuddl commented Sep 5, 2021

However if you manually navigated to https://twitter.com/camelcasenick Twitter would serve the same profile without a redirect. That is the problem here.

I understand that. But how likely is it? Why should anybody do that?

I found https://twitter.com/camelCaseNick using twitters own search. However the page was linked with:

<link href="https://twitter.com/camelcasenick" rel="canonical">

So twitter seems to be uncertain, which username they want to show us. I would prefer to store the one that is not lowercased because it contains more information. 🤷

@derenrich
Copy link
Author

This is a general problem not limited to Twitter

fuddl added a commit that referenced this issue Sep 7, 2021
@fuddl
Copy link
Owner

fuddl commented Sep 7, 2021

@derenrich Version 0.189 should now resolve case insensitive identifiers case insensitively:
DerEnrich=derenrich

If you notice any problems, feel free to reopen this issue

@fuddl fuddl closed this as completed Sep 7, 2021
@derenrich
Copy link
Author

thanks!

Stvad pushed a commit to Stvad/wd that referenced this issue Dec 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants