-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider umlaut forms when building tokenized map #15235
Conversation
Allows for writing broadcast player replacements using umlaut form (i.e "Blübaum" instead of "Bluebaum") and have replacement happen even if the name in the PGN is spelled "Bluebaum". Previously each player replacement name was mapped into a single token string which identifies the replacement info: "Matthias Blübaum" -> Map("blubaum matthias" -> ReplacementInfo) Now names with umlauts will make an additional mapping: "Matthias Blübaum" -> Map("blubaum matthias" -> ReplacementInfo, "bluebaum matthias" -> ReplacementInfo) Relates: lichess-org#15152
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks great, especially with the scala-cli
example.
I think it'd better to move umlautify
function to tokenize
object as it a part of it. And it'll simplify tokenizedPlayers
a bit more.
Also added some code golf suggestions because it's fun ⛳
Co-authored-by: Thanh Le <lenguyenthanh@hotmail.com>
* master: move app rate limiters to web Revert necessary import for pipe ensure all rate limiters are configured move ctrl limiters to web - WIP scala tweaks report donation stats every 24h New Crowdin updates (lichess-org#15226) chessground redrawAll scss tweaks put is3d in board change event prettier Don't refetch same css, cached or no brightness/opacity on last move & check squares, but no hue rotate disable social links on kid profiles Hide kid teams on profile fix 3d piece z index when toggling 3d after page load
Eeek, the code golfing changed the behaviour of the Pull Request 😅 The post-"code golfing" version, private lazy val tokenizedPlayers: Map[PlayerToken, RelayPlayer] =
players.mapKeys(umlautify.andThen(_.value).andThen(tokenize.apply)) , is a 1-1 mapping - all keys are "umlautified". We now only tokenize the umlautified form. The pre-"code golfing" version, private lazy val tokenizedPlayers: Map[PlayerToken, RelayPlayer] =
players.iterator
.flatMap((name, player) => Set(name, umlautify(name)).map((_, player)))
.map((name, player) => (tokenize.apply(name.value), player))
.toMap , would possibly insert an extra element in the map for each key - it would tokenize the original name, and if the original name had an "umlautified" form, it would also tokenize the "umlautified" form. pp:ing post-"code golfing"
pp:ing pre-"code golfing"
Maybe the post-"code golfing" is fine, |
oh, sorry, I felt weird about |
Here's an implementation proposal for the player replacement part of #15152
Allows for writing broadcast player replacements using umlaut form (i.e "Blübaum" instead of "Bluebaum") and have replacement happen even if the name in the PGN is spelled "Bluebaum".
Previously each player replacement name was mapped into a single token string which identifies the replacement info:
Now names with umlauts will make an additional mapping:
Relates: #15152
Note,
this can introduce mismatches though... I haven't looked up any "real" mismatches,
but here's a made up one for instance:
If broadcast is created with a player replacement for fictional player "Joe Sü",
there will be two mappings, "joe su" and "joe sue".
When "Joe Sü" then plays against fictional WGM "Sue Joe",
"Sue Joe"'s info will be replaced with "Joe Sü"'s info!
(Workaround is to change the player replacement for fictional player "Joe Sü" to "Joe Su",
which would avoid the additional mapping...)
Here's an example scala-cli application which "passes" after this patch - including the bad Sü-case and the good Blübaum-case,
umlaut.scala