Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider umlaut forms when building tokenized map #15235

Merged
merged 5 commits into from
May 6, 2024

Conversation

tors42
Copy link
Contributor

@tors42 tors42 commented May 5, 2024

Here's an implementation proposal for the player replacement part of #15152

Allows for writing broadcast player replacements using umlaut form (i.e "Blübaum" instead of "Bluebaum") and have replacement happen even if the name in the PGN is spelled "Bluebaum".

Previously each player replacement name was mapped into a single token string which identifies the replacement info:

"Matthias Blübaum" -> Map("blubaum matthias" -> ReplacementInfo)

Now names with umlauts will make an additional mapping:

"Matthias Blübaum" -> Map("blubaum matthias"  -> ReplacementInfo,
                          "bluebaum matthias" -> ReplacementInfo)

Relates: #15152

Note,
this can introduce mismatches though... I haven't looked up any "real" mismatches,
but here's a made up one for instance:

If broadcast is created with a player replacement for fictional player "Joe Sü",
there will be two mappings, "joe su" and "joe sue".
When "Joe Sü" then plays against fictional WGM "Sue Joe",
"Sue Joe"'s info will be replaced with "Joe Sü"'s info!
(Workaround is to change the player replacement for fictional player "Joe Sü" to "Joe Su",
which would avoid the additional mapping...)

Here's an example scala-cli application which "passes" after this patch - including the bad Sü-case and the good Blübaum-case,

umlaut.scala
//> using scala 3.4.1
//> using dep io.github.tors42:chariot:0.0.87

@main
def main() =

  val lichessApi = "http://localhost:8080"
  val token      = "lip_diego"

  var broadcastReplacements = List(
                               // Tokenized form:
    "Magnus Carlsen / 2863",   // carlsen magnus
    "Senor Ramirez / 1812",    // ramirez senor
    "José Ángel / 2002",       // angel jose
    "Matthias Blübaum / 2649", // blubaum matthias (+ bluebaum matthias)
    "Joe Sü / 1700 / FM"       // joe su           (+ joe sue)
  ).mkString("\n")

  val incomingPGN =
    """
       [White "Matthias Bluebaum"]
       [Black "Magnus Carlsen"]

       1. d4 d5 *

       [White "Jose Angel"]
       [Black "Señor Ramirez"]

       1. d4 d5 *

       [White "Joe Sü"]
       [Black "Sue Joe"]
       [BlackElo "2000"]
       [BlackTitle "WGM"]

       1. d4 d5 *""".stripIndent().linesIterator.drop(1).mkString("\n")

  val expectedPGN =
    """
       [White "Matthias Bluebaum"]
       [Black "Magnus Carlsen"]
       [WhiteElo "2649"]
       [BlackElo "2863"]

       1. d4 d5 *

       [White "Jose Angel"]
       [Black "Señor Ramirez"]
       [WhiteElo "2002"]
       [BlackElo "1812"]

       1. d4 d5 *

       [White "Joe Sü"]
       [Black "Sue Joe"]
       [WhiteElo "1700"]
       [WhiteTitle "FM"]
       [BlackElo "1700"]
       [BlackTitle "FM"]

       1. d4 d5 *""".stripIndent().linesIterator.drop(1).mkString("\n")

  val client = chariot.Client.basic(conf => conf.api(lichessApi))
    .withToken(token)

  val broadcast = client.broadcasts().create(p => p
    .name("Broadcast Name")
    .shortDescription("Short Broadcast Description")
    .longDescription("Looooong Broadcast Description")
    .players(broadcastReplacements)).get()

  val round = client.broadcasts().createRound(broadcast.id(),
    p => p.name("Round Name")).get()

  client.broadcasts().pushPgnByRoundId(round.id(), incomingPGN)

  def filterTags(pgn: chariot.model.Pgn): chariot.model.Pgn =
    chariot.model.Pgn.of(
      pgn.tags().stream()
        .filter(tag => Set(
          "White", "WhiteElo", "WhiteTitle", "Black", "BlackElo", "BlackTitle"
        ).contains(tag.name()))
        .toList(),
        pgn.moves())

  val exportedPGN = String.join("\n\n",
    client.broadcasts().exportPgn(broadcast.id())
      .stream()
      .map(filterTags(_))
      .map(_.toString)
      .toList())

  if exportedPGN == expectedPGN then
    println("Exported PGN matched expected PGN")
  else
    println(s"""\nExpected:\n$expectedPGN%nActual:\n$exportedPGN\nDiffEnd""")

Allows for writing broadcast player replacements using umlaut form (i.e
"Blübaum" instead of "Bluebaum") and have replacement happen even if the
name in the PGN is spelled "Bluebaum".

Previously each player replacement name was mapped into a single token
string which identifies the replacement info:
"Matthias Blübaum" -> Map("blubaum matthias" -> ReplacementInfo)

Now names with umlauts will make an additional mapping:
"Matthias Blübaum" -> Map("blubaum matthias"  -> ReplacementInfo,
                          "bluebaum matthias" -> ReplacementInfo)

Relates: lichess-org#15152
Copy link
Member

@lenguyenthanh lenguyenthanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks great, especially with the scala-cli example.

I think it'd better to move umlautify function to tokenize object as it a part of it. And it'll simplify tokenizedPlayers a bit more.

Also added some code golf suggestions because it's fun ⛳

modules/relay/src/main/RelayPlayers.scala Outdated Show resolved Hide resolved
modules/relay/src/main/RelayPlayers.scala Show resolved Hide resolved
ornicar and others added 4 commits May 6, 2024 10:51
Co-authored-by: Thanh Le <lenguyenthanh@hotmail.com>
* master:
  move app rate limiters to web
  Revert necessary import for pipe
  ensure all rate limiters are configured
  move ctrl limiters to web - WIP
  scala tweaks
  report donation stats every 24h
  New Crowdin updates (lichess-org#15226)
  chessground redrawAll
  scss tweaks
  put is3d in board change event
  prettier
  Don't refetch same css, cached or no
  brightness/opacity on last move & check squares, but no hue rotate
  disable social links on kid profiles
  Hide kid teams on profile
  fix 3d piece z index when toggling 3d after page load
@ornicar ornicar merged commit 0b2a2dd into lichess-org:master May 6, 2024
3 checks passed
@tors42
Copy link
Contributor Author

tors42 commented May 6, 2024

Eeek, the code golfing changed the behaviour of the Pull Request 😅

The post-"code golfing" version,

private lazy val tokenizedPlayers: Map[PlayerToken, RelayPlayer] =
    players.mapKeys(umlautify.andThen(_.value).andThen(tokenize.apply))

, is a 1-1 mapping - all keys are "umlautified". We now only tokenize the umlautified form.

The pre-"code golfing" version,

private lazy val tokenizedPlayers: Map[PlayerToken, RelayPlayer] =
    players.iterator
      .flatMap((name, player) => Set(name, umlautify(name)).map((_, player)))
      .map((name, player) => (tokenize.apply(name.value), player))
      .toMap

, would possibly insert an extra element in the map for each key - it would tokenize the original name, and if the original name had an "umlautified" form, it would also tokenize the "umlautified" form.
(Set(name, umlautify(name)) can contain 1 or 2 entries)

pp:ing post-"code golfing"

HashMap(
    bluebaum matthias -> RelayPlayer(None,Some(2649),None,None)
    joe sue -> RelayPlayer(None,Some(1700),Some(FM),None)
    carlsen magnus -> RelayPlayer(None,Some(2863),None,None)
    ramirez senor -> RelayPlayer(None,Some(1812),None,None)
    angel jose -> RelayPlayer(None,Some(2002),None,None)
)

pp:ing pre-"code golfing"

HashMap(                                                                                                         
    blubaum matthias -> RelayPlayer(None,Some(2649),None,None)
    bluebaum matthias -> RelayPlayer(None,Some(2649),None,None)
    joe su -> RelayPlayer(None,Some(1700),Some(FM),None)
    joe sue -> RelayPlayer(None,Some(1700),Some(FM),None)
    carlsen magnus -> RelayPlayer(None,Some(2863),None,None)
    ramirez senor -> RelayPlayer(None,Some(1812),None,None)
    angel jose -> RelayPlayer(None,Some(2002),None,None)
)

Maybe the post-"code golfing" is fine,
but it "loses support" for the (plausible?) [White "Matthias Blubaum"] spelling...

@lenguyenthanh
Copy link
Member

lenguyenthanh commented May 7, 2024

oh, sorry, I felt weird about Set(name, umlautify(name)) but then ignored it 😓 . Lets me fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants