lemmatizer should skip `Typo` + `GoesWith` tags #1345

Jemoka · 2024-02-16T21:23:59Z

nlp = stanza.Pipeline(lang="en")
nlp("Hi Andrea")

gives

...
    {
      "id": 2,
      "text": "Andrea",
      "lemma": "andreabertone@enron_development",
      "upos": "PROPN",
      "xpos": "NNP",
      "feats": "Number=Sing",
      "head": 1,
      "deprel": "vocative",
      "start_char": 3,
      "end_char": 9,
      "ner": "S-PERSON",
      "multi_ner": [
        "S-PERSON"
      ]
    }
...

causing incorrect lemmas to leak from the training data.

The text was updated successfully, but these errors were encountered:

AngledLuffa · 2024-02-17T17:53:34Z

oops, also a problem here:

stanza/stanza/models/lemmatizer.py

Line 168 in 82c6968

trainer.train_dict(train_batch.doc.get([TEXT, UPOS, LEMMA]))

AngledLuffa · 2024-03-03T21:42:18Z

With version 1.8.0 or 1.8.1:

>>> pipe("Hi Andrea")
[
  [
    {
      "id": 1,
      "text": "Hi",
      "lemma": "hi",
      "upos": "INTJ",
      "xpos": "UH",
      "start_char": 0,
      "end_char": 2
    },
    {
      "id": 2,
      "text": "Andrea",
      "lemma": "Andrea",
      "upos": "PROPN",
      "xpos": "NNP",
      "feats": "Number=Sing",
      "start_char": 3,
      "end_char": 9,
      "misc": "SpaceAfter=No"
    }
  ]
]

Jemoka added the bug label Feb 16, 2024

AngledLuffa mentioned this issue Feb 17, 2024

Lemma goeswith #1346

Merged

AngledLuffa closed this as completed Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lemmatizer should skip `Typo` + `GoesWith` tags #1345

lemmatizer should skip `Typo` + `GoesWith` tags #1345

Jemoka commented Feb 16, 2024

AngledLuffa commented Feb 17, 2024

AngledLuffa commented Mar 3, 2024

lemmatizer should skip Typo + GoesWith tags #1345

lemmatizer should skip Typo + GoesWith tags #1345

Comments

Jemoka commented Feb 16, 2024

AngledLuffa commented Feb 17, 2024

AngledLuffa commented Mar 3, 2024

lemmatizer should skip `Typo` + `GoesWith` tags #1345

lemmatizer should skip `Typo` + `GoesWith` tags #1345