Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Various common nouns tagged as proper noun. #1090

Closed
MarketingPip opened this issue Feb 12, 2024 · 6 comments
Closed

[Issue]: Various common nouns tagged as proper noun. #1090

MarketingPip opened this issue Feb 12, 2024 · 6 comments

Comments

@MarketingPip
Copy link
Contributor

MarketingPip commented Feb 12, 2024

It appears actor tags are causing tags to be tagged as NNP (when using - Penn Tag).

ie: "NNP" Proper Noun, Singular

function CompromiseTagger(word) {

    const doc = nlp(word);
    doc.compute('penn');
    const terms = doc.out('json')[0].terms[0];
    return terms.penn
}
console.log(CompromiseTagger("author"));

As far as I know these should be tagged as "Noun" / "NN"...

I would assume the list needs cleaned up and a rule set needs implemented for somethings. Example "bishop" is in the list.

"Bishop" could be tagged as a proper noun if referring to a specific person's title, such as "Bishop John." or [#Actor] (#FirstName|#Person+) etc...

@MarketingPip MarketingPip changed the title [Issue]: Common nouns tagged as proper. [Issue]: Various common nouns tagged as proper noun. Feb 12, 2024
@spencermountain
Copy link
Owner

hey Jared, this works for me -

nlp('author').debug() //Noun, Actor, Singular

cheers

@MarketingPip
Copy link
Contributor Author

@spencermountain - assuming you're using latest version of build...? And have you applied penn tags...? (I noticed some things were wrong - I'd have to reference some PDF's) but I think ordinal numbers are to be tagged as JJ. (I'll have to reference and confirm).

ps; I pulled build from esm but I'll update you shortly to see if I didn't pull latest version or something? 🤷‍♂️

@MarketingPip
Copy link
Contributor Author

MarketingPip commented Feb 12, 2024

@spencermountain - update. So Compromise (not using Penn tags), tags / chunk's it as a Noun. But when penn compute tags are applied turns into a NNP tag.

But again this code / example should return for your NNP.

import nlp from "https://esm.sh/compromise"


function CompromiseTagger(word) {

    const doc = nlp(word);
    doc.compute('penn');
    const terms = doc.out('json')[0].terms[0];
  return terms.penn
}
console.log(CompromiseTagger("bishop"));
console.log(CompromiseTagger("doctor")); 

spencermountain added a commit that referenced this issue Feb 13, 2024
@spencermountain
Copy link
Owner

agh, my apologies Jared, you're right.
found the errant NNP tag in the mapping. Thank you for your help.
will release a fix for this, this week.
thanks

@MarketingPip
Copy link
Contributor Author

MarketingPip commented Feb 14, 2024

No worries! I thought I was going crazy (trying to set up some demos of Compromise tagging some things in the HMM model I was showing you) until I started doing some digging hahahah!

That said - hoping you'll be pumped up with HMM model (and maybe consider taking Compromise) that approach etc.. with some rules. Seeing some weirdly crazy good accuracy on tags (without rules) and with some basic rules + help of Compromise (and my half ass brain lol) - even crazier results. Tho < the rule set applied from Compromise. (after predicting tags) - should blow things out of the water.

ps; Ordinal numbers are to be tagged as adjectives JJ - see here for more guidelines.

@spencermountain
Copy link
Owner

fixed in 14.12.0 - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants