Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"to" is a preposition and not a conjuction #1107

Open
NikhilVerma opened this issue May 2, 2024 · 2 comments
Open

"to" is a preposition and not a conjuction #1107

NikhilVerma opened this issue May 2, 2024 · 2 comments

Comments

@NikhilVerma
Copy link

https://www.dictionary.com/browse/to

I am trying to build a sentence separator which can split a sentence if it has multiple verb or noun conjunctions.

The current approach is to do something like this

	const conjunctionSplit = doc
		.splitOn("#Adverb? #Verb (#Conjunction|,)")
		.splitOn("(#Conjunction|,) #Adverb? #Verb");

However a sentence like "An organisation should make best efforts to protect it's hardware and software." gets parsed as

[
    "An organisation should make best efforts",
    "to protect",
    "it's",
    "hardware and",
    "software."
]

which should be parsed as

[
    "An organisation should make best efforts to protect it's",
    "hardware and",
    "software."
]

My current workaround is to do this:

world.model.one.lexicon.to = "Preposition";

It's awesome that compromise let's me edit the lexicon so easily. But I think it should be updated in the main library as well

@spencermountain
Copy link
Owner

hey Nikhil, yep you're right - looks like a mis-tagging by compromise in this case.
I'm happy to check it out for the next release
thanks for the heads-up
cheers

@spencermountain
Copy link
Owner

hey, longer answer this time:
the Penn Tagset has a whole new part-of-speech tag for TO, which I think is why it became a Conjunction in the test-set I used, and why we call it a conjunction by default in compromise. I changed it now, and a billion tests failed. This change should probably be in a major release.

Personally, i've never been clear on the difference - 'head and tail' vs 'head to tail'. I'd love to know if you, (or anyone!) has any opinions on this of any strength - they both seem to do the same thing, to me.

gonna punt this for now. Thank you for flagging it to me
cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants