Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removes exceptions from splitters #72

Closed
wants to merge 1 commit into from
Closed

Conversation

asdqwex
Copy link

@asdqwex asdqwex commented Aug 9, 2017

Not sure why that is in there but it looks like it is unneeded. Feel free to merge or reject as you see fit.

@coveralls
Copy link

coveralls commented Aug 9, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling bf45b5a on asdqwex:master into 4880754 on jsvine:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling bf45b5a on asdqwex:master into 4880754 on jsvine:master.

@jsvine
Copy link
Owner

jsvine commented Aug 10, 2017

Thanks @asdqwex! If I recall correctly — and it's been a while since I wrote those lines — the exceptions are meant to keep certain period-separated abbreviations stuck together, e.g., so "C.I.A." remains "C.I.A." and not three sentences ("C", "I", "A"). Ideally, markovify would have a more complete list of these abbreviations ... or a more sophisticated/generalized approach to detecting non-sentence-ending-periods. What do you think?

@alpha-beta-soup
Copy link

alpha-beta-soup commented May 24, 2018

Shouldn't it just consider any letter/s-separated-by-periods as acronyms, and not break them? U.S.A. and USA are semantically the same thing. I can't think of many cases where any period that is not followed by a whitespace character is sentence-ending. You just have to be careful to note that not all acronyms are \b(?:[a-zA-Z]\.){2,}, but also \b(?:(?:[a-zA-Z])+\.){2,} (the latter matches Ph.D whereas the former does not).

@jsvine
Copy link
Owner

jsvine commented May 24, 2018

That sounds like a promising approach. Thank you for suggesting it.

jsvine added a commit that referenced this pull request Aug 2, 2020
@jsvine
Copy link
Owner

jsvine commented Aug 2, 2020

Very belatedly, thank you for the suggestion. v0.8.3 now checks for a generic pattern, rather than a preformed list.

@jsvine jsvine closed this Aug 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants