Skip to content
This repository has been archived by the owner on Feb 28, 2021. It is now read-only.

Failed to detect number substitutions #5

Closed
priyankagv1 opened this issue Mar 19, 2019 · 9 comments
Closed

Failed to detect number substitutions #5

priyankagv1 opened this issue Mar 19, 2019 · 9 comments
Labels
bug Something isn't working

Comments

@priyankagv1
Copy link

priyankagv1 commented Mar 19, 2019

When trying to identify profane words sh1t is not getting identified as profane.
Levenstein approach should have identified the variation to the original profane word.
Also, I see that sh1t is listed under the profane word dictionary. Could you please see where the problem is?

@rominf rominf added the bug Something isn't working label Mar 20, 2019
@rominf
Copy link
Owner

rominf commented Mar 20, 2019

Thank you for the report. The problem was that Spacy tokenizer splitted sh1t into tokens sh1 and t. I fixed this by adding all profane words to tokenizer special cases. Please use the latest version from PyPI.

@rominf rominf closed this as completed Mar 20, 2019
@priyankagv1
Copy link
Author

Thank you so much.Will try and let you know!

@rominf
Copy link
Owner

rominf commented Mar 20, 2019

Forgot to mention: with my improvements sh1t is detected fine (because it's in the profane word dictionary), but sh5t is still splitted into 2 words and, therefore is not detected. I don't know how to fix this yet.

@rominf rominf reopened this Mar 20, 2019
@rominf
Copy link
Owner

rominf commented Mar 20, 2019

I've got an idea. Will try it tomorrow.

@priyankagv1
Copy link
Author

Thank you!

@rominf
Copy link
Owner

rominf commented Mar 22, 2019

I've got a better idea and I need more time.

@rominf
Copy link
Owner

rominf commented Mar 24, 2019

Blocked by #14.

@rominf
Copy link
Owner

rominf commented Mar 28, 2019

@priyankagv1, finally solved it. Please, try the latest version from PyPI.

@priyankagv1
Copy link
Author

Sure..thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants