Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplaceWords transform is broken #20

Closed
bartvanandel opened this issue Apr 1, 2020 · 3 comments
Closed

ReplaceWords transform is broken #20

bartvanandel opened this issue Apr 1, 2020 · 3 comments

Comments

@bartvanandel
Copy link
Contributor

There are a few issues with the ReplaceWords transform. Most importantly, due to the way it's implemented, it actually alters any matching substring:

ReplaceWords({'foo': 'bar'})('foobar') returns barbar

This isn't necessarily a problem (it could be useful), but it is not consistent with the function name. I see two possibible fixes:

  • Change internals to re.sub(r'\b{}\b'.format(re.escape(key)), value, s). Note that \b is the word boundary mark.
  • Rename the function to ReplaceRegexes (or keep both). The example should reflect the current behavior. Regex replaces can be nice: you can for instance use ReplaceWords({r'\b(foo|bar)baz\b': r'\1'}) to remove 'baz' from 'foobaz' and 'barbaz' but keep it in all other cases (artificial example, I know).

The above indicates that this function (or these functions, if you go for the second option) can use a unit test. This is currently missing..

Finally, the transform is not included in transforms.__all__, and as a result it isn't directly available on jiwer, but only as part of jiwer.transforms.

@nikvaessen
Copy link
Collaborator

Any thoughts on #21 ?

@nikvaessen
Copy link
Collaborator

Thanks for raising the issue! Should be fixed with #21.

@bartvanandel
Copy link
Contributor Author

That was fast, great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants