Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ngrams transformer #52

Closed
andrewsmartin opened this issue Oct 11, 2017 · 2 comments
Closed

Add ngrams transformer #52

andrewsmartin opened this issue Oct 11, 2017 · 2 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@andrewsmartin
Copy link
Contributor

See https://github.com/tensorflow/transform/blob/master/tensorflow_transform/mappers.py#L399

@andrewsmartin andrewsmartin added enhancement New feature or request help wanted Extra attention is needed labels Oct 11, 2017
@nevillelyh
Copy link
Contributor

This can be treated as an n-hot to m transformer, i.e. Seq[String] to Seq[String] where m is n-gram combinations of n input words. But the total vocabulary space might explode. Not sure if worth it.

@richwhitjr
Copy link
Contributor

Probably one of things to add if people ask for this. I have used something similar to this in the past but just created the n grams prior to the n-hot encoder instead of in the transformer.

andrewsmartin added a commit that referenced this issue Oct 17, 2017
andrewsmartin added a commit that referenced this issue Oct 17, 2017
Tweak ngram params.

Remove type alias.

Use stream to compute ngrams lazily
andrewsmartin added a commit that referenced this issue Oct 17, 2017
Tweak ngram params.

Remove type alias.

Use stream to compute ngrams lazily
nevillelyh pushed a commit that referenced this issue Oct 18, 2017
Tweak ngram params.

Remove type alias.

Use stream to compute ngrams lazily
nevillelyh pushed a commit that referenced this issue Oct 18, 2017
Tweak ngram params.

Remove type alias.

Use stream to compute ngrams lazily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants