Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OP] Join onto external dataframe, file, dictionary #17

Closed
benfred opened this issue Jun 4, 2020 · 2 comments · Fixed by #154
Closed

[OP] Join onto external dataframe, file, dictionary #17

benfred opened this issue Jun 4, 2020 · 2 comments · Fixed by #154
Assignees
Labels
enhancement New feature or request Outbrains Outbrains workflow

Comments

@benfred
Copy link
Member

benfred commented Jun 4, 2020

Issue by EvenOldridge
Thursday Mar 26, 2020 at 02:22 GMT
Originally opened as https://github.com/rapidsai/recsys/issues/32


Is your operator request related to a problem? Please describe.
Outbrains dataset

Join to external data in multiple formats. This mechanism will also be used / shared by https://github.com/rapidsai/recsys/issues/31 to apply the groupby operations.

Describe the solution you'd like

  • Type: Feature Engineering
  • input column type(s): [Categorical]
  • input options:
    • Columns to join off of: [Categorical] (just in case naming isn't the same?)
    • Categorical column names to be joined
    • Continuous column names to be joined
  • Expected transformation of the data after application: Data is joined

Optional: Describe operation stages in detail*
Statistics per chunk: N/A
Statistics combine: N/A
Apply: Join

Context:
Note that this is meant to be an iterator to gdf join and not an iterator to iterator join which we'll figure out in the future.

@benfred benfred added enhancement New feature or request Outbrains Outbrains workflow labels Jun 4, 2020
@rnyak
Copy link
Contributor

rnyak commented Jul 6, 2020

Related cudf issue: rapidsai/cudf#5621

@benfred
Copy link
Member Author

benfred commented Jul 13, 2020

We should pass in a nvtabular.Dataset to the operator - and merge with the groupby apply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Outbrains Outbrains workflow
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants