Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] String support for cuDF #1032

Merged
merged 202 commits into from Mar 15, 2019

Conversation

kkraus14
Copy link
Collaborator

@kkraus14 kkraus14 commented Feb 22, 2019

Fixes #825

Still lots to do:

  • Requires nvstrings 0.3 which is unreleased
  • Write unit tests for concatenating string columns and finish implementation
  • Write tests for using nvstring columns in relevant libcudf functions (join, groupby)
  • Write unit tests for the various string functions that will be exposed
  • Finish fixing getitem behavior for Series of string dtype
  • Write Cython and other lower level interop hooks for nvstrings objects
  • Expose functions from nvstrings as functions for Series objects
  • More...

felipeblazing and others added 16 commits February 5, 2019 20:14
…_CATEGORY which will be used to store the NVCATEGORY
…ementation for the csv reader since we would not really be using convertStrToValue for nvstring the way it is currently implemented
…n principle, added NVCategory to CMake, added assert(false) to csv type convertor to make sure no one tries to read nvcategory data from csv
…appen in release mode, will be cleaned up after
@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress cuDF (Python) Affects Python cuDF API. labels Feb 22, 2019
@kkraus14 kkraus14 self-assigned this Feb 22, 2019
@randerzander randerzander added this to Next release in Bug Squashing Feb 23, 2019
@kkraus14 kkraus14 added this to PR-WIP in v0.6 Release via automation Feb 25, 2019
@kkraus14
Copy link
Collaborator Author

@randerzander @beckernick Example of using nvstrings functions via cudf Series:

data = ['a', 'b', 'c', 'd', 'e']
nvs = nvstrings.to_device(data)
gs = cudf.Series(data)

output_dev_array = rmm.to_device(np.empty(5, dtype='int32'))
gs.str.len(output_dev_array.device_ctypes_pointer.value)
print(output_dev_array.copy_to_host())

Note I'll make these much more Pythonic in this PR but if you want to start kicking the tires now.

@kkraus14
Copy link
Collaborator Author

rerun tests

@kkraus14
Copy link
Collaborator Author

rerun tests

@kkraus14
Copy link
Collaborator Author

rerun tests

1 similar comment
@kkraus14
Copy link
Collaborator Author

rerun tests

@kkraus14 kkraus14 changed the title [WIP] String support for cuDF [REVIEW] String support for cuDF Mar 15, 2019
@kkraus14 kkraus14 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 15, 2019
@kkraus14
Copy link
Collaborator Author

rerun tests

@kkraus14 kkraus14 added ! - Release and removed 3 - Ready for Review Ready for review by team labels Mar 15, 2019
Copy link
Member

@raydouglass raydouglass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from the CI & build side

@kkraus14 kkraus14 merged commit c1debef into rapidsai:branch-0.6 Mar 15, 2019
v0.6 Release automation moved this from PR-Reviewer approved to Done Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuDF (Python) Affects Python cuDF API.
Projects
No open projects
v0.6 Release
  
Done
Development

Successfully merging this pull request may close these issues.

None yet