Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] String support for cuDF #1032

Merged
merged 202 commits into from Mar 15, 2019
Merged

Conversation

@kkraus14
Copy link
Collaborator

@kkraus14 kkraus14 commented Feb 22, 2019

Fixes #825

Still lots to do:

  • Requires nvstrings 0.3 which is unreleased
  • Write unit tests for concatenating string columns and finish implementation
  • Write tests for using nvstring columns in relevant libcudf functions (join, groupby)
  • Write unit tests for the various string functions that will be exposed
  • Finish fixing getitem behavior for Series of string dtype
  • Write Cython and other lower level interop hooks for nvstrings objects
  • Expose functions from nvstrings as functions for Series objects
  • More...
felipeblazing and others added 16 commits Feb 6, 2019
…_CATEGORY which will be used to store the NVCATEGORY
…ementation for the csv reader since we would not really be using convertStrToValue for nvstring the way it is currently implemented
…n principle, added NVCategory to CMake, added assert(false) to csv type convertor to make sure no one tries to read nvcategory data from csv
…appen in release mode, will be cleaned up after
@randerzander randerzander added this to Next release in Bug Squashing Feb 23, 2019
@kkraus14 kkraus14 added this to PR-WIP in v0.6 Release via automation Feb 25, 2019
@kkraus14
Copy link
Collaborator Author

@kkraus14 kkraus14 commented Feb 25, 2019

@randerzander @beckernick Example of using nvstrings functions via cudf Series:

data = ['a', 'b', 'c', 'd', 'e']
nvs = nvstrings.to_device(data)
gs = cudf.Series(data)

output_dev_array = rmm.to_device(np.empty(5, dtype='int32'))
gs.str.len(output_dev_array.device_ctypes_pointer.value)
print(output_dev_array.copy_to_host())

Note I'll make these much more Pythonic in this PR but if you want to start kicking the tires now.

@kkraus14
Copy link
Collaborator Author

@kkraus14 kkraus14 commented Mar 14, 2019

rerun tests

@kkraus14
Copy link
Collaborator Author

@kkraus14 kkraus14 commented Mar 15, 2019

rerun tests

@kkraus14
Copy link
Collaborator Author

@kkraus14 kkraus14 commented Mar 15, 2019

rerun tests

1 similar comment
@kkraus14
Copy link
Collaborator Author

@kkraus14 kkraus14 commented Mar 15, 2019

rerun tests

@kkraus14 kkraus14 changed the title [WIP] String support for cuDF [REVIEW] String support for cuDF Mar 15, 2019
@kkraus14
Copy link
Collaborator Author

@kkraus14 kkraus14 commented Mar 15, 2019

rerun tests

Copy link
Member

@raydouglass raydouglass left a comment

Looks good from the CI & build side

@kkraus14 kkraus14 merged commit c1debef into rapidsai:branch-0.6 Mar 15, 2019
8 checks passed
v0.6 Release automation moved this from PR-Reviewer approved to Done Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
v0.6 Release
  
Done
Linked issues

Successfully merging this pull request may close these issues.

None yet