Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] String support for cuDF #1032

Merged
merged 202 commits into from Mar 15, 2019

Conversation

Projects
@kkraus14
Copy link
Member

commented Feb 22, 2019

Fixes #825

Still lots to do:

  • Requires nvstrings 0.3 which is unreleased
  • Write unit tests for concatenating string columns and finish implementation
  • Write tests for using nvstring columns in relevant libcudf functions (join, groupby)
  • Write unit tests for the various string functions that will be exposed
  • Finish fixing getitem behavior for Series of string dtype
  • Write Cython and other lower level interop hooks for nvstrings objects
  • Expose functions from nvstrings as functions for Series objects
  • More...

felipeblazing and others added some commits Feb 6, 2019

Added NVCategory to gdf_dtype_extra_info and a type called GDF_STRING…
…_CATEGORY which will be used to store the NVCATEGORY
added nvstring_category type to type dispatcher, created a dummy impl…
…ementation for the csv reader since we would not really be using convertStrToValue for nvstring the way it is currently implemented
made a quick hacky test to make sure we can build and that it works i…
…n principle, added NVCategory to CMake, added assert(false) to csv type convertor to make sure no one tries to read nvcategory data from csv
a very dirty commit with the purposes of debugging some issues that h…
…appen in release mode, will be cleaned up after

@randerzander randerzander added this to Next release in Bug Squashing Feb 23, 2019

rommelDB and others added some commits Feb 25, 2019

@kkraus14 kkraus14 added this to PR-WIP in v0.6 Release via automation Feb 25, 2019

@kkraus14

This comment has been minimized.

Copy link
Member Author

commented Feb 25, 2019

@randerzander @beckernick Example of using nvstrings functions via cudf Series:

data = ['a', 'b', 'c', 'd', 'e']
nvs = nvstrings.to_device(data)
gs = cudf.Series(data)

output_dev_array = rmm.to_device(np.empty(5, dtype='int32'))
gs.str.len(output_dev_array.device_ctypes_pointer.value)
print(output_dev_array.copy_to_host())

Note I'll make these much more Pythonic in this PR but if you want to start kicking the tires now.

kkraus14 and others added some commits Feb 26, 2019

@kkraus14

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2019

rerun tests

rommelDB and others added some commits Mar 14, 2019

Fix split operator with latest nvstrings changes, fix where we alloca…
…te a device array using numba instead of rmm
@kkraus14

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2019

rerun tests

@kkraus14

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2019

rerun tests

1 similar comment
@kkraus14

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2019

rerun tests

@kkraus14 kkraus14 changed the title [WIP] String support for cuDF [REVIEW] String support for cuDF Mar 15, 2019

@kkraus14

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2019

rerun tests

@raydouglass
Copy link
Member

left a comment

Looks good from the CI & build side

@kkraus14 kkraus14 merged commit c1debef into rapidsai:branch-0.6 Mar 15, 2019

8 checks passed

gpuCI/cudf-cpu-prb Build finished.
Details
gpuCI/cudf-cpu-prb/changelog Build #2463 succeeded in 1.3 sec
Details
gpuCI/cudf-cpu-prb/cuda10.0-py3.6 Build #14241 succeeded in 30 min
Details
gpuCI/cudf-cpu-prb/cuda10.0-py3.7 Build #14244 succeeded in 30 min
Details
gpuCI/cudf-cpu-prb/cuda9.2-py3.6 Build #14243 succeeded in 31 min
Details
gpuCI/cudf-cpu-prb/cuda9.2-py3.7 Build #14242 succeeded in 29 min
Details
gpuCI/cudf-cpu-prb/style Build #2572 succeeded in 4.6 sec
Details
gpuCI/cudf-gpu-prb Build finished. 9664 tests run, 826 skipped, 0 failed.
Details

v0.6 Release automation moved this from PR-Reviewer approved to Done Mar 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.