Skip to content
This repository has been archived by the owner on May 19, 2023. It is now read-only.

[REVIEW] Tokenizer with rmm integration #155

Merged
merged 71 commits into from
Jun 12, 2020
Merged

[REVIEW] Tokenizer with rmm integration #155

merged 71 commits into from
Jun 12, 2020

Commits on Apr 28, 2020

  1. Updated cmake file. Updated basicTokenizer to use rmm

    Bianca Rhodes committed Apr 28, 2020
    Configuration menu
    Copy the full SHA
    73a14ad View commit details
    Browse the repository at this point in the history

Commits on May 6, 2020

  1. Revert basicTokenizer changes

    Bianca Rhodes committed May 6, 2020
    Configuration menu
    Copy the full SHA
    74a36bd View commit details
    Browse the repository at this point in the history
  2. Updated device token ids to rmm device_vector

    Bianca Rhodes committed May 6, 2020
    Configuration menu
    Copy the full SHA
    da89fea View commit details
    Browse the repository at this point in the history

Commits on May 7, 2020

  1. code cleanup

    Bianca Rhodes committed May 7, 2020
    Configuration menu
    Copy the full SHA
    304dbdc View commit details
    Browse the repository at this point in the history
  2. Update cmake file

    Bianca Rhodes committed May 7, 2020
    Configuration menu
    Copy the full SHA
    6bebdf5 View commit details
    Browse the repository at this point in the history
  3. Update device_word_indices to use rmm device vector

    Bianca Rhodes committed May 7, 2020
    Configuration menu
    Copy the full SHA
    9b86290 View commit details
    Browse the repository at this point in the history
  4. Converted device_tokens_per_word to rmm device_vector

    Bianca Rhodes committed May 7, 2020
    Configuration menu
    Copy the full SHA
    4f54001 View commit details
    Browse the repository at this point in the history

Commits on May 8, 2020

  1. Updated tensor_tokenIDS to rmm device_vector

    Bianca Rhodes committed May 8, 2020
    Configuration menu
    Copy the full SHA
    fd26d1c View commit details
    Browse the repository at this point in the history
  2. Update attention_mask to rmm device_vector

    Bianca Rhodes committed May 8, 2020
    Configuration menu
    Copy the full SHA
    d3d3174 View commit details
    Browse the repository at this point in the history
  3. Update metadata to rmm device_vector

    Bianca Rhodes committed May 8, 2020
    Configuration menu
    Copy the full SHA
    a395674 View commit details
    Browse the repository at this point in the history
  4. Update device_row2log to rmm device_vector

    Bianca Rhodes committed May 8, 2020
    Configuration menu
    Copy the full SHA
    c2c01f6 View commit details
    Browse the repository at this point in the history
  5. Updated device_row2row_within_log to rmm device_vector

    Bianca Rhodes committed May 8, 2020
    Configuration menu
    Copy the full SHA
    87a72f5 View commit details
    Browse the repository at this point in the history

Commits on May 11, 2020

  1. Configuration menu
    Copy the full SHA
    cab62e3 View commit details
    Browse the repository at this point in the history
  2. Update device_sentence_offsets to rmm device_vector

    Bianca Rhodes committed May 11, 2020
    Configuration menu
    Copy the full SHA
    2de9740 View commit details
    Browse the repository at this point in the history

Commits on May 12, 2020

  1. Add device synchronize

    Bianca Rhodes committed May 12, 2020
    Configuration menu
    Copy the full SHA
    d771ab1 View commit details
    Browse the repository at this point in the history

Commits on May 13, 2020

  1. Revert "Update device_sentence_offsets to rmm device_vector"

    This reverts commit 2de9740.
    Bianca Rhodes committed May 13, 2020
    Configuration menu
    Copy the full SHA
    d7d81e3 View commit details
    Browse the repository at this point in the history
  2. Revert "Updated copy_vec_to_device function param to rmm device vector"

    This reverts commit cab62e3.
    Bianca Rhodes committed May 13, 2020
    Configuration menu
    Copy the full SHA
    c9c42a4 View commit details
    Browse the repository at this point in the history
  3. Update device_sentence_offsets to rmm device_vector

    Bianca Rhodes committed May 13, 2020
    Configuration menu
    Copy the full SHA
    8e67c8a View commit details
    Browse the repository at this point in the history

Commits on May 14, 2020

  1. Configuration menu
    Copy the full SHA
    610c8a4 View commit details
    Browse the repository at this point in the history
  2. Updated device_chars_per_thread to rmm device vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    38b7721 View commit details
    Browse the repository at this point in the history
  3. Update device_code_points to rmm device_vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    f5b7d02 View commit details
    Browse the repository at this point in the history
  4. Update device_num_selected to rmm device vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    b0f8e5b View commit details
    Browse the repository at this point in the history
  5. Updated device_cp_metadata to rmm device_vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    b51f0b7 View commit details
    Browse the repository at this point in the history
  6. Update device_aux_data to rmm device_vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    e4e1a4d View commit details
    Browse the repository at this point in the history
  7. Update device_hash_table to rmm device_vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    175e3b6 View commit details
    Browse the repository at this point in the history
  8. Update device_bin_coefficients to rmm device_vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    43e4155 View commit details
    Browse the repository at this point in the history
  9. Update device_bin_offsets to rmm device_vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    38ebaa0 View commit details
    Browse the repository at this point in the history
  10. Update device_num_selected to rmm device_vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    a628753 View commit details
    Browse the repository at this point in the history
  11. Update cub_temp_storage to rmm device_vector

    Bianca Rhodes committed May 14, 2020
    Configuration menu
    Copy the full SHA
    e4f3a3c View commit details
    Browse the repository at this point in the history

Commits on May 15, 2020

  1. Updated cub_temp_storage to rmm device_vector

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    64460a0 View commit details
    Browse the repository at this point in the history
  2. Code cleanup - remove device sychronize

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    b80f814 View commit details
    Browse the repository at this point in the history
  3. Code cleanup - replace malloc_and_copy_vec_to_device

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    a8359d6 View commit details
    Browse the repository at this point in the history
  4. Code cleanup - remove comments

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    07a94cc View commit details
    Browse the repository at this point in the history
  5. Code cleanup. Add rmm to conda env

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    b8c0594 View commit details
    Browse the repository at this point in the history
  6. Add tokenizer test

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    98c4af6 View commit details
    Browse the repository at this point in the history
  7. Update changelog

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    97f8f38 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    42729e0 View commit details
    Browse the repository at this point in the history
  9. Fix test_tokenizer styling

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    f51659f View commit details
    Browse the repository at this point in the history
  10. Merge branch 'feature/tokenizer-rmm' of https://github.com/brhodes10/clx

     into feature/tokenizer-rmm
    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    96896f1 View commit details
    Browse the repository at this point in the history
  11. Fix flake8 error

    Bianca Rhodes committed May 15, 2020
    Configuration menu
    Copy the full SHA
    7cd53ea View commit details
    Browse the repository at this point in the history

Commits on May 19, 2020

  1. gpuCI build update

    Bianca Rhodes committed May 19, 2020
    Configuration menu
    Copy the full SHA
    6406e81 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'branch-0.14' into feature/tokenizer-rmm

    Bianca Rhodes committed May 19, 2020
    Configuration menu
    Copy the full SHA
    24e3b1d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a12fc3c View commit details
    Browse the repository at this point in the history

Commits on May 20, 2020

  1. Update tokenizer test

    Bianca Rhodes committed May 20, 2020
    Configuration menu
    Copy the full SHA
    21ff922 View commit details
    Browse the repository at this point in the history

Commits on May 21, 2020

  1. Remove tokenizer tests

    Bianca Rhodes committed May 21, 2020
    Configuration menu
    Copy the full SHA
    c0c73ae View commit details
    Browse the repository at this point in the history
  2. update gpu build

    efajardo-nv committed May 21, 2020
    Configuration menu
    Copy the full SHA
    9ce4862 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2020

  1. Configuration menu
    Copy the full SHA
    fda3af2 View commit details
    Browse the repository at this point in the history

Commits on May 27, 2020

  1. Update conda/environments/clx_dev.yml

    Co-authored-by: Keith Kraus <keith.j.kraus@gmail.com>
    brhodes10 and kkraus14 committed May 27, 2020
    Configuration menu
    Copy the full SHA
    e1df4f0 View commit details
    Browse the repository at this point in the history
  2. Update cpp/src/basicTokenizer.cu

    Co-authored-by: Mark Harris <mharris@nvidia.com>
    brhodes10 and harrism committed May 27, 2020
    Configuration menu
    Copy the full SHA
    3fcd0a0 View commit details
    Browse the repository at this point in the history
  3. Update cpp/src/basicTokenizer.cu

    Co-authored-by: Mark Harris <mharris@nvidia.com>
    brhodes10 and harrism committed May 27, 2020
    Configuration menu
    Copy the full SHA
    0656d34 View commit details
    Browse the repository at this point in the history
  4. Update cpp/src/basicTokenizer.cu

    Co-authored-by: Mark Harris <mharris@nvidia.com>
    brhodes10 and harrism committed May 27, 2020
    Configuration menu
    Copy the full SHA
    60d93f2 View commit details
    Browse the repository at this point in the history
  5. Updated basicTokenizer constructor

    Bianca Rhodes committed May 27, 2020
    Configuration menu
    Copy the full SHA
    9123cc9 View commit details
    Browse the repository at this point in the history
  6. remove malloc_and_copy_vec_to_device

    Bianca Rhodes committed May 27, 2020
    Configuration menu
    Copy the full SHA
    490a540 View commit details
    Browse the repository at this point in the history
  7. remove channels from ci build script

    Bianca Rhodes committed May 27, 2020
    Configuration menu
    Copy the full SHA
    44527e3 View commit details
    Browse the repository at this point in the history
  8. Update cpp/src/basicTokenizer.cu

    update to use thrust exclusive scan
    
    Co-authored-by: Mark Harris <mharris@nvidia.com>
    brhodes10 and harrism committed May 27, 2020
    Configuration menu
    Copy the full SHA
    2f99323 View commit details
    Browse the repository at this point in the history
  9. Update cpp/src/basicTokenizer.cu

    update to use thrust exclusive scan
    
    Co-authored-by: Mark Harris <mharris@nvidia.com>
    brhodes10 and harrism committed May 27, 2020
    Configuration menu
    Copy the full SHA
    457805f View commit details
    Browse the repository at this point in the history

Commits on May 28, 2020

  1. Removed thrust exclusive scan. Revert "Update cpp/src/basicTokenizer.…

    …cu
    "
    
    This reverts commit 2f99323.
    Bianca Rhodes committed May 28, 2020
    Configuration menu
    Copy the full SHA
    18fb379 View commit details
    Browse the repository at this point in the history
  2. Removed transfer_cp_data_to_device function

    Bianca Rhodes committed May 28, 2020
    Configuration menu
    Copy the full SHA
    02792f1 View commit details
    Browse the repository at this point in the history
  3. Update device_row2log and device_row2row_within_log

    Bianca Rhodes committed May 28, 2020
    Configuration menu
    Copy the full SHA
    b669d81 View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2020

  1. Add tokenizer cpp test and benchmark

    Bianca Rhodes committed Jun 1, 2020
    Configuration menu
    Copy the full SHA
    263a314 View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2020

  1. Fix attention_mask size

    Bianca Rhodes committed Jun 2, 2020
    Configuration menu
    Copy the full SHA
    818824b View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2020

  1. Added google benchmark and gtest

    Bianca Rhodes committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    fcfe712 View commit details
    Browse the repository at this point in the history

Commits on Jun 8, 2020

  1. Update to tokenizer benchmark build

    Bianca Rhodes committed Jun 8, 2020
    Configuration menu
    Copy the full SHA
    5ccd61e View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2020

  1. Update cmake benchmark build

    Bianca Rhodes committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    c5de879 View commit details
    Browse the repository at this point in the history
  2. Update cmake benchmark build

    Bianca Rhodes committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    b08fe9c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b096898 View commit details
    Browse the repository at this point in the history
  4. Add google tests to cmake build

    Bianca Rhodes committed Jun 9, 2020
    Configuration menu
    Copy the full SHA
    18ae9cd View commit details
    Browse the repository at this point in the history

Commits on Jun 10, 2020

  1. cmake et warnings as errors

    Bianca Rhodes committed Jun 10, 2020
    Configuration menu
    Copy the full SHA
    d233829 View commit details
    Browse the repository at this point in the history
  2. cmake build cleanup

    Bianca Rhodes committed Jun 10, 2020
    Configuration menu
    Copy the full SHA
    e7b6300 View commit details
    Browse the repository at this point in the history
  3. Add tests for cuda_tokenizer_cudf

    Bianca Rhodes committed Jun 10, 2020
    Configuration menu
    Copy the full SHA
    83743fd View commit details
    Browse the repository at this point in the history
  4. Update cpp/CMakeLists.txt

    Co-authored-by: Mark Harris <mharris@nvidia.com>
    brhodes10 and harrism committed Jun 10, 2020
    Configuration menu
    Copy the full SHA
    83dd0aa View commit details
    Browse the repository at this point in the history