Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] regexp: hanging when attempting to repeat string anchor inside capture group #11311

Closed
anthony-chang opened this issue Jul 20, 2022 · 0 comments · Fixed by #11373
Closed
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@anthony-chang
Copy link
Contributor

anthony-chang commented Jul 20, 2022

Describe the bug
We normally throw a "nothing to repeat" error when we attempt to repeat a string anchor, but if the anchor is in a capture group, cuDF hangs.

Steps/Code to reproduce bug

>>> import cudf
>>> cudf.Series(['a']).str.contains(r'(^)+', regex=True) # hangs

>>> cudf.Series(['a']).str.contains(r'(\A)+', regex=True) # hangs

For end of string anchors, it does not hang nor throw an error specifically when using split.

>>> cudf.Series(['a']).str.split(r'($)+', regex=True)
0    [a]
dtype: list
>>> cudf.Series(['a']).str.split(r'(\Z)+', regex=True)
0    [a]
dtype: list

>>> cudf.Series(['a']).str.contains(r'($)+', regex=True) # hangs

>>> cudf.Series(['a']).str.contains(r'(\Z)+', regex=True) # hangs

Expected behavior
Throw invalid regex pattern: nothing to repeat at position...

Environment overview (please complete the following information)

  • Environment location: bare-metal
  • Method of cuDF install: miniconda

Environment details

Click here to see environment details
 **git***
 commit b3e524742ee3fc57d1fd057a8b01666114245c22 (HEAD -> branch-22.08, rapidsai/branch-22.08)
 Author: nvdbaranec <56695930+nvdbaranec@users.noreply.github.com>
 Date:   Wed Jul 20 10:26:03 2022 -0500

 Fix invalid allocate_like() and empty_like() tests. (#11268)

 Fixes:  https://github.com/rapidsai/cudf/issues/11247

 These tests were using `cudf::test::expect_column_properties_equal()` which explicitly compares null counts.  However the tests were using uninitialized memory for the null mask, so it experienced random failures depending on what was returned from the memory manager.   The fix is to simply use a custom set of checks specific to these tests rather than the internal cudf functionality.

 Authors:
 - https://github.com/nvdbaranec

 Approvers:
 - Nghia Truong (https://github.com/ttnghia)
 - David Wendt (https://github.com/davidwendt)
 - Mike Wilson (https://github.com/hyperbolic2346)
 - Bradley Dice (https://github.com/bdice)

 URL: https://github.com/rapidsai/cudf/pull/11268
 **git submodules***

 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=18.04
 DISTRIB_CODENAME=bionic
 DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"
 NAME="Ubuntu"
 VERSION="18.04.6 LTS (Bionic Beaver)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 18.04.6 LTS"
 VERSION_ID="18.04"
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 VERSION_CODENAME=bionic
 UBUNTU_CODENAME=bionic
 Linux c240m5-01 5.4.0-109-generic #123~18.04.1-Ubuntu SMP Fri Apr 8 09:48:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

 ***GPU Information***
 Wed Jul 20 09:48:36 2022
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  Tesla T4            On   | 00000000:19:00.0 Off |                    0 |
 | N/A   46C    P0    27W /  70W |   2099MiB / 15109MiB |     10%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   1  Tesla T4            On   | 00000000:5E:00.0 Off |                    0 |
 | N/A   53C    P0    38W /  70W |   9291MiB / 15109MiB |     42%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   2  Tesla T4            On   | 00000000:86:00.0 Off |                    0 |
 | N/A   43C    P0    28W /  70W |   6699MiB / 15109MiB |     37%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   3  Tesla T4            On   | 00000000:AF:00.0 Off |                    0 |
 | N/A   51C    P0    42W /  70W |   6371MiB / 15109MiB |     95%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |    0   N/A  N/A      1924      G   /usr/lib/xorg/Xorg                  4MiB |
 |    0   N/A  N/A     70274      C   python                           1037MiB |
 |    0   N/A  N/A     71431      C   /usr/bin/python                  1055MiB |
 |    1   N/A  N/A      1924      G   /usr/lib/xorg/Xorg                  4MiB |
 |    1   N/A  N/A      6227      C   python                            633MiB |
 |    1   N/A  N/A     35419      C   /opt/conda/bin/python             649MiB |
 |    1   N/A  N/A     70274      C   python                           1037MiB |
 |    1   N/A  N/A     71431      C   /usr/bin/python                  6965MiB |
 |    2   N/A  N/A      1924      G   /usr/lib/xorg/Xorg                  4MiB |
 |    2   N/A  N/A     36556      C   python                            935MiB |
 |    2   N/A  N/A     42293      C   /usr/bin/python                  5757MiB |
 |    3   N/A  N/A      1924      G   /usr/lib/xorg/Xorg                  4MiB |
 |    3   N/A  N/A      6227      C   python                            633MiB |
 |    3   N/A  N/A     35419      C   /opt/conda/bin/python            3793MiB |
 |    3   N/A  N/A     36556      C   python                            935MiB |
 |    3   N/A  N/A     42293      C   /usr/bin/python                  1003MiB |
 +-----------------------------------------------------------------------------+

 ***CPU***
 Architecture:        x86_64
 CPU op-mode(s):      32-bit, 64-bit
 Byte Order:          Little Endian
 CPU(s):              72
 On-line CPU(s) list: 0-71
 Thread(s) per core:  2
 Core(s) per socket:  18
 Socket(s):           2
 NUMA node(s):        2
 Vendor ID:           GenuineIntel
 CPU family:          6
 Model:               85
 Model name:          Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
 Stepping:            4
 CPU MHz:             1600.741
 CPU max MHz:         3700.0000
 CPU min MHz:         1200.0000
 BogoMIPS:            6000.00
 Virtualization:      VT-x
 L1d cache:           32K
 L1i cache:           32K
 L2 cache:            1024K
 L3 cache:            25344K
 NUMA node0 CPU(s):   0-17,36-53
 NUMA node1 CPU(s):   18-35,54-71
 Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d

 ***CMake***

 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 9.4.0-1ubuntu1~18.04) 9.4.0
 Copyright (C) 2019 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


 ***nvcc***
 /usr/local/cuda-11.5/bin/nvcc
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2021 NVIDIA Corporation
 Built on Thu_Nov_18_09:45:30_PST_2021
 Cuda compilation tools, release 11.5, V11.5.119
 Build cuda_11.5.r11.5/compiler.30672275_0

 ***Python***
 /home/antchang/miniconda3/envs/cudf/bin/python
 Python 3.9.13

 ***Environment Variables***
 PATH                            : /home/antchang/miniconda3/envs/cudf/bin:/home/antchang/.poetry/bin:/home/antchang/miniconda3/condabin:/home/antchang/.pyenv/shims:/home/antchang/.pyenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/antchang/spark/bin:/home/antchang/spark/sbin:/usr/local/cuda-11.5/bin
 LD_LIBRARY_PATH                 : :/usr/local/cuda-11.5/lib64:/usr/local/cuda-11.5/extras/CUPTI/lib64
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /home/antchang/miniconda3/envs/cudf
 PYTHON_PATH                     :

 ***conda packages***
 /home/antchang/miniconda3/condabin/conda
 # packages in environment at /home/antchang/miniconda3/envs/cudf:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
 _openmp_mutex             4.5                       2_gnu    conda-forge
 abseil-cpp                20211102.0           h27087fc_1    conda-forge
 arrow-cpp                 7.0.0           py39h0f417f0_8_cuda    conda-forge
 arrow-cpp-proc            3.0.0                      cuda    conda-forge
 aws-c-cal                 0.5.11               h95a6274_0    conda-forge
 aws-c-common              0.6.2                h7f98852_0    conda-forge
 aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
 aws-c-io                  0.10.5               hfb6a706_0    conda-forge
 aws-checksums             0.1.11               ha31a3da_7    conda-forge
 aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
 bzip2                     1.0.8                h7f98852_4    conda-forge
 c-ares                    1.18.1               h7f98852_0    conda-forge
 ca-certificates           2022.6.15            ha878542_0    conda-forge
 cachetools                5.0.0              pyhd8ed1ab_0    conda-forge
 cuda-python               11.7.0           py39h3fd9d12_0    nvidia
 cudatoolkit               11.7.0              hd8887f6_10    conda-forge
 cudf                      22.06.00a220531 cuda_11_py39_gd0b4e3032c_317    rapidsai-nightly
 cupy                      10.6.0           py39hc3c280e_0    conda-forge
 dlpack                    0.5                  h9c3ff4c_0    conda-forge
 fastavro                  1.5.2            py39hb9d737c_0    conda-forge
 fastrlock                 0.8              py39h5a03fae_2    conda-forge
 fsspec                    2022.5.0           pyhd8ed1ab_0    conda-forge
 gflags                    2.2.2             he1b5a44_1004    conda-forge
 glog                      0.6.0                h6f12383_0    conda-forge
 grpc-cpp                  1.46.3               hc275302_1    conda-forge
 keyutils                  1.6.1                h166bdaf_0    conda-forge
 krb5                      1.19.3               h3790be6_0    conda-forge
 ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
 libblas                   3.9.0           15_linux64_openblas    conda-forge
 libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
 libbrotlidec              1.0.9                h166bdaf_7    conda-forge
 libbrotlienc              1.0.9                h166bdaf_7    conda-forge
 libcblas                  3.9.0           15_linux64_openblas    conda-forge
 libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
 libcudf                   22.06.00a220531 cuda11_gd0b4e3032c_317    rapidsai-nightly
 libcurl                   7.83.1               h7bff187_0    conda-forge
 libedit                   3.1.20191231         he28a2e2_2    conda-forge
 libev                     4.33                 h516909a_1    conda-forge
 libevent                  2.1.10               h9b69904_4    conda-forge
 libffi                    3.4.2                h7f98852_5    conda-forge
 libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
 libgfortran-ng            12.1.0              h69a702a_16    conda-forge
 libgfortran5              12.1.0              hdcd56e2_16    conda-forge
 libgomp                   12.1.0              h8d9b700_16    conda-forge
 libgoogle-cloud           1.40.2               hefc27d0_0    conda-forge
 liblapack                 3.9.0           15_linux64_openblas    conda-forge
 libllvm11                 11.1.0               hf817b99_3    conda-forge
 libnghttp2                1.47.0               h727a467_0    conda-forge
 libnsl                    2.0.0                h7f98852_0    conda-forge
 libopenblas               0.3.20          pthreads_h78a6416_0    conda-forge
 libprotobuf               3.20.1               h6239696_0    conda-forge
 librmm                    22.06.01             h65003ff_0    conda-forge
 libssh2                   1.10.0               ha56f1ee_2    conda-forge
 libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
 libthrift                 0.16.0               h519c5ea_1    conda-forge
 libutf8proc               2.7.0                h7f98852_0    conda-forge
 libuuid                   2.32.1            h7f98852_1000    conda-forge
 libzlib                   1.2.12               h166bdaf_2    conda-forge
 llvmlite                  0.38.1           py39h7d9a04d_0    conda-forge
 lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
 ncurses                   6.3                  h27087fc_1    conda-forge
 numba                     0.55.2           py39h66db6d7_0    conda-forge
 numpy                     1.22.4           py39hc58783e_0    conda-forge
 nvtx                      0.2.3            py39h3811e60_1    conda-forge
 openssl                   1.1.1q               h166bdaf_0    conda-forge
 orc                       1.7.5                h6c59b99_0    conda-forge
 packaging                 21.3               pyhd8ed1ab_0    conda-forge
 pandas                    1.4.3            py39h1832856_0    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 pip                       22.1.2             pyhd8ed1ab_0    conda-forge
 protobuf                  3.20.1           py39h5a03fae_0    conda-forge
 ptxcompiler               0.4.0            py39h1eff087_0    conda-forge
 pyarrow                   7.0.0           py39h1ed2e5d_8_cuda    conda-forge
 pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
 python                    3.9.13          h9a8a25e_0_cpython    conda-forge
 python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
 python_abi                3.9                      2_cp39    conda-forge
 pytz                      2022.1             pyhd8ed1ab_0    conda-forge
 re2                       2022.04.01           h27087fc_0    conda-forge
 readline                  8.1.2                h0f457ee_0    conda-forge
 rmm                       22.06.00         py39hbcfe0b2_0    conda-forge
 s2n                       1.0.10               h9b69904_0    conda-forge
 setuptools                63.2.0           py39hf3d152e_0    conda-forge
 six                       1.16.0             pyh6c4a22f_0    conda-forge
 snappy                    1.1.9                hbd366e4_1    conda-forge
 spdlog                    1.10.0               h924138e_0    conda-forge
 sqlite                    3.39.1               h4ff8645_0    conda-forge
 thrust                    1.16.0               h0800d71_1    conda-forge
 tk                        8.6.12               h27826a3_0    conda-forge
 typing_extensions         4.3.0              pyha770c72_0    conda-forge
 tzdata                    2022a                h191b570_0    conda-forge
 wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
 xz                        5.2.5                h516909a_1    conda-forge
 zlib                      1.2.12               h166bdaf_2    conda-forge
 zstd                      1.5.2                h8a70e8d_2    conda-forge


Additional context
Related issue #10006

@anthony-chang anthony-chang added Needs Triage Need team to review and classify bug Something isn't working labels Jul 20, 2022
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Jul 20, 2022
@davidwendt davidwendt self-assigned this Jul 25, 2022
@davidwendt davidwendt added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jul 26, 2022
rapids-bot bot pushed a commit that referenced this issue Aug 4, 2022
Adds regex compile logic to check quantifier can be used with the previous item even if its within a capture group.
This prevents an infinite loop occurring when evaluating the expression.
Additional gtests are included to check for this condition which should throw an error.

Closes #11311

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Tobias Ribizel (https://github.com/upsj)
  - Elias Stehle (https://github.com/elstehle)

URL: #11373
Bug Squashing automation moved this from Needs prioritizing to Closed Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants