Skip to content

Conversation

RockingJavaBean
Copy link
Contributor

@RockingJavaBean RockingJavaBean commented May 20, 2020

Resolve #38207

Below is the description of split function according to Python doc.

If sep is not specified or is None,  a different splitting algorithm is applied:  
runs of consecutive whitespace are regarded as a single separator, 
and the result will contain no empty strings at the start or end 
if the string has leading or trailing whitespace.

The logic to handle both none and empty separators is added in register_string_ops.cpp as fix.

Signed-off-by: Xiong Wei xiongw.fnst@cn.fujitsu.com

@RockingJavaBean RockingJavaBean requested a review from apaszke as a code owner May 20, 2020 11:48
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label May 20, 2020
@dr-ci
Copy link

dr-ci bot commented May 20, 2020

💊 CI failures summary and remediations

As of commit a2b0b17 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jun 17 02:28:07 RuntimeError: Process 2 terminated or timed out after 100.05739402770996 seconds
Jun 17 02:28:07 ====================================================================== 
Jun 17 02:28:07 ERROR [100.076s]: test_backward_node_failure (__main__.TensorPipeAgentDistAutogradTestWithSpawn) 
Jun 17 02:28:07 ---------------------------------------------------------------------- 
Jun 17 02:28:07 Traceback (most recent call last): 
Jun 17 02:28:07   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 204, in wrapper 
Jun 17 02:28:07     self._join_processes(fn) 
Jun 17 02:28:07   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 306, in _join_processes 
Jun 17 02:28:07     self._check_return_codes(elapsed_time) 
Jun 17 02:28:07   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 344, in _check_return_codes 
Jun 17 02:28:07     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time)) 
Jun 17 02:28:07 RuntimeError: Process 2 terminated or timed out after 100.05739402770996 seconds 
Jun 17 02:28:07  
Jun 17 02:28:07 ---------------------------------------------------------------------- 
Jun 17 02:28:07 Ran 65 tests in 310.898s 
Jun 17 02:28:07  
Jun 17 02:28:07 FAILED (errors=1) 
Jun 17 02:28:07  
Jun 17 02:28:07 Generating XML reports... 
Jun 17 02:28:07 Generated XML report: test-reports/dist-gloo/TEST-TensorPipeAgentDistAutogradTestWithSpawn-20200617022256.xml 
Jun 17 02:28:07 Traceback (most recent call last): 
Jun 17 02:28:07   File "test/run_test.py", line 720, in <module> 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 30 times.

@RockingJavaBean
Copy link
Contributor Author

In order to handle the empty separator scenario as Python split function, the default value of the separator is changed from the space to the empty string, leading to the following message.

May 20 12:38:03 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.  
May 20 12:38:03  
May 20 12:38:03 Broken ops: [ 
May 20 12:38:03 	aten::split.str(str self, str separator=" ", int max=-1) -> (str[]) 
May 20 12:38:03 ] 

@RockingJavaBean RockingJavaBean changed the title [WIP] Fix inconsistent results of string split func on JIT mode Fix inconsistent results of string split func on JIT mode May 20, 2020
@RockingJavaBean
Copy link
Contributor Author

RockingJavaBean commented May 20, 2020

@eellison @suo kindly request for review.

@ezyang ezyang requested review from suo and bwasti May 26, 2020 14:27
@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 26, 2020
@ezyang
Copy link
Contributor

ezyang commented May 26, 2020

I am not sure defaulting it to empty string is correct. In Python, an empty separator raises an error, and we should probably follow that behavior.

@RockingJavaBean
Copy link
Contributor Author

@ezyang Thanks for your kind reply.

The current PR fixes the inconsistent behavior for JIT mode when the separator is not specified,
while it treats split('') as same as split() and will not raise an error.

In order to follow the same behavior of Python for empty separators, defaulting the separator parameter to None is required to distinguish between split() and split('').

>>> test = 'a     a'
>>> test.split()
['a', 'a']
>>> test.split('')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: empty separator

I tried setting the default value of the separator to None via new API.

m.def("split.str(str self, str separator=None, int max=-1) -> str[]");

However, the following error is thrown

RuntimeError: isString() INTERNAL ASSERT FAILED at "../aten/src/ATen/core/ivalue_inl.h":87, 
please report a bug to PyTorch. Expected String but got None

Please kindly share suggestions on how to default the separator to None.

@ezyang
Copy link
Contributor

ezyang commented May 27, 2020

you'd have to make the string optional. I'm not sure this is supported by JIT. (@suo, I'm looking to you to find some JIT side to review this, if you won't review it yourself.)

@suo
Copy link
Member

suo commented May 28, 2020

It should work to make the argument optional. You do it by adding a ? after the type identifier, like str? separator (see the schema reference if you're interested.)

@RockingJavaBean RockingJavaBean force-pushed the jit_string_split branch 2 times, most recently from 5623cc2 to 6a115de Compare May 29, 2020 10:36
@RockingJavaBean
Copy link
Contributor Author

Thanks for your kind and detailed suggestions.

I have managed to implement the optional argument in terms of separator, and follow the same logic as Python when users do not specify or input empty separators.

Please kindly review the pull request. @suo @ezyang

Copy link
Member

@suo suo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes! lgtm

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@suo
Copy link
Member

suo commented Jun 1, 2020

@houseroad: how should we resolve the BC incompatibility errors?

@eellison
Copy link
Contributor

eellison commented Jun 1, 2020

This isn't BC incompatible, but it is forward incompatible, you might run into issues there.

@RockingJavaBean
Copy link
Contributor Author

Thanks for the kind help and guidance through this pull request.
Please kindly confirm whether this PR is ready to be merged, or do I need to further update the code before merging?
@ezyang @suo @eellison

@suo
Copy link
Member

suo commented Jun 9, 2020

@RockingJavaBean to get the BC check to succeed, please add this schema to the whitelist here: https://github.com/pytorch/pytorch/blob/master/test/backward_compatibility/check_backward_compatibility.py#L19. Then we should be good to go. Thanks!

@RockingJavaBean RockingJavaBean force-pushed the jit_string_split branch 2 times, most recently from ae84c84 to 0c19d3f Compare June 10, 2020 05:35
@RockingJavaBean
Copy link
Contributor Author

@suo Thanks for sharing how to get the BC checks to succeed, and the aten::split schema has been added to the whitelist.
Please kindly help land this PR.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com>
This commit is to fix the following issue

```
{
    path: 'torch/csrc/jit/runtime/register_string_ops.cpp',
    start_line: 577,
    end_line: 577,
    start_column: 22,
    end_column: 22,
    annotation_level: 'failure',
    message: "[performance-unnecessary-value-param] warning: the parameter 'string' is copied for each invocation but only used as a const reference; consider making it a const reference"
}
```

Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com>
Resolve the conflict of check_backward_compatibility.py
from pull request pytorch#39933

Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com>
@RockingJavaBean
Copy link
Contributor Author

@suo thanks for your kind help on landing this PR.
However, the landing failed due to the conflict of check_backward_compatibility.py, as the commit #39933 updated the whitelist three days ago.
I have resolved the conflict and pushed to the pull request, please kindly help re-land this PR.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@RockingJavaBean
Copy link
Contributor Author

@suo thanks you so much for the help on landing this pull request.
Currently, there is a warning for Facebook Internal CI and the status is Action required.
I am not quite familiar with the Internal CI, is there any code updates that I should do or is the PR ready to be merged?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@suo
Copy link
Member

suo commented Jun 17, 2020

Yep, the PR is ready merge! Each time you update it, we need to sync it with our internal CI system and run tests. Then when they pass, we should be good to land. I'll try to land it tonight :)

@RockingJavaBean
Copy link
Contributor Author

thank you so much throughout this pull request.
BTW: It seems test/backward_compatibility/check_backward_compatibility.py is frequently updated and often leads to the conflict of code.

@facebook-github-bot
Copy link
Contributor

@suo merged this pull request in 55bcb5d.

xwang233 pushed a commit to xwang233/pytorch that referenced this pull request Jun 20, 2020
…38772)

Summary:
Resolve pytorch#38207

Below is the description of split function according to [Python doc](https://docs.python.org/3.8/library/stdtypes.html?highlight=split#str.split).

```
If sep is not specified or is None,  a different splitting algorithm is applied:
runs of consecutive whitespace are regarded as a single separator,
and the result will contain no empty strings at the start or end
if the string has leading or trailing whitespace.
```

The logic to handle both none and empty separators is added in register_string_ops.cpp as fix.

Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com>
Pull Request resolved: pytorch#38772

Differential Revision: D21789612

Pulled By: suo

fbshipit-source-id: 4dfd74eda71e0bfd757378daedc927a4a63ec0e4
@RockingJavaBean RockingJavaBean deleted the jit_string_split branch October 16, 2020 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged oncall: jit Add this issue/PR to JIT oncall triage queue open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[JIT] Inconsistent results of string split func on JIT mode
7 participants