Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringConcat Operator #5350

Merged
merged 17 commits into from Jul 14, 2023
Merged

Conversation

adityagoel4512
Copy link
Contributor

Description

This PR introduces a StringConcat operator, as raised in #5339.

StringConcat takes two string tensors as input and returns the elementwise concatenation of the strings in each tensor, with support for numpy-like broadcasting. The first input prefixes onto the second input in the output.

Examples are as follows:

StringConcat(["abc", "def"], [".com", ".net"])  ==> ["abc.com", "def.net"]
StringConcat(["tiger", "lion", "zebra"], ["s"]) ==> ["tigers", "lions", "zebras"]

This directly implements numpy.char.add and can be used to implement tf.strings.join. The following script can be used to validate this:

import numpy as np
from functools import reduce
import tensorflow as tf

onnx_concat_op = np.char.add

def tf_concat_op(inputs, separator=None):
    reduce_op = onnx_concat_op if separator is None else lambda x, y: onnx_concat_op(x, onnx_concat_o
p(separator, y))
    return reduce(reduce_op, inputs)

if __name__ == "__main__":
    data = ['abc', 'def']
    res1 = np.array(tf.strings.join(data).numpy()).astype(np.str_)
    res2 = tf_concat_op(data)
    np.testing.assert_equal(res1, res2)

    data = [['abc','123'],['def','456'],['ghi','789']]
    res1 = tf.strings.join(data).numpy().astype(np.str_)
    res2 = tf_concat_op(data)
    np.testing.assert_equal(res1, res2)

    data = [['abc','123'],['def','456']]
    res1 = tf.strings.join(data, separator=" ").numpy().astype(np.str_)
    res2 = tf_concat_op(data, separator=" ")
    np.testing.assert_equal(res1, res2)

Motivation and Context

Closes #5339.

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@adityagoel4512 adityagoel4512 requested review from a team as code owners June 23, 2023 12:26
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
onnx/reference/ops/_op_list.py Dismissed Show dismissed Hide dismissed
@xadupre
Copy link
Contributor

xadupre commented Jun 23, 2023

I don't think we should update test cases for acosh and other operator not related to StringConcat.

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@gramalingam
Copy link
Contributor

Looks good to me.

One minor thought/question: the op looks out of place in the nn folder/file. I wonder if we should create a new folder/file for string or text ? Other suggestions/inputs welcome.

@gramalingam gramalingam added the operator Issues related to ONNX operators label Jul 5, 2023
@adityagoel4512
Copy link
Contributor Author

adityagoel4512 commented Jul 5, 2023

Looks good to me.

One minor thought/question: the op looks out of place in the nn folder/file. I wonder if we should create a new folder/file for string or text ? Other suggestions/inputs welcome.

I'd be favour of a text directory.

adityagoel4512 and others added 2 commits July 9, 2023 20:11
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Bumps [mypy](https://github.com/python/mypy) from 1.3.0 to 1.4.1.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/python/mypy/commit/3534bacc4c0d3c4b1983a533e2a36cce43f2ec9d"><code>3534bac</code></a>
Remove +dev from version</li>
<li><a
href="https://github.com/python/mypy/commit/f36ea01e9de2ffac42a2615307f3692fd8e84c4a"><code>f36ea01</code></a>
Fix async iterator body stripping (<a
href="https://redirect.github.com/python/mypy/issues/15491">#15491</a>)</li>
<li><a
href="https://github.com/python/mypy/commit/ba7887be34391ae777cb32ca85719f3b3fa01c06"><code>ba7887b</code></a>
Revert &quot;Fix spurious errors on builtins.open (<a
href="https://redirect.github.com/python/mypy/issues/15161">#15161</a>)&quot;
(<a
href="https://redirect.github.com/python/mypy/issues/15508">#15508</a>)</li>
<li><a
href="https://github.com/python/mypy/commit/16fe5da0bd9be4d0669f00a47d260894c988cf87"><code>16fe5da</code></a>
Fix readthedocs build (<a
href="https://redirect.github.com/python/mypy/issues/15437">#15437</a>)</li>
<li><a
href="https://github.com/python/mypy/commit/9b327d12bc3b57cc30ed76dcd3e07a3674da17a2"><code>9b327d1</code></a>
Use consistent anchors for error codes (<a
href="https://redirect.github.com/python/mypy/issues/15435">#15435</a>)</li>
<li><a
href="https://github.com/python/mypy/commit/32abe0210092e6e074d9c2cc1862a3cf7c2421c5"><code>32abe02</code></a>
docs: ref redirector (<a
href="https://redirect.github.com/python/mypy/issues/15432">#15432</a>)</li>
<li><a
href="https://github.com/python/mypy/commit/e5a5b33e12e850d8cf11e156267adf8ee85e6221"><code>e5a5b33</code></a>
Unbreak CI (<a
href="https://redirect.github.com/python/mypy/issues/15505">#15505</a>)</li>
<li><a
href="https://github.com/python/mypy/commit/81d01aa0bdec8f9c2ed85d39b6077572abcda72a"><code>81d01aa</code></a>
Fix PEP 561 editable install test case (<a
href="https://redirect.github.com/python/mypy/issues/15493">#15493</a>)</li>
<li><a
href="https://github.com/python/mypy/commit/eba351e4016a57fad611cc14cef456a741a4713b"><code>eba351e</code></a>
Add pip as test requirement for PEP 660 editable installs (<a
href="https://redirect.github.com/python/mypy/issues/15482">#15482</a>)</li>
<li><a
href="https://github.com/python/mypy/commit/9faffe87c0f15980afc041183148829208f4fc8f"><code>9faffe8</code></a>
Bump typing_extensions dependency (<a
href="https://redirect.github.com/python/mypy/issues/15488">#15488</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/python/mypy/compare/v1.3.0...v1.4.1">compare
view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=mypy&package-manager=pip&previous-version=1.3.0&new-version=1.4.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
adityagoel4512 and others added 2 commits July 11, 2023 10:23
Signed-off-by: Aditya Goel <48102515+adityagoel4512@users.noreply.github.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Copy link
Contributor

@gramalingam gramalingam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
@cbourjau
Copy link
Contributor

I would be great to get this merged since several other PRs are also touching the text folder

@xadupre xadupre added this pull request to the merge queue Jul 14, 2023
Merged via the queue into onnx:main with commit e9474bc Jul 14, 2023
37 checks passed
@adityagoel4512 adityagoel4512 deleted the string_concat_operator branch July 15, 2023 21:42
adityagoel4512 added a commit to adityagoel4512/onnx that referenced this pull request Jul 28, 2023
This PR introduces a `StringConcat` operator, as raised in
onnx#5339.

`StringConcat` takes two string tensors as input and returns the
elementwise concatenation of the strings in each tensor, with support
for numpy-like broadcasting. The first input prefixes onto the second
input in the output.

Examples are as follows:

```
StringConcat(["abc", "def"], [".com", ".net"])  ==> ["abc.com", "def.net"]
StringConcat(["tiger", "lion", "zebra"], ["s"]) ==> ["tigers", "lions", "zebras"]
```

This directly implements
[numpy.char.add](https://numpy.org/doc/stable/reference/generated/numpy.char.add.html#numpy.char.add)
and can be used to implement
[tf.strings.join](https://www.tensorflow.org/api_docs/python/tf/strings/join).
The following script can be used to validate this:

```
import numpy as np
from functools import reduce
import tensorflow as tf

onnx_concat_op = np.char.add

def tf_concat_op(inputs, separator=None):
    reduce_op = onnx_concat_op if separator is None else lambda x, y: onnx_concat_op(x, onnx_concat_o
p(separator, y))
    return reduce(reduce_op, inputs)

if __name__ == "__main__":
    data = ['abc', 'def']
    res1 = np.array(tf.strings.join(data).numpy()).astype(np.str_)
    res2 = tf_concat_op(data)
    np.testing.assert_equal(res1, res2)

    data = [['abc','123'],['def','456'],['ghi','789']]
    res1 = tf.strings.join(data).numpy().astype(np.str_)
    res2 = tf_concat_op(data)
    np.testing.assert_equal(res1, res2)

    data = [['abc','123'],['def','456']]
    res1 = tf.strings.join(data, separator=" ").numpy().astype(np.str_)
    res2 = tf_concat_op(data, separator=" ")
    np.testing.assert_equal(res1, res2)

```

<!-- - Why is this change required? What problem does it solve? -->
<!-- - If it fixes an open issue, please link to the issue here. -->
Closes onnx#5339.

---------

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Aditya Goel <48102515+adityagoel4512@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
adityagoel4512 added a commit to adityagoel4512/onnx that referenced this pull request Jul 28, 2023
This PR introduces a `StringConcat` operator, as raised in
onnx#5339.

`StringConcat` takes two string tensors as input and returns the
elementwise concatenation of the strings in each tensor, with support
for numpy-like broadcasting. The first input prefixes onto the second
input in the output.

Examples are as follows:

```
StringConcat(["abc", "def"], [".com", ".net"])  ==> ["abc.com", "def.net"]
StringConcat(["tiger", "lion", "zebra"], ["s"]) ==> ["tigers", "lions", "zebras"]
```

This directly implements
[numpy.char.add](https://numpy.org/doc/stable/reference/generated/numpy.char.add.html#numpy.char.add)
and can be used to implement
[tf.strings.join](https://www.tensorflow.org/api_docs/python/tf/strings/join).
The following script can be used to validate this:

```
import numpy as np
from functools import reduce
import tensorflow as tf

onnx_concat_op = np.char.add

def tf_concat_op(inputs, separator=None):
    reduce_op = onnx_concat_op if separator is None else lambda x, y: onnx_concat_op(x, onnx_concat_o
p(separator, y))
    return reduce(reduce_op, inputs)

if __name__ == "__main__":
    data = ['abc', 'def']
    res1 = np.array(tf.strings.join(data).numpy()).astype(np.str_)
    res2 = tf_concat_op(data)
    np.testing.assert_equal(res1, res2)

    data = [['abc','123'],['def','456'],['ghi','789']]
    res1 = tf.strings.join(data).numpy().astype(np.str_)
    res2 = tf_concat_op(data)
    np.testing.assert_equal(res1, res2)

    data = [['abc','123'],['def','456']]
    res1 = tf.strings.join(data, separator=" ").numpy().astype(np.str_)
    res2 = tf_concat_op(data, separator=" ")
    np.testing.assert_equal(res1, res2)

```

<!-- - Why is this change required? What problem does it solve? -->
<!-- - If it fixes an open issue, please link to the issue here. -->
Closes onnx#5339.

---------

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Aditya Goel <48102515+adityagoel4512@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
adityagoel4512 added a commit to adityagoel4512/onnx that referenced this pull request Jul 28, 2023
This PR introduces a `StringConcat` operator, as raised in
onnx#5339.

`StringConcat` takes two string tensors as input and returns the
elementwise concatenation of the strings in each tensor, with support
for numpy-like broadcasting. The first input prefixes onto the second
input in the output.

Examples are as follows:

```
StringConcat(["abc", "def"], [".com", ".net"])  ==> ["abc.com", "def.net"]
StringConcat(["tiger", "lion", "zebra"], ["s"]) ==> ["tigers", "lions", "zebras"]
```

This directly implements
[numpy.char.add](https://numpy.org/doc/stable/reference/generated/numpy.char.add.html#numpy.char.add)
and can be used to implement
[tf.strings.join](https://www.tensorflow.org/api_docs/python/tf/strings/join).
The following script can be used to validate this:

```
import numpy as np
from functools import reduce
import tensorflow as tf

onnx_concat_op = np.char.add

def tf_concat_op(inputs, separator=None):
    reduce_op = onnx_concat_op if separator is None else lambda x, y: onnx_concat_op(x, onnx_concat_o
p(separator, y))
    return reduce(reduce_op, inputs)

if __name__ == "__main__":
    data = ['abc', 'def']
    res1 = np.array(tf.strings.join(data).numpy()).astype(np.str_)
    res2 = tf_concat_op(data)
    np.testing.assert_equal(res1, res2)

    data = [['abc','123'],['def','456'],['ghi','789']]
    res1 = tf.strings.join(data).numpy().astype(np.str_)
    res2 = tf_concat_op(data)
    np.testing.assert_equal(res1, res2)

    data = [['abc','123'],['def','456']]
    res1 = tf.strings.join(data, separator=" ").numpy().astype(np.str_)
    res2 = tf_concat_op(data, separator=" ")
    np.testing.assert_equal(res1, res2)

```

<!-- - Why is this change required? What problem does it solve? -->
<!-- - If it fixes an open issue, please link to the issue here. -->
Closes onnx#5339.

---------

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Aditya Goel <48102515+adityagoel4512@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operator Issues related to ONNX operators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Operator: StringConcat
5 participants