Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringConcat Operator #5350

Merged
merged 17 commits into from
Jul 14, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
31 changes: 31 additions & 0 deletions docs/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -24022,6 +24022,37 @@ This version of the operator has been available since version 20 of the default
<dd>Constrain grid types to float tensors.</dd>
</dl>

### <a name="StringConcat-20"></a>**StringConcat-20**</a>

StringConcat concatenates string tensors elementwise (with NumPy-style broadcasting support)

#### Version

This version of the operator has been available since version 20 of the default ONNX operator set.

#### Inputs

<dl>
<dt><tt>X</tt> (non-differentiable) : T</dt>
<dd>Tensor to prepend in concatenation</dd>
<dt><tt>Y</tt> (non-differentiable) : T</dt>
<dd>Tensor to append in concatenation</dd>
</dl>

#### Outputs

<dl>
<dt><tt>Z</tt> (non-differentiable) : T</dt>
<dd>Concatenated string tensor</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T</tt> : tensor(string)</dt>
<dd>Inputs and outputs must be UTF-8 strings</dd>
</dl>

# ai.onnx.preview.training
## Version 1 of the 'ai.onnx.preview.training' operator set
### <a name="ai.onnx.preview.training.Adagrad-1"></a>**ai.onnx.preview.training.Adagrad-1**</a>
Expand Down
76 changes: 76 additions & 0 deletions docs/Operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ For an operator input/output's differentiability, it can be differentiable,
|<a href="#SplitToSequence">SplitToSequence</a>|<a href="Changelog.md#SplitToSequence-11">11</a>|
|<a href="#Sqrt">Sqrt</a>|<a href="Changelog.md#Sqrt-13">13</a>, <a href="Changelog.md#Sqrt-6">6</a>, <a href="Changelog.md#Sqrt-1">1</a>|
|<a href="#Squeeze">Squeeze</a>|<a href="Changelog.md#Squeeze-13">13</a>, <a href="Changelog.md#Squeeze-11">11</a>, <a href="Changelog.md#Squeeze-1">1</a>|
|<a href="#StringConcat">StringConcat</a>|<a href="Changelog.md#StringConcat-20">20</a>|
|<a href="#StringNormalizer">StringNormalizer</a>|<a href="Changelog.md#StringNormalizer-10">10</a>|
|<a href="#Sub">Sub</a>|<a href="Changelog.md#Sub-14">14</a>, <a href="Changelog.md#Sub-13">13</a>, <a href="Changelog.md#Sub-7">7</a>, <a href="Changelog.md#Sub-6">6</a>, <a href="Changelog.md#Sub-1">1</a>|
|<a href="#Sum">Sum</a>|<a href="Changelog.md#Sum-13">13</a>, <a href="Changelog.md#Sum-8">8</a>, <a href="Changelog.md#Sum-6">6</a>, <a href="Changelog.md#Sum-1">1</a>|
Expand Down Expand Up @@ -30080,6 +30081,81 @@ expect(node, inputs=[x, axes], outputs=[y], name="test_squeeze_negative_axes")
</details>


### <a name="StringConcat"></a><a name="stringconcat">**StringConcat**</a>

StringConcat concatenates string tensors elementwise (with NumPy-style broadcasting support)

#### Version

This version of the operator has been available since version 20 of the default ONNX operator set.

#### Inputs

<dl>
<dt><tt>X</tt> (non-differentiable) : T</dt>
<dd>Tensor to prepend in concatenation</dd>
<dt><tt>Y</tt> (non-differentiable) : T</dt>
<dd>Tensor to append in concatenation</dd>
</dl>

#### Outputs

<dl>
<dt><tt>Z</tt> (non-differentiable) : T</dt>
<dd>Concatenated string tensor</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T</tt> : tensor(string)</dt>
<dd>Inputs and outputs must be UTF-8 strings</dd>
</dl>


#### Examples

<details>
<summary>stringconcat</summary>

```python
node = onnx.helper.make_node(
"StringConcat",
inputs=["x", "y"],
outputs=["result"],
)
x = np.array(["abc", "def"]).astype("object")
y = np.array([".com", ".net"]).astype("object")
result = np.array(["abc.com", "def.net"]).astype("object")

expect(node, inputs=[x, y], outputs=[result], name="test_string_concat")

x = np.array(["cat", "dog", "snake"]).astype("object")
y = np.array(["s"]).astype("object")
result = np.array(["cats", "dogs", "snakes"]).astype("object")

expect(
node,
inputs=[x, y],
outputs=[result],
name="test_string_concat_broadcasting",
)

x = np.array("cat").astype("object")
y = np.array("s").astype("object")
result = np.array("cats").astype("object")

expect(
node,
inputs=[x, y],
outputs=[result],
name="test_string_concat_zero_dimensional",
)
```

</details>


### <a name="StringNormalizer"></a><a name="stringnormalizer">**StringNormalizer**</a>

StringNormalization performs string operations for basic cleaning.
Expand Down
45 changes: 44 additions & 1 deletion docs/TestCoverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
* [Overall Test Coverage](#overall-test-coverage)
# Node Test Coverage
## Summary
Node tests have covered 174/187 (93.05%, 5 generators excluded) common operators.
Node tests have covered 175/188 (93.09%, 5 generators excluded) common operators.

Node tests have covered 0/0 (N/A) experimental operators.

Expand Down Expand Up @@ -20594,6 +20594,49 @@ expect(node, inputs=[x, axes], outputs=[y], name="test_squeeze_negative_axes")
</details>


### StringConcat
There are 1 test cases, listed as following:
<details>
<summary>stringconcat</summary>

```python
node = onnx.helper.make_node(
"StringConcat",
inputs=["x", "y"],
outputs=["result"],
)
x = np.array(["abc", "def"]).astype("object")
y = np.array([".com", ".net"]).astype("object")
result = np.array(["abc.com", "def.net"]).astype("object")

expect(node, inputs=[x, y], outputs=[result], name="test_string_concat")

x = np.array(["cat", "dog", "snake"]).astype("object")
y = np.array(["s"]).astype("object")
result = np.array(["cats", "dogs", "snakes"]).astype("object")

expect(
node,
inputs=[x, y],
outputs=[result],
name="test_string_concat_broadcasting",
)

x = np.array("cat").astype("object")
y = np.array("s").astype("object")
result = np.array("cats").astype("object")

expect(
node,
inputs=[x, y],
outputs=[result],
name="test_string_concat_zero_dimensional",
)
```

</details>


### StringNormalizer
There are 6 test cases, listed as following:
<details>
Expand Down
46 changes: 46 additions & 0 deletions onnx/backend/test/case/node/string_concat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Copyright (c) ONNX Project Contributors
#
# SPDX-License-Identifier: Apache-2.0

import numpy as np

import onnx
from onnx.backend.test.case.base import Base
from onnx.backend.test.case.node import expect


class StringConcat(Base):
@staticmethod
def export() -> None:
adityagoel4512 marked this conversation as resolved.
Show resolved Hide resolved
node = onnx.helper.make_node(
"StringConcat",
inputs=["x", "y"],
outputs=["result"],
)
x = np.array(["abc", "def"]).astype("object")
y = np.array([".com", ".net"]).astype("object")
result = np.array(["abc.com", "def.net"]).astype("object")

expect(node, inputs=[x, y], outputs=[result], name="test_string_concat")

x = np.array(["cat", "dog", "snake"]).astype("object")
y = np.array(["s"]).astype("object")
result = np.array(["cats", "dogs", "snakes"]).astype("object")

expect(
node,
inputs=[x, y],
outputs=[result],
name="test_string_concat_broadcasting",
)

x = np.array("cat").astype("object")
y = np.array("s").astype("object")
result = np.array("cats").astype("object")

expect(
node,
inputs=[x, y],
outputs=[result],
name="test_string_concat_zero_dimensional",
)
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2abc2defBx
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2.com2.netBy
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2abc.com2def.netBresult
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2cat2dog2snakeBx
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2sBy
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2cats2dogs2snakesBresult
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2catBx
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2sBy
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2catsBresult
69 changes: 0 additions & 69 deletions onnx/defs/nn/defs.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2229,75 +2229,6 @@ ONNX_OPERATOR_SET_SCHEMA(
})
.SetDoc(TfIdfVectorizer_ver9_doc));

static const char* StringNormalizer_ver10_doc = R"DOC(
StringNormalization performs string operations for basic cleaning.
This operator has only one input (denoted by X) and only one output
(denoted by Y). This operator first examines the elements in the X,
and removes elements specified in "stopwords" attribute.
After removing stop words, the intermediate result can be further lowercased,
uppercased, or just returned depending the "case_change_action" attribute.
This operator only accepts [C]- and [1, C]-tensor.
If all elements in X are dropped, the output will be the empty value of string tensor with shape [1]
if input shape is [C] and shape [1, 1] if input shape is [1, C].
)DOC";

ONNX_OPERATOR_SET_SCHEMA(
StringNormalizer,
10,
OpSchema()
.Input(0, "X", "UTF-8 strings to normalize", "tensor(string)")
.Output(0, "Y", "UTF-8 Normalized strings", "tensor(string)")
.Attr(
std::string("case_change_action"),
std::string("string enum that cases output to be lowercased/uppercases/unchanged."
" Valid values are \"LOWER\", \"UPPER\", \"NONE\". Default is \"NONE\""),
AttributeProto::STRING,
std::string("NONE"))
.Attr(
std::string("is_case_sensitive"),
std::string("Boolean. Whether the identification of stop words in X is case-sensitive. Default is false"),
AttributeProto::INT,
static_cast<int64_t>(0))
.Attr(
"stopwords",
"List of stop words. If not set, no word would be removed from X.",
AttributeProto::STRINGS,
OPTIONAL_VALUE)
.Attr(
"locale",
"Environment dependent string that denotes the locale according to which output strings needs to be upper/lowercased."
"Default en_US or platform specific equivalent as decided by the implementation.",
AttributeProto::STRING,
OPTIONAL_VALUE)
.SetDoc(StringNormalizer_ver10_doc)
.TypeAndShapeInferenceFunction([](InferenceContext& ctx) {
auto output_elem_type = ctx.getOutputType(0)->mutable_tensor_type();
output_elem_type->set_elem_type(TensorProto::STRING);
if (!hasInputShape(ctx, 0)) {
return;
}
TensorShapeProto output_shape;
auto& input_shape = ctx.getInputType(0)->tensor_type().shape();
auto dim_size = input_shape.dim_size();
// Last axis dimension is unknown if we have stop-words since we do
// not know how many stop-words are dropped
if (dim_size == 1) {
// Unknown output dimension
output_shape.add_dim();
} else if (dim_size == 2) {
// Copy B-dim
auto& b_dim = input_shape.dim(0);
if (!b_dim.has_dim_value() || b_dim.dim_value() != 1) {
fail_shape_inference("Input shape must have either [C] or [1,C] dimensions where C > 0");
}
*output_shape.add_dim() = b_dim;
output_shape.add_dim();
} else {
fail_shape_inference("Input shape must have either [C] or [1,C] dimensions where C > 0");
}
updateOutputShape(ctx, 0, output_shape);
}));

static const char* mvn_ver13_doc = R"DOC(
A MeanVarianceNormalization Function: Perform mean variance normalization
on the input tensor X using formula: `(X-EX)/sqrt(E(X-EX)^2)`
Expand Down
2 changes: 2 additions & 0 deletions onnx/defs/operator_sets.h
Original file line number Diff line number Diff line change
Expand Up @@ -1105,6 +1105,7 @@ class OpSet_Onnx_ver19 {
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 20, GridSample);
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 20, Gelu);
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 20, ConstantOfShape);
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 20, StringConcat);

// Iterate over schema from ai.onnx version 20
class OpSet_Onnx_ver20 {
Expand All @@ -1113,6 +1114,7 @@ class OpSet_Onnx_ver20 {
fn(GetOpSchema<ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 20, GridSample)>());
fn(GetOpSchema<ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 20, Gelu)>());
fn(GetOpSchema<ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 20, ConstantOfShape)>());
fn(GetOpSchema<ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 20, StringConcat)>());
}
};

Expand Down