onnx · ezyang · Sep 29, 2017 · Sep 22, 2017 · Sep 27, 2017 · Sep 27, 2017
diff --git a/docs/Operators.md b/docs/Operators.md
@@ -1,4 +1,5 @@
 ## Operator Schemas
+*This file is automatically generated from the [def files](/onnx/defs)*
 * **Abs**
 
   Absolute takes one input data (Tensor<T>) and produces one output data
@@ -280,7 +281,7 @@
       <dt>X</dt>
       <dd>Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image.Otherwise the size is (N x D1 x D2 ... x Dn)</dd>
       <dt>weights</dt>
-      <dd>The weight tensor that will be used in the convolutions; has size (M x C x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C x k1 x k2 x ... x kn), where is the dimenstion of the kernel</dd>
+      <dd>The weight tensor that will be used in the convolutions; has size (M x C x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C x k1 x k2 x ... x kn), where is the dimension of the kernel</dd>
     </dl>
   * **output**:
     <dl>
@@ -311,7 +312,7 @@
       <dt>X</dt>
       <dd>Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image.Otherwise the size is (N x D1 x D2 ... x Dn)</dd>
       <dt>weights</dt>
-      <dd>The weight tensor that will be used in the convolutions; has size (C x M x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (C x M x k1 x k2 x ... x kn), where is the dimenstion of the kernel</dd>
+      <dd>The weight tensor that will be used in the convolutions; has size (C x M x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (C x M x k1 x k2 x ... x kn), where is the dimension of the kernel</dd>
     </dl>
   * **output**:
     <dl>
@@ -459,7 +460,7 @@
   * **output**:
     <dl>
       <dt>output</dt>
-      <dd>A tensor of rank 2 with the contents of the input tensor, with first dimension equal first dimension of input, and remaining input dimensions flatenned into the inner dimension of the output.</dd>
+      <dd>A tensor of rank 2 with the contents of the input tensor, with first dimension equal first dimension of input, and remaining input dimensions flattened into the inner dimension of the output.</dd>
     </dl>
 
 
@@ -520,6 +521,39 @@
     </dl>
 
 
+* **Gemm**
+
+  General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3
+  Compute Y = alpha * A * B + beta * C, where input tensor A has dimension (M X K), input tensor B has dimension (K X N), input tensor C and output tensor Y have dimension (M X N). Input tensor C can be used inplace as the output tensor Y. If attribute broadcast is non-zero, input tensor C will be broadcasted to match the dimension requirement. If A can be transposed before doing the computation if attribute transA is non-zero, same for B and transB.
+  * **attribute**:
+    <dl>
+      <dt>alpha</dt>
+      <dd>Scalar multiplier for the product of input tensors A * B</dd>
+      <dt>beta</dt>
+      <dd>Scalar multiplier for input tensor C</dd>
+      <dt>broadcast</dt>
+      <dd>Whether C should be broadcasted</dd>
+      <dt>transA</dt>
+      <dd>Whether A should be transposed</dd>
+      <dt>transB</dt>
+      <dd>Whether B should be transposed</dd>
+    </dl>
+  * **input**:
+    <dl>
+      <dt>A</dt>
+      <dd>Input tensor A</dd>
+      <dt>B</dt>
+      <dd>Input tensor B</dd>
+      <dt>C</dt>
+      <dd>Input tensor C, can be inplace.</dd>
+    </dl>
+  * **output**:
+    <dl>
+      <dt>Y</dt>
+      <dd>Output tensor.</dd>
+    </dl>
+
+
 * **GlobalAveragePool**
 
   GlobalAveragePool consumes an input tensor X and applies average pooling across the

diff --git a/onnx/defs/gen_doc.py b/onnx/defs/gen_doc.py
@@ -18,7 +18,7 @@ def display_number(v):
 
 def main(args):
     args.output.write('## Operator Schemas\n')
-    args.output.write('*This file is automatically generated from the [def files](/onnx/defs)*')
+    args.output.write('*This file is automatically generated from the [def files](/onnx/defs)*\n')
 
     for op_type, schema in sorted(defs.get_all_schemas().items()):
         # If support level is experimental, then don't generate documentation.

diff --git a/onnx/defs/math/defs.cc b/onnx/defs/math/defs.cc
@@ -384,3 +384,30 @@ will throw errors.
          "as described above.")
   .Output(0, "output", "The softmax normalized output values with the same "
           "shape as input tensor.");
+
+OPERATOR_SCHEMA(Gemm)
+    .NumInputs(3)
+    .NumOutputs(1)
+    .SetDoc(R"DOC(General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3
+Compute Y = alpha * A * B + beta * C, where input tensor A has dimension (M X K), input tensor B has dimension (K X N), input tensor C and output tensor Y have dimension (M X N). Input tensor C can be used inplace as the output tensor Y. If attribute broadcast is non-zero, input tensor C will be broadcasted to match the dimension requirement. If A can be transposed before doing the computation if attribute transA is non-zero, same for B and transB.
+)DOC")
+    .Input(0, "A", "Input tensor A")
+    .Input(1, "B", "Input tensor B")
+    .Input(2, "C", "Input tensor C, can be inplace.")
+    .AllowConsumed({{2, 0}})
+    .Output(0, "Y", "Output tensor.")
+    .Attr("transA",
+          "Whether A should be transposed",
+          AttrType::INT)
+    .Attr("transB",
+          "Whether B should be transposed",
+          AttrType::INT)
+    .Attr("broadcast",
+          "Whether C should be broadcasted",
+          AttrType::INT)
+    .Attr("alpha",
+          "Scalar multiplier for the product of input tensors A * B",
+          AttrType::FLOAT)
+    .Attr("beta",
+          "Scalar multiplier for input tensor C",
+          AttrType::FLOAT);