##### Copyright 2020 Google LLC.

Licensed under the Apache License, Version 2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

# TensorFlow Coder (TF-Coder): A program synthesis tool for TensorFlow expressions

**TensorFlow Coder** is a tool that helps you manipulate tensors with TensorFlow! If you provide an example of a tensor manipulation, TF-Coder will search for TensorFlow code that matches the example.

Follow this [**tutorial**](https://github.com/google-research/tensorflow-coder/blob/master/Tutorial.md) to get familiar with TF-Coder.

Make sure to connect to a runtime (click "Connect" in the top right corner).

## Step 0: Data collection request

**Note from the TF-Coder team at Google:**

We are excited to bring you TF-Coder, which we hope will accelerate your TensorFlow development.

We have one quick request first: we would like to log usage data for TF-Coder, so that we can identify scenarios where TF-Coder can be improved. This usage data will help us improve TF-Coder for everyone. We also believe that the usage data will be a valuable resource to the broader program synthesis research community. Please read the text below and then use the following cell to let us know whether we may log or release your usage data. Either way, you may still use the TF-Coder tool.

---

Collecting TF-Coder usage data will help Google improve the TF-Coder tool, and TensorFlow services more generally. This usage data includes (i) the problems you create, (ii) the settings for the TF-Coder tool, (iii) the TF-Coder tool's results for those problems, (iv) metadata relating to your session, problem and device you are using to use the TF-Coder tool, and (v) your location (determined by your IP address). The usage data does not include any other personally identifiable information. Please do not upload or provide any personal or confidential information to the TF-Coder tool.

In addition to Google’s internal use of your usage data, Google would also like to release some of such data in a public dataset to facilitate related research and to promote reproducible research publications. If your usage data is released it will be done in an open source fashion, meaning anyone with access to the data may use it for their purposes.

To opt-out of Google collecting your usage data entirely, uncheck the first box in the cell below. To opt out of your usage data being released as part of a public dataset, uncheck the second box. For the avoidance of doubt, if you only uncheck the second box, you are consenting to Google’s internal use of your usage data consistent with this disclosure. Regardless of your choice about sharing your usage data, you may still access and use the TF-Coder tool.

In [None]:
#@title Run this cell after making your choices.

allow_data_collection = True  #@param {type: "boolean"}
include_in_dataset = True  #@param {type: "boolean"}

if allow_data_collection:
  if include_in_dataset:
    print('Usage data may be collected and released in a public dataset.')
  else:
    print('Usage data may be collected but will not be publicly released.')
else:
  print('Usage data will not be collected.')

## Step 1: Installs and imports

In [None]:
#@title Run this cell to install and import TF-Coder.

ready = True
try:
  _ = (allow_data_collection, include_in_dataset)
except NameError as e:
  print('Please run the cell in Step 0 first.')
  ready = False

if ready:
  # Import TensorFlow and NumPy in case the user wants to create the example
  # programmatically.
  import tensorflow as tf
  import numpy as np
  
  !pip install tensorflow-coder
  from tf_coder.value_search import colab_interface
  from tf_coder.value_search import value_search_settings as settings_module

  if allow_data_collection:
    !pip install tensorflow-coder-colab-logging
    from tf_coder_colab_logging import colab_logging

  from google.colab import output
  output.clear()

  print('Imports successful. Loading models...')
  colab_interface.warm_up()
  print('Done. TF-Coder is now ready to use!')

## Step 2: Describe the problem with an example

Provide an **input-output example**:

* `inputs` is a dictionary containing one or more input tensors with variable names.
* `output` is the corresponding output tensor.

Tensors can be provided as lists (possibly multidimensional) or `tf.Tensor` objects.

You may also specify relevant **scalar constants**. TF-Coder also uses heuristics to guess a few useful constants.

Finally, it often helps to provide an **English description** of the desired tensor manipulation. This description can help the tool decide which TensorFlow operations to prioritize.

_Note: Please do not include confidential or personal information._

In [3]:
# Edit this cell! Follow the format of the example below.

# A dict mapping input variable names to input tensors.
inputs = {
    'rows': [10, 20, 30],
    'cols': [1, 2, 3, 4],
}

# The corresponding output tensor.
output = [[11, 12, 13, 14],
          [21, 22, 23, 24],
          [31, 32, 33, 34]]

# A list of relevant scalar constants, if any.
constants = []

# An English description of the tensor manipulation.
description = 'add two vectors with broadcasting to get a matrix'

## Step 3: Run the TF-Coder tool

In [4]:
#@title Run this cell to invoke TF-Coder on the problem from Step 2.

ready = True
try:
  _ = colab_interface
except NameError:
  print('Run the cell in Step 1 first.')
  ready = False
try:
  _ = (inputs, output, constants, description)
except NameError:
  print('Define the problem by running the cell in Step 2 first.')
  ready = False

#@markdown &nbsp;
#@markdown #### **Settings for TF-Coder**
#@markdown How long to search for a solution, in seconds.
time_limit = 60  #@param {type: "integer"}
#@markdown How many solutions to find before stopping. If more than 1, the entire search will slow down.
number_of_solutions = 1  #@param{type: "integer"}
#@markdown Whether solutions must use all inputs, at least one input, or no such requirement.
solution_requirement = "all inputs" #@param ["all inputs", "one input", "no restriction"]

settings = settings_module.from_dict({
    'timeout': time_limit,
    'only_minimal_solutions': False,
    'max_solutions': number_of_solutions,
    'require_all_inputs_used': solution_requirement == 'all inputs',
    'require_one_input_used': solution_requirement == 'one input',
})

if ready:
  if allow_data_collection:
    problem_id = colab_logging.get_uuid()
    colab_logging.log_problem(inputs, output, constants, description, settings,
                              include_in_dataset=include_in_dataset,
                              problem_id=problem_id)

  # Results will be printed to the cell's output.
  results = colab_interface.run_value_search_from_colab(
      inputs, output, constants, description, settings)

  if allow_data_collection:
    colab_logging.log_result(results,
                             include_in_dataset=include_in_dataset,
                             problem_id=problem_id)

Input 'rows':
tf.Tensor([10 20 30], shape=(3,), dtype=int32)

Input 'cols':
tf.Tensor([1 2 3 4], shape=(4,), dtype=int32)

Output:
tf.Tensor(
[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]], shape=(3, 4), dtype=int32)

Constants: [0, 1, -1, True, False, 3, 4]

Description: add two vectors with broadcasting to get a matrix

Searching...

Found solution: tf.add(cols, tf.expand_dims(rows, 1))

Solution was found in 0.3 seconds:
tf.add(cols, tf.expand_dims(rows, 1))


# &nbsp;
---


## Usage Tips

#### General

* If TF-Coder finds a solution, it is _guaranteed_ that the solution produces
  the example output when run on the example inputs. However, it is _not
  guaranteed_ that the solution generalizes in the way you intend! Please
  carefully review solutions produced by TF-Coder before using them in your real
  project.

* TF-Coder will often produce a solution that uses hardcoded constants for
  shapes or lengths, e.g., `tf.reshape(to_flatten, (6,))` in order to flatten an
  input tensor with shape `(2, 3)`. You may need to manually change these
  constants to improve the generality of the solution, e.g., replacing `6` with
  `-1` in this case. Use the shape attribute to obtain dimension lengths of
  input tensors, e.g., `to_flatten.shape[0]` would be `2`.

* If you want to play with TensorFlow in Colab (e.g., to understand how a
  TF-Coder solution works or to test your own solution):
  * The TF-Coder Colab already imports TensorFlow 2 and Numpy, for your
    convenience.
  * Use `tf.constant` to create a tensor from the list format:
    ```
    >>> tf.constant([[13, 22], [17, 5]])
    <tf.Tensor: id=1, shape=(2, 2), dtype=int32, numpy=
    array([[13, 22],
           [17,  5]], dtype=int32)>

    >>> tf.constant(12.3)
    <tf.Tensor: id=2, shape=(), dtype=float32, numpy=12.3>
    ```
  * A Colab notebook can only have one cell running at a time. If you want to
    experiment with TensorFlow code while TF-Coder is running, consider doing so
    in a separate Python shell.

* TF-Coder's running time is exponential in the complexity of the solution.
  _Simplifying the problem_, or _breaking it down into multiple steps_, can help
  TF-Coder find solutions quickly. For instance, if you know that a reshape,
  transpose, cast, or other similar operation should be applied to an input or
  as the last operation to produce the output, consider applying that operation
  manually to the input-output example, to help TF-Coder focus on the more
  difficult parts.

#### Input-Output Example

Creating a good input-output example is crucial for TF-Coder to find the
solution you want. The example should be robust enough to rule out _false
positive solutions_, which are TensorFlow expressions that work on the given
example, but fail to generalize in the desired way.

Here are some techniques that reduce the risk of false positives:

* **Include more numbers** in the input and output tensors. TF-Coder will only
  output a solution if it works on the provided example, so having many numbers
  in the output tensor means it is less likely for incorrect solutions to
  produce all of the correct numbers by chance.

* **Use random-looking numbers** in the input tensors. For example,
  `[18, 73, 34, 51]` would be a better input tensor than `[1, 2, 3, 4]`, since
  the former is not all consecutive and not all increasing. This helps eliminate
  patterns in the input tensors that false positive solutions can take advantage
  of.

* **Remove patterns from the output other than the intended one**. For example,
  if the output tensor is a selection of numbers from input tensors, make sure
  the selected numbers aren't all the maximum element along some axis, unless
  that is the intended pattern.

* **Include edge cases** where relevant. These could include negative numbers,
  zero, or duplicate numbers, when applicable to the problem.

* **Distinguish between indices and non-indices**. If you know a number should
  not be used as an index, consider making it out of range of valid indices
  (negative, too large, or even floating-point).

* **Follow any constraints that exist in your real program**. For example, if an
  input tensor only contains positive numbers, TF-Coder may produce a solution
  that doesn't generalize to negative numbers. Whether this is acceptable
  depends on whether that tensor could possibly contain negative numbers in your
  real program. Of course, depending on the problem, a completely general
  solution may be unnecessarily harder to find.

In general, false positive solutions are more common if the output tensor
contains a relatively low amount of information given the inputs. This may
happen if the output is a scalar or boolean tensor, or if the output is
constructed by selecting one or a few elements from an input. When possible, try
to include many numbers in the output so that it contains enough information to
unambiguously identify the intended transformation.

#### Constants

* TF-Coder will print out the list of constants that it is using, including
  constants chosen through heuristics. This list is ordered with highest-
  priority constants at the beginning.
* If the intended solution requires a constant that is not in TF-Coder's printed
  list of constants, then TF-Coder will be _unable_ to find the intended
  solution. So, it is important to provide any necessary constants.
* If you explicitly provide constants, they will be used with the highest
  priority. Thus, even if TF-Coder's heuristics choose your desired constant, it
  may be better to provide the constant explicitly so that TF-Coder is more
  confident about using your constant.
* Providing extraneous constants will slow down the tool.

#### Description

* The description is optional. If provided, it is used to prioritize TensorFlow
  operations that fit with the description.
* If you know of a TensorFlow operation (e.g., `tf.reduce_max`) that is
  relevant, include its name (e.g., "tf.reduce_max") anywhere in the
  description. This will lead TF-Coder to prioritize that operation.
* If possible, try to describe how the output should be computed, rather than
  what the output conceptually represents.
* A good description is less important than a good input-output example.

#### Other Details and Advanced Options

* When running TF-Coder, you can set the time limit, the number of solutions to
  find, and whether solutions are required to use inputs.
  * Time limit: This is the maximum amount of time, in seconds, that TF-Coder
    will spend on the problem before giving up. Note that you can stop the tool
    at any time by pressing the cell's stop button.
  * Number of solutions: TF-Coder can continue searching for more solutions
    after the first solution is found. This can help you examine different ways
    of solving the problem. However, enabling multiple solutions will cause the
    entire search to slow down, even for the first solution.
  * Solution requirement: By default, solutions are required to use every input
    tensor at least once. This constraint can be relaxed to allow solutions that
    use only one input (if there are multiple inputs), or even solutions that
    use no inputs at all.

* By default, integer tensors have a DType of `tf.int32`, and float tensors have
  a DType of `tf.float32`. To specify a different DType, provide a `tf.Tensor`
  object instead of a list. For example:
  * If an input is given as `[3, 1, 7, 4]`, then it will have a DType of
    `tf.int32`.
  * If an input is given as `tf.constant([3, 1, 7, 4], dtype=tf.int64)`, then it
    will have a DType of `tf.int64`.

* A primitive scalar input can be specified with a Python float or int, and a
  scalar tensor can be specified with a `tf.Tensor`:
  * If an input is given as `[123]`, then it will be a 1-dimensional tensor with
    shape `(1,)`, equivalent to `tf.constant([123])`.
  * If an input is given as `123`, then it will remain a Python primitive int,
    not a `tf.Tensor`.
  * If an input is given as `tf.constant(123)`, then it will be a 0-dimensional
    scalar tensor with shape `()`.

* Input and output tensors can have at most 4 dimensions.

## Example problems that TF-Coder can solve

Here are several examples of real-life problems that TF-Coder can solve.

In [None]:
# Real task encountered by a Googler.
inputs = {
    'tensor': [[0, 1, 0, 0],
               [0, 1, 1, 0],
               [1, 1, 1, 1]],
}
output = [[0.0, 1.0, 0.0, 0.0],
          [0.0, 0.5, 0.5, 0.0],
          [0.25, 0.25, 0.25, 0.25]]
constants = []
description = 'normalize the rows of a tensor'

In [None]:
# Real task encountered by a Googler.
inputs = {
    'elements': [0, 0, 0, 1, 3, 3],
}
output = [[0, 0], [0, 1], [0, 2], [1, 0], [3, 0], [3, 1]]
constants = []
description = 'pair each element with a counter'

In [None]:
# Real task encountered by a Googler.
inputs = {
    'sparse': tf.SparseTensor(
        indices=[[0, 0, 0], [0, 1, 1], [1, 1, 1], [1, 1, 2]],
        values=[1., 1., 1., 1.],
        dense_shape=[2, 2, 800]),
}
output = tf.SparseTensor(
    indices=[[0, 0, 0], [0, 1, 1]],
    values=[1., 1.],
    dense_shape=[1, 2, 800])
constants = []
description = 'slice index 0 of the first dimension of a SparseTensor'

In [None]:
# Real task encountered by a Googler.
inputs = {
    'lengths': [3, 4, 2, 1],
}
output = [[1, 1, 1, 0, 0],
          [1, 1, 1, 1, 0],
          [1, 1, 0, 0, 0],
          [1, 0, 0, 0, 0]]
constants = [5]
description = 'create a mask for sequences of the given lengths'

In [None]:
# Real task encountered by a Googler.
inputs = {
    'segments': [ 1,  1,  1,  0,  0,  2],
    'data':     [10, 20, 30, 14, 15, 26],
}
output = [14, 15, 10, 20, 30, 26]
constants = []
description = 'sort the segments'

In [None]:
# Adapted from https://stackoverflow.com/questions/53054668
inputs = {
    'values': [37, 42, 42, 37, 28, 15, 42, 15],
}
output = [0, 1, 1, 0, 2, 3, 1, 3]
constants = []
description = 'group items by value and get the group indices'

In [None]:
# Adapted from https://stackoverflow.com/questions/47816231
inputs = {
    'vector': [3, 5, 0, 2, 3, 3, 0],
}
output = [[1., 0., 0., 0., 1., 1., 0.],
          [0., 1., 0., 0., 0., 0., 0.],
          [0., 0., 1., 0., 0., 0., 1.],
          [0., 0., 0., 1., 0., 0., 0.],
          [1., 0., 0., 0., 1., 1., 0.],
          [1., 0., 0., 0., 1., 1., 0.],
          [0., 0., 1., 0., 0., 0., 1.]]
constants = []
description = 'binary tensor from vector indicating if elements are equal'

In [None]:
# Adapted from https://stackoverflow.com/questions/44834739
inputs = {
    'scores': [[0.7, 0.2, 0.1],
               [0.4, 0.5, 0.1],
               [0.4, 0.4, 0.2],
               [0.3, 0.4, 0.3],
               [0.0, 0.0, 1.0]],
}
output = [[1, 0, 0],
          [0, 1, 0],
          [1, 0, 0],
          [0, 1, 0],
          [0, 0, 1]]
constants = []
description = 'compute argmax in each tensor and set it to 1'

In [None]:
# Adapted from https://stackoverflow.com/questions/33769041
inputs = {
    'first': [-1, 0, -3, 2, 1, 3, 5, -1, -9, 2, 10],
    'second': [12, 3, 45, 6, 7, 8, 9, 87, 65, 4, 32],
}
output = [6, 8, 9, 4, 32]
constants = [1]
description = 'select the values in the second tensor where the first tensor is greater than 1'

## Supported Operations

In [5]:
# Run this cell to print all supported operations.
colab_interface.print_supported_operations()

TensorFlow functions:
---------------------
tf.abs(x)
tf.add(x, y)
tf.add_n(inputs)
tf.argmax(input, axis)
tf.argmin(input, axis)
tf.argsort(values, axis, stable=True)
tf.argsort(values, axis, direction='DESCENDING', stable=True)
tf.boolean_mask(tensor, mask)
tf.broadcast_to(input, shape)
tf.cast(x, dtype)
tf.clip_by_value(t, clip_value_min, clip_value_max)
tf.concat(values, axis)
tf.constant(value)
tf.constant(value, dtype)
tf.divide(x, y)
tf.equal(x, y)
tf.exp(x)
tf.expand_dims(input, axis)
tf.eye(num_rows)
tf.eye(num_rows, num_columns)
tf.eye(num_rows, dtype)
tf.fill(dims, value)
tf.gather(params, indices)
tf.gather(params, indices, axis, batch_dims)
tf.gather_nd(params, indices)
tf.gather_nd(params, indices, batch_dims)
tf.greater(x, y)
tf.greater_equal(x, y)
tf.math.bincount(arr)
tf.math.ceil(x)
tf.math.count_nonzero(input)
tf.math.count_nonzero(input, axis)
tf.math.cumsum(x, axis)
tf.math.cumsum(x, axis, exclusive=True)
tf.math.divide_no_nan(x, y)
tf.math.floor(x)
tf.math.log(x)


## Feedback? Questions?

More information and resources about TF-Coder can be found at our [GitHub repo](https://github.com/google-research/tensorflow-coder).

To report a bug or make a feature request, please raise a
[GitHub issue](https://github.com/google-research/tensorflow-coder/issues).

If you have accidentally run the TF-Coder tool on *personal or confidential information*, and you have agreed to the *public release* of your TF-Coder usage data, you may reach out to tf-coder-support@google.com to request removal of your data from the public release. Such requests will be handled on a best-effort basis with no guarantee of success. Again, please do not run the TF-Coder tool on personal or confidential information.

This is a research project, not an official Google product.