PyTorch Integration with Tensor Comprehensions #72

prigoyal · 2018-02-26T17:24:32Z

This PR adds PyTorch integration with TC. Here is what the PyTorch-TC package provides:

inputs and outputs to functions are torch.*Tensors
Integration with PyTorch autograd: if you specify forward and backward functions, you get an autograd function that takes Variable as input and returns Variable as output.
autotuner results can be cached to a file (for reuse)
Various test for ML layers are provided in test_python/layers/

To make it easy to use TC, we provide conda packages for it. We will provide instructions on how to install the conda package soon. Documentation will also be made available in few hours.

Edited the CodeOwners.md file to document the python bindings, conda recipes properly. Also rearranged the bullet points so that build related sections are together, docs are together, framework sections are together etc. If anyone has suggestions for improvements, please let me know :)

prigoyal · 2018-02-26T17:27:49Z

cc @soumith , @hmansell

nicolasvasilache · 2018-02-26T17:49:42Z

tensor_comprehensions/library/layers.yaml

+
+- name: cosine
+  lang: |
+    def cosine(float(M) I) -> (O) {


this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python

I think there are many TCs here in python side that don't have test cases in C++. I have tried adding them in C++ side as much as possible. I'll double check and add all the TC in C++ as well but that'll happen in separate PR with proper task tracking. Thanks.

#74 issue filed for it

We don't necessarily need tests for all cases, but we clearly want these things in the "core" part and without copying them. They are not python-specific in any sense. If they are copied, they will get out of sync faster than you can imagine. And having them in TC supports my concern about spurious python-specific changes to the syntax.

agreed, I have a task open for bringing the test case parity between C++ and Python TC, I think what i would like to do is to have a properly library that is maintained and updated.

assigned task #74 to myself

marking this as resolved, thanks @ftynse :)

nicolasvasilache · 2018-02-26T17:49:46Z

tensor_comprehensions/library/layers.yaml

+
+- name: cosine_similarity
+  lang: |
+    def cosine_similarity(float(M, N) I1, float(M, N) I2) -> (O, sumI1, sumI2) {{


this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python

#74 issue filed for it

nicolasvasilache · 2018-02-26T17:50:13Z

tensor_comprehensions/library/layers.yaml

+- name: add
+  lang: |
+    def add(float(N) A, float(N) B) -> (output) {
+      output(i) = A(i) + B(i) + 1


this seems a bit misleading? why add an extra 1?

just a simple example, can be removed or templated.

nicolasvasilache · 2018-02-26T17:50:20Z

tensor_comprehensions/library/layers.yaml

+
+- name: layernorm
+  lang: |
+    def layernorm(float(T, B, C) I) -> (O, mean, centered, var)


this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python

nicolasvasilache · 2018-02-26T17:50:59Z

tensor_comprehensions/library/layers.yaml

+
+- name: batchnorm
+  lang: |
+    def batchnorm(float(N,C,H,W) I, float(C) rMeanIn, float(C) rVarIn)


this is one that we only had in unit tests but not in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python

nicolasvasilache · 2018-02-26T17:51:03Z

tensor_comprehensions/library/layers.yaml

+
+- name: small_mobilenet
+  lang: |
+    def small_mobilenet(float(C1, H, W) I, float(C1, KH1, KW1) W1, float(C1) B1, float(C2, C1) W2, float(C2) B2)


this is one that we only had in unit tests but not in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python

I think the library referenced in above link is actually out-of-date. It has lots of fcrelu, LUT and matmul/convolution. Nothing else.
Also, the library is not used anywhere except some C2 tests, we might consider cleaning up this library or start using is more properly and hence maintainig it.

#74 issue filed for it

nicolasvasilache · 2018-02-26T17:53:35Z

tensor_comprehensions/pybinds/pybind_engine.cc

+          vargs.push_back(const_cast<char*>(arg.data()));
+        }
+        char** argv = vargs.data();
+        tc::python::globalDebugGflagsGlogInit(&numArgs, &argv);


this function returns an unused bool, what is the rationale here?

this is how gflags/glog work. if the init fails at any point, we get the exception otherwise, it's important to catch True everywhere for successful init. There is InitGoogleTest and InitGoogleLogging and ParseCommandLineFlags that all should pass. If any of them fails, the init was problematic either due to bad flags, etc.

For reference, this is also how things are in main in C++ tests.

#resolved (please comment if some other clarity is needed). Thanks.

nicolasvasilache · 2018-02-26T18:02:46Z

tensor_comprehensions/pybinds/pybind_options.cc

+            }
+            throw std::runtime_error("Invalid option passed");
+          }),
+          "Initiailze the mapping options from one of the following:\n 1. naive\n 2. pointwise\n 3. mlp\n 4. conv\n 5. group_conv\n 6. single_thread")


Initialize

nicolasvasilache · 2018-02-26T18:09:38Z

tensor_comprehensions/tc_unit.py

+
+    if options is None:
+        options = Options("naive")
+        logger.warning("No mapping options passed, 'naive' type mapping options will be used and will likely have bad performance. See help(your_layer.__call__) for setting mapping options.")


As discussed, builtin options are not meant to be used by users but as a starting point for autotuning because this will give a false sense of the performance of what TC can achieve while only poking at the tip of the iceberg.

In any case, you also want to emit similar warnings for everything that wasn't explicitly autotuned (for now).
None of the builtin options are meant to be used to obtain "good" perf, they only arised from the core devs having played a bit by hand with the mapper and as relevant strategies for testing.

yes, for that reason, we explicitly mention "bad performance" in the warning message and then users can adapt. The warning message also gives instruction for how to set the options. The warning is thrown everywhere when the explicit options are not passed. The documentation will also provide clear instructions for passing options.

The reason for providing defaults and using defaults is that it allows quick experimentation without a learning curve of "Options". The current "naive" options work fine and don't fail compilation, at least for all the test cases provided here.

I already expressed concerns about default options. My motivation is double-fold: (1) I want people to know that there is the thing called "polyhedral transformation and mapping" that happens inside, they don't have to understand what it does, but it helps me promote my work (yes, I'm that selfish); (2) we already saw people skeptical about the performance we can achieve, so trying the default thing and failing to perform will be most certainly used against us; I can go as far as expect a paper that claims to compare against TC while not using the mapping options at all.

I was never convinced by the warning approach. At some point in history, I was writing analyses and warnings for clang and saw how much people read those... I also don't think typing out or copy-pasting "MappingOptions.naive()" has a steep or any learning curve whatsoever. Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.

The current "naive" options work fine and don't fail compilation

If they eventually do, I decline any responsibility on fixing the backend.

LOL, I'll add the section about polyhedral/halide to the python doc strings to the function.

Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.

yep but our users are also researchers and not engineers :)

the python readme now has the polyhedral/halide part as well. That was purely an oversight on my part, apologies and correction made now :)

nicolasvasilache · 2018-02-26T18:15:39Z

test_python/layers/test_external_cuda_injection.py

+    def test_simple_cuda_injection(self):
+        lang = """
+        def add(float(N) A, float(N) B) -> (output) {
+            output(i) = A(i) + B(i) + 1


I have the same comment as above, adding an extra 1 is surprising and upon further inspection my_add below does not do it.
So you essentially have 2 implementations that don't do the same thing in the same test.
You inject the code in an unchecked way.

This test only passes because the output is not checked for correctness which is another thing I would like to see fixed.

Thanks!

@nicolasvasilache adding such checks for all the tests is not trivial, but there are checks in place, please look at test_tc.py

If you want to make contribution, please send a PR.

Having checks on C++ backend is also a surety enough that python side things are okay too. On the python front, the calls are simply dispatched to the C++ backend and there are no modifications made to the data.

nicolasvasilache

Please make python tests check against some golden standard implementation at some level of machine precision like we have been doing in the C++ tests for instance.

reason given above

nicolasvasilache · 2018-02-26T18:22:51Z

setup.py

+################################################################################
+# Copy the proto files only
+################################################################################
+copyfile("src/proto/compcache.proto", "tensor_comprehensions/compilation_cache.proto")


Could you please explain why this and the .gitignore change above are needed versus say just using the particular file?
If it is because we don't want to ship stuff from src then maybe it is appropriate to have a proto/*.proto outside of src and use that without copying?
What do you think @ftynse @ttheodor ?

as part of the conda package, we don't ship the src/proto codepath or any of src/

hence, we just pick whatever is relevant and we only need the proto files.

Usage:

look at tc_unit/decode function. This is a functionality people wanted that once they have autotuned their options, the want to see what they look like. so now they can do

tc.decode(cache_file)

Reason for chosing name decode is because protoc --decode is the actual command and it was intuitive to use this name and user mentioned also "how do I decode the binary options file"?

I agree with @nicolasvasilache on the location of .proto files, they don't really belong to src/ and should be promoted. I suppose, we can then ship proto/ in conda without looking in src/ and without copying.

@prigoyal has a point about protoc using decode as name for reading files. Because we say options are stored as protobuf, let's stick with protobuf lingo for them.

yeah, regarding proto/ we only ship them in conda and not look at src/

I'll mark this as resolved, current shipping of proto seems fine and the decode naming is okay too, thanks @nicolasvasilache and @ftynse

nicolasvasilache · 2018-02-26T18:23:48Z

tensor_comprehensions/library/layers.yaml

+
+- name: fully_connected
+  lang: |
+    def fully_connected(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {


nit: spacings are inconsistent in the parameters in this def and a few others

thanks for the nits, I'll correct them

nicolasvasilache · 2018-02-26T18:25:32Z

conda_recipes/README.md

+5. Now, we have conda setup, we are ready to build conda package for TC. For this,
+since TC has its dependencies that are linked dynamically, we will build conda
+packages for them separately. In short as of January 31, 2018 we need to build
+packages for `clang+llvm-tapir5.0`, `gflags`, `glog`, `Halide`, TC version of `isl`, `PET`, TC version of `ppcg`, `protobuf`, and finally `Tensor Comprehensions`.


PET / PPCG are outdated, please also update the date here.

nicolasvasilache · 2018-02-26T18:28:46Z

conda_recipes/Dockerfile

@@ -0,0 +1,68 @@
+FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04


FMI, why do we rewrite a full new Dockerfile rather than build on top of what we have been using in CI?

the one in CI doesn't have GPU but for building conda packages, I need to run python test which basically need GPU in all cases. Hence building the docker image which has GPUs. You can read how nvidia-docker works. I provided the readme instructions here for using and building this image.

The docker image can't have GPUs, you can't download them :)

All our trusty docker images used to inherit from this FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04 and all our xenial docker images used to inherit from this FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 but I now realize all that has changed drastically in the last stretch before the release and that we now use this file.

Still, when providing the Docker for conda_recipes you are using a very similar Docker base than what I wrote a few months back which got me confused.

Does this mean we should uniformize what is in here and revert back to the old docker files like you did in this PR?

I'll open a task specifically for this.

The docker files will be moved from here to a separate repo https://github.com/prigoyal/tensorcomp-dockerfiles that I have setup for Jenkins setup. The question about whether the images should be tiered or not is something that I am discussing with @pietern and @ezyang already. Please open the issue in the above repo to centralize discussions properly.

Still, when providing the Docker for conda_recipes you are using a very similar Docker base than what I wrote a few months back which got me confused.

Not quite sure, the docker images follow quite different structure now and provide more safeguarding.

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04

why did we inherit from those earlier? did you use nvidia-smi or did you ever run gpu test in that docker image? :)

The docker image can't have GPUs, you can't download them :)

yeah, that's why we use nvidia-docker https://github.com/NVIDIA/nvidia-docker , looking at the diagram here will make things clear. Docker is not a VM, its an image (note, the terminology has been incorrectly used in past)

revert back to the old docker files like you did in this PR

nope, setting up CircleCI to properly use the gpu images we build is cumbersome, so we are going to use AWS jenkins setup soon.

why did we inherit from those earlier? did you use nvidia-smi?

No since there is no cuda driver on the CPU-only CI machines.
However we have always had cross-compilation tests on CPU-only CI that were testing simple parts of the CUDA toolkit.

In particular you don't need a physical GPU to generate PTX and see whether compilation passes (e.g. NVRTC + CUB)...

you can continue doing that, no need to start from NVIDIA images. the toolkit is installed in the images as you also pointed out. Nvidia-docker images are useless if you don't run CI gpu tests with it.

nicolasvasilache · 2018-02-26T21:00:31Z

tensor_comprehensions/tc_unit.py

+        name, backward_name = get_tc_names_from_kwargs(**kwargs)
+        kwargs.pop("name", None)
+        backward = True if backward_name is not None else False
+        hash_key = get_tc_hash_key(name, *input_tensors)


So with this you are actually signing up for trouble as there is no guarantee that what you cache is correct in the long run.

This should work fine for what you are currently doing with it but you should plan to pybind export functions (that we will need to expose) from here.

The reason is that the caching logic will depend on output tensor sizes, input/output tensor striding, allocation alignment many other fun things as we start digging deeper into optimizations.

There is nothing to change for now but it is just a friendly warning that you should not rely too much on your own framework-specific caching mechanism since this seems quite central to your autotuner wrapper logic.

there is no support for strided tensors at the moment. When that is added and the task is closed for that, the python API will be modified accordingly.

nicolasvasilache · 2018-02-26T21:07:37Z

tensor_comprehensions/tc_unit.py

+    def __call__(self, *inputs, **kwargs):
+        r"""Runs the define TC language on given inputs.
+
+        Args:


nit: there seems to be a lot of duplicate between the __call__ and the autotune comments, is there a way to compress a bit?

nicolasvasilache · 2018-02-26T21:23:51Z

Nice and sweet, python API is a very lightweight wrapper around the core C++ API so that will be as future-proof as it can. I like it.

There is too much python side handling of caching logic to my taste but we haven't provided you with the proper C++ functions to do better for now; so we should iterate to avoid you building more logic that some day will break.

There are still concerns about:

silent default options, however loud the warning is
python TC string definitions that are different from the C++ string definitions
python test that do not test correctness

3 is a blocker

1 should be a blocker but I can make my peace with it

2 should disappear completely in a subsequent PR, only risk is that we will disseminate non-best-practice code for a bit, not a blocker in my opinion but it may raise some unhappiness on the python side in the future

Let's see what @ftynse says I think he said he will make the effort of looking tonight despite the late hour in Europe.

And congrats for driving this through on your own! 👍

prigoyal · 2018-02-26T21:44:09Z

There is too much python side handling of caching logic to my taste

the autotuner caching needs some work, I told @ftynse that I'll port over some functionality to C++ but that's a separate work and not for this PR

silent default options, however loud the warning is

NOT BLOCKER User are happy and desire this behavior, they don't want the learning curve of "Options" so it's good if we all make our peace with it.

python TC string definitions that are different from the C++ string definitions

NOT BLOCKER We already discussed it in much details. Users have given feedback and it doesn't matter to them. The correct fix is to allow handling scalars for bounds inference (task open for this). In the meantime, people can adopt one of the two ways of formatting below

OPTION 1

import tensor_comprehensions as tc
import torch
lang = """
def avgpool(float(B, C, H, W) input) -> (output) {{
    output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
}}
"""
avgpool = tc.define(lang, name="avgpool", constants={"sH":1, "sW":1, "kH":2, "kW":2})
inp = torch.ones(32, 3, 10, 10).cuda()
out = avgpool(inp)

OPTION 2

import tensor_comprehensions as tc
import torch
import re
LANG="""
def avgpool(float(B, C, H, W) input) -> (output) {
    output(b, c, h, w) += input(b, c, h * <sh> + kh, w * <sw> + kw) where kh in 0:<kH>, kw in 0:<kW>
}
"""
sH, sW, kH, kW = 1, 1, 2, 2
LANG = re.sub('<sh>', str(sH), LANG)
LANG = re.sub('<sw>', str(sW), LANG)
LANG = re.sub('<kH>', str(kH), LANG)
LANG = re.sub('<kW>', str(kW), LANG)
avgpool = tc.define(LANG, name="avgpool")
inp = torch.ones(1, 1, 4, 4).cuda()
out = avgpool(inp)

OR any other option people can come up with

python test that do not test correctness

NOT BLOCKER most of the python tests are backed by C++ backend. I will be completing the checks on python front over next days since a proper python test harness is needed. However, this is not a release blocker: I have myself verifieed running torch functions and comparing tc output on my command line + some users have also verified their outputs are correct. Also people will want to write operations that probably don't have reference implementation (we know of one example already) and TC produced correct outputs still. But that is not a guarantee for anything however, most of the heavyweight layers have already been tested properly so I am confident.

NONE of the above concerns should be a blocker. THey rather seem to be enhancements to me. The python packages are ready. Things will never be perfect but we are properly documenting what works and what doesn't work.

nicolasvasilache · 2018-02-26T21:54:39Z

tensor_comprehensions/README.md

+
+## Communication
+
+* **Email**: prigoyal@fb.com


tensorcomp@fb.com

this section is just not needed, I'll remove it, the communication will go on the README page

prigoyal · 2018-02-26T22:29:29Z

@ftynse I had a quick sync with @nicolasvasilache just now about all the points above. We have agreed that the the 3rd point is a nice to have thing and I'll work on it after the release or as I find time over this week. We have also agreed that I'll wait until midnight for your review (midnight = EST midnight), since you are at conference until Wed and the release has been planned for Wed for few weeks, so it's fine that I merge this PR at midnight. You can still leave your review comments and I'll try to address them in separate PRs.

ftynse

I have expressed my two concerns several times already, let me reiterate one last time.

Mapping options must not be implicit, even with the warning. Bold assumption: nobody reads warnings. Demonstration: we decided not to compile TC with -Werror because of the number of errors it produced (we tried after spending days of time to catch some tricky memory problem). That is, we don't care about warnings, so we should not think users will. There are individual places which make it worse, in particualr the matrix multiplication example in the documentation, which is the absolute worse case for TC performance even with options.
Additional "syntax" features in python, in particular constant substitution, are a bad idea. While we indeed want a C++-template-like behavior, it should be implemented in the language, not as a hack around. The arguments are the same as for including templates in the C++ language as a compile-time generalization mechanism instead of the C preprocessor. If this is really a dearly wanted feature, it can be easily implemented as a separate call without making the user believe such substitution is a core feature of TC language.

I believe, the decisions taken on both points go against the line 2 in python philosophy aka PEP 20: "explicit is better than implicit". They make additional, optional behavior implicit. I also believe that, once public, any changes to this behavior will become significantly more complicated because people will start to rely on the existing behavior, however illogical and implicit it is.

While these decisions seem to be maked on some user feedback, I strongly believe we should not just satisfy whatever requests we receive. There is a whole methodology on conducting user research, ranging from interviews to focus groups, to participatory design, to controlled experiments...

Finally, I noticed this in the discussion of the original release blog post: many people interpreted TC as "maching learning DSL with CUDA codegen and autotuner", in a sense that they think the autotuner applies to, say, moving bits of CUDA code around. Even despite the fact that half of the post was about the "transformative" parts of TC, that is Halide and polyhedral compilation. I find it disturbing that the python README has a paragraph selling the TC bindings and not a single mention of either Halide or polyhedra, independently of me working on the polyhedral part. People want to have some magic tool that gives them better kernels, but sorry TC is not magic.

ftynse · 2018-02-26T21:58:53Z

CMakeLists.txt

@@ -174,8 +174,6 @@ endif()

 ################################################################################
 # ATen
-
-# ATen - if someone ships libATen.so,  we try to use that if available


don't we anymore?

we do but this has been more properly highlighted in next steps when we are searching for libaten in pytorch install

ftynse · 2018-02-26T22:07:07Z

conda_recipes/isl-tc/meta.yaml

@@ -0,0 +1,35 @@
+package:
+  name: isl-tc


Why is this called isl-tc and not just isl? And before somebody says that there is already a conda package for isl, why isn't this a problem for protobuf below?

to make it specific to ISL, I think there was a comment long back when I was writing these scripts that we should make things as tc like ppcg-tc, etc so that we don't confuse with the other actual repos

ftynse · 2018-02-26T22:10:06Z

setup.py

+        ['git', 'rev-parse', 'HEAD'], cwd=TOP_DIR
+    ).decode('ascii').strip()
+except (OSError, subprocess.CalledProcessError):
+    git_version = None


Is it just allowed to proceed with None as git version?

yeah, I'll clean this up though, we might prefer to rather fail

ftynse · 2018-02-26T22:13:21Z

setup.py

+################################################################################
+# Copy the proto files only
+################################################################################
+copyfile("src/proto/compcache.proto", "tensor_comprehensions/compilation_cache.proto")


I agree with @nicolasvasilache on the location of .proto files, they don't really belong to src/ and should be promoted. I suppose, we can then ship proto/ in conda without looking in src/ and without copying.

@prigoyal has a point about protoc using decode as name for reading files. Because we say options are stored as protobuf, let's stick with protobuf lingo for them.

ftynse · 2018-02-26T22:13:59Z

src/core/flags.cc

+  if (*pargc == 0) {
+    return true;
+  }
+  // TODO: (prigoyal): we need to do some filtering on flags here,


open an issue for tracking?

yeah, regarding proto/ we only ship them in conda and not look at src/, but like you said, this should probably be under TensorComprehensions/proto maybe or anywhere else

ftynse · 2018-02-26T22:35:43Z

tensor_comprehensions/pybinds/pybind_options.cc

+          [](tc::MappingOptions& instance, const std::string& type) {
+            instance.outerScheduleFusionStrategy(type);
+          },
+          "Require TC to try and execute different TC expressions interleaved (Max), separately (Min)\nor interleaved as long as sufficient parallelism is exploited (Preserve3Coincident) by\nperforming loop fusion and fission. Applies before tiling")


"Applies before tiling" is incorrect. It applies to tile loops, i.e. the loops that enumerate tiles.

will correct this

ftynse · 2018-02-26T22:36:27Z

tensor_comprehensions/pybinds/pybind_options.cc

+          },
+          "Set up outerScheduleFusionStrategy and intraTileFusionStrategy to the given value")
+      .def(
+          "outerScheduleFusionStrategy",


intraTileFusionStrategy seems not exported.

ftynse · 2018-02-26T22:45:22Z

tensor_comprehensions/tc_unit.py

+
+    if options is None:
+        options = Options("naive")
+        logger.warning("No mapping options passed, 'naive' type mapping options will be used and will likely have bad performance. See help(your_layer.__call__) for setting mapping options.")


I already expressed concerns about default options. My motivation is double-fold: (1) I want people to know that there is the thing called "polyhedral transformation and mapping" that happens inside, they don't have to understand what it does, but it helps me promote my work (yes, I'm that selfish); (2) we already saw people skeptical about the performance we can achieve, so trying the default thing and failing to perform will be most certainly used against us; I can go as far as expect a paper that claims to compare against TC while not using the mapping options at all.

I was never convinced by the warning approach. At some point in history, I was writing analyses and warnings for clang and saw how much people read those... I also don't think typing out or copy-pasting "MappingOptions.naive()" has a steep or any learning curve whatsoever. Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.

The current "naive" options work fine and don't fail compilation

If they eventually do, I decline any responsibility on fixing the backend.

ftynse · 2018-02-26T22:51:07Z

tensor_comprehensions/tc_unit.py

+            and the :attr:`lang` should contain both forward and backward TC
+            strings.
+
+        constants (dict, optional):


I already complained about this implicit constant insertion that uses python syntax. With the introduction of the yaml file, it only got worse. It will be looked at as example of TC in isolation of python and its string replacement. Why can't you have two separate calls with clear functionality:

generic_tc = "def whatever() {{ code with {python stuff} }}" preprocessed_tc = tc.substitute(generic_tc, {"python stuff": 42}) tc.define(preprocessed_tc)

This entire situation is the same choice as C preprocessor vs C++ templates for compile-time behavior. C++ committee still tries to clean up the damage from the preprocessor and pretending it was not a part of the language.

ftynse · 2018-02-26T22:51:28Z

tensor_comprehensions/torch_tc/__init__.py

@@ -0,0 +1,14 @@
+# Copyright (c) 2017-present, Facebook, Inc.


why do we need an empty file?

this is how python packaging works

ftynse · 2018-02-26T23:26:39Z

Regarding the choice of the options listed above, this is a clear example of false dichotomy. You don't have to make the users write individual substitutions with regular expressions in case when they are not embedded in the define call. You can trivially extract the code that does this substitution in define into a separate function, substitute, and ask to call that whenever necessary.

Something like

# inside our code
def substitute(lang, argMap):
    for key in argMap:
       lang = re.sub('${{{0}}}'.format(key) , str(argMap), lang)
    return lang

# in user code
macro = """
  def whatever() -> (O) {
    O(0) = ${foo}
  }
"""
lang = substitute(macro, {"foo": 42})
tc.define(lang, ...)

you can also use kwargs for substitute in this case ;)

prigoyal · 2018-02-26T23:34:46Z

sure, that's just a matter of whether we want to make the extra call for substitute or not. TC define call can do that substitution and takes the dictionary of arguments. From the feedback from @soumith about using the constants dictionary. I think this is largely empirical and I'll add an option to also do the substitute as well and expose a test case, users can chose whatever makes the most sense to them.

how about this:

lang = """
def avgpool(float(B, C, H, W) input) -> (output) {
    output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
}
"""
avgpool = tc.define(lang, substitute={sh=1, ...})

ftynse · 2018-02-27T00:04:18Z

I'm totally willing to pay the price of one extra explicit call to get separation of concerns: TC language is what is defined by the BNF in the documentation, substitution is done separately because TC language does not support it.

jekbradbury · 2018-02-27T00:15:44Z

Why can't substitution rely on the native Python template module or <string>.format?

prigoyal · 2018-02-27T00:27:47Z

@jekbradbury the right fix is really to be able to infer the bounds in where clauses when scalars are passed. Then we can write better TC like this:

def avgpool(float(B, C, H, W) input, float kH, float kW) -> (output) {
    output(b, c, h, w) += input(b, c, h + kh, w + kw) where kh in 0:kH, kw in 0:kW
}

but this isn't easy to fix atm due to how our computation model is right now, how we allocate tensors and scope we compile things in.

A couple of things we wanted to make sure in this API:

Keep the API minimal and low surface. Right now there are basically only three calls
tc.define, my_awesome_layer(), my_layer.autotune()
We want to keep the parity between what an actual TC syntax is i.e. def name () { code } and that TC syntax should stick in Python as well.

We can use any kind of substitution: the python template module will work just fine, user can do that, or you can also do re.sub or you can use what we provide i.e. pass constants to tc.define call. anything is fine and okay and we leave it to use to do whatever makes the most sense.

I'll add a test case using the template module as well to make things more explicit

prigoyal · 2018-02-27T00:49:19Z

@jekbradbury you would still have to do the following:

>>> sH = 1
>>> sW = 1
>>> kH = 2
>>> kW = 2
>>> lang = f"""
... def avgpool(float(B, C, H, W) input) -> (output) {{
... output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
... }}
... """

i.e. still use the {{...}} but remove the constants, or substitute call. The reason for still using {{...}} is that TC string parser doesn't interact well with the python string formatter cc @zdevito

However, this is the thing that will stick around for ~1 month and is a temporary workaround for now.

Although using the f is nice, I wouldn't want to want put any python3.6 specific features in API. we officially support py3.6 but py2 might as well just work and we haven't tested it and don't support it officially.

jekbradbury · 2018-02-27T00:52:56Z

Ohhh of course, that clash of curly-brace semantics is annoying but seems kind of inevitable.

prigoyal · 2018-02-27T00:53:54Z

yep, for now, it is :) but we are working on it and will put a giant WARNING / NOTE that this is temporary solution and will be away in 1 month or so :)

nicolasvasilache · 2018-02-27T01:53:55Z

Re. fake templating by string replacement I strictly prefer the <...> syntax with proper factory functions exposed via pybind. The main reason is that this is what we have been using for a while in places like this.
However I understand the extra constraints on @prigoyal and we can always fix it later but as @ftynse also expressed: "If they are copied, they will get out of sync faster than you can imagine".
This is the general reason why copy-pasta represents such a big technical debt.

Why can't substitution rely on the native Python template module or .format

@jekbradbury reg. the above, any logic implemented outside of C++ that people risk starting to rely upon is a liability (do I really need Python when I start optimizing for older Android phones). See also @ftynse's comment about what TC is and isn't. Additionally, this is an opportunity to reintroduce all the copy-pasta issues.

This is likely a symptom not yet having found the proper abstraction to express such things, whether actual templates in the language, a proper TcBuilder API or something else that we would love other's feedback on. The immediate consequence is that whatever is implemented atm for this particular point is subject to change.

prigoyal · 2018-02-27T02:57:05Z

I strictly prefer the <...> syntax with proper factory

I agree and I would still say that the proper solution is to have bounds inference from scalar working and then we can avoid all sorts of custom bindings .

However I understand the extra constraints on @prigoyal and we can always fix it later but as @ftynse also expressed: "If they are copied, they will get out of sync faster than you can imagine".

There aren't any constraints to incur big technical debts. @ftynse is saying that we need to build the better header library in C++ backend which doesn't exist right now. Rather the TC defs that have been defined in python side in the library file.

This is the general reason why copy-pasta represents such a big technical debt.

@nicolasvasilache can you point out where is the copy-pasta in this? I would like to clean that up :)

(do I really need Python when I start optimizing for older Android phones)

no you don't, this will be addressed when the TC defs are moved to the library header (which TC codebase itself doesn't respect atm)

prigoyal · 2018-02-27T03:24:02Z

landing this now, CI is green, all comments from @ftynse have been addressed as well.

Acknowledging two things at the time of landing this:

Issue 74 should become a higher priority. TC codebase doesn't respect the include/library itself and it's not a fair expectation to block the release/PR to make that nice in codebase now. That's an effort that requires proper thinking also thinking about long term pytorch JIT plans we have.
Loud and clear warnings about {{...}} syntax. For now, @ftynse has given consent that we can keep it as is. Issue Allow using scalars for bounds inference #73 needs work to make this proper.

ftynse · 2018-02-27T09:18:55Z

Why can't substitution rely on the native Python template module or .format?

Anything related to the TC language should be (eventually) done in the TC parser, and absolutely cannot rely on anything python-specific.

But I am positive that internally the substitution in define does rely on <string>.format but does so implicitly. That's why I have been pushing towards making this explicit. TC is specified as a string after all, you can apply whatever string operations to it before you call define.

ftynse · 2018-02-27T09:23:28Z

i.e. still use the {{...}} but remove the constants, or substitute call. The reason for still using {{...}} is that TC string parser doesn't interact well with the python string formatter

that clash of curly-brace semantics is annoying but seems kind of inevitable.

I don't think TC parser should have anything special to support whatever upstream string manipulation. We should think and choose a proper syntax for template-style substitutions, that's all.

prigoyal · 2018-02-27T13:49:32Z

But I am positive that internally the substitution in define does rely on .format but does so implicitly.

yes, that's correct :) it's one line for .format() if you look at the define call. that define basically does what you said: "you can apply whatever string operations to it before you call define"

I don't think TC parser should have anything special to support whatever upstream string manipulation.

I meant that if you do just one braces like this:

def avgpool(float(B, C, H, W) input) -> (output) {
   output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
}

and try to format it, it won't work and the reason is the {..} in the TC parser. It doesn't know that there are two {{ now and parses things incorrectly.

prigoyal added review me framework labels Feb 26, 2018

prigoyal requested review from zdevito, abadams and ftynse February 26, 2018 17:24

facebook-github-bot added the CLA Signed label Feb 26, 2018

prigoyal requested review from nicolasvasilache, wsmoses and skimo-openhub February 26, 2018 17:28

prigoyal force-pushed the pth-tc-package branch from 63fd678 to 5dc5fed Compare February 26, 2018 17:39

nicolasvasilache reviewed Feb 26, 2018

View reviewed changes

nicolasvasilache previously requested changes Feb 26, 2018

View reviewed changes

nicolasvasilache reviewed Feb 26, 2018

View reviewed changes

prigoyal force-pushed the pth-tc-package branch 3 times, most recently from c7b77e2 to ef932e1 Compare February 26, 2018 18:37

nicolasvasilache reviewed Feb 26, 2018

View reviewed changes

prigoyal force-pushed the pth-tc-package branch 2 times, most recently from 0e756e6 to e908c24 Compare February 26, 2018 22:35

ftynse reviewed Feb 26, 2018

View reviewed changes

prigoyal force-pushed the pth-tc-package branch from e908c24 to b3208a9 Compare February 27, 2018 01:24

prigoyal force-pushed the pth-tc-package branch 2 times, most recently from a1def65 to 3daec20 Compare February 27, 2018 02:51

PyTorch Integration with Tensor Comprehensions

77e5211

prigoyal force-pushed the pth-tc-package branch from 3daec20 to 77e5211 Compare February 27, 2018 03:10

prigoyal merged commit 0d34ee3 into master Feb 27, 2018

prigoyal deleted the pth-tc-package branch February 27, 2018 04:49

		@@ -0,0 +1,68 @@
		FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04

		@@ -0,0 +1,14 @@
		# Copyright (c) 2017-present, Facebook, Inc.

PyTorch Integration with Tensor Comprehensions #72

PyTorch Integration with Tensor Comprehensions #72

Conversation

prigoyal commented Feb 26, 2018 • edited

prigoyal commented Feb 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prigoyal Feb 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prigoyal Feb 26, 2018 • edited

Choose a reason for hiding this comment

nicolasvasilache left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prigoyal Feb 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolasvasilache Feb 26, 2018 • edited

Choose a reason for hiding this comment

prigoyal Feb 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prigoyal Feb 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolasvasilache commented Feb 26, 2018 • edited

prigoyal commented Feb 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prigoyal commented Feb 26, 2018

ftynse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prigoyal Feb 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prigoyal commented Feb 26, 2018 •

edited

prigoyal Feb 26, 2018 •

edited

prigoyal Feb 26, 2018 •

edited

prigoyal Feb 26, 2018 •

edited

nicolasvasilache Feb 26, 2018 •

edited

prigoyal Feb 26, 2018 •

edited

prigoyal Feb 26, 2018 •

edited

nicolasvasilache commented Feb 26, 2018 •

edited

prigoyal commented Feb 26, 2018 •

edited

prigoyal Feb 26, 2018 •

edited