Skip to content
This repository has been archived by the owner on Apr 28, 2023. It is now read-only.

PyTorch Integration with Tensor Comprehensions #72

Merged
merged 1 commit into from Feb 27, 2018
Merged

Conversation

prigoyal
Copy link
Contributor

@prigoyal prigoyal commented Feb 26, 2018

This PR adds PyTorch integration with TC. Here is what the PyTorch-TC package provides:

  • inputs and outputs to functions are torch.*Tensors
  • Integration with PyTorch autograd: if you specify forward and backward functions, you get an autograd function that takes Variable as input and returns Variable as output.
  • autotuner results can be cached to a file (for reuse)
  • Various test for ML layers are provided in test_python/layers/

To make it easy to use TC, we provide conda packages for it. We will provide instructions on how to install the conda package soon. Documentation will also be made available in few hours.

Edited the CodeOwners.md file to document the python bindings, conda recipes properly. Also rearranged the bullet points so that build related sections are together, docs are together, framework sections are together etc. If anyone has suggestions for improvements, please let me know :)

@prigoyal
Copy link
Contributor Author

cc @soumith , @hmansell


- name: cosine
lang: |
def cosine(float(M) I) -> (O) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are many TCs here in python side that don't have test cases in C++. I have tried adding them in C++ side as much as possible. I'll double check and add all the TC in C++ as well but that'll happen in separate PR with proper task tracking. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#74 issue filed for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't necessarily need tests for all cases, but we clearly want these things in the "core" part and without copying them. They are not python-specific in any sense. If they are copied, they will get out of sync faster than you can imagine. And having them in TC supports my concern about spurious python-specific changes to the syntax.

Copy link
Contributor Author

@prigoyal prigoyal Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I have a task open for bringing the test case parity between C++ and Python TC, I think what i would like to do is to have a properly library that is maintained and updated.

assigned task #74 to myself

marking this as resolved, thanks @ftynse :)


- name: cosine_similarity
lang: |
def cosine_similarity(float(M, N) I1, float(M, N) I2) -> (O, sumI1, sumI2) {{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#74 issue filed for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#resolved

- name: add
lang: |
def add(float(N) A, float(N) B) -> (output) {
output(i) = A(i) + B(i) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit misleading? why add an extra 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a simple example, can be removed or templated.


- name: layernorm
lang: |
def layernorm(float(T, B, C) I) -> (O, mean, centered, var)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python


- name: batchnorm
lang: |
def batchnorm(float(N,C,H,W) I, float(C) rMeanIn, float(C) rVarIn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one that we only had in unit tests but not in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python


- name: small_mobilenet
lang: |
def small_mobilenet(float(C1, H, W) I, float(C1, KH1, KW1) W1, float(C1) B1, float(C2, C1) W2, float(C2) B2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one that we only had in unit tests but not in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the library referenced in above link is actually out-of-date. It has lots of fcrelu, LUT and matmul/convolution. Nothing else.
Also, the library is not used anywhere except some C2 tests, we might consider cleaning up this library or start using is more properly and hence maintainig it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#74 issue filed for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#resolved

vargs.push_back(const_cast<char*>(arg.data()));
}
char** argv = vargs.data();
tc::python::globalDebugGflagsGlogInit(&numArgs, &argv);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function returns an unused bool, what is the rationale here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how gflags/glog work. if the init fails at any point, we get the exception otherwise, it's important to catch True everywhere for successful init. There is InitGoogleTest and InitGoogleLogging and ParseCommandLineFlags that all should pass. If any of them fails, the init was problematic either due to bad flags, etc.

For reference, this is also how things are in main in C++ tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#resolved (please comment if some other clarity is needed). Thanks.

}
throw std::runtime_error("Invalid option passed");
}),
"Initiailze the mapping options from one of the following:\n 1. naive\n 2. pointwise\n 3. mlp\n 4. conv\n 5. group_conv\n 6. single_thread")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize


if options is None:
options = Options("naive")
logger.warning("No mapping options passed, 'naive' type mapping options will be used and will likely have bad performance. See help(your_layer.__call__) for setting mapping options.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, builtin options are not meant to be used by users but as a starting point for autotuning because this will give a false sense of the performance of what TC can achieve while only poking at the tip of the iceberg.

In any case, you also want to emit similar warnings for everything that wasn't explicitly autotuned (for now).
None of the builtin options are meant to be used to obtain "good" perf, they only arised from the core devs having played a bit by hand with the mapper and as relevant strategies for testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for that reason, we explicitly mention "bad performance" in the warning message and then users can adapt. The warning message also gives instruction for how to set the options. The warning is thrown everywhere when the explicit options are not passed. The documentation will also provide clear instructions for passing options.

The reason for providing defaults and using defaults is that it allows quick experimentation without a learning curve of "Options". The current "naive" options work fine and don't fail compilation, at least for all the test cases provided here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already expressed concerns about default options. My motivation is double-fold: (1) I want people to know that there is the thing called "polyhedral transformation and mapping" that happens inside, they don't have to understand what it does, but it helps me promote my work (yes, I'm that selfish); (2) we already saw people skeptical about the performance we can achieve, so trying the default thing and failing to perform will be most certainly used against us; I can go as far as expect a paper that claims to compare against TC while not using the mapping options at all.

I was never convinced by the warning approach. At some point in history, I was writing analyses and warnings for clang and saw how much people read those... I also don't think typing out or copy-pasting "MappingOptions.naive()" has a steep or any learning curve whatsoever. Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.

The current "naive" options work fine and don't fail compilation

If they eventually do, I decline any responsibility on fixing the backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I'll add the section about polyhedral/halide to the python doc strings to the function.

Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.

yep but our users are also researchers and not engineers :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the python readme now has the polyhedral/halide part as well. That was purely an oversight on my part, apologies and correction made now :)

def test_simple_cuda_injection(self):
lang = """
def add(float(N) A, float(N) B) -> (output) {
output(i) = A(i) + B(i) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same comment as above, adding an extra 1 is surprising and upon further inspection my_add below does not do it.
So you essentially have 2 implementations that don't do the same thing in the same test.
You inject the code in an unchecked way.

This test only passes because the output is not checked for correctness which is another thing I would like to see fixed.

Thanks!

Copy link
Contributor Author

@prigoyal prigoyal Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolasvasilache adding such checks for all the tests is not trivial, but there are checks in place, please look at test_tc.py

If you want to make contribution, please send a PR.

Having checks on C++ backend is also a surety enough that python side things are okay too. On the python front, the calls are simply dispatched to the C++ backend and there are no modifications made to the data.

Copy link
Contributor

@nicolasvasilache nicolasvasilache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make python tests check against some golden standard implementation at some level of machine precision like we have been doing in the C++ tests for instance.

################################################################################
# Copy the proto files only
################################################################################
copyfile("src/proto/compcache.proto", "tensor_comprehensions/compilation_cache.proto")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why this and the .gitignore change above are needed versus say just using the particular file?
If it is because we don't want to ship stuff from src then maybe it is appropriate to have a proto/*.proto outside of src and use that without copying?
What do you think @ftynse @ttheodor ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as part of the conda package, we don't ship the src/proto codepath or any of src/

hence, we just pick whatever is relevant and we only need the proto files.

Usage:

look at tc_unit/decode function. This is a functionality people wanted that once they have autotuned their options, the want to see what they look like. so now they can do

tc.decode(cache_file)

Reason for chosing name decode is because protoc --decode is the actual command and it was intuitive to use this name and user mentioned also "how do I decode the binary options file"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nicolasvasilache on the location of .proto files, they don't really belong to src/ and should be promoted. I suppose, we can then ship proto/ in conda without looking in src/ and without copying.

@prigoyal has a point about protoc using decode as name for reading files. Because we say options are stored as protobuf, let's stick with protobuf lingo for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, regarding proto/ we only ship them in conda and not look at src/

Copy link
Contributor Author

@prigoyal prigoyal Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mark this as resolved, current shipping of proto seems fine and the decode naming is okay too, thanks @nicolasvasilache and @ftynse


- name: fully_connected
lang: |
def fully_connected(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spacings are inconsistent in the parameters in this def and a few others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the nits, I'll correct them

5. Now, we have conda setup, we are ready to build conda package for TC. For this,
since TC has its dependencies that are linked dynamically, we will build conda
packages for them separately. In short as of January 31, 2018 we need to build
packages for `clang+llvm-tapir5.0`, `gflags`, `glog`, `Halide`, TC version of `isl`, `PET`, TC version of `ppcg`, `protobuf`, and finally `Tensor Comprehensions`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PET / PPCG are outdated, please also update the date here.

@@ -0,0 +1,68 @@
FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FMI, why do we rewrite a full new Dockerfile rather than build on top of what we have been using in CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one in CI doesn't have GPU but for building conda packages, I need to run python test which basically need GPU in all cases. Hence building the docker image which has GPUs. You can read how nvidia-docker works. I provided the readme instructions here for using and building this image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker image can't have GPUs, you can't download them :)

All our trusty docker images used to inherit from this FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04 and all our xenial docker images used to inherit from this FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 but I now realize all that has changed drastically in the last stretch before the release and that we now use this file.

Still, when providing the Docker for conda_recipes you are using a very similar Docker base than what I wrote a few months back which got me confused.

Does this mean we should uniformize what is in here and revert back to the old docker files like you did in this PR?

I'll open a task specifically for this.

Copy link
Contributor Author

@prigoyal prigoyal Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker files will be moved from here to a separate repo https://github.com/prigoyal/tensorcomp-dockerfiles that I have setup for Jenkins setup. The question about whether the images should be tiered or not is something that I am discussing with @pietern and @ezyang already. Please open the issue in the above repo to centralize discussions properly.

Still, when providing the Docker for conda_recipes you are using a very similar Docker base than what I wrote a few months back which got me confused.

Not quite sure, the docker images follow quite different structure now and provide more safeguarding.

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04

why did we inherit from those earlier? did you use nvidia-smi or did you ever run gpu test in that docker image? :)

The docker image can't have GPUs, you can't download them :)

yeah, that's why we use nvidia-docker https://github.com/NVIDIA/nvidia-docker , looking at the diagram here will make things clear. Docker is not a VM, its an image (note, the terminology has been incorrectly used in past)

revert back to the old docker files like you did in this PR

nope, setting up CircleCI to properly use the gpu images we build is cumbersome, so we are going to use AWS jenkins setup soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we inherit from those earlier? did you use nvidia-smi?

No since there is no cuda driver on the CPU-only CI machines.
However we have always had cross-compilation tests on CPU-only CI that were testing simple parts of the CUDA toolkit.

In particular you don't need a physical GPU to generate PTX and see whether compilation passes (e.g. NVRTC + CUB)...

Copy link
Contributor Author

@prigoyal prigoyal Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can continue doing that, no need to start from NVIDIA images. the toolkit is installed in the images as you also pointed out. Nvidia-docker images are useless if you don't run CI gpu tests with it.

@prigoyal prigoyal force-pushed the pth-tc-package branch 3 times, most recently from c7b77e2 to ef932e1 Compare February 26, 2018 18:37
name, backward_name = get_tc_names_from_kwargs(**kwargs)
kwargs.pop("name", None)
backward = True if backward_name is not None else False
hash_key = get_tc_hash_key(name, *input_tensors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this you are actually signing up for trouble as there is no guarantee that what you cache is correct in the long run.

This should work fine for what you are currently doing with it but you should plan to pybind export functions (that we will need to expose) from here.

The reason is that the caching logic will depend on output tensor sizes, input/output tensor striding, allocation alignment many other fun things as we start digging deeper into optimizations.

There is nothing to change for now but it is just a friendly warning that you should not rely too much on your own framework-specific caching mechanism since this seems quite central to your autotuner wrapper logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no support for strided tensors at the moment. When that is added and the task is closed for that, the python API will be modified accordingly.

def __call__(self, *inputs, **kwargs):
r"""Runs the define TC language on given inputs.

Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there seems to be a lot of duplicate between the __call__ and the autotune comments, is there a way to compress a bit?

@nicolasvasilache
Copy link
Contributor

nicolasvasilache commented Feb 26, 2018

Nice and sweet, python API is a very lightweight wrapper around the core C++ API so that will be as future-proof as it can. I like it.

There is too much python side handling of caching logic to my taste but we haven't provided you with the proper C++ functions to do better for now; so we should iterate to avoid you building more logic that some day will break.

There are still concerns about:

  1. silent default options, however loud the warning is
  2. python TC string definitions that are different from the C++ string definitions
  3. python test that do not test correctness

3 is a blocker

1 should be a blocker but I can make my peace with it

2 should disappear completely in a subsequent PR, only risk is that we will disseminate non-best-practice code for a bit, not a blocker in my opinion but it may raise some unhappiness on the python side in the future

Let's see what @ftynse says I think he said he will make the effort of looking tonight despite the late hour in Europe.

And congrats for driving this through on your own! 👍

@prigoyal
Copy link
Contributor Author

prigoyal commented Feb 26, 2018

There is too much python side handling of caching logic to my taste

the autotuner caching needs some work, I told @ftynse that I'll port over some functionality to C++ but that's a separate work and not for this PR

  1. silent default options, however loud the warning is

NOT BLOCKER User are happy and desire this behavior, they don't want the learning curve of "Options" so it's good if we all make our peace with it.

  1. python TC string definitions that are different from the C++ string definitions

NOT BLOCKER We already discussed it in much details. Users have given feedback and it doesn't matter to them. The correct fix is to allow handling scalars for bounds inference (task open for this). In the meantime, people can adopt one of the two ways of formatting below

OPTION 1

import tensor_comprehensions as tc
import torch
lang = """
def avgpool(float(B, C, H, W) input) -> (output) {{
    output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
}}
"""
avgpool = tc.define(lang, name="avgpool", constants={"sH":1, "sW":1, "kH":2, "kW":2})
inp = torch.ones(32, 3, 10, 10).cuda()
out = avgpool(inp)

OPTION 2

import tensor_comprehensions as tc
import torch
import re
LANG="""
def avgpool(float(B, C, H, W) input) -> (output) {
    output(b, c, h, w) += input(b, c, h * <sh> + kh, w * <sw> + kw) where kh in 0:<kH>, kw in 0:<kW>
}
"""
sH, sW, kH, kW = 1, 1, 2, 2
LANG = re.sub('<sh>', str(sH), LANG)
LANG = re.sub('<sw>', str(sW), LANG)
LANG = re.sub('<kH>', str(kH), LANG)
LANG = re.sub('<kW>', str(kW), LANG)
avgpool = tc.define(LANG, name="avgpool")
inp = torch.ones(1, 1, 4, 4).cuda()
out = avgpool(inp)

OR any other option people can come up with

  1. python test that do not test correctness

NOT BLOCKER most of the python tests are backed by C++ backend. I will be completing the checks on python front over next days since a proper python test harness is needed. However, this is not a release blocker: I have myself verifieed running torch functions and comparing tc output on my command line + some users have also verified their outputs are correct. Also people will want to write operations that probably don't have reference implementation (we know of one example already) and TC produced correct outputs still. But that is not a guarantee for anything however, most of the heavyweight layers have already been tested properly so I am confident.

NONE of the above concerns should be a blocker. THey rather seem to be enhancements to me. The python packages are ready. Things will never be perfect but we are properly documenting what works and what doesn't work.


## Communication

* **Email**: prigoyal@fb.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tensorcomp@fb.com

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section is just not needed, I'll remove it, the communication will go on the README page

@prigoyal
Copy link
Contributor Author

@ftynse I had a quick sync with @nicolasvasilache just now about all the points above. We have agreed that the the 3rd point is a nice to have thing and I'll work on it after the release or as I find time over this week. We have also agreed that I'll wait until midnight for your review (midnight = EST midnight), since you are at conference until Wed and the release has been planned for Wed for few weeks, so it's fine that I merge this PR at midnight. You can still leave your review comments and I'll try to address them in separate PRs.

@prigoyal prigoyal force-pushed the pth-tc-package branch 2 times, most recently from 0e756e6 to e908c24 Compare February 26, 2018 22:35
Copy link
Contributor

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have expressed my two concerns several times already, let me reiterate one last time.

  1. Mapping options must not be implicit, even with the warning. Bold assumption: nobody reads warnings. Demonstration: we decided not to compile TC with -Werror because of the number of errors it produced (we tried after spending days of time to catch some tricky memory problem). That is, we don't care about warnings, so we should not think users will. There are individual places which make it worse, in particualr the matrix multiplication example in the documentation, which is the absolute worse case for TC performance even with options.

  2. Additional "syntax" features in python, in particular constant substitution, are a bad idea. While we indeed want a C++-template-like behavior, it should be implemented in the language, not as a hack around. The arguments are the same as for including templates in the C++ language as a compile-time generalization mechanism instead of the C preprocessor. If this is really a dearly wanted feature, it can be easily implemented as a separate call without making the user believe such substitution is a core feature of TC language.

I believe, the decisions taken on both points go against the line 2 in python philosophy aka PEP 20: "explicit is better than implicit". They make additional, optional behavior implicit. I also believe that, once public, any changes to this behavior will become significantly more complicated because people will start to rely on the existing behavior, however illogical and implicit it is.

While these decisions seem to be maked on some user feedback, I strongly believe we should not just satisfy whatever requests we receive. There is a whole methodology on conducting user research, ranging from interviews to focus groups, to participatory design, to controlled experiments...

Finally, I noticed this in the discussion of the original release blog post: many people interpreted TC as "maching learning DSL with CUDA codegen and autotuner", in a sense that they think the autotuner applies to, say, moving bits of CUDA code around. Even despite the fact that half of the post was about the "transformative" parts of TC, that is Halide and polyhedral compilation. I find it disturbing that the python README has a paragraph selling the TC bindings and not a single mention of either Halide or polyhedra, independently of me working on the polyhedral part. People want to have some magic tool that gives them better kernels, but sorry TC is not magic.

@@ -174,8 +174,6 @@ endif()

################################################################################
# ATen

# ATen - if someone ships libATen.so, we try to use that if available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do but this has been more properly highlighted in next steps when we are searching for libaten in pytorch install

@@ -0,0 +1,35 @@
package:
name: isl-tc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called isl-tc and not just isl? And before somebody says that there is already a conda package for isl, why isn't this a problem for protobuf below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it specific to ISL, I think there was a comment long back when I was writing these scripts that we should make things as tc like ppcg-tc, etc so that we don't confuse with the other actual repos

setup.py Outdated
['git', 'rev-parse', 'HEAD'], cwd=TOP_DIR
).decode('ascii').strip()
except (OSError, subprocess.CalledProcessError):
git_version = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just allowed to proceed with None as git version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'll clean this up though, we might prefer to rather fail

################################################################################
# Copy the proto files only
################################################################################
copyfile("src/proto/compcache.proto", "tensor_comprehensions/compilation_cache.proto")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nicolasvasilache on the location of .proto files, they don't really belong to src/ and should be promoted. I suppose, we can then ship proto/ in conda without looking in src/ and without copying.

@prigoyal has a point about protoc using decode as name for reading files. Because we say options are stored as protobuf, let's stick with protobuf lingo for them.

if (*pargc == 0) {
return true;
}
// TODO: (prigoyal): we need to do some filtering on flags here,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open an issue for tracking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor Author

@prigoyal prigoyal Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, regarding proto/ we only ship them in conda and not look at src/, but like you said, this should probably be under TensorComprehensions/proto maybe or anywhere else

[](tc::MappingOptions& instance, const std::string& type) {
instance.outerScheduleFusionStrategy(type);
},
"Require TC to try and execute different TC expressions interleaved (Max), separately (Min)\nor interleaved as long as sufficient parallelism is exploited (Preserve3Coincident) by\nperforming loop fusion and fission. Applies before tiling")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Applies before tiling" is incorrect. It applies to tile loops, i.e. the loops that enumerate tiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will correct this

},
"Set up outerScheduleFusionStrategy and intraTileFusionStrategy to the given value")
.def(
"outerScheduleFusionStrategy",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intraTileFusionStrategy seems not exported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do


if options is None:
options = Options("naive")
logger.warning("No mapping options passed, 'naive' type mapping options will be used and will likely have bad performance. See help(your_layer.__call__) for setting mapping options.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already expressed concerns about default options. My motivation is double-fold: (1) I want people to know that there is the thing called "polyhedral transformation and mapping" that happens inside, they don't have to understand what it does, but it helps me promote my work (yes, I'm that selfish); (2) we already saw people skeptical about the performance we can achieve, so trying the default thing and failing to perform will be most certainly used against us; I can go as far as expect a paper that claims to compare against TC while not using the mapping options at all.

I was never convinced by the warning approach. At some point in history, I was writing analyses and warnings for clang and saw how much people read those... I also don't think typing out or copy-pasting "MappingOptions.naive()" has a steep or any learning curve whatsoever. Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.

The current "naive" options work fine and don't fail compilation

If they eventually do, I decline any responsibility on fixing the backend.

and the :attr:`lang` should contain both forward and backward TC
strings.

constants (dict, optional):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already complained about this implicit constant insertion that uses python syntax. With the introduction of the yaml file, it only got worse. It will be looked at as example of TC in isolation of python and its string replacement. Why can't you have two separate calls with clear functionality:

generic_tc = "def whatever() {{ code with {python stuff} }}"
preprocessed_tc = tc.substitute(generic_tc, {"python stuff": 42})
tc.define(preprocessed_tc)

This entire situation is the same choice as C preprocessor vs C++ templates for compile-time behavior. C++ committee still tries to clean up the damage from the preprocessor and pretending it was not a part of the language.

@@ -0,0 +1,14 @@
# Copyright (c) 2017-present, Facebook, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need an empty file?

Copy link
Contributor Author

@prigoyal prigoyal Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how python packaging works

@ftynse
Copy link
Contributor

ftynse commented Feb 26, 2018

Regarding the choice of the options listed above, this is a clear example of false dichotomy. You don't have to make the users write individual substitutions with regular expressions in case when they are not embedded in the define call. You can trivially extract the code that does this substitution in define into a separate function, substitute, and ask to call that whenever necessary.

Something like

# inside our code
def substitute(lang, argMap):
    for key in argMap:
       lang = re.sub('${{{0}}}'.format(key) , str(argMap), lang)
    return lang

# in user code
macro = """
  def whatever() -> (O) {
    O(0) = ${foo}
  }
"""
lang = substitute(macro, {"foo": 42})
tc.define(lang, ...)

you can also use kwargs for substitute in this case ;)

@prigoyal
Copy link
Contributor Author

sure, that's just a matter of whether we want to make the extra call for substitute or not. TC define call can do that substitution and takes the dictionary of arguments. From the feedback from @soumith about using the constants dictionary. I think this is largely empirical and I'll add an option to also do the substitute as well and expose a test case, users can chose whatever makes the most sense to them.

how about this:

lang = """
def avgpool(float(B, C, H, W) input) -> (output) {
    output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
}
"""
avgpool = tc.define(lang, substitute={sh=1, ...})

@ftynse
Copy link
Contributor

ftynse commented Feb 27, 2018

I'm totally willing to pay the price of one extra explicit call to get separation of concerns: TC language is what is defined by the BNF in the documentation, substitution is done separately because TC language does not support it.

@jekbradbury
Copy link
Contributor

Why can't substitution rely on the native Python template module or <string>.format?

@prigoyal
Copy link
Contributor Author

prigoyal commented Feb 27, 2018

@jekbradbury the right fix is really to be able to infer the bounds in where clauses when scalars are passed. Then we can write better TC like this:

def avgpool(float(B, C, H, W) input, float kH, float kW) -> (output) {
    output(b, c, h, w) += input(b, c, h + kh, w + kw) where kh in 0:kH, kw in 0:kW
}

but this isn't easy to fix atm due to how our computation model is right now, how we allocate tensors and scope we compile things in.

A couple of things we wanted to make sure in this API:

  1. Keep the API minimal and low surface. Right now there are basically only three calls
    tc.define, my_awesome_layer(), my_layer.autotune()

  2. We want to keep the parity between what an actual TC syntax is i.e. def name () { code } and that TC syntax should stick in Python as well.

We can use any kind of substitution: the python template module will work just fine, user can do that, or you can also do re.sub or you can use what we provide i.e. pass constants to tc.define call. anything is fine and okay and we leave it to use to do whatever makes the most sense.

I'll add a test case using the template module as well to make things more explicit

@prigoyal
Copy link
Contributor Author

@jekbradbury you would still have to do the following:

>>> sH = 1
>>> sW = 1
>>> kH = 2
>>> kW = 2
>>> lang = f"""
... def avgpool(float(B, C, H, W) input) -> (output) {{
... output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
... }}
... """

i.e. still use the {{...}} but remove the constants, or substitute call. The reason for still using {{...}} is that TC string parser doesn't interact well with the python string formatter cc @zdevito

However, this is the thing that will stick around for ~1 month and is a temporary workaround for now.

Although using the f is nice, I wouldn't want to want put any python3.6 specific features in API. we officially support py3.6 but py2 might as well just work and we haven't tested it and don't support it officially.

@jekbradbury
Copy link
Contributor

Ohhh of course, that clash of curly-brace semantics is annoying but seems kind of inevitable.

@prigoyal
Copy link
Contributor Author

yep, for now, it is :) but we are working on it and will put a giant WARNING / NOTE that this is temporary solution and will be away in 1 month or so :)

@nicolasvasilache
Copy link
Contributor

Re. fake templating by string replacement I strictly prefer the <...> syntax with proper factory functions exposed via pybind. The main reason is that this is what we have been using for a while in places like this.
However I understand the extra constraints on @prigoyal and we can always fix it later but as @ftynse also expressed: "If they are copied, they will get out of sync faster than you can imagine".
This is the general reason why copy-pasta represents such a big technical debt.

Why can't substitution rely on the native Python template module or .format

@jekbradbury reg. the above, any logic implemented outside of C++ that people risk starting to rely upon is a liability (do I really need Python when I start optimizing for older Android phones). See also @ftynse's comment about what TC is and isn't. Additionally, this is an opportunity to reintroduce all the copy-pasta issues.

This is likely a symptom not yet having found the proper abstraction to express such things, whether actual templates in the language, a proper TcBuilder API or something else that we would love other's feedback on. The immediate consequence is that whatever is implemented atm for this particular point is subject to change.

@prigoyal prigoyal force-pushed the pth-tc-package branch 2 times, most recently from a1def65 to 3daec20 Compare February 27, 2018 02:51
@prigoyal
Copy link
Contributor Author

prigoyal commented Feb 27, 2018

I strictly prefer the <...> syntax with proper factory

I agree and I would still say that the proper solution is to have bounds inference from scalar working and then we can avoid all sorts of custom bindings .

However I understand the extra constraints on @prigoyal and we can always fix it later but as @ftynse also expressed: "If they are copied, they will get out of sync faster than you can imagine".

There aren't any constraints to incur big technical debts. @ftynse is saying that we need to build the better header library in C++ backend which doesn't exist right now. Rather the TC defs that have been defined in python side in the library file.

This is the general reason why copy-pasta represents such a big technical debt.

@nicolasvasilache can you point out where is the copy-pasta in this? I would like to clean that up :)

(do I really need Python when I start optimizing for older Android phones)

no you don't, this will be addressed when the TC defs are moved to the library header (which TC codebase itself doesn't respect atm)

@prigoyal
Copy link
Contributor Author

landing this now, CI is green, all comments from @ftynse have been addressed as well.

Acknowledging two things at the time of landing this:

  1. Issue 74 should become a higher priority. TC codebase doesn't respect the include/library itself and it's not a fair expectation to block the release/PR to make that nice in codebase now. That's an effort that requires proper thinking also thinking about long term pytorch JIT plans we have.

  2. Loud and clear warnings about {{...}} syntax. For now, @ftynse has given consent that we can keep it as is. Issue Allow using scalars for bounds inference #73 needs work to make this proper.

@prigoyal prigoyal merged commit 0d34ee3 into master Feb 27, 2018
@prigoyal prigoyal deleted the pth-tc-package branch February 27, 2018 04:49
@ftynse
Copy link
Contributor

ftynse commented Feb 27, 2018

Why can't substitution rely on the native Python template module or .format?

Anything related to the TC language should be (eventually) done in the TC parser, and absolutely cannot rely on anything python-specific.

But I am positive that internally the substitution in define does rely on <string>.format but does so implicitly. That's why I have been pushing towards making this explicit. TC is specified as a string after all, you can apply whatever string operations to it before you call define.

@ftynse
Copy link
Contributor

ftynse commented Feb 27, 2018

i.e. still use the {{...}} but remove the constants, or substitute call. The reason for still using {{...}} is that TC string parser doesn't interact well with the python string formatter

that clash of curly-brace semantics is annoying but seems kind of inevitable.

I don't think TC parser should have anything special to support whatever upstream string manipulation. We should think and choose a proper syntax for template-style substitutions, that's all.

@prigoyal
Copy link
Contributor Author

But I am positive that internally the substitution in define does rely on .format but does so implicitly.

yes, that's correct :) it's one line for .format() if you look at the define call. that define basically does what you said: "you can apply whatever string operations to it before you call define"

I don't think TC parser should have anything special to support whatever upstream string manipulation.

I meant that if you do just one braces like this:

def avgpool(float(B, C, H, W) input) -> (output) {
   output(b, c, h, w) += input(b, c, h * {sH} + kh, w * {sW} + kw) where kh in 0:{kH}, kw in 0:{kW}
}

and try to format it, it won't work and the reason is the {..} in the TC parser. It doesn't know that there are two {{ now and parses things incorrectly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants