PyTorch Integration with Tensor Comprehensions #72
Conversation
63fd678
to
5dc5fed
Compare
|
||
- name: cosine | ||
lang: | | ||
def cosine(float(M) I) -> (O) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are many TCs here in python side that don't have test cases in C++. I have tried adding them in C++ side as much as possible. I'll double check and add all the TC in C++ as well but that'll happen in separate PR with proper task tracking. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#74 issue filed for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't necessarily need tests for all cases, but we clearly want these things in the "core" part and without copying them. They are not python-specific in any sense. If they are copied, they will get out of sync faster than you can imagine. And having them in TC supports my concern about spurious python-specific changes to the syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
- name: cosine_similarity | ||
lang: | | ||
def cosine_similarity(float(M, N) I1, float(M, N) I2) -> (O, sumI1, sumI2) {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#74 issue filed for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#resolved
- name: add | ||
lang: | | ||
def add(float(N) A, float(N) B) -> (output) { | ||
output(i) = A(i) + B(i) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bit misleading? why add an extra 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a simple example, can be removed or templated.
|
||
- name: layernorm | ||
lang: | | ||
def layernorm(float(T, B, C) I) -> (O, mean, centered, var) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one of the few that didn't exist in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python
|
||
- name: batchnorm | ||
lang: | | ||
def batchnorm(float(N,C,H,W) I, float(C) rMeanIn, float(C) rVarIn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one that we only had in unit tests but not in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python
|
||
- name: small_mobilenet | ||
lang: | | ||
def small_mobilenet(float(C1, H, W) I, float(C1, KH1, KW1) W1, float(C1) B1, float(C2, C1) W2, float(C2) B2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one that we only had in unit tests but not in https://github.com/facebookresearch/TensorComprehensions/tree/master/include/tc/library
please consider opening a task so that we have parity between C++ and Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the library referenced in above link is actually out-of-date. It has lots of fcrelu, LUT and matmul/convolution. Nothing else.
Also, the library is not used anywhere except some C2 tests, we might consider cleaning up this library or start using is more properly and hence maintainig it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#74 issue filed for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#resolved
vargs.push_back(const_cast<char*>(arg.data())); | ||
} | ||
char** argv = vargs.data(); | ||
tc::python::globalDebugGflagsGlogInit(&numArgs, &argv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function returns an unused bool
, what is the rationale here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how gflags/glog work. if the init fails at any point, we get the exception otherwise, it's important to catch True everywhere for successful init. There is InitGoogleTest and InitGoogleLogging and ParseCommandLineFlags that all should pass. If any of them fails, the init was problematic either due to bad flags, etc.
For reference, this is also how things are in main in C++ tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#resolved (please comment if some other clarity is needed). Thanks.
} | ||
throw std::runtime_error("Invalid option passed"); | ||
}), | ||
"Initiailze the mapping options from one of the following:\n 1. naive\n 2. pointwise\n 3. mlp\n 4. conv\n 5. group_conv\n 6. single_thread") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize
|
||
if options is None: | ||
options = Options("naive") | ||
logger.warning("No mapping options passed, 'naive' type mapping options will be used and will likely have bad performance. See help(your_layer.__call__) for setting mapping options.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, builtin options are not meant to be used by users but as a starting point for autotuning because this will give a false sense of the performance of what TC can achieve while only poking at the tip of the iceberg.
In any case, you also want to emit similar warnings for everything that wasn't explicitly autotuned (for now).
None of the builtin options are meant to be used to obtain "good" perf, they only arised from the core devs having played a bit by hand with the mapper and as relevant strategies for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, for that reason, we explicitly mention "bad performance" in the warning message and then users can adapt. The warning message also gives instruction for how to set the options. The warning is thrown everywhere when the explicit options are not passed. The documentation will also provide clear instructions for passing options.
The reason for providing defaults and using defaults is that it allows quick experimentation without a learning curve of "Options". The current "naive" options work fine and don't fail compilation, at least for all the test cases provided here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already expressed concerns about default options. My motivation is double-fold: (1) I want people to know that there is the thing called "polyhedral transformation and mapping" that happens inside, they don't have to understand what it does, but it helps me promote my work (yes, I'm that selfish); (2) we already saw people skeptical about the performance we can achieve, so trying the default thing and failing to perform will be most certainly used against us; I can go as far as expect a paper that claims to compare against TC while not using the mapping options at all.
I was never convinced by the warning approach. At some point in history, I was writing analyses and warnings for clang and saw how much people read those... I also don't think typing out or copy-pasting "MappingOptions.naive()" has a steep or any learning curve whatsoever. Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.
The current "naive" options work fine and don't fail compilation
If they eventually do, I decline any responsibility on fixing the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, I'll add the section about polyhedral/halide to the python doc strings to the function.
Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.
yep but our users are also researchers and not engineers :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the python readme now has the polyhedral/halide part as well. That was purely an oversight on my part, apologies and correction made now :)
def test_simple_cuda_injection(self): | ||
lang = """ | ||
def add(float(N) A, float(N) B) -> (output) { | ||
output(i) = A(i) + B(i) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same comment as above, adding an extra 1
is surprising and upon further inspection my_add
below does not do it.
So you essentially have 2 implementations that don't do the same thing in the same test.
You inject the code in an unchecked way.
This test only passes because the output is not checked for correctness which is another thing I would like to see fixed.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolasvasilache adding such checks for all the tests is not trivial, but there are checks in place, please look at test_tc.py
If you want to make contribution, please send a PR.
Having checks on C++ backend is also a surety enough that python side things are okay too. On the python front, the calls are simply dispatched to the C++ backend and there are no modifications made to the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make python tests check against some golden standard implementation at some level of machine precision like we have been doing in the C++ tests for instance.
################################################################################ | ||
# Copy the proto files only | ||
################################################################################ | ||
copyfile("src/proto/compcache.proto", "tensor_comprehensions/compilation_cache.proto") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why this and the .gitignore
change above are needed versus say just using the particular file?
If it is because we don't want to ship stuff from src
then maybe it is appropriate to have a proto/*.proto
outside of src and use that without copying?
What do you think @ftynse @ttheodor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as part of the conda package, we don't ship the src/proto
codepath or any of src/
hence, we just pick whatever is relevant and we only need the proto files.
Usage:
look at tc_unit/decode
function. This is a functionality people wanted that once they have autotuned their options, the want to see what they look like. so now they can do
tc.decode(cache_file)
Reason for chosing name decode
is because protoc --decode
is the actual command and it was intuitive to use this name and user mentioned also "how do I decode the binary options file"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @nicolasvasilache on the location of .proto
files, they don't really belong to src/
and should be promoted. I suppose, we can then ship proto/
in conda without looking in src/
and without copying.
@prigoyal has a point about protoc
using decode
as name for reading files. Because we say options are stored as protobuf, let's stick with protobuf lingo for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, regarding proto/
we only ship them in conda and not look at src/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll mark this as resolved, current shipping of proto seems fine and the decode naming is okay too, thanks @nicolasvasilache and @ftynse
|
||
- name: fully_connected | ||
lang: | | ||
def fully_connected(float(B,M) I, float(N,M) W1, float(N) B1) -> (O1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spacings are inconsistent in the parameters in this def and a few others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the nits, I'll correct them
conda_recipes/README.md
Outdated
5. Now, we have conda setup, we are ready to build conda package for TC. For this, | ||
since TC has its dependencies that are linked dynamically, we will build conda | ||
packages for them separately. In short as of January 31, 2018 we need to build | ||
packages for `clang+llvm-tapir5.0`, `gflags`, `glog`, `Halide`, TC version of `isl`, `PET`, TC version of `ppcg`, `protobuf`, and finally `Tensor Comprehensions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PET / PPCG are outdated, please also update the date here.
@@ -0,0 +1,68 @@ | |||
FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FMI, why do we rewrite a full new Dockerfile rather than build on top of what we have been using in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the one in CI doesn't have GPU but for building conda packages, I need to run python test which basically need GPU in all cases. Hence building the docker image which has GPUs. You can read how nvidia-docker works. I provided the readme instructions here for using and building this image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker image can't have GPUs, you can't download them :)
All our trusty docker images used to inherit from this FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04
and all our xenial docker images used to inherit from this FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
but I now realize all that has changed drastically in the last stretch before the release and that we now use this file.
Still, when providing the Docker for conda_recipes you are using a very similar Docker base than what I wrote a few months back which got me confused.
Does this mean we should uniformize what is in here and revert back to the old docker files like you did in this PR?
I'll open a task specifically for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker files will be moved from here to a separate repo https://github.com/prigoyal/tensorcomp-dockerfiles that I have setup for Jenkins setup. The question about whether the images should be tiered or not is something that I am discussing with @pietern and @ezyang already. Please open the issue in the above repo to centralize discussions properly.
Still, when providing the Docker for conda_recipes you are using a very similar Docker base than what I wrote a few months back which got me confused.
Not quite sure, the docker images follow quite different structure now and provide more safeguarding.
FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04
why did we inherit from those earlier? did you use nvidia-smi or did you ever run gpu test in that docker image? :)
The docker image can't have GPUs, you can't download them :)
yeah, that's why we use nvidia-docker
https://github.com/NVIDIA/nvidia-docker , looking at the diagram here will make things clear. Docker is not a VM, its an image (note, the terminology has been incorrectly used in past)
revert back to the old docker files like you did in this PR
nope, setting up CircleCI to properly use the gpu images we build is cumbersome, so we are going to use AWS jenkins setup soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we inherit from those earlier? did you use nvidia-smi?
No since there is no cuda driver on the CPU-only CI machines.
However we have always had cross-compilation tests on CPU-only CI that were testing simple parts of the CUDA toolkit.
In particular you don't need a physical GPU to generate PTX and see whether compilation passes (e.g. NVRTC + CUB)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can continue doing that, no need to start from NVIDIA images. the toolkit is installed in the images as you also pointed out. Nvidia-docker images are useless if you don't run CI gpu tests with it.
c7b77e2
to
ef932e1
Compare
name, backward_name = get_tc_names_from_kwargs(**kwargs) | ||
kwargs.pop("name", None) | ||
backward = True if backward_name is not None else False | ||
hash_key = get_tc_hash_key(name, *input_tensors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with this you are actually signing up for trouble as there is no guarantee that what you cache is correct in the long run.
This should work fine for what you are currently doing with it but you should plan to pybind export functions (that we will need to expose) from here.
The reason is that the caching logic will depend on output tensor sizes, input/output tensor striding, allocation alignment many other fun things as we start digging deeper into optimizations.
There is nothing to change for now but it is just a friendly warning that you should not rely too much on your own framework-specific caching mechanism since this seems quite central to your autotuner wrapper logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no support for strided tensors at the moment. When that is added and the task is closed for that, the python API will be modified accordingly.
tensor_comprehensions/tc_unit.py
Outdated
def __call__(self, *inputs, **kwargs): | ||
r"""Runs the define TC language on given inputs. | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there seems to be a lot of duplicate between the __call__
and the autotune
comments, is there a way to compress a bit?
Nice and sweet, python API is a very lightweight wrapper around the core C++ API so that will be as future-proof as it can. I like it. There is too much python side handling of caching logic to my taste but we haven't provided you with the proper C++ functions to do better for now; so we should iterate to avoid you building more logic that some day will break. There are still concerns about:
3 is a blocker 1 should be a blocker but I can make my peace with it 2 should disappear completely in a subsequent PR, only risk is that we will disseminate non-best-practice code for a bit, not a blocker in my opinion but it may raise some unhappiness on the python side in the future Let's see what @ftynse says I think he said he will make the effort of looking tonight despite the late hour in Europe. And congrats for driving this through on your own! 👍 |
the autotuner caching needs some work, I told @ftynse that I'll port over some functionality to C++ but that's a separate work and not for this PR
NOT BLOCKER User are happy and desire this behavior, they don't want the learning curve of "Options" so it's good if we all make our peace with it.
NOT BLOCKER We already discussed it in much details. Users have given feedback and it doesn't matter to them. The correct fix is to allow handling scalars for bounds inference (task open for this). In the meantime, people can adopt one of the two ways of formatting below OPTION 1
OPTION 2
OR any other option people can come up with
NOT BLOCKER most of the python tests are backed by C++ backend. I will be completing the checks on python front over next days since a proper python test harness is needed. However, this is not a release blocker: I have myself verifieed running torch functions and comparing tc output on my command line + some users have also verified their outputs are correct. Also people will want to write operations that probably don't have reference implementation (we know of one example already) and TC produced correct outputs still. But that is not a guarantee for anything however, most of the heavyweight layers have already been tested properly so I am confident. NONE of the above concerns should be a blocker. THey rather seem to be enhancements to me. The python packages are ready. Things will never be perfect but we are properly documenting what works and what doesn't work. |
tensor_comprehensions/README.md
Outdated
|
||
## Communication | ||
|
||
* **Email**: prigoyal@fb.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensorcomp@fb.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section is just not needed, I'll remove it, the communication will go on the README page
@ftynse I had a quick sync with @nicolasvasilache just now about all the points above. We have agreed that the the 3rd point is a nice to have thing and I'll work on it after the release or as I find time over this week. We have also agreed that I'll wait until midnight for your review (midnight = EST midnight), since you are at conference until Wed and the release has been planned for Wed for few weeks, so it's fine that I merge this PR at midnight. You can still leave your review comments and I'll try to address them in separate PRs. |
0e756e6
to
e908c24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have expressed my two concerns several times already, let me reiterate one last time.
-
Mapping options must not be implicit, even with the warning. Bold assumption: nobody reads warnings. Demonstration: we decided not to compile TC with
-Werror
because of the number of errors it produced (we tried after spending days of time to catch some tricky memory problem). That is, we don't care about warnings, so we should not think users will. There are individual places which make it worse, in particualr the matrix multiplication example in the documentation, which is the absolute worse case for TC performance even with options. -
Additional "syntax" features in python, in particular constant substitution, are a bad idea. While we indeed want a C++-template-like behavior, it should be implemented in the language, not as a hack around. The arguments are the same as for including templates in the C++ language as a compile-time generalization mechanism instead of the C preprocessor. If this is really a dearly wanted feature, it can be easily implemented as a separate call without making the user believe such substitution is a core feature of TC language.
I believe, the decisions taken on both points go against the line 2 in python philosophy aka PEP 20: "explicit is better than implicit". They make additional, optional behavior implicit. I also believe that, once public, any changes to this behavior will become significantly more complicated because people will start to rely on the existing behavior, however illogical and implicit it is.
While these decisions seem to be maked on some user feedback, I strongly believe we should not just satisfy whatever requests we receive. There is a whole methodology on conducting user research, ranging from interviews to focus groups, to participatory design, to controlled experiments...
Finally, I noticed this in the discussion of the original release blog post: many people interpreted TC as "maching learning DSL with CUDA codegen and autotuner", in a sense that they think the autotuner applies to, say, moving bits of CUDA code around. Even despite the fact that half of the post was about the "transformative" parts of TC, that is Halide and polyhedral compilation. I find it disturbing that the python README has a paragraph selling the TC bindings and not a single mention of either Halide or polyhedra, independently of me working on the polyhedral part. People want to have some magic tool that gives them better kernels, but sorry TC is not magic.
@@ -174,8 +174,6 @@ endif() | |||
|
|||
################################################################################ | |||
# ATen | |||
|
|||
# ATen - if someone ships libATen.so, we try to use that if available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do but this has been more properly highlighted in next steps when we are searching for libaten in pytorch install
@@ -0,0 +1,35 @@ | |||
package: | |||
name: isl-tc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this called isl-tc
and not just isl? And before somebody says that there is already a conda package for isl, why isn't this a problem for protobuf below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it specific to ISL, I think there was a comment long back when I was writing these scripts that we should make things as tc like ppcg-tc, etc so that we don't confuse with the other actual repos
setup.py
Outdated
['git', 'rev-parse', 'HEAD'], cwd=TOP_DIR | ||
).decode('ascii').strip() | ||
except (OSError, subprocess.CalledProcessError): | ||
git_version = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just allowed to proceed with None
as git version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I'll clean this up though, we might prefer to rather fail
################################################################################ | ||
# Copy the proto files only | ||
################################################################################ | ||
copyfile("src/proto/compcache.proto", "tensor_comprehensions/compilation_cache.proto") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @nicolasvasilache on the location of .proto
files, they don't really belong to src/
and should be promoted. I suppose, we can then ship proto/
in conda without looking in src/
and without copying.
@prigoyal has a point about protoc
using decode
as name for reading files. Because we say options are stored as protobuf, let's stick with protobuf lingo for them.
if (*pargc == 0) { | ||
return true; | ||
} | ||
// TODO: (prigoyal): we need to do some filtering on flags here, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open an issue for tracking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, regarding proto/
we only ship them in conda
and not look at src/
, but like you said, this should probably be under TensorComprehensions/proto
maybe or anywhere else
[](tc::MappingOptions& instance, const std::string& type) { | ||
instance.outerScheduleFusionStrategy(type); | ||
}, | ||
"Require TC to try and execute different TC expressions interleaved (Max), separately (Min)\nor interleaved as long as sufficient parallelism is exploited (Preserve3Coincident) by\nperforming loop fusion and fission. Applies before tiling") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Applies before tiling" is incorrect. It applies to tile loops, i.e. the loops that enumerate tiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will correct this
}, | ||
"Set up outerScheduleFusionStrategy and intraTileFusionStrategy to the given value") | ||
.def( | ||
"outerScheduleFusionStrategy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intraTileFusionStrategy
seems not exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
|
||
if options is None: | ||
options = Options("naive") | ||
logger.warning("No mapping options passed, 'naive' type mapping options will be used and will likely have bad performance. See help(your_layer.__call__) for setting mapping options.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already expressed concerns about default options. My motivation is double-fold: (1) I want people to know that there is the thing called "polyhedral transformation and mapping" that happens inside, they don't have to understand what it does, but it helps me promote my work (yes, I'm that selfish); (2) we already saw people skeptical about the performance we can achieve, so trying the default thing and failing to perform will be most certainly used against us; I can go as far as expect a paper that claims to compare against TC while not using the mapping options at all.
I was never convinced by the warning approach. At some point in history, I was writing analyses and warnings for clang and saw how much people read those... I also don't think typing out or copy-pasting "MappingOptions.naive()" has a steep or any learning curve whatsoever. Copy-pasting code from stack overflow without a slightest clue of how it works is like half of software engineering.
The current "naive" options work fine and don't fail compilation
If they eventually do, I decline any responsibility on fixing the backend.
and the :attr:`lang` should contain both forward and backward TC | ||
strings. | ||
|
||
constants (dict, optional): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already complained about this implicit constant insertion that uses python syntax. With the introduction of the yaml file, it only got worse. It will be looked at as example of TC in isolation of python and its string replacement. Why can't you have two separate calls with clear functionality:
generic_tc = "def whatever() {{ code with {python stuff} }}"
preprocessed_tc = tc.substitute(generic_tc, {"python stuff": 42})
tc.define(preprocessed_tc)
This entire situation is the same choice as C preprocessor vs C++ templates for compile-time behavior. C++ committee still tries to clean up the damage from the preprocessor and pretending it was not a part of the language.
@@ -0,0 +1,14 @@ | |||
# Copyright (c) 2017-present, Facebook, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need an empty file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how python packaging
works
Regarding the choice of the options listed above, this is a clear example of false dichotomy. You don't have to make the users write individual substitutions with regular expressions in case when they are not embedded in the Something like # inside our code
def substitute(lang, argMap):
for key in argMap:
lang = re.sub('${{{0}}}'.format(key) , str(argMap), lang)
return lang
# in user code
macro = """
def whatever() -> (O) {
O(0) = ${foo}
}
"""
lang = substitute(macro, {"foo": 42})
tc.define(lang, ...) you can also use kwargs for substitute in this case ;) |
sure, that's just a matter of whether we want to make the extra call for substitute or not. TC define call can do that substitution and takes the dictionary of arguments. From the feedback from @soumith about using the constants dictionary. I think this is largely empirical and I'll add an option to also do the how about this:
|
I'm totally willing to pay the price of one extra explicit call to get separation of concerns: TC language is what is defined by the BNF in the documentation, substitution is done separately because TC language does not support it. |
Why can't substitution rely on the native Python |
@jekbradbury the right fix is really to be able to infer the bounds in
but this isn't easy to fix atm due to how our computation model is right now, how we allocate tensors and scope we compile things in. A couple of things we wanted to make sure in this API:
We can use any kind of substitution: the python I'll add a test case using the |
@jekbradbury you would still have to do the following:
i.e. still use the However, this is the thing that will stick around for ~1 month and is a temporary workaround for now. Although using the |
Ohhh of course, that clash of curly-brace semantics is annoying but seems kind of inevitable. |
yep, for now, it is :) but we are working on it and will put a giant WARNING / NOTE that this is temporary solution and will be away in 1 month or so :) |
e908c24
to
b3208a9
Compare
Re. fake templating by string replacement I strictly prefer the
@jekbradbury reg. the above, any logic implemented outside of C++ that people risk starting to rely upon is a liability (do I really need Python when I start optimizing for older Android phones). See also @ftynse's comment about what TC is and isn't. Additionally, this is an opportunity to reintroduce all the copy-pasta issues. This is likely a symptom not yet having found the proper abstraction to express such things, whether actual templates in the language, a proper TcBuilder API or something else that we would love other's feedback on. The immediate consequence is that whatever is implemented atm for this particular point is subject to change. |
a1def65
to
3daec20
Compare
I agree and I would still say that the proper solution is to have bounds inference from scalar working and then we can avoid all sorts of custom bindings .
There aren't any constraints to incur big technical debts. @ftynse is saying that we need to build the better header library in C++ backend which doesn't exist right now. Rather the TC defs that have been defined in python side in the library file.
@nicolasvasilache can you point out where is the copy-pasta in this? I would like to clean that up :)
no you don't, this will be addressed when the TC defs are moved to the library header (which TC codebase itself doesn't respect atm) |
3daec20
to
77e5211
Compare
landing this now, CI is green, all comments from @ftynse have been addressed as well. Acknowledging two things at the time of landing this:
|
Anything related to the TC language should be (eventually) done in the TC parser, and absolutely cannot rely on anything python-specific. But I am positive that internally the substitution in |
I don't think TC parser should have anything special to support whatever upstream string manipulation. We should think and choose a proper syntax for template-style substitutions, that's all. |
yes, that's correct :) it's one line for
I meant that if you do just one braces like this:
and try to format it, it won't work and the reason is the |
This PR adds PyTorch integration with TC. Here is what the PyTorch-TC package provides:
torch.*Tensor
stest_python/layers/
To make it easy to use TC, we provide conda packages for it. We will provide instructions on how to install the conda package soon. Documentation will also be made available in few hours.
Edited the CodeOwners.md file to document the python bindings, conda recipes properly. Also rearranged the bullet points so that build related sections are together, docs are together, framework sections are together etc. If anyone has suggestions for improvements, please let me know :)