- Mountain View, CA
- http://jasonansel.com/
- @jansel0
Block or Report
Block or report jansel
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
-
pytorch/torchdynamo Public
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
-
pytorch/pytorch Public
Tensors and Dynamic neural networks in Python with strong GPU acceleration
-
1,094 contributions in the last year
Contribution activity
August 2022
Created 14 commits in 1 repository
Created a pull request in pytorch/torchdynamo that received 18 comments
[inductor] Reduce excessive view materialization
Fixes #595 For this example from @ngimel: https://gist.github.com/ngimel/1a7156a98a5bebab31e15f5b5f6b222d --- before 2022-08-03 14:26:38.528766870 …
+49
−23
•
18
comments
Opened 13 other pull requests in 1 repository
pytorch/torchdynamo
13
merged
- [inductor] Put 3D tiling behind a config
- [inductor] Improve node sort order
- [inductor] Don't update device_types for optimized-away constants
- [inductor] New tiling heuristic
- [inductor] Disable cudagraphs in backwards if it failed in forwards
- [inductor] Support aten.scalar_tensor
- [inductor] Create implicit fallbacks for unhandled ops
- [inductor] Small fixes for huggingface models
- [inductor] Add aten.roll lowering
- [inductor] Support constant_pad_nd for 1D/3D padding
- Support torch.jit.annotate
- [inductor] Deduplicate iteration ranges
- [inductor] Tile after fusion
Reviewed 54 pull requests in 2 repositories
pytorch/torchdynamo
25 pull requests
- Autotune best layout for conv
- don't tile broadcasted tensors, those tiles are not needed
- [inductor] Lowering aten.reflection_pad2d_backward
- Set up per-operator input database, per-operator microbenchmarking
- CI run of accuracy-aot-nop on torchbench
- improve splits heuristics to handle outer reductions
- add allow_tf32 option in torchinductor.triton_ops.matmul
- [inductor] New tiling heuristic
- Associate fx node as origins on IRNode, ensure all nodes going to scheduling have an associate FX Node
- [inductor] Fix a dead buffer removal bug
- Delete python key mode
- [WIP] [RFC] - Gathering feedback on approach - Limited control flow operator
- Autotune framework for conv and mm
- Remove context manager - 2/n; support nn.Modules, run
- [inductor] Add type conversion for div when both operands are integers
- Move aot_cudagraphs backend here
- make precision args float16/float32/amp mutually exclusive
- changed pre-fuse heuristic a bit
- [inductor] Small fixes for huggingface models
- Readme - Remove the context manager and more information about backends
-
Add
masked_fill.Tensor
- Prefuse (pointwise) nodes before scheduling
- [inductor] Add aten.roll lowering
- Bug fixes in runner for batch size
- Add optional unspecializing int/float and heuristic
- Some pull request reviews not shown.
pytorch/pytorch
12 pull requests
- extend torch.ones to handle tuple inputs
- Added list clearing codegen to AOTAutograd (hidden behind config.aot_clear_list
- prevent graph mutation in constraint generation
- expand torch.full to reason about integers
- fix bug in a linear constraint
- add the nessesairy constraints for the next 5 benchmarks
- add the operations needed for electra model
- store parameter values as static shapes during constraint generation
- rule for bypassing scalars
- add constraints for layer_norm function
- linear constraints
- Embedding rule for TorchDynamo
Created an issue in pytorch/torchdynamo that received 10 comments
Investigate "skipping cudagraphs due to multple devices" issue
./benchmarks/torchbench.py -dcuda --float16 --inductor --training -k hf_Bart
cuda train hf_Bart WARNING:root:using triton random, expect differenc…
10
comments
Opened 9 other issues in 2 repositories
pytorch/torchdynamo
1
closed
7
open
-
hf_Bart
assert dst.data_ptr() == src.data_ptr()
error - min_cut_rematerialization_partition error from pytorch_struct
- Improve Triton constant CPU overheads in TorchInductor
-
We should rename
nopython=True
mode - Inductor training hf_GPT2: illegal memory access was encountered
- Ptx error too much parameter space from legacy_senet154
- Finish implementing inductor dynamic shape guards
- Use opinfo testing in TorchInductor