Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a PyTorch to Relay parser #63

Merged
merged 1 commit into from
Jan 24, 2020
Merged

Conversation

alexwong
Copy link

@alexwong alexwong commented Nov 19, 2019

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

Support PyTorch natively in TVM by providing a relay parser. Adapted from #23 and currently only supports traced models (no control flow) and probably only image classification models (provided test uses torchvision impl to test against).

Like other frontends, grab the relay module and paramaters to build via:
mod, params = relay.frontend.from_pytorch(trace, input_shapes)

Tested against torchvision models in the included test_forward.py. Will write a discussion post in discuss.tvm.ai to see if we want this upstream.

@alexwong alexwong changed the title Initial commit of PyTorch to Relay parser Add a PyTorch to Relay parser Nov 19, 2019
@alexwong
Copy link
Author

Having issues installing torch and torchvision in CI/CD so tests can be run. Will continue looking at it.

@alexwong alexwong changed the title Add a PyTorch to Relay parser [WIP] Add a PyTorch to Relay parser Nov 27, 2019
@alexwong
Copy link
Author

WIP for now, looks like there may be some bugs.

@alexwong alexwong force-pushed the pytorch-relay branch 2 times, most recently from d093bf8 to 14cf351 Compare December 3, 2019 21:05
@alexwong alexwong changed the title [WIP] Add a PyTorch to Relay parser Add a PyTorch to Relay parser Dec 11, 2019
@alexwong
Copy link
Author

alexwong commented Dec 11, 2019

@kevinthesun @zhiics @yongwww Could you guys take a look at this sometime?

*Also please ignore the changes to the CI scripts. I had some issues trying to get the CI to pass but it may have been unrelated (PT is installed here: https://github.com/neo-ai/tvm/blob/dev/docker/install/ubuntu_install_onnx.sh)

python/tvm/relay/frontend/pytorch.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch.py Outdated Show resolved Hide resolved
@alexwong alexwong force-pushed the pytorch-relay branch 2 times, most recently from c620491 to be96d1f Compare December 20, 2019 21:15
Copy link

@kevinthesun kevinthesun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@zxy844288792 zxy844288792 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tests/scripts/task_python_vta.sh Outdated Show resolved Hide resolved
tests/scripts/task_python_unittest.sh Outdated Show resolved Hide resolved
tests/scripts/task_python_integration.sh Outdated Show resolved Hide resolved
tests/python/frontend/pytorch_neo/test_forward.py Outdated Show resolved Hide resolved
tests/python/frontend/pytorch_neo/test_forward.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Outdated Show resolved Hide resolved
@yongwww
Copy link

yongwww commented Jan 9, 2020

@zhiics pls help take a look

@alexwong alexwong force-pushed the pytorch-relay branch 3 times, most recently from 6522294 to c8d690e Compare January 14, 2020 00:16
Copy link

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort. I have a general question, which is about the handling of dynamic models. I think the current structure or handling of ops would have the same/similar problem to the TF frontend parser when handling dynamic models -- we need to infer value/shape everywhere. I agree we don't see this in most image classification model, but we probably need to be aware of it when extending to models involving dynamic shape.

@yongwww @kevinthesun how do you guys think?

python/tvm/relay/frontend/pytorch_neo.py Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Show resolved Hide resolved
@alexwong alexwong force-pushed the pytorch-relay branch 2 times, most recently from 9066acf to d266abf Compare January 15, 2020 19:42
@alexwong
Copy link
Author

Any more feedback? I responded to everything except the question regarding dynamic shape which I'm not sure about. Anywhere we need shape, we tend to use the infer_shape pass. The only thing I'm slightly concerned about is that VGG and AlexNet are kind of flaky in terms of accuracy.

@zhiics
Copy link

zhiics commented Jan 16, 2020

@alexwong Dynamic shape would be the a separate issue that we will work on incrementally. I brought it up just because I wanted to make sure that design would not change much when we handle dynamic information in the future.

Could you please elaborate a little more about the accuracy problem? What is flaky? And can you please dig a bit more into the details because I think flaky accuracy on real model might cause problems to the service in the end.

@alexwong
Copy link
Author

There's been some issues with vgg and alexnet failing due to bad accuracy. I can take another look but it's kind of hard to debug at the moment as it doesn't happen when running locally on CPU and doesn't happen consistently when running on the CI.

@zhiics
Copy link

zhiics commented Jan 16, 2020

How frequently it happens in the CI? We can checkout the docker image to reproduce if it never happens on the local machine. Lets discuss offline.

@yongwww
Copy link

yongwww commented Jan 16, 2020

@alexwong it's possible to checkout the ci container to reproduce flaky on your own ec2 instance. The flaky issue should be fixed if it occurs frequently.

I will take a look at this pr again later today.

@alexwong
Copy link
Author

alexwong commented Jan 16, 2020

Okay, trying to reproduce on an ec2 instance with the docker image. I would say it happens less than half the time (rough guess). It is odd that this only recently happened and I don't think there's been too many functional changes in the code. Part of the reason could be from pulling in the upstream merge here.

@alexwong
Copy link
Author

I'm able to reproduce the failures with the ci-gpu container. Am having trouble diagnosing the issue though. Comparing the initial PT graph and produced relay module for passes and failures, there is no difference so my assumption would be that the problem is most likely on the operator implementation level. Will compare resulting output after each operator/layer and compare between PyTorch and TVM to figure out where it's going wrong. Will require some rework on the PT side to return the output at each layer in the forward function.

@alexwong
Copy link
Author

I'm still looking into the VGG issue but @zxy844288792 talked to Vin and he said we can merge this without the fix and have that as a separate issue as it only occurs when using the settings in ci-gpu and doesn't occur in our service. It may be good to temporarily comment out VGG in the test so it doesn't cause other PRs to fail CI occasionally though. Can we finalize the review and merge this? @yongwww

@alexwong alexwong force-pushed the pytorch-relay branch 3 times, most recently from 3d366cd to b83978c Compare January 23, 2020 04:02
@alexwong alexwong requested a review from yongwww January 23, 2020 20:40
python/tvm/relay/frontend/pytorch_neo.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/pytorch_neo.py Show resolved Hide resolved
@yongwww
Copy link

yongwww commented Jan 23, 2020

@alexwong I am okay for commenting out vgg temporarily, filed an issue #80 to track it.

I agree with @zhiics about the infer_shape for dynamic inputs. Considering we don't have dynamic requirements/customer needs/tests for pt model currently, it is okay for me to skip this part.

Overall, lgtm, @alexwong pls fix all comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants