Skip to content

Conversation

shewu-quic
Copy link
Collaborator

  • Get required spillFillBufferSize from context binary and set to compiler_spec
  • Quantize embedding op in qnn.
  • If enable multi-contexts, maxSpillFillBuffer could not set to zero.

- Get required spillFillBufferSize from context binary and set to compiler_spec
- Quantize embedding op in qnn.
- If enable multi-contexts, maxSpillFillBuffer could not set to zero.
Copy link

pytorch-bot bot commented Oct 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5925

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit b974e0a with merge base c06a708 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024
@shewu-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR is to refine how to set spill fill buffer and quantize the embedding op.
The purpose is to run llama3 instruct with 128 seq_len on 12GB device.
Could you please have a look on it? Thank you!

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Oct 8, 2024

Hey can you apply this patch to the PR? There are some lint failures.

diff --git a/backends/qualcomm/runtime/backends/QnnBackendCache.h b/backends/qualcomm/runtime/backends/QnnBackendCache.h
index c492a3d1d..074092bc1 100644
--- a/backends/qualcomm/runtime/backends/QnnBackendCache.h
+++ b/backends/qualcomm/runtime/backends/QnnBackendCache.h
@@ -24,8 +24,8 @@ class QnnBackendCache {
     ONLINE_PREPARE = 3,
   };
   explicit QnnBackendCache(const QnnExecuTorchContextBinary& qnn_context_blob)
-      : qnn_context_blob_(qnn_context_blob){};
-  virtual ~QnnBackendCache();
+      : qnn_context_blob_(qnn_context_blob){}
+  virtual ~QnnBackendCache() = 0;
   QnnBackendCache(const QnnBackendCache&) = delete;
   QnnBackendCache(QnnBackendCache&&) = delete;
   QnnBackendCache& operator=(const QnnBackendCache&) = delete;
@@ -57,7 +57,7 @@ class QnnBackendCache {
   virtual Error RetrieveBackendBinaryInfo(
       const QnnSystemContext_BinaryInfo_t* binaryinfo) {
     return Error::Ok;
-  };
+  }

  private:
   Error GetQnnGraphInfoFromBinary();
diff --git a/backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h b/backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h
index 117704f6e..8bf20af37 100644
--- a/backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h
+++ b/backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h
@@ -13,9 +13,9 @@ namespace executor {
 namespace qnn {
 class HtpBackendCache : public QnnBackendCache {
  public:
-  HtpBackendCache(const QnnExecuTorchContextBinary& qnn_context_blob)
+  explicit HtpBackendCache(const QnnExecuTorchContextBinary& qnn_context_blob)
       : QnnBackendCache(qnn_context_blob), spill_fill_buf_(0) {}
-  ~HtpBackendCache() {}
+  ~HtpBackendCache() = default;

   uint64_t GetSpillFillBufferSize() {
     return spill_fill_buf_;

@cccclai cccclai mentioned this pull request Oct 8, 2024
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Oct 8, 2024
Summary: Copy of pytorch#5925 and address the constructor/destructor issues

Differential Revision: D64055108
@cccclai
Copy link
Contributor

cccclai commented Oct 8, 2024

I cherry pick you PR to #5989 and make changes on top of it. Also rebase to solve some linter breakage in your base commit and file #5995 to resolve the another lint issue.

public:
HtpBackendCache(const QnnExecuTorchContextBinary& qnn_context_blob)
: QnnBackendCache(qnn_context_blob), spill_fill_buf_(0) {}
~HtpBackendCache() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect HtpBackendCache do nothing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe it doesn’t need to do anything further.

@cccclai
Copy link
Contributor

cccclai commented Oct 9, 2024

#5989 is merged and your name will be the author for this commit.

@shewu-quic
Copy link
Collaborator Author

shewu-quic commented Oct 9, 2024

I cherry pick you PR to #5989 and make changes on top of it. Also rebase to solve some linter breakage in your base commit and file #5995 to resolve the another lint issue.

Thank you very much!
So, I could close this PR?

@cccclai
Copy link
Contributor

cccclai commented Oct 9, 2024

I cherry pick you PR to #5989 and make changes on top of it. Also rebase to solve some linter breakage in your base commit and file #5995 to resolve the another lint issue.

Thank you very much! So, I could close this PR?

Yeah this PR can be closed.

@shewu-quic shewu-quic closed this Oct 9, 2024
@cccclai cccclai mentioned this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants