Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OSDI'23]AdaEmbed #16

Open
Monstertail opened this issue Jul 5, 2023 · 0 comments
Open

[OSDI'23]AdaEmbed #16

Monstertail opened this issue Jul 5, 2023 · 0 comments
Labels
System4ML System for ML

Comments

@Monstertail
Copy link
Owner

Monstertail commented Jul 5, 2023

[OSDI'23]AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models

Notes in Chinese: In Zhihu(知乎)
Notes in English: In my Notion
How to read a paper:
Step 1: Keep in mind
-What problem does this paper try to solve?
-Why is this an important and hard problem?
-Why can’t previous work solve this problem?
-What is novel in this paper?
-Does it show good results?

  • Step 2: Summarize
    • Summary for high-level ideas
      • Reduce the size of embeddings needed for the same DLRM accuracy via in-training embedding pruning
      • < = > For the given embedding size, AdaEmbed scalably identifies and retains embeddings that have larger importance to model accuracy at particular times during training.
    • Problems/Motivations: what problem does this paper solve?
      • While more embedding rows typically enable better model accuracy by considering more feature instances, they lead to large deployment cost and slow model execution.
      • Key insight is that the access patterns and weights of different embeddings are heterogeneous across embedding rows, and dynamically change over the training
        process, implying varying embedding importance with respect to model accuracy
    • Challenges: why is this problem hard to solve?
      • DLRMs often have stringent throughput and latency requirements for (online) training and inference, but gigantic embeddings make computation , communication and memory optimizations challenging
        • To achieve desired model throughput, practical deployments often have to use hundreds of GPUs to hold embeddings.
      • Designing better embeddings (e.g., number of per-feature embedding rows and which embedding weights to retain) remains challenging because the exploration space increases with larger embeddings and requires intensive manual efforts
    • Methods: what are the key techniques in the paper?
      • AdaEmbed considers embeddings with higher runtime access frequencies and larger training gradients to be more important, and it dynamically prunes less important embeddings at scale to automatically determine per-feature embeddings.
        • challenge 1: Identifying important embeddings out of billions is non-trivial.
          • Embedding Monitor: Identify Important Embedding(by access frequency and L2-norm of gradients)
        • challenge 2: Enforcing in-training pruning after identifying important embeddings is not straightforward either
          • AdaEmbed Coordinator: Prune at Right Time(trade-offs between pruning overhead and quality)
          • Memory Manager: Prune Weights at Scale( Virtually Hashed Physically Indexed is used to reduce memory reallocation)
@Monstertail Monstertail added the System4ML System for ML label Jul 5, 2023
@Monstertail Monstertail changed the title [OSDI'23]AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models [OSDI'23]AdaEmbed Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
System4ML System for ML
Projects
None yet
Development

No branches or pull requests

1 participant