[OSDI'23]AdaEmbed #16

Monstertail · 2023-07-05T01:16:30Z

[OSDI'23]AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models

Notes in Chinese： In Zhihu（知乎）
Notes in English： In my Notion
How to read a paper:
Step 1: Keep in mind
-What problem does this paper try to solve?
-Why is this an important and hard problem?
-Why can’t previous work solve this problem?
-What is novel in this paper?
-Does it show good results?

Step 2: Summarize
- Summary for high-level ideas
  - Reduce the size of embeddings needed for the same DLRM accuracy via in-training embedding pruning
  - < = > For the given embedding size, AdaEmbed scalably identifies and retains embeddings that have larger importance to model accuracy at particular times during training.
- Problems/Motivations: what problem does this paper solve?
  - While more embedding rows typically enable better model accuracy by considering more feature instances, they lead to large deployment cost and slow model execution.
  - Key insight is that the access patterns and weights of different embeddings are heterogeneous across embedding rows, and dynamically change over the training
    process, implying varying embedding importance with respect to model accuracy
- Challenges: why is this problem hard to solve?
  - DLRMs often have stringent throughput and latency requirements for (online) training and inference, but gigantic embeddings make computation , communication and memory optimizations challenging
    - To achieve desired model throughput, practical deployments often have to use hundreds of GPUs to hold embeddings.
  - Designing better embeddings (e.g., number of per-feature embedding rows and which embedding weights to retain) remains challenging because the exploration space increases with larger embeddings and requires intensive manual efforts
- Methods: what are the key techniques in the paper?
  - AdaEmbed considers embeddings with higher runtime access frequencies and larger training gradients to be more important, and it dynamically prunes less important embeddings at scale to automatically determine per-feature embeddings.
    - challenge 1: Identifying important embeddings out of billions is non-trivial.
      - Embedding Monitor: Identify Important Embedding(by access frequency and L2-norm of gradients)
    - challenge 2: Enforcing in-training pruning after identifying important embeddings is not straightforward either
      - AdaEmbed Coordinator: Prune at Right Time(trade-offs between pruning overhead and quality)
      - Memory Manager: Prune Weights at Scale( Virtually Hashed Physically Indexed is used to reduce memory reallocation)

Monstertail added the System4ML System for ML label Jul 5, 2023

Monstertail changed the title ~~[OSDI'23]AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models~~ [OSDI'23]AdaEmbed Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSDI'23]AdaEmbed #16

[OSDI'23]AdaEmbed #16

Monstertail commented Jul 5, 2023 •

edited

Loading

[OSDI'23]AdaEmbed #16

[OSDI'23]AdaEmbed #16

Comments

Monstertail commented Jul 5, 2023 • edited Loading

[OSDI'23]AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models

Monstertail commented Jul 5, 2023 •

edited

Loading