Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding fused uint4x2_mixed_mm to inductor #106516

Closed
wants to merge 9 commits into from

Commits on Aug 3, 2023

  1. int4x2 WIP

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    [ghstack-poisoned]
    HDCharles committed Aug 3, 2023
    Configuration menu
    Copy the full SHA
    7892259 View commit details
    Browse the repository at this point in the history
  2. Update on "int4x2 WIP"

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov
    
    [ghstack-poisoned]
    HDCharles committed Aug 3, 2023
    Configuration menu
    Copy the full SHA
    590d45c View commit details
    Browse the repository at this point in the history
  3. Update on "adding fused uint4x2_mixed_mm to inductor"

    Summary: this is needed for int4 weight-only quantization, we're
    matching on the specific unpack operation that unpacks the uint4x2 into
    int4's so we can have a fused kernel for it.  note, even if the user
    isn't specifically doing this, the two operations are mathematically
    equilvanet so it won't cause issues. Ideally
    at some point full prologue fusion for the mm arguments would be able to
    handle this chain but until then, this type of kernel is needed.
    
    Test Plan:
    
    python test/inductor/test_pattern_matcher.py -k "uint4x2"
    print test/inductor/test_torchinductor.py -k "uint4x2"
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    [ghstack-poisoned]
    HDCharles committed Aug 3, 2023
    Configuration menu
    Copy the full SHA
    59ad453 View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2023

  1. Update on "adding fused uint4x2_mixed_mm to inductor"

    Summary: this is needed for int4 weight-only quantization, we're
    matching on the specific unpack operation that unpacks the uint4x2 into
    int4's so we can have a fused kernel for it.  note, even if the user
    isn't specifically doing this, the two operations are mathematically
    equilvanet so it won't cause issues (for some reason int8 bitwise logic
    in triton and pytorch doesn't match so that's the only exception). Ideally
    at some point full prologue fusion for the mm arguments would be able to
    handle this chain but until then, this type of kernel is needed.
    
    Test Plan:
    
    python test/inductor/test_pattern_matcher.py -k "uint4x2"
    print test/inductor/test_torchinductor.py -k "uint4x2"
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    [ghstack-poisoned]
    HDCharles committed Aug 10, 2023
    Configuration menu
    Copy the full SHA
    9d43739 View commit details
    Browse the repository at this point in the history
  2. Update on "adding fused uint4x2_mixed_mm to inductor"

    Summary: this is needed for int4 weight-only quantization, we're
    matching on the specific unpack operation that unpacks the uint4x2 into
    int4's so we can have a fused kernel for it.  note, even if the user
    isn't specifically doing this, the two operations are mathematically
    equilvanet so it won't cause issues (for some reason int8 bitwise logic
    in triton and pytorch doesn't match so that's the only exception). Ideally
    at some point full prologue fusion for the mm arguments would be able to
    handle this chain but until then, this type of kernel is needed.
    
    Test Plan:
    
    python test/inductor/test_pattern_matcher.py -k "uint4x2"
    print test/inductor/test_torchinductor.py -k "uint4x2"
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    [ghstack-poisoned]
    HDCharles committed Aug 10, 2023
    Configuration menu
    Copy the full SHA
    546a286 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2023

  1. Update on "adding fused uint4x2_mixed_mm to inductor"

    Summary: this is needed for int4 weight-only quantization, we're
    matching on the specific unpack operation that unpacks the uint4x2 into
    int4's so we can have a fused kernel for it.  note, even if the user
    isn't specifically doing this, the two operations are mathematically
    equilvanet so it won't cause issues (for some reason int8 bitwise logic
    in triton and pytorch doesn't match so that's the only exception). Ideally
    at some point full prologue fusion for the mm arguments would be able to
    handle this chain but until then, this type of kernel is needed.
    
    Test Plan:
    
    python test/inductor/test_pattern_matcher.py -k "uint4x2"
    print test/inductor/test_torchinductor.py -k "uint4x2"
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    [ghstack-poisoned]
    HDCharles committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    55797a6 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2023

  1. Update on "adding fused uint4x2_mixed_mm to inductor"

    Summary: this is needed for int4 weight-only quantization, we're
    matching on the specific unpack operation that unpacks the uint4x2 into
    int4's so we can have a fused kernel for it.  note, even if the user
    isn't specifically doing this, the two operations are mathematically
    equilvanet so it won't cause issues (for some reason int8 bitwise logic
    in triton and pytorch doesn't match so that's the only exception). Ideally
    at some point full prologue fusion for the mm arguments would be able to
    handle this chain but until then, this type of kernel is needed.
    
    Test Plan:
    
    python test/inductor/test_pattern_matcher.py -k "uint4x2"
    print test/inductor/test_torchinductor.py -k "uint4x2"
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    [ghstack-poisoned]
    HDCharles committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    df08d2c View commit details
    Browse the repository at this point in the history
  2. Update on "adding fused uint4x2_mixed_mm to inductor"

    Summary: this is needed for int4 weight-only quantization, we're
    matching on the specific unpack operation that unpacks the uint4x2 into
    int4's so we can have a fused kernel for it.  note, even if the user
    isn't specifically doing this, the two operations are mathematically
    equilvanet so it won't cause issues (for some reason int8 bitwise logic
    in triton and pytorch doesn't match so that's the only exception). Ideally
    at some point full prologue fusion for the mm arguments would be able to
    handle this chain but until then, this type of kernel is needed.
    
    Test Plan:
    
    python test/inductor/test_pattern_matcher.py -k "uint4x2"
    print test/inductor/test_torchinductor.py -k "uint4x2"
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    [ghstack-poisoned]
    HDCharles committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    aae1aad View commit details
    Browse the repository at this point in the history
  3. Update on "adding fused uint4x2_mixed_mm to inductor"

    Summary: this is needed for int4 weight-only quantization, we're
    matching on the specific unpack operation that unpacks the uint4x2 into
    int4's so we can have a fused kernel for it.  note, even if the user
    isn't specifically doing this, the two operations are mathematically
    equilvanet so it won't cause issues (for some reason int8 bitwise logic
    in triton and pytorch doesn't match so that's the only exception). Ideally
    at some point full prologue fusion for the mm arguments would be able to
    handle this chain but until then, this type of kernel is needed.
    
    Test Plan:
    
    python test/inductor/test_pattern_matcher.py -k "uint4x2"
    print test/inductor/test_torchinductor.py -k "uint4x2"
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    [ghstack-poisoned]
    HDCharles committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    a49ff9d View commit details
    Browse the repository at this point in the history