Skip to content

GPT-QModel v4.2.0

Choose a tag to compare

@Qubitium Qubitium released this 12 Sep 08:02
· 969 commits to main since this release
c0c3569

Notable Changes

  • Add Qwen3-Next by @Qubitium and @LRL-ModelCloud in #1787
  • Add Apertus support by @LRL-ModelCloud in #1767
  • Add Kimi k2 support by @LRL-ModelCloud in #1768
  • Add Klear support by @LRL-ModelCloud in #1769
  • Add FastLLM support by @LRL-ModelCloud in #1771
  • Add Nemotron H support by @LRL-ModelCloud in #1773
  • Add fail_safe option by @LRL-ModelCloud in #1775
  • Use threading lock to protect unsafe tensor moves in multi-gpu by @Qubitium in #1778
  • Avoid building experimental extensions to reduce wheel size by @Qubitium in #1763

What's Changed

  • Fix LlavaQwen2GPTQ by @LRL-ModelCloud in #1772
  • Fix Q.to on multi-gpu gptq when proceeding fast and has many experts and gpus by @avtc in #1774
  • Bump actions/setup-python from 5 to 6 in the github-actions group by @dependabot[bot] in #1758
  • [CI] fix release jobs were skipped by @CSY-ModelCloud in #1759
  • ignore compile warns about var declared but not used by @Qubitium in #1760
  • allow prebuilt wheel path to be customized via env by @Qubitium in #1761
  • add build toggles for all cpp kernels by @Qubitium in #1764
  • fix multi gpu inference by @LRL-ModelCloud in #1762
  • [CI] reduce wheel download size by @CSY-ModelCloud in #1765
  • start 4.2.0-dev cycle by @Qubitium in #1766
  • fix klear by @LRL-ModelCloud in #1770
  • FIX transformers >= 4.56.1 force changed torch.default_dtype by @Qubitium in #1779
  • fix multi gpu fail_safe by @LRL-ModelCloud in #1780
  • fix device instance by @LRL-ModelCloud in #1783
  • prepare for 4.2 release by @Qubitium in #1785

Full Changelog: v4.1.0...v4.2.0