LLM
Fine-Tuning LLM and embedding models
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
Repo accompanying PEFT/LoRA article.
FinSight - Financial Insights at Your Fingertip: FinSight is a cutting-edge AI assistant tailored for portfolio managers, investors, and finance enthusiasts. It streamlines the process of gaining c…
DSPy: The framework for programming—not prompting—language models
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
LLM papers I'm reading, mostly on inference and model compression
AirLLM 70B inference with single 4GB GPU
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Distribute and run LLMs with a single file.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.
Code for the video on feed-forward language model
[ECCV 2024] Tokenize Anything via Prompting
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
WhisperPlus: Faster, Smarter, and More Capable 🚀
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Best practices for distilling large language models.
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
A lightweight UI for interacting with the Zoo Text-to-CAD API.
This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resulting model is meant to follow instructions and chat in Hindi and…
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
