Skip to content

Welcome to the 🐶 InstructLab Project

Banner InstructLab is a model-agnostic open source AI project that facilitates contributions to Large Language Models (LLMs).

We are on a mission to let anyone shape generative AI by enabling contributed updates to existing LLMs in an accessible way.

Our community welcomes all those who would like to help us enable everyone to shape the future of generative AI.

Why InstructLab

There are many projects rapidly embracing and extending permissively licensed AI models, but they are faced with three main challenges:

  • Contribution to LLMs is not possible directly. They show up as forks, which forces consumers to choose a “best-fit” model that isn’t easily extensible. Also, the forks are expensive for model creators to maintain.
  • The ability to contribute ideas is limited by a lack of AI/ML expertise. One has to learn how to fork, train, and refine models to see their idea move forward. This is a high barrier to entry.
  • There is no direct community governance or best practice around review, curation, and distribution of forked models.

InstructLab is here to solve these problems.

The project enables community contributors to add additional "skills" or "knowledge" to a particular model.

InstructLab's model-agnostic technology gives model upstreams with sufficient infrastructure resources the ability to create regular builds of their open source licensed models not by rebuilding and retraining the entire model but by composing new skills into it.

Take a look at "lab-enhanced" models on the InstructLab Hugging Face page.

Get Started with InstructLab

  • Check out the Community README to get started with using and contributing to the project.
  • You may wish to read through the project's FAQ to get more familiar with all aspects of InstructLab.
  • If you want to jump right in, head to the ilab documentation to get InstructLab set up and running.
  • Learn more about the skills and knowledge you can add to models.
  • You can find all the ways to collaborate with project maintainers and your fellow users of InstructLab beyond GitHub by visiting our project collaboration page.
  • When you are ready to make a contribution to the project, please take a few minutes to look over our contribution guidelines to ensure your contribution is aligned with the project policies.

Community Meetings

For folks getting started with all things InstructLab, it may be easiest for you to join one of our community meetings and speak with project maintainers and other InstructLab collaborators live. You can find details on all of our community meetings, including our open office hours each Thursday, in our detailed Project Meetings documentation.

Everyone is welcome and encouraged to attend if they will find value in joining. Please note that some meetings are recorded and the recordings published in our project YouTube channel. The meeting host will advise all attendees if the meeting is being recorded. If you prefer to join camera off or dial in via phone so as to not be actively recorded and/or you prefer not to be on camera during meetings, that is absolutely no problem.

Code of Conduct

Participation in all aspects of the InstructLab community (including but not limited to community meetings, mailing lists, real-time chat, and the project GitHub repos) is governed by our Code of Conduct.

Quick Links

Governance

See the project governance document for an overview of how InstructLab project operates.

Security

Security policies and practices, including reporting vulnerabilities, can be found in our security document.

Read the Paper

InstructLab 🐶 uses a novel synthetic data-based alignment tuning method for Large Language Models (LLMs.) The "lab" in InstructLab 🥼 stands for Large-Scale Alignment for ChatBots [1].

[1] Shivchander Sudalairaj*, Abhishek Bhandwaldar*, Aldo Pareja*, Kai Xu, David D. Cox, Akash Srivastava*. "LAB: Large-Scale Alignment for ChatBots", arXiv preprint arXiv: 2403.01081, 2024. (* denotes equal contributions)

Acknowledgements

The InstructLab project is sponsored by Red Hat.

InstructLab was originally created by engineers from Red Hat and IBM Research.

The infrastructure used to regularly train models based on new contributions from the community is donated and maintained by IBM.

Pinned Loading

  1. instructlab instructlab Public

    InstructLab Command-Line Interface. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data.

    Python 811 303

  2. taxonomy taxonomy Public

    Taxonomy tree that will allow you to create models tuned with your data

    Python 182 654

  3. community community Public

    InstructLab Community wide collaboration space including contributing, security, code of conduct, etc

    Python 71 40

  4. dev-docs dev-docs Public

    Developer documents for the InstructLab organization

    Makefile 2 28

Repositories

Showing 10 of 18 repositories
  • instructlab Public

    InstructLab Command-Line Interface. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data.

    instructlab/instructlab’s past year of commit activity
    Python 811 Apache-2.0 302 225 (20 issues need help) 75 Updated Oct 5, 2024
  • training Public

    InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data

    instructlab/training’s past year of commit activity
    Python 14 Apache-2.0 37 33 (3 issues need help) 10 Updated Oct 4, 2024
  • website Public
    instructlab/website’s past year of commit activity
    TypeScript 0 CC-BY-4.0 19 8 7 Updated Oct 4, 2024
  • schema Public

    JSON schema for Taxonomy YAML

    instructlab/schema’s past year of commit activity
    Python 1 Apache-2.0 9 1 2 Updated Oct 4, 2024
  • eval Public

    Python library for Evaluation

    instructlab/eval’s past year of commit activity
    Python 5 Apache-2.0 15 3 2 Updated Oct 4, 2024
  • sdg Public

    Python library for Synthetic Data Generation

    instructlab/sdg’s past year of commit activity
    Python 16 Apache-2.0 30 38 (1 issue needs help) 5 Updated Oct 4, 2024
  • community Public

    InstructLab Community wide collaboration space including contributing, security, code of conduct, etc

    instructlab/community’s past year of commit activity
    Python 71 Apache-2.0 40 13 4 Updated Oct 4, 2024
  • taxonomy Public

    Taxonomy tree that will allow you to create models tuned with your data

    instructlab/taxonomy’s past year of commit activity
    Python 182 Apache-2.0 654 5 13 Updated Oct 3, 2024
  • ui Public

    Place to hack on UI for InstructLab

    instructlab/ui’s past year of commit activity
    TypeScript 12 Apache-2.0 25 39 (4 issues need help) 7 Updated Oct 3, 2024
  • dev-docs Public

    Developer documents for the InstructLab organization

    instructlab/dev-docs’s past year of commit activity
    Makefile 2 Apache-2.0 28 13 21 Updated Oct 2, 2024