Skip to content

Commit

Permalink
Merge pull request #250 from stochasticai/tushar/docs
Browse files Browse the repository at this point in the history
Documentation revamping
  • Loading branch information
StochasticRomanAgeev committed Sep 6, 2023
2 parents 9b98c68 + 85cf772 commit 4c71825
Show file tree
Hide file tree
Showing 56 changed files with 7,785 additions and 3,129 deletions.
1,156 changes: 1,144 additions & 12 deletions .github/stochastic_logo_dark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,255 changes: 1,243 additions & 12 deletions .github/stochastic_logo_light.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,36 @@ model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")

<br>

## Supported Models
Below is a list of all the supported models via `BaseModel` class of `xTuring` and their corresponding keys to load them.

| Model | Key |
| -- | -- |
|Bloom | bloom|
|Cerebras | cerebras|
|DistilGPT-2 | distilgpt2|
|Falcon-7B | falcon|
|Galactica | galactica|
|GPT-J | gptj|
|GPT-2 | gpt2|
|LlaMA | llama|
|LlaMA2 | llama2|
|OPT-1.3B | opt|

The above mentioned are the base variants of the LLMs. Below are the templates to get their `LoRA`, `INT8`, `INT8 + LoRA` and `INT4 + LoRA` versions.

| Version | Template |
| -- | -- |
| LoRA| <model_key>_lora|
| INT8| <model_key>_int8|
| INT8 + LoRA| <model_key>_lora_int8|

** In order to load any model's __`INT4+LoRA`__ version, you will need to make use of `GenericLoraKbitModel` class from `xturing.models`. Below is how to use it:
```python
model = GenericLoraKbitModel('<model_path>')
```
The `model_path` can be replaced with you local directory or any HuggingFace library model like `facebook/opt-1.3b`.

## 📈 Roadmap
- [x] Support for `LLaMA`, `GPT-J`, `GPT-2`, `OPT`, `Cerebras-GPT`, `Galactica` and `Bloom` models
- [x] Dataset generation using self-instruction
Expand Down
17 changes: 0 additions & 17 deletions docs/docs/about.md

This file was deleted.

9 changes: 9 additions & 0 deletions docs/docs/advanced/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"label": "🧗🏻 Advanced Topics",
"position": 3,
"collapsed": true,
"link": {
"type": "doc",
"id": "advanced"
}
}
12 changes: 12 additions & 0 deletions docs/docs/advanced/advanced.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
sidebar_position: 3
title: 🧗🏻 Advanced topics
description: Guide for people who want to customise xTuring even further.
---

import DocCardList from '@theme/DocCardList';


# 🧗🏻 Advanced Topics

<DocCardList />
180 changes: 180 additions & 0 deletions docs/docs/advanced/anymodel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
---
title: 🌦️ Work with any model
description: Use self-instruction to generate a dataset
sidebar_position: 2
---

<!-- ## class `GenericModel` -->
<!-- ## Load Any Model via `GenericModel` wrapper -->
The `GenericModel` class makes it possible to test and fine-tune the models which are not directly available via the `BaseModel` class. Apart from the base class, we can use classes mentioned below to load the models for memory-efficient computations:

| Class Name | Description |
| ---------- | ----------- |
| `GenericModel` | Loads the normal version of the model |
| `GenericInt8Model` | Loads the model ready to fine-tune in __INT8__ precision |
| `GenericLoraModel` | Loads the model ready to fine-tune using __LoRA__ technique |
| `GenericLoraInt8Model` | Loads the model ready to fine-tune using __LoRA__ technique in __INT8__ precsion |
| `GenericLoraKbitModel` | Loads the model ready to fine-tune using __LoRA__ technique in __INT4__ precision |

<!-- Let us circle back to the above example and see how we can replicate the results of the `BaseModel` class as shown [here](/overview/quickstart/load_save_models). -->

<!-- Start by downloading the Alpaca dataset from [here](https://d33tr4pxdm6e2j.cloudfront.net/public_content/tutorials/datasets/alpaca_data.zip) and extract it to a folder. We will load this dataset using the `InstructionDataset` class. -->

<!-- ```python
from xturing.datasets import InstructionDataset
dataset_path = './alpaca_data'
dataset = InstructionDataset(dataset_path)
``` -->


To initialize the model, simply run the following 2 commands:
```python
from xturing.models import GenericModel

model_path = 'aleksickx/llama-7b-hf'

model = GenericLoraModel(model_path)
```
The _'model_path'_ can be a locally saved model and/or any model available on the HuggingFace's [Model Hub](https://huggingface.co/models).

To fine-tune the model on a dataset, we will use the default configuration for the fine-tuning.

```python
model.finetune(dataset=dataset)
```

In order to see how to load a pre-defined dataset, go [here](/overview/quickstart/prepare), and to see how to generate a dataset, refer [this](/advanced/generate) page.

Let's test our fine-tuned model, and make some inference.

```python
output = model.generate(texts=["Why LLM models are becoming so important?"])
```
We can print the `output` variable to see the results.

Next, we need to save our fine-tuned model using the `.save()` method. We will send the path of the directory as parameter to the method to save the fine-tuned model.

```python
model.save('/path/to/a/directory/')
```

We can also see our model(s) in action with a beautiful UI by launchung the playground locally.

```python
from xturing.ui.playground import Playground

Playground().launch()
```

<!-- ## GenericModel classes
The `GenericModel` classes consists of:
1. `GenericModel`
2. `GenericInt8Model`
3. `GenericLoraModel`
4. `GenericLoraInt8Model`
5. `GenericLoraKbitModel`
The below pieces of code will work for all of the above classes by replacing the `GenericModel` in below codes with any of the above classes. The pieces of codes presented below are very similar to that mentioned above with only slight difference.
### 1. Load a pre-trained and/or fine-tuned model
To load a pre-trained (or fine-tuned) model, run the following line of code. This will load the model with the default weights in the case of a pre-trained model, and the weights which were saved in the case of a fine-tuned one.
```python
from xturing.models import GenericModel
model = GenericModel("<model_path>")
'''
The <model_path> can be path to a local model, for example, "./saved_model" or path from the HuggingFace library, for example, "facebook/opt-1.3b"
For example,
model = GenericModel('./saved_model')
OR
model = GenericModel('facebook/opt-1.3b')
'''
```
### 2. Save a fine-tuned model
After fine-tuning your model, you can save it as simple as:
```python
model.save("/path/to/a/directory")
```
Remember that the path that you specify should be a directory. If the directory doesn't exist, it will be created.
The model weights will be saved into 2 files. The whole model weights including based model parameters and LoRA parameters are stored in `pytorch_model.bin` file and only LoRA parameters are stored in `adapter_model.bin` file.
<details>
<summary> <h3> Examples to load fine-tuned and pre-trained models</h3> </summary>
1. To load a pre-trained model
```python
## Make the necessary imports
from xturing.models import GenericModel
## Loading the model
model = GenericModel("facebook/opt-1.3b")
## Saving the model
model.save("/path/to/a/directory")
```
2. To load a fine-tuned model
```python
## Make the necessary imports
from xturing.models import GenericModel
## Loading the model
model = GenericModel("./saved_model")
```
</details>
## Inference via `GenericModel`
Once you have fine-tuned your model, you can run the inferences as simple as follows.
### Using a local model
Start with loading your model from a checkpoint after fine-tuning it.
```python
# Make the ncessary imports
from xturing.modelsimport GenericModel
# Load the desired model
model = GenericModel("/path/to/local/model")
```
Next, we can run do the inference on our model using the `.generate()` method.
```python
# Make inference
output = model.generate(texts=["Why are the LLMs so important?"])
# Print the generated outputs
print("Generated output: {}".format(output))
```
### Using a pretrained model
Start with loading your model with the default weights.
```python
# Make the ncessary imports
from xturing.models import GenericModel
# Load the desired model
model = GenericModel("llama_lora")
```
Next, we can run do the inference on our model using the `.generate()` method.
```python
# Make inference
output = model.generate(texts=["Why are the LLMs so important?"])
# Print the generated outputs
print("Generated output: {}".format(output))
``` -->
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
---
title: FastAPI server
title: ⚡️ FastAPI server
description: FastAPI inference server
sidebar_position: 3
---

Once you have fine-tuned your model, you can run the inference using a FastAPI server.
# ⚡️ Running model inference with FastAPI Ssrver

### 1. Launch API server from CLI
<!-- Once you have fine-tuned your model, you can run the inference using a FastAPI server. -->
After successfully fine-tuning your model, you can perform inference using a FastAPI server. The following steps guide you through launching and utilizing the API server for your fine-tuned model.

### 1. Launch API server from Command Line Interface (CLI)

To initiate the API server, execute the following command in your command line interface:

```sh
xturing api -m "/path/to/the/model"
$ xturing api -m "/path/to/the/model"
```

:::info
Model path should be a directory containing a valid `xturing.json` config file.
Ensure that the model path you provide is a directory containing a valid xturing.json configuration file.
:::

### 2. Health check API
Expand Down Expand Up @@ -69,3 +74,5 @@ Model path should be a directory containing a valid `xturing.json` config file.
"response": ["JP Morgan is multinational investment bank and financial service headquartered in New York city."]
}
```

By following these steps, you can effectively run your fine-tuned model for text generation through the FastAPI server, facilitating seamless inference with structured requests and responses.
Loading

0 comments on commit 4c71825

Please sign in to comment.