Skip to content

Commit

Permalink
feat: [LLM] Added support for Large Language Models
Browse files Browse the repository at this point in the history
# Features:

* Generate text
* Chat
* Get text embedding vector for a list of texts
* Tune a language model using a training dataset

The following models are available:

* `text-bison@001`
* `chat-bison@001`
* `textembedding-gecko@001`

# Example usage

## Text Generation

```python
from vertexai.preview.language_models import TextGenerationModel

model = TextGenerationModel.from_pretrained("text-bison@001")

print(model.predict(
    "What is the best recipe for banana bread? Recipe:",
    # Optional:
    #max_output_tokens=128,
    #temperature=0,
    #top_p=1,
    #top_k=5,
))
```

## Chat

```python
from aiplatform.preview.language_models import ChatModel, InputOutputTextPair

chat_model = ChatModel.from_pretrained("chat-bison@001")

chat = chat_model.start_chat(
    # Optional:
    context="My name is Ned. You are my personal assistant. My favorite movies are Lord of the Rings and Hobbit.",
    examples=[
        InputOutputTextPair(
            input_text="Who do you work for?",
            output_text="I work for Ned.",
        ),
        InputOutputTextPair(
            input_text="What do I like?",
            output_text="Ned likes watching movies.",
        ),
    ],
)

print(chat.send_message("Are my favorite movies based on a book series?"))

print(chat.send_message("When where these books published?"))
```

## Text embedding

```python
from vertexai.preview.language_models import TextEmbeddingModel

model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")
embeddings = model.get_embeddings(["What is life?"])
for embedding in embeddings:
    vector = embedding.values
    print(len(vector))
```

# Tuning
```python
from vertexai.preview.language_models import TextGenerationModel

model = TextGenerationModel.from_pretrained("text-bison@001")

# Dataset URI
training_data = "gs://<>bucket/<path>.jsonl"

# Pandas dataset
training_data = pandas.DataFrame(data=[
    {"input_text": "Input 1", "output_text": "Output 1"},
    {"input_text": "Input 2", "output_text": "Output 2"},
])

# Prompt dataset resource name
training_data = "projects/.../locations/.../datasets/..."

model.tune_model(
    training_data=training_data,
    # Optional:
    train_steps=10,
    tuning_job_location="europe-west4",
    model_deployment_location="us-central1",
)

model.predict("What is life?")
```

PiperOrigin-RevId: 529799173
  • Loading branch information
Ark-kun authored and Copybara-Service committed May 5, 2023
1 parent 9e2c216 commit 866c6aa
Show file tree
Hide file tree
Showing 2 changed files with 871 additions and 0 deletions.
Loading

0 comments on commit 866c6aa

Please sign in to comment.