Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
tags
*.cache
*.pt
*.pkl
Expand Down
4 changes: 2 additions & 2 deletions clients/python/llmengine/fine_tuning.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,15 +283,15 @@ def get_events(cls, fine_tune_id: str) -> GetFineTuneEventsResponse:
Returns:
GetFineTuneEventsResponse: an object that contains the list of events for the fine-tuning job

Example:
=== "Getting events for fine-tuning jobs in Python"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
=== "Getting events for fine-tuning jobs in Python"
=== "Getting events for fine-tuning jobs in Python"

```python
from llmengine import FineTune

response = FineTune.get_events(fine_tune_id="ft-cir3eevt71r003ks6il0")
print(response.json())
```

JSON Response:
=== "Response in JSON"
```json
{
"events":
Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ With your API key set, you can now send LLM Engine requests using the Python cli
from llmengine import Completion

response = Completion.create(
model="falcon-7b-instruct",
model="llama-2-7b",
prompt="I'm opening a pancake restaurant that specializes in unique pancake shapes, colors, and flavors. List 3 quirky names I could name my restaurant.",
max_new_tokens=100,
temperature=0.2,
Expand All @@ -66,7 +66,7 @@ import sys
from llmengine import Completion

stream = Completion.create(
model="falcon-7b-instruct",
model="llama-2-7b",
prompt="Give me a 200 word summary on the current economic events in the US.",
max_new_tokens=1000,
temperature=0.2,
Expand Down
35 changes: 15 additions & 20 deletions docs/guides/completions.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,26 +42,21 @@ See the full [Completion API reference documentation](../../api/python_client/#l
An example Completion API response looks as follows:

=== "Response in JSON"
`python
>>> print(response.json())
`
Example output:
`json
{
"request_id": "c4bf0732-08e0-48a8-8b44-dfe8d4702fb0",
"output": {
"text": "_______ and I am a _______",
"num_completion_tokens": 10
}
}
`
```python
>>> print(response.json())
{
"request_id": "c4bf0732-08e0-48a8-8b44-dfe8d4702fb0",
"output": {
"text": "_______ and I am a _______",
"num_completion_tokens": 10
}
}
```
=== "Response in Python"
`python
>>> print(response.output.text)
`
Example output:
` _______ and I am a _______
`
```python
>>> print(response.output.text)
_______ and I am a _______
```

## Token streaming

Expand All @@ -81,7 +76,7 @@ import sys
from llmengine import Completion

stream = Completion.create(
model="falcon-7b-instruct",
model="llama-2-7b",
prompt="Give me a 200 word summary on the current economic events in the US.",
max_new_tokens=1000,
temperature=0.2,
Expand Down
6 changes: 3 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ Kubernetes.
### Key Features

**Ready-to-use APIs for your favorite models**: Deploy and serve
open source foundation models - including LLaMA, MPT, and Falcon.
open source foundation models - including Llama-2, MPT, and Falcon.
Use Scale-hosted models or deploy to your own infrastructure.

**Fine-tune your favorite models**: Fine-tune open-source foundation
models like LLaMA, MPT, etc. with your own data for optimized performance.
**Fine-tune the best open-source models**: Fine-tune open-source foundation
models like Llama-2, MPT, etc. with your own data for optimized performance.

**Optimized Inference**: LLM Engine provides inference APIs
for streaming responses and dynamically batching inputs for higher throughput
Expand Down