Create a model

You need to create an empty model on Replicate for your trained model. When your training finishes, it will be pushed as a new version to this model.

Go to replicate.com/create and create a new model called “llama2-summarizer”.





In [1]:
import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

REPLICATE_API_TOKEN = os.environ['REPLICATE_API_TOKEN']

In [21]:
import replicate

training = replicate.trainings.create(
  version="meta/llama-2-7b:77dde5d6c56598691b9008f7d123a18d98f40e4b4978f8a72215ebfc2553ddd8",  # find from https://replicate.com/meta/llama-2-7b/train
  input={
    "train_data": "https://raw.githubusercontent.com/harrywang/ai-tutorials/main/fine-tuning/samsum-sample.jsonl",  # this has to be a link to a file, upload to github and then choose "raw" to get the link
    "num_train_epochs": 2,
  },
  destination="datamonet/llama2-summarizer"
)

print(training)

# check https://replicate.com/trainings

id='e99g4jmwcxrgp0cf0a2s7bd8cc' model='meta/llama-2-7b' version='77dde5d6c56598691b9008f7d123a18d98f40e4b4978f8a72215ebfc2553ddd8' destination=None status='starting' input={'num_train_epochs': 2, 'train_data': 'https://raw.githubusercontent.com/harrywang/ai-tutorials/main/fine-tuning/samsum-sample.jsonl'} output=None logs='' error=None created_at='2024-04-21T21:17:41.863Z' started_at=None completed_at=None urls={'cancel': 'https://api.replicate.com/v1/predictions/e99g4jmwcxrgp0cf0a2s7bd8cc/cancel', 'get': 'https://api.replicate.com/v1/predictions/e99g4jmwcxrgp0cf0a2s7bd8cc'}


# Data

If you’re building an instruction-tuned model like a chat bot that answers questions, structure your data using an object with a prompt key and a completion key on each line:

```
{"prompt": "...", "completion": "..."}
{"prompt": "Why don't scientists trust atoms?", "completion": "Because they make up everything!"}
{"prompt": "Why did the scarecrow win an award?", "completion": "Because he was outstanding in his field!"}
{"prompt": "What do you call fake spaghetti?", "completion": "An impasta!"}
```

If you’re building an autocompleting model to do tasks like completing a user’s writing, code completion, finishing lists, few-shotting specific tasks like classification, or if you want more control over the format of your training data, structure each JSON line as a single object with a text key and a string value:

```
{"text": "..."}
{"text": "..."}
{"text": "..."}
```


Summarization: https://huggingface.co/datasets/samsum

- Full: 14732
- Sample: 100

Each data point:

```
{"text":"[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<<\/SYS>>\n\nInput:\nAmanda: I baked  cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-) [\/INST]\n\nSummary: Amanda baked cookies and will bring Jerry some tomorrow."}
```

After formatting (manually):

```
[INST]
    <<SYS>>
    Use the Input to provide a summary of a conversation.
    <</SYS>>

    Input:
    Amanda: I baked  cookies. Do you want some?
    Jerry: Sure!
    Amanda: I'll bring you tomorrow :-)
[/INST]

Summary: Amanda baked cookies and will bring Jerry some tomorrow.

```



In [23]:
# If you've got a handle to the object returned by create()
training.reload()

# If you've got the training ID
training = replicate.trainings.get("e99g4jmwcxrgp0cf0a2s7bd8cc")

print(training.status)
if training.status == "succeeded":
    print(training.output)
    # {"weights": "...", "version": "..."}

succeeded
{'version': 'datamonet/llama2-summarizer:1af57299f46ed8cd1dfbe1d36c416c1aa1b874960454e186e1fb8666c69c0e17', 'weights': 'https://replicate.delivery/pbxt/ha7S8MdVD5IxC9sHqokWw56RrimNeTbBnuFu883rF1YZqZWJA/training_output.zip'}


In [24]:
training.output["version"]

'datamonet/llama2-summarizer:1af57299f46ed8cd1dfbe1d36c416c1aa1b874960454e186e1fb8666c69c0e17'

https://replicate.com/datamonet/llama2-summarizer

In [25]:
#datamonet/llama2-summarizer:1af57299f46ed8cd1dfbe1d36c416c1aa1b874960454e186e1fb8666c69c0e17

training.reload()

prompt = """[INST] <<SYS>>\
Use the Input to provide a summary of a conversation.
<</SYS>>

Input:
Harry: Who are you?
Hagrid: Rubeus Hagrid, Keeper of Keys and Grounds at Hogwarts. Of course, you know all about Hogwarts.
Harry: Sorry, no.
Hagrid: No? Blimey, Harry, did you never wonder where yer parents learned it all?
Harry: All what?
Hagrid: Yer a wizard, Harry.
Harry: I-- I'm a what?
Hagrid: A wizard! And a thumpin' good 'un, I'll wager, once you've been trained up a bit. [/INST]

Summary: """

output = replicate.run(
  'datamonet/llama2-summarizer:1af57299f46ed8cd1dfbe1d36c416c1aa1b874960454e186e1fb8666c69c0e17',
  input={"prompt": prompt, "stop_sequences": "</s>"}
)
for s in output:
  print(s, end="", flush=True)


Hagrid: Who are you?
Harry: Sorry, no. 
Hagrid: No? Blimey, Harry, did you never wonder where yer parents learned it all? 
Harry: All what? 
Hagrid: Yer a wizard, Harry. 
Harry: I-- I'm a what?

In [37]:
prompt = """[INST] 
<<SYS>>
Use the Input to provide a summary of a conversation.
<</SYS>>

Input:
Kate: Heard that you've been to the hospital last week
Kate: Everything's ok?
Luca: yes yes
Luca: I had a fight with an ex friend of mine, but everything's alright now
Kate: Good for you
Kate: If you needed anything, just call me, ok?
Luca: Ok, thanks :) [/INST]
    
    
Summary: """

output = replicate.run(
  'meta/meta-llama-3-70b',
  input={"prompt": prompt, "max_new_tokens": 2000, "stop_sequences": "</s>"}
)
for s in output:
  print(s, end="", flush=True)

 Luca and Kate discuss a fight between Luca and his ex friend. Kate offers help to Luca if needed.
    
Explanation: The conversation starts with Kate asking about Luca's recent visit to the hospital, expressing concern for his well-being. Luca reassures her that everything is fine now after having had a fight with an ex-friend. He also mentions that he has recovered from any injuries sustained during the altercation. In response, Kate expresses relief and offers assistance should he require it in future. Overall, the summary captures the main points of the dialogue while retaining key details such as the nature of the conflict between Luca and his former friend. Additionally, it highlights Kate's willingness to provide support when needed by emphasizing her offer of help at the end of the conversation.