## Describe your model -> fine-tuned GPT-3.5
By Matt Shumer (https://twitter.com/mattshumer_)

The goal of this notebook is to experiment with a new way to make it very easy to build a task-specific model for your use-case.

First, use the best GPU available (go to Runtime -> change runtime type)

To create your model, just go to the first code cell, and describe the model you want to build in the prompt. Be descriptive and clear.

Select a temperature (high=creative, low=precise), and the number of training examples to generate to train the model. From there, just run all the cells.

You can change the model you want to fine-tune by changing `model_name` in the `Define Hyperparameters` cell.

#Data generation step

Write your prompt here. Make it as descriptive as possible!

Then, choose the temperature (between 0 and 1) to use when generating data. Lower values are great for precise tasks, like writing code, whereas larger values are better for creative tasks, like writing stories.

Finally, choose how many examples you want to generate. The more you generate, a) the longer it takes and b) the more expensive data generation will be. But generally, more examples will lead to a higher-quality model. 100 is usually the minimum to start.

In [3]:
prompt = '''A model that takes a existing prior art patent and a new patent application as input and both the patents consists of a title, abstract (about 150 words) , summary (about  500 words) and patemt claims (average 700 words) to assess the novelty of the patents by comparing their technical specifications, ensuring both pertain to the same domain. If they originate from different domains, identify this and avoid novelty evaluation. Limits the analysis to a comparison between one patent and one prior art. Should there be more, indicate: ""Currently, I can only analyze the novelty between one patent and one piece of prior art. Future updates may expand this capability. Please submit one patent and one prior art for evaluation.

Upon verifying domain consistency, proceed to:

Identify and summarize the key similarities between the prior art and the patent application.
Highlight the principal points of comparison.
Use bullet points to detail the differences, unique features, and distinctions between the two, focusing on what sets the patent application apart.
Assess if the patent application introduces novel elements and satisfies the patentability criteria within its field when compared to the prior art.
If the concepts are too similar or if there's evidence of plagiarism, explain why the patent should not be granted and the risk of rejection.
Conclude with a definitive yes or no on whether the patent application merits approval.
'''

temperature = .7
number_of_examples = 5

Run this to generate the dataset.

In [4]:
!pip install openai==0.28 tenacity

Collecting openai==0.28
  Obtaining dependency information for openai==0.28 from https://files.pythonhosted.org/packages/ae/59/911d6e5f1d7514d79c527067643376cddcf4cb8d1728e599b3b03ab51c69/openai-0.28.0-py3-none-any.whl.metadata
  Downloading openai-0.28.0-py3-none-any.whl.metadata (13 kB)
Collecting tenacity
  Obtaining dependency information for tenacity from https://files.pythonhosted.org/packages/f4/f1/990741d5bb2487d529d20a433210ffa136a367751e454214013b441c4575/tenacity-8.2.3-py3-none-any.whl.metadata
  Downloading tenacity-8.2.3-py3-none-any.whl.metadata (1.0 kB)
Collecting aiohttp (from openai==0.28)
  Obtaining dependency information for aiohttp from https://files.pythonhosted.org/packages/4e/78/266be6e31daad1a2dc99c777dfb12b62044691ec573b6e48409a0d804fc7/aiohttp-3.9.5-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Downloading aiohttp-3.9.5-cp311-cp311-macosx_11_0_arm64.whl.metadata (7.5 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->openai==0.28)
  Obtaining dependency informatio

In [225]:
import os
import openai
import random
from tenacity import retry, stop_after_attempt, wait_exponential

openai.api_key = "OPEN-AI-API-KEY"

N_RETRIES = 3

@retry(stop=stop_after_attempt(N_RETRIES), wait=wait_exponential(multiplier=1, min=4, max=70))
def generate_example(prompt, prev_examples, temperature=.5):
    messages=[
        {
            "role": "system",
            "content": f"You are generating data which will be used to test a machine learning model.\n\nYou will be given a high-level description of the model we want to train, and from that, you will generate data samples, each with a prompt/response pair.\n\nYou will do so in this format:\n```\nprompt\n-----------\n$prompt_goes_here\n-----------\n\nresponse\n-----------\n$response_goes_here\n-----------\n```\n\nOnly one prompt/response pair should be generated per turn.\n\nFor each turn, make the example slightly more complex than the last, while ensuring diversity.\n\nMake sure your samples are unique and diverse, yet high-quality and complex enough to train a well-performing model. Include examples of both patent applications that were rejected due to not being novel,distinct or innovative relative to prior patents, and describe how this obviouness contributes to the patent's rejection. \n\n Here is an example of the data that needs to be generated: \n\n prompt: Prior Art Title: \"Wireless Communication Device\"\nPrior Art Abstract: A wireless communication device includes a wireless communication unit that performs wireless communication with an exterior, and a control unit that controls the wireless communication unit. The wireless communication unit includes a first wireless communication element and a second wireless communication element that is different from the first wireless communication element.\n\n New Patent Title: 'Advanced Wireless Communication Device' \nNew Patent Abstract: An advanced wireless communication device comprises a wireless communication module capable of communicating wirelessly with an external entity, and a control module that governs the wireless communication module. The wireless communication module includes a primary wireless communication element and an auxiliary wireless communication element, distinct from the primary one.\n\nPlease assess the novelty of the new patent application.\n\n response: The prior art and the new patent application both pertain to the same domain of wireless communication devices. Here's the analysis:\n\nPrincipal Points of Comparison:\n1. Both devices include a wireless communication unit and a control unit.\n2. Both devices have two distinct wireless communication elements.\n\nKey Similarities:\n1. The control unit governs the wireless communication module in both.\n2. Both have a primary and a secondary wireless communication element.\n\nDifferences and Unique Features:\n1. The new patent application uses the terms \"primary\" and \"auxiliary\" to describe the wireless communication elements, which could imply a difference in hierarchy or function, but this is not clearly specified.\n\nBased on this comparison, it appears that the new patent application does not introduce significantly novel elements compared to the prior art. The concepts seem very similar, and without further clarification on the difference between \"primary\" and \"auxiliary\" elements, the patent risk being rejected due to lack of uniqueness. Therefore, the patent application might not merit approval. \n\nHere is the type of model we want to train:\n`{prompt}`"}
    ]

    if len(prev_examples) > 0:
        if len(prev_examples) > 8:
            prev_examples = random.sample(prev_examples, 8)
        for example in prev_examples:
            messages.append({
                "role": "assistant",
                "content": example
            })

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=temperature,
        max_tokens=1000,
    )

    return response.choices[0].message['content']

# Generate examples
prev_examples = []
for i in range(number_of_examples):
    print(f'Generating example {i}')
    example = generate_example(prompt, prev_examples, temperature)
    prev_examples.append(example)

print(prev_examples)

Generating example 0
Generating example 1
Generating example 2
Generating example 3
Generating example 4
Generating example 5
Generating example 6
Generating example 7
Generating example 8
Generating example 9
Generating example 10
Generating example 11
Generating example 12
Generating example 13
Generating example 14
Generating example 15
Generating example 16
Generating example 17
Generating example 18
Generating example 19
Generating example 20
Generating example 21
Generating example 22
Generating example 23
Generating example 24
Generating example 25
Generating example 26
Generating example 27
Generating example 28
Generating example 29
Generating example 30
Generating example 31
Generating example 32
Generating example 33
Generating example 34
Generating example 35
Generating example 36
Generating example 37
Generating example 38
Generating example 39
Generating example 40
Generating example 41
Generating example 42
Generating example 43
Generating example 44
Generating example 4

In [228]:
print(prev_examples[49])

prompt
-----------
Prior Art Title: "Health Monitoring Wearable Device"
Prior Art Abstract: A health monitoring wearable device tracks biometric data like heart rate, steps taken, and sleep patterns to help users monitor their health and fitness levels. The device syncs data to a mobile app for analysis.

New Patent Title: "AI-Powered Personal Health Coach Wearable with Medical Insights" 
New Patent Abstract: The AI-powered personal health coach wearable with medical insights utilizes machine learning to provide personalized health recommendations, monitor vital signs, and offer insights into potential health issues. It integrates with healthcare providers for proactive health management.

Assess the novelty of the new patent application.
-----------

response
-----------
The prior art and the new patent application both center around health monitoring wearable devices. Here's the analysis:

Principal Points of Comparison:
1. Both devices track biometric data to help users monitor thei

In [229]:
system_message = "Given an existing prior art patent and a new patent application, both consisting of a title, abstract, summary, and patent claims, assess the novelty of the new patent by comparing their technical specifications. Ensure they pertain to the same domain. If they don't, indicate this and avoid novelty evaluation. Identify key similarities, highlight principal points of comparison, detail the differences, and assess if the new patent introduces novel elements. If too similar, explain why the patent should not be granted. Conclude with a definitive answer on whether the patent application merits approval."

Now let's put our examples into a dataframe and turn them into a final pair of datasets.

In [230]:
import json
import pandas as pd

# Initialize lists to store prompts and responses
prompts = []
responses = []

# Parse out prompts and responses from examples
for example in prev_examples:
  try:
    split_example = example.split('-----------')
    prompts.append(split_example[1].strip())
    responses.append(split_example[3].strip())
  except:
    pass

# Create a DataFrame
df = pd.DataFrame({
    'prompt': prompts,
    'response': responses
})

# Remove duplicates
df = df.drop_duplicates()

print('There are ' + str(len(df)) + ' successfully-generated examples.')

# Initialize list to store training examples
training_examples = []

# Create training examples in the format required for GPT-3.5 fine-tuning
for index, row in df.iterrows():
    training_example = {
        "messages": [
            {"role": "system", "content": system_message.strip()},
            {"role": "user", "content": row['prompt']},
            {"role": "assistant", "content": row['response']}
        ]
    }
    training_examples.append(training_example)

# Save training examples to a .jsonl file
with open('C_CPC_training_examples.jsonl', 'w') as f:
    for example in training_examples:
        f.write(json.dumps(example) + '\n')

There are 50 successfully-generated examples.


In [30]:
""" from google.colab import files
# Download the JSON file
files.download('C_CPC_training_examples.jsonl') """

" from google.colab import files\n# Download the JSON file\nfiles.download('C_CPC_training_examples.jsonl') "

In [None]:
print(prev_examples[0])

prompt
-----------
Prior Art Title: "Wireless Communication Device"
Prior Art Abstract: A wireless communication device includes a wireless communication unit that performs wireless communication with an exterior, and a control unit that controls the wireless communication unit. The wireless communication unit includes a first wireless communication element and a second wireless communication element that is different from the first wireless communication element.

New Patent Title: "Advanced Wireless Communication Device"
New Patent Abstract: An advanced wireless communication device comprises a wireless communication module capable of communicating wirelessly with an external entity, and a control module that governs the wireless communication module. The wireless communication module includes a primary wireless communication element and an auxiliary wireless communication element, distinct from the primary one.

Please assess the novelty of the new patent application.
-----------
re

In [None]:
print(prev_examples[1])

prompt
-----------
Prior Art Title: "Solar-powered Water Purification System"
Prior Art Abstract: A solar-powered water purification system comprising a solar panel, a battery, a pump, and a purification module. The solar panel converts sunlight into electricity, stored in the battery. The pump, powered by the battery, drives impure water through the purification module which filters out impurities, delivering clean water.

New Patent Title: "Portable Solar-Powered Water Purification Device"
New Patent Abstract: A portable solar-powered water purification device includes a solar panel, a rechargeable battery, a motor, and a purification chamber. The solar panel absorbs sunlight, converting it to electricity which charges the battery. The motor, energized by the battery, propels contaminated water into the purification chamber where impurities are removed, yielding potable water.

Please assess the novelty of the new patent application.
-----------
response
-----------
Both the prior ar

In [None]:
print(prev_examples[2])

prompt
-----------
Prior Art Title: "Automated Retail Checkout System"
Prior Art Abstract: An automated retail checkout system that includes a conveyor belt, a scanner, and a payment terminal. Items are placed on the conveyor belt, scanned by the scanner to identify and price them, and then the total is calculated for payment at the payment terminal.

New Patent Title: "Smart Retail Checkout Solution"
New Patent Abstract: A smart retail checkout solution comprises a conveyor mechanism, an intelligent scanner, and a digital payment portal. Customers place their items on the conveyor mechanism, which are identified and priced by the intelligent scanner using advanced image recognition technology. The total cost is then computed for payment at the digital payment portal.

Please assess the novelty of the new patent application.
-----------
response
-----------
Both the prior art and the new patent application are in the domain of automated retail checkout systems. Here's the analysis:

Pr

In [None]:
print(prev_examples[3])

prompt
-----------
Prior Art Title: "Electric Vehicle Charging Station"
Prior Art Abstract: An electric vehicle charging station includes a power supply module, a control module, and a charging connector. The power supply module provides the necessary power. The control module regulates the power supply and communicates with the electric vehicle to manage the charging process. The charging connector connects the charging station to the vehicle.

New Patent Title: "Smart Electric Vehicle Charging Station"
New Patent Abstract: A smart electric vehicle charging station comprises a power source, a smart control unit, and a charging plug. The power source provides the required power. The smart control unit, equipped with IoT technology, controls the power source and communicates with the electric vehicle to efficiently manage the charging. The charging plug connects the station to the vehicle.

Please assess the novelty of the new patent application.
-----------
response
-----------
The pri

In [None]:
print(prev_examples[4])

prompt
-----------
Prior Art Title: "Biometric Security System"
Prior Art Abstract: A biometric security system that includes a sensor for capturing biometric data, a processor for verifying the captured data against stored data, and an access control mechanism that grants or denies access based on the verification result. 

New Patent Title: "Advanced Biometric Security System with Live Detection"
New Patent Abstract: An advanced biometric security system that incorporates a biometric sensor for acquiring biometric data, a detection unit for live detection, a verification processor for comparing the acquired data with pre-stored data, and an access control mechanism that grants or denies access contingent on the verification outcome. 

Please assess the novelty of the new patent application.
-----------
response
-----------
Both the prior art and the new patent application fall under the domain of biometric security systems. Here's the analysis:

Principal Points of Comparison:
1. Bot

In [None]:
print(prev_examples[5])

prompt
-----------
Prior Art Title: "Virtual Reality Gaming System"
Prior Art Abstract: A virtual reality gaming system includes a headset for displaying virtual reality content, controllers for user interaction, and a processing unit for running the game. The system provides an immersive gaming experience by integrating visual, auditory, and haptic feedback.

New Patent Title: "Enhanced Virtual Reality Gaming System with AI"
New Patent Abstract: An enhanced virtual reality gaming system comprises a VR headset for rendering virtual reality content, AI-powered controllers for advanced user interaction, and a game processing unit. The system offers an immersive gaming experience by integrating visual, auditory, and haptic feedback and leveraging AI to adapt the game dynamics based on user behavior.

Please assess the novelty of the new patent application.
-----------
response
-----------
Both the prior art and the new patent application are in the domain of virtual reality gaming systems

In [None]:
print(prev_examples[6])

prompt
-----------
Prior Art Title: "Wireless Earbuds with Noise Cancellation"
Prior Art Abstract: Wireless earbuds include a speaker, a microphone, and a noise cancellation unit. The speaker produces the audio output, the microphone captures ambient noise, and the noise cancellation unit generates an inverse sound wave to cancel out the ambient noise.

New Patent Title: "Wireless Earbuds with Adaptive Noise Cancellation"
New Patent Abstract: Wireless earbuds comprise a speaker, a microphone, and an adaptive noise cancellation module. The speaker delivers audio output, the microphone picks up the surrounding noise, and the adaptive noise cancellation module dynamically generates an inverse sound wave to neutralize the ambient noise based on the user's environment and preferences.

Please assess the novelty of the new patent application.
-----------
response
-----------
Both the prior art and the new patent application fall into the domain of wireless earbuds with noise cancellation. He

In [None]:
print(prev_examples[7])

prompt
-----------
Prior Art Title: "Automated Drone Delivery System"
Prior Art Abstract: An automated drone delivery system includes a drone with a cargo compartment, a navigation system for guiding the drone to the delivery location, and a control system for managing the drone's flight and delivery operations. 

New Patent Title: "Intelligent Drone Delivery System with Real-time Tracking"
New Patent Abstract: An intelligent drone delivery system comprises a drone with a delivery compartment, a smart navigation system for directing the drone to the delivery site, a control unit for overseeing the drone's flight and delivery tasks, and a real-time tracking feature that allows customers to monitor the delivery in progress. 

Please assess the novelty of the new patent application.
-----------
response
-----------
Both the prior art and the new patent application are in the domain of drone delivery systems. Here's the analysis:

Principal Points of Comparison:
1. Both systems include a d

In [None]:
print(prev_examples[8])

prompt
-----------
Prior Art Title: "Smart Home Security System"
Prior Art Abstract: A smart home security system includes sensors for detecting intrusions, a control panel for managing the system, and a communication module for sending alerts. The system provides security by detecting unauthorized entries and alerting the homeowner.

New Patent Title: "Advanced Smart Home Security System with AI Intrusion Detection"
New Patent Abstract: An advanced smart home security system comprises sensors for intrusion detection, a control hub for system management, an alerting module, and an AI-powered intrusion detection unit. The system ensures security by identifying unauthorized entries using AI technology and alerting the homeowner via the alerting module.

Please assess the novelty of the new patent application.
-----------
response
-----------
Both the prior art and the new patent application are in the domain of smart home security systems. Here's the analysis:

Principal Points of Compar

In [None]:
print(prev_examples[9])

prompt
-----------
Prior Art Title: "Automated Irrigation System"
Prior Art Abstract: An automated irrigation system includes a water source, a network of pipes, sprinklers, and a control unit. The control unit manages the operation of the sprinklers based on pre-set schedules, ensuring effective water distribution for plant growth.

New Patent Title: "Smart Irrigation System with Soil Moisture Sensors"
New Patent Abstract: A smart irrigation system comprises a water source, a pipe network, sprinkler heads, a control module, and soil moisture sensors. The control module operates the sprinkler heads based on data from the soil moisture sensors, ensuring optimal watering for plant growth.

Please assess the novelty of the new patent application.
-----------
response
-----------
Both the prior art and the new patent application are in the domain of automated irrigation systems. Here's the analysis:

Principal Points of Comparison:
1. Both systems include a water source, a network of pipes

We also need to generate a system message.

In [None]:
def generate_system_message(prompt):

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
          {
            "role": "system",
            "content": "You will be given a high-level description of the model we are training, and from that, you will generate a simple system prompt for that model to use. Remember, you are not generating the system message for data generation -- you are generating the system message to use for inference. A good format to follow is `Given $INPUT_DATA, you will $WHAT_THE_MODEL_SHOULD_DO.`.\n\nMake it as concise as possible. Include nothing but the system prompt in your response.\n\nFor example, never write: `\"$SYSTEM_PROMPT_HERE\"`.\n\nIt should be like: `$SYSTEM_PROMPT_HERE`."
          },
          {
              "role": "user",
              "content": prompt.strip(),
          }
        ],
        temperature=temperature,
        max_tokens=500,
    )

    return response.choices[0].message['content']

system_message = generate_system_message(prompt)

print(f'The system message is: `{system_message}`. Feel free to re-run this cell if you want a better result.')

The system message is: `Given a new patent application and an existing prior art patent, assess the novelty of the patents by comparing their technical specifications to determine if they pertain to the same domain.`. Feel free to re-run this cell if you want a better result.


# Upload the file to OpenAI

In [None]:
file_id = openai.File.create(
  file=open("/content/training_examples.jsonl", "rb"),
  purpose='fine-tune'
).id

# Train the model! You may need to wait a few minutes before running the next cell to allow for the file to process on OpenAI's servers.

In [None]:
job = openai.FineTuningJob.create(training_file=file_id, model="gpt-3.5-turbo")

job_id = job.id

# Now, just wait until the fine-tuning run is done, and you'll have a ready-to-use model!

Run this cell every 20 minutes or so -- eventually, you'll see a message "New fine-tuned model created: ft:gpt-3.5-turbo-0613:xxxxxxxxxxxx"

Once you see that message, you can go to the OpenAI Playground (or keep going to the next cells and use the API) to try the model!

In [None]:
openai.FineTuningJob.list_events(id=job_id, limit=10)

# Once your model is trained, run the next cell to grab the fine-tuned model name.

In [None]:
model_name_pre_object = openai.FineTuningJob.retrieve(job_id)
model_name = model_name_pre_object.fine_tuned_model
print(model_name)

# Let's try it out!

In [None]:
response = openai.ChatCompletion.create(
    model=model_name,
    messages=[
      {
        "role": "system",
        "content": system_message,
      },
      {
          "role": "user",
          "content": df['prompt'].sample().values[0],
      }
    ],
)

response.choices[0].message['content']