In [1]:
import ollama

import numpy as np

from datasets import load_dataset

In [2]:
ds = load_dataset("ccdv/patent-classification", "abstract")

In [3]:
train_data = ds['train']

In [4]:
test_data = ds['test']

In [5]:
modelfile='''
FROM llama3.1
PARAMETER num_ctx 4096
PARAMETER temperature 0.2
PARAMETER num_predict 2
PARAMETER top_k 50
PARAMETER top_p 0.9
PARAMETER min_p 0.05
PARAMETER mirostat 2
'''

In [6]:
ollama.create(model='llama3.1', modelfile=modelfile)

{'status': 'success'}

In [7]:
system_message = """
You are an expert patent attorney tasked to classify patent abstracts into a fixed set of categories.
Each abstract can belong to only ONE of the following categories:
0: "Human Necessities", 
1: "Performing Operations; Transporting",
2: "Chemistry; Metallurgy",
3: "Textiles; Paper",
4: "Fixed Constructions",
5: "Mechanical Engineering; Lightning; Heating; Weapons; Blasting",
6: "Physics",
7: "Electricity",
8: "General tagging of new or cross-sectional technology"

A detailed description of each category is presented below.

Human Necessities: This category pertains to inventions that are related to fulfilling basic human needs. It includes patents for medical devices, pharmaceutical compositions, personal care products, food and beverages, clothing, and other inventions that directly impact human health, well-being, and daily living.
Physics: This category encompasses inventions that are related to the study of matter and energy. It includes patents for devices or processes that involve principles of mechanics, optics, acoustics, thermodynamics, quantum mechanics, and other areas of physics. Examples of patents in this category may include inventions related to lasers, semiconductors, optics, nuclear technology, and quantum computing.
Electricity: This category includes inventions that are related to the generation, transmission, distribution, and utilization of electrical energy. It covers patents for electrical circuits, power systems, electrical machinery and apparatus, electric vehicles, renewable energy technologies, and other electrical inventions.
General tagging of new or cross-sectional technology: This category is a catch-all for inventions that do not fit into any specific category but are related to new or emerging technologies. It includes patents for innovative technologies that may not have a specific field or application yet, but have the potential to disrupt various industries. Examples may include patents related to artificial intelligence, blockchain, virtual reality, augmented reality, and other cutting-edge technologies.
Performing Operations; Transporting: This category covers inventions related to processes, methods, and devices used in performing various operations or transporting goods, people, or information. It includes patents for manufacturing processes, industrial machinery, transportation vehicles, logistics systems, communication systems, and other inventions related to performing operations and transporting goods or information.
Chemistry; Metallurgy: This category includes inventions related to chemical processes, compositions, and materials. It covers patents for chemical reactions, pharmaceutical compositions, chemical catalysts, polymers, materials science, metallurgy, and other inventions related to the field of chemistry.
Mechanical Engineering; Lightning; Heating; Weapons; Blasting: This category encompasses inventions related to mechanical engineering, lightning protection, heating systems, weapons, and blasting technologies. It includes patents for machines, mechanical devices, heating systems, firearms, explosives, and other inventions related to mechanical engineering and related fields.
Fixed Constructions: This category includes inventions related to the construction industry and built infrastructure. It covers patents for building materials, construction methods, architectural designs, civil engineering projects, and other inventions related to fixed constructions such as buildings, bridges, roads, dams, and other infrastructure projects.

User input will contain a patent abstract.
Classify this abstract to only ONE of the above mentioned categories.
While you assign a category think through carefully and look for a clear conceptual match between the abstract and the label.
Your answer should contain only the numeric label for the category as described above.
Do NOT output anything else except the label.
"""

In [8]:
few_shot_prompt = [{'role':'system', 'content': system_message}]

In [9]:
predictions, ground_truths = [], []

In [10]:
for index, row in test_data.to_pandas().sample(500).iterrows():
    gold_abstract = row.iloc[0]
    gold_label = row.iloc[1]

    user_input = [{'role':'user', 'content': gold_abstract}]

    try:
        response = ollama.chat(
            model='llama3.1', 
            messages=few_shot_prompt + user_input
        )

        prediction = int(response['message']['content'])

        predictions.append(prediction)
        ground_truths.append(gold_label)
    except Exception as e:
        print(e) # Log error and continue
        continue

In [11]:
(np.array(predictions) == np.array(ground_truths)).mean()

0.318