Pobierz próbkę wyników badań, które otrzymaliśmy i na podstawie danych referencyjnych (poprawnych i niepoprawnych) zdecyduj, którym wynikom możemy zaufać. Wyślij do centrali w standardowy sposób tylko dwucyfrowe identyfikatory (są na początku linii z każdą próbką) poprawnych badań. Pomiń te, które wykryjesz jako sfałszowane. Wierzymy, że w 2024 roku w wykryciu takich anomalii może pomóc Ci technika zwana fine-tuningiem modeli językowych. Możesz jednak to zadanie wykonać w dowolny sposób, który doprowadzi Cię do rozwiązania. Nazwa zadania do raportu to ‘research’. https://centrala.ag3nts.org/dane/lab_data.zip

Oczekiwany format raportu w polu ‘answer’ (same dwucyfrowe wartości!).

[
  'identyfikator-01',
  'identyfikator-02',
  'identyfikator-03',
  'identyfikator-0N',
]

In [1]:
import os
import requests
import json
from dotenv import load_dotenv

In [2]:
load_dotenv()

personal_api_key = os.getenv("PERSONAL_API_KEY")

In [3]:
with open('lab_data/correct.txt', 'r') as file:
    correct_data = file.readlines()

with open('lab_data/incorrect.txt', 'r') as file:
    incorrect_data = file.readlines()

# Create list of training examples in JSONL format
training_data = []

# Add correct examples
for line in correct_data:
    values = line.strip().split(',')
    training_data.append({
        "messages": [
            {"role": "system", "content": "You are a data validator that checks if numeric data points are valid or invalid."},
            {"role": "user", "content": f"Are these data points valid? {values[0]},{values[1]},{values[2]},{values[3]}"},
            {"role": "assistant", "content": "These data points are valid."}
        ]
    })

# Add incorrect examples
for line in incorrect_data:
    values = line.strip().split(',')
    training_data.append({
        "messages": [
            {"role": "system", "content": "You are a data validator that checks if numeric data points are valid or invalid."},
            {"role": "user", "content": f"Are these data points valid? {values[0]},{values[1]},{values[2]},{values[3]}"},
            {"role": "assistant", "content": "These data points are invalid."}
        ]
    })

# Save to JSONL file
with open('training_data.jsonl', 'w') as f:
    for item in training_data:
        f.write(json.dumps(item) + '\n')

print(f"Saved {len(training_data)} training examples to training_data.jsonl")
print(json.dumps(training_data, indent=4))

Saved 478 training examples to training_data.jsonl
[
    {
        "messages": [
            {
                "role": "system",
                "content": "You are a data validator that checks if numeric data points are valid or invalid."
            },
            {
                "role": "user",
                "content": "Are these data points valid? 75,-16,-42,84"
            },
            {
                "role": "assistant",
                "content": "These data points are valid."
            }
        ]
    },
    {
        "messages": [
            {
                "role": "system",
                "content": "You are a data validator that checks if numeric data points are valid or invalid."
            },
            {
                "role": "user",
                "content": "Are these data points valid? -30,38,-22,64"
            },
            {
                "role": "assistant",
                "content": "These data points are valid."
            }
        ]
    

In [4]:
with open('./lab_data/verify.txt', 'r') as file:
    verify_data = file.readlines()

In [5]:
import openai

openai_api_key = os.getenv("OPENAI_API_KEY")
client = openai.OpenAI(api_key=openai_api_key)
def validate_data(data):
    response = client.chat.completions.create(
        model="ft:gpt-4o-mini-2024-07-18:personal:ai-devs-s04e02:AYKpykD6",
        messages=[
            {"role": "system", "content": "You are a data validator that checks if numeric data points are valid or invalid."},
            {"role": "user", "content": f"Are these data points valid? {data}"},
        ]
    )
    return response.choices[0].message.content

In [28]:
data_to_validate = []
for line in verify_data:
    id = line.split('=')[0]
    data = line.split('=')[1]
    print(data)
    validated_data = validate_data(data)
    print(validated_data)
    if validated_data == 'These data points are valid.':
        data_to_validate.append(id)
print(data_to_validate)

12,100,3,39

These data points are valid.
-41,75,67,-25

These data points are valid.
78,38,65,2

These data points are invalid.
5,64,67,30

These data points are invalid.
33,-21,16,-72

These data points are invalid.
99,17,69,61

These data points are invalid.
17,-42,-65,-43

These data points are invalid.
57,-83,-54,-43

These data points are invalid.
67,-55,-6,-32

These data points are invalid.
-20,-23,-2,44

These data points are valid.
['01', '02', '10']


In [29]:
answer = data_to_validate
answer_json = {
    "task": "research",
    "apikey": personal_api_key,
    "answer": answer
}
answer_url = "https://centrala.ag3nts.org/report"
answer_response = requests.post(answer_url, json=answer_json)
print(answer_response.request.body.decode('unicode_escape'))  # Displays the request body decoded

# Assuming `login_response` is the response object from the login request
print("Status Code:", answer_response.status_code)  # Displays the status code (e.g., 200)
print("Response Body:", answer_response.text.encode('utf-8').decode('unicode_escape'))  # Display response content directly without encoding

{"task": "research", "apikey": "1400cbf0-b7dd-49ab-9342-6ad8fd26ba69", "answer": ["01", "02", "10"]}
Status Code: 200
Response Body: {
    "code": 0,
    "message": "{{FLG:ITSVALID}}"
}
