## Retreieval Document

Notebook ini berisikan bagaimana cara mengekstrak pertanyaan pengguna menjadi query yang akan siap dijalankan di Knowledge Graph menggunakan Large Language Model.

Langkah-langkah:
1. Load Knowledge graph dari demo Neo4J
2. Membuat query Chyper memakai fungsi `GraphCypherQAChain()` menggunnakan  base LLM.
3. Melakukan finetuning menggunakan Unsloth.
4. Inference model
5. Evaluasi

## Persiapan

- Unsloth:https://unsloth.ai/
- Unsloth Github repo: https://github.com/unslothai/unsloth

- References
  - LoRA paper: https://arxiv.org/pdf/2106.09685

Note:
It's not good enough to just a prompting technic in key_promt. Like few shot prompting as example. There some hallucination as going so on.  So we need finetuning the model.

In [None]:
# prompt: mount gdrive saya

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!.
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes
!pip install langchain_community neo4j langchain langchain_groq

# Initial Set Up

In [None]:
import os
from langchain_community.graphs import Neo4jGraph
from langchain_groq import ChatGroq
from langchain.chains import GraphCypherQAChain
from google.colab import userdata

os.environ["groq_api_key"] = ""
os.environ["hf_api"] = ""

#https://demo.neo4jlabs.com:7473/browser/
DEMO_URL = "neo4j+s://demo.neo4jlabs.com"
database = "recommendations"

Kita menggunakan Knowledge Graph demo yang telah disediakan Neo4j.

In [None]:
graph = Neo4jGraph(url=DEMO_URL,database=database,username=database,password=database,sanitize=True)
print(graph.schema)

Node properties:
Movie {posterEmbedding: LIST, url: STRING, runtime: INTEGER, revenue: INTEGER, budget: INTEGER, plotEmbedding: LIST, imdbRating: FLOAT, released: STRING, countries: LIST, languages: LIST, plot: STRING, imdbVotes: INTEGER, imdbId: STRING, year: INTEGER, poster: STRING, movieId: STRING, tmdbId: STRING, title: STRING}
Genre {name: STRING}
User {userId: STRING, name: STRING}
Actor {url: STRING, died: DATE, born: DATE, imdbId: STRING, name: STRING, tmdbId: STRING, bornIn: STRING, bio: STRING, poster: STRING}
Director {url: STRING, bornIn: STRING, born: DATE, died: DATE, tmdbId: STRING, imdbId: STRING, name: STRING, poster: STRING, bio: STRING}
Person {url: STRING, died: DATE, bornIn: STRING, born: DATE, imdbId: STRING, name: STRING, poster: STRING, tmdbId: STRING, bio: STRING}
Relationship properties:
RATED {rating: FLOAT, timestamp: INTEGER}
ACTED_IN {role: STRING}
DIRECTED {role: STRING}
The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:User)-[:RATED]->(:Movie)
(:Actor)

New Update from Langchain (09/05/24): an enhanced schema parameter representation that samples the database values and return them to the LLM to be able to generate more accurate Cypher statements

https://python.langchain.com/v0.1/docs/integrations/graphs/neo4j_cypher/#enhanced-schema-information

In [None]:
graph = Neo4jGraph(url=DEMO_URL,database=database,username=database,password=database,sanitize=True,enhanced_schema=True)
print(graph.schema)



Node properties:
- **Movie**
  - `url`: STRING Example: "https://themoviedb.org/movie/862"
  - `runtime`: INTEGER Min: 2, Max: 910
  - `revenue`: INTEGER Min: 1, Max: 2787965087
  - `imdbRating`: FLOAT Min: 1.6, Max: 9.6
  - `released`: STRING Example: "1995-11-22"
  - `countries`: LIST Min Size: 1, Max Size: 16
  - `languages`: LIST Min Size: 1, Max Size: 19
  - `plot`: STRING Example: "A cowboy doll is profoundly threatened and jealous"
  - `imdbVotes`: INTEGER Min: 13, Max: 1626900
  - `imdbId`: STRING Example: "0114709"
  - `year`: INTEGER Min: 1902, Max: 2016
  - `poster`: STRING Example: "https://image.tmdb.org/t/p/w440_and_h660_face/uXDf"
  - `movieId`: STRING Example: "1"
  - `tmdbId`: STRING Example: "862"
  - `title`: STRING Example: "Toy Story"
  - `budget`: INTEGER Min: 1, Max: 380000000
- **Genre**
  - `name`: STRING Example: "Adventure"
- **User**
  - `userId`: STRING Example: "1"
  - `name`: STRING Example: "Omar Huffman"
- **Actor**
  - `url`: STRING Example: "https://t

# Testing

Menggunakan base model `llama3-8b-8192`.

In [None]:
groq_api_key = ""

In [None]:
model = ChatGroq(temperature=0, model_name="llama3-8b-8192", groq_api_key = groq_api_key)
chain = GraphCypherQAChain.from_llm(graph=graph, llm=model, verbose=True, allow_dangerous_requests=True)

In [None]:
questions = ["Who is the oldest director?",
             "Find all directors who have directed a movie in Spanish language.",
             "Give me 5 movies where a director has also acted?",
             "List all movies with an IMDb rating greater than 5 that have been directed by a director born in China."
             ]

# POSSIBLE CORRECT CYPHER QUERY
# 1. MATCH (d:Director) WHERE d.born IS NOT NULL RETURN d ORDER BY d.born ASC LIMIT 1
# 2. MATCH (d:Director)-[:DIRECTED]->(m:Movie) WHERE 'Spanish' IN m.languages RETURN d.name
# 3. MATCH (d:Director)-[:ACTED_IN]->(m:Movie) WHERE exists{ (d)-[:DIRECTED]->(m) } RETURN m.title AS MovieTitle, m.movieId AS MovieID LIMIT 5
# 4. MATCH (m:Movie)<-[:DIRECTED]-(d:Director) WHERE m.imdbRating > 5 AND d.bornIn = 'China' RETURN m

for q in questions:
    print("\n", q)
    try:
        result = chain.invoke(q)['result']
        print(result)
    except:
        pass


 Who is the oldest director?


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (d:Director)-[:ACTED_IN|DIRECTED]->(m:Movie)
RETURN d.name AS director, d.born AS born ORDER BY d.born ASC LIMIT 1;[0m
Full Context:
[32;1m[1;3m[{'director': 'Georges Méliès', 'born': neo4j.time.Date(1861, 12, 8)}][0m

[1m> Finished chain.[0m
Georges Méliès is the oldest director.

 Find all directors who have directed a movie in Spanish language.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (d:Director)-[:DIRECTED]->(m:Movie)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(lang:Genre {name: "Spanish"}) RETURN d;[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m
I don't know the answer.

 Give me 5 movies where a director has also acted?


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (d:Director)-[:ACTED_IN]->(m:Movie) RETURN m;[0m
Full Context:
[32;1m[1;3m[{'m': {'lan

New Update from Langchain (09/05/2024): use validate_cypher parameter with enhanced schema parameter to get the best results

In [None]:
model = ChatGroq(temperature=0, model_name="llama3-8b-8192", groq_api_key = groq_api_key)
chain = GraphCypherQAChain.from_llm(graph=graph, llm=model, verbose=True, validate_cypher = True, allow_dangerous_requests=True)

In [None]:
questions = ["Who is the oldest director?",
             "Find all directors who have directed a movie in Spanish language.",
             "Give me 5 movies where a director has also acted?",
             "List all movies with an IMDb rating greater than 5 that have been directed by a director born in China."
             ]

for q in questions:
    print("\n", q)
    try:
        result = chain.invoke(q)['result']
        print(result)
    except:
        pass


 Who is the oldest director?


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (d:Director)-[:ACTED_IN|DIRECTED]->(m:Movie)
RETURN d.name AS director, d.born AS born ORDER BY d.born ASC LIMIT 1;[0m
Full Context:
[32;1m[1;3m[{'director': 'Georges Méliès', 'born': neo4j.time.Date(1861, 12, 8)}][0m

[1m> Finished chain.[0m
Georges Méliès is the oldest director.

 Find all directors who have directed a movie in Spanish language.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3m[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m
I don't know the answer.

 Give me 5 movies where a director has also acted?


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (d:Director)-[:ACTED_IN]->(m:Movie) RETURN m;[0m
Full Context:
[32;1m[1;3m[{'m': {'languages': ['English'], 'year': 1919, 'imdbId': '0009932', 'runtime': 12, 'imdbRating': 6.1, 'movieId': '72626', 'countr

Tampak hasil generate Chyper dari base model membaik karena dapat menghasilkan Fullcontext. Jika model tidak dapat menggenerate Chyper yang benar cukup dikosongkan. Berbeda ketika belum menggunakan parameter `validate_cypher = True` model tetap meng-generate Chyper walaupun salah.



# Fine Tuning using Unsloth

The code below is from the Unsloth repository: https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing

In [None]:
%%capture
!pip install triton

In [None]:
%%capture
# (Optional) Makes the build much faster
!pip install ninja
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
!pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.9.post4: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29+4a9dd7e.d20241008. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Adding LORA

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.9.post4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Create a dataset

In [None]:
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = f"Convert text to cypher query based on this schema: {graph.schema}"
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for input, output in zip(inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = prompt.format(instructions, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

Load data from HuggingFace

In [None]:
from datasets import load_dataset
dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset

README.md:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

alpaca_data_cleaned.json:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]



Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

Dataset({
    features: ['output', 'input', 'instruction', 'text'],
    num_rows: 51760
})

Load our own data

we're going to use: https://github.com/tomasonjo/text2cypher/blob/main/datasets/synthetic_gpt4turbo_demodbs/text2cypher_gpt4turbo.csv

In [None]:
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Finetune-llama3/text2cypher_gpt4turbo.csv')
df = df[(df['database'] == 'recommendations') & (df['syntax_error'] == False) & (df['timeout'] == False)]
df

Unnamed: 0,question,cypher,type,database,syntax_error,timeout,returns_results,false_schema
7275,What are the top 5 movies with a runtime great...,MATCH (m:Movie)\nWHERE m.runtime > 120\nRETURN...,Simple Retrieval Queries,recommendations,False,False,True,
7276,List the first 3 genres with movies having an ...,MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)\nWHERE ...,Verbose query,recommendations,False,False,True,
7277,List the first 5 directors who have a biograph...,MATCH (d:Director)\nWHERE d.bio IS NOT NULL\nR...,Simple Retrieval Queries,recommendations,False,False,True,
7278,Which 3 movies have the most detailed plot des...,"MATCH (m:Movie)\nRETURN m.title, m.plot\nORDER...",Simple Retrieval Queries,recommendations,False,False,True,
7279,Show the top 5 actors who have acted in movies...,MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)<-[:DIRE...,Simple Retrieval Queries,recommendations,False,False,True,
...,...,...,...,...,...,...,...,...
8067,Which movies have been acted in by more than 1...,MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\nWITH m...,Complex Retrieval Queries,recommendations,False,False,True,
8068,Find all movies where the director has directe...,MATCH (d:Director)-[:DIRECTED]->(m:Movie)\nWIT...,Complex Retrieval Queries,recommendations,False,False,False,
8069,Find all movies that have a plot mentioning 'h...,MATCH (m:Movie)\nWHERE m.plot CONTAINS 'hero'\...,Complex Retrieval Queries,recommendations,False,False,True,
8070,Which movies have been rated the highest by us...,"MATCH (u:User)-[r:RATED]->(m:Movie)\nWITH u, c...",Complex Retrieval Queries,recommendations,False,False,True,


In [None]:
df = df[['question','cypher']]
df.rename(columns={'question': 'input','cypher':'output'}, inplace=True)
df.reset_index(drop=True, inplace=True)
df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.rename(columns={'question': 'input','cypher':'output'}, inplace=True)


Unnamed: 0,input,output
0,What are the top 5 movies with a runtime great...,MATCH (m:Movie)\nWHERE m.runtime > 120\nRETURN...
1,List the first 3 genres with movies having an ...,MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)\nWHERE ...
2,List the first 5 directors who have a biograph...,MATCH (d:Director)\nWHERE d.bio IS NOT NULL\nR...
3,Which 3 movies have the most detailed plot des...,"MATCH (m:Movie)\nRETURN m.title, m.plot\nORDER..."
4,Show the top 5 actors who have acted in movies...,MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)<-[:DIRE...
...,...,...
757,Which movies have been acted in by more than 1...,MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\nWITH m...
758,Find all movies where the director has directe...,MATCH (d:Director)-[:DIRECTED]->(m:Movie)\nWIT...
759,Find all movies that have a plot mentioning 'h...,MATCH (m:Movie)\nWHERE m.plot CONTAINS 'hero'\...
760,Which movies have been rated the highest by us...,"MATCH (u:User)-[r:RATED]->(m:Movie)\nWITH u, c..."


In [None]:
from datasets import Dataset
dataset = Dataset.from_pandas(df)
dataset = dataset.map(formatting_prompts_func, batched = True)
dataset

Map:   0%|          | 0/762 [00:00<?, ? examples/s]

Dataset({
    features: ['input', 'output', 'text'],
    num_rows: 762
})

In [None]:
dataset[0]

{'input': 'What are the top 5 movies with a runtime greater than 120 minutes?',
 'output': 'MATCH (m:Movie)\nWHERE m.runtime > 120\nRETURN m\nORDER BY m.runtime DESC\nLIMIT 5',
 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nConvert text to cypher query based on this schema: Node properties:\n- **Movie**\n  - `url`: STRING Example: "https://themoviedb.org/movie/862"\n  - `runtime`: INTEGER Min: 2, Max: 910\n  - `revenue`: INTEGER Min: 1, Max: 2787965087\n  - `imdbRating`: FLOAT Min: 1.6, Max: 9.6\n  - `released`: STRING Example: "1995-11-22"\n  - `countries`: LIST Min Size: 1, Max Size: 16\n  - `languages`: LIST Min Size: 1, Max Size: 19\n  - `plot`: STRING Example: "A cowboy doll is profoundly threatened and jealous"\n  - `imdbVotes`: INTEGER Min: 13, Max: 1626900\n  - `imdbId`: STRING Example: "0114709"\n  - `year`: INTEGER Min: 1902, Max: 2016\

# Train the model

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # max_steps = 60,
        num_train_epochs=1,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/762 [00:00<?, ? examples/s]

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
5.613 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 762 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 95
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,1.15
2,1.1438
3,1.1304
4,1.0632
5,0.9756
6,0.8696
7,0.7522
8,0.6517
9,0.5544
10,0.4418


In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

2269.0072 seconds used for training.
37.82 minutes used for training.
Peak reserved memory = 8.301 GB.
Peak reserved memory for training = 2.688 GB.
Peak reserved memory % of max memory = 56.286 %.
Peak reserved memory for training % of max memory = 18.226 %.


# Inference

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt.format(
        f"Convert text to cypher query based on this schema: {graph.schema}", # instruction
        "Who is the oldest director?", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Convert text to cypher query based on this schema: Node properties:
- **Movie**
  - `url`: STRING Example: "https://themoviedb.org/movie/862"
  - `runtime`: INTEGER Min: 2, Max: 910
  - `revenue`: INTEGER Min: 1, Max: 2787965087
  - `imdbRating`: FLOAT Min: 1.6, Max: 9.6
  - `released`: STRING Example: "1995-11-22"
  - `countries`: LIST Min Size: 1, Max Size: 16
  - `languages`: LIST Min Size: 1, Max Size: 19
  - `plot`: STRING Example: "A cowboy doll is profoundly threatened and jealous"
  - `imdbVotes`: INTEGER Min: 13, Max: 1626900
  - `imdbId`: STRING Example: "0114709"
  - `year`: INTEGER Min: 1902, Max: 2016
  - `poster`: STRING Example: "https://image.tmdb.org/t/p/w440_and_h660_face/uXDf"
  - `movieId`: STRING Example: "1"
  - `tmdbId`: STRING Example: "862"
  - `title`: STRING Example: "T

# Save the Finetuned

Local Saving

In [None]:
# model.save_pretrained("lora_model") # Local saving
# tokenizer.save_pretrained("lora_model")

Online Saving to HuggingFace

In [None]:
# should have write access

hf_api= "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"

model.push_to_hub("ridopandi/llama3_text2cypher_recommendations", token = hf_api)
tokenizer.push_to_hub("ridopandi/llama3_text2cypher_recommendations", token = hf_api)

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/ridopandi/llama3_text2cypher_recommendations


No files have been modified since last commit. Skipping to prevent empty commit.


# Load Finetuned Model from HuggingFace

In [None]:
from unsloth import FastLanguageModel

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "ridopandi/llama3_text2cypher_recommendations", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

==((====))==  Unsloth 2024.9.post4: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29+4a9dd7e.d20241008. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear4bit(
      

In [None]:
inputs = tokenizer(
[
    prompt.format(
        f"Convert text to cypher query based on this schema: {graph.schema}", # instruction
        "Who is the oldest director?", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
result = tokenizer.batch_decode(outputs)
response = result[0].split("### Response:")[1].split("###")[0].strip().replace("<|end_of_text|>", "")
print(response)

MATCH (d:Director)
WHERE d.born IS NOT NULL
RETURN d.name, d.born, d.died


# Evaluating

Unsloth has not integrated in Langchain, so need little adjustment

In [None]:
from langchain.chains import LLMChain
from langchain_groq import ChatGroq
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from google.colab import userdata

groq_api_key = "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"

# groq_api_key = userdata.get('GROQ_API')

CYPHER_QA_TEMPLATE = """You convert context to a final answer. Understand the question, the context, then generate result.
Here is an example:

Question: Who is the director of Harry Potter 1 and 8?
Context: [{{d.name: Chris Columbus, d.born: 10 September 1958}},{{d.name: David Yates, d.born: 8 October 1963}}]
Helpful Answer: Chris Columbus and David Yates is the director of Harry Potter

Follow this example when generating answers.
Answer in short, don't hallucinate!
Question: {question}
Information: {context}
Helpful Answer:
"""

qa_prompt = ChatPromptTemplate.from_template(CYPHER_QA_TEMPLATE)
output_parser = StrOutputParser()
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192", groq_api_key = groq_api_key)
chain = qa_prompt | llm | output_parser

context = graph.query(response)
question = 'Who is the oldest director?'

chain.invoke({"context":context , "question":question})

RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01j538pvwqfv1td5n1ey6wdvz4` on tokens per minute (TPM): Limit 6000, Used 0, Requested 65214. Please try again in 9m52.14s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}

In [None]:
questions = ["Who is the oldest director?",
             "Find all directors who have directed a movie in Spanish language.",
             "Give me 5 movies where a director has also acted?",
             "List all movies with an IMDb rating greater than 5 that have been directed by a director born in China."
             ]

def generate_cypher_query(question):
  inputs = tokenizer(
  [
      prompt.format(
          f"Convert text to cypher query based on this schema: {graph.schema}", # instruction
          question, # input
          "", # output - leave this blank for generation!
      )
  ], return_tensors = "pt").to("cuda")

  outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
  result = tokenizer.batch_decode(outputs)
  cypher_query = result[0].split("### Response:")[1].split("###")[0].strip().replace("<|end_of_text|>", "")
  return cypher_query

for q in questions:
    print("\n",q)
    cypher_query = generate_cypher_query(q)
    print(cypher_query)
    context = graph.query(cypher_query)
    print('context: ', context)
    result = chain.invoke({"context":context , "question":q})
    print(result)