In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])


  from .autonotebook import tqdm as notebook_tqdm
Downloading shards: 100%|██████████| 3/3 [00:45<00:00, 15.20s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [01:52<00:00, 37.44s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] What is your favourite condiment? [/INST]Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> [INST] Do you have mayonnaise recipes? [/INST]Yes, I sure do! Here's a classic recipe for homemade mayonnaise that you can easily whip up in a blender or a food processor.

Ingredients:
- 1 egg yolk
- 1 teaspoon Dijon mustard
- 1 cup vegetable oil (safflower or canola oil works best)
- 1-2 tablespoons white wine vinegar or fresh lemon juice
- 1/2 teaspoon salt
- 1/4 teaspoon freshly ground black pepper

Instructions:
1. In a blender or food processor, combine the egg yolk and mustard. Blend for a few seconds to mix.
2. With the blender or food processor running on low speed, slowly drizzle in the oil, drop by drop, in a steady stream. Once all the oil has been incorporated, you'll have a thick emulsion.
3. Add the vinegar or lemon juice, salt, and pepper, and blend until well com

In [2]:
from datasets import load_dataset

train_dataset = load_dataset("json", data_files="train_CRM_data.json", split="train")
train_dataset[0]

Generating train split: 393 examples [00:00, 4415.60 examples/s]


{'messages': [{'content': '\nYou are a Python function generator. Users will ask you questions in English, \nand you will produce a Python function as answer to the question based on the provided CONTEXT.\n\nCONTEXT:\nPandas DataFrame `df` containing transaction data with columns `order_id`, `user_id`, `item_id`, `timestamp`, `score`.\n`order_id` takes string datatype and identifies the order.\n`user_id` takes string datatype and identifies the customer.\n`item_id` takes string datatype that identifies the product.\n`timestamp` takes timestamp datatype and represents the datetimestamp of transaction.\n`score` takes float datatype and represents the score of the transaction.\nNote that a customer can make multiple transactions for a product but the customer product pair will be just one entry for each order.\n',
   'role': 'system'},
  {'content': 'Is there a particular day of the week with higher transaction volume?',
   'role': 'user'},
  {'content': "def highest_transaction_day(df):\

In [7]:
train_dataset[0]['messages'][:2]

[{'content': '\nYou are a Python function generator. Users will ask you questions in English, \nand you will produce a Python function as answer to the question based on the provided CONTEXT.\n\nCONTEXT:\nPandas DataFrame `df` containing transaction data with columns `order_id`, `user_id`, `item_id`, `timestamp`, `score`.\n`order_id` takes string datatype and identifies the order.\n`user_id` takes string datatype and identifies the customer.\n`item_id` takes string datatype that identifies the product.\n`timestamp` takes timestamp datatype and represents the datetimestamp of transaction.\n`score` takes float datatype and represents the score of the transaction.\nNote that a customer can make multiple transactions for a product but the customer product pair will be just one entry for each order.\n',
  'role': 'system'},
 {'content': 'Is there a particular day of the week with higher transaction volume?',
  'role': 'user'}]

In [11]:
original_data = [
    {"role": "user", "content": "You are a Python function generator. Users will ask you questions in English, and you will produce a Python function as answer to the question based on the provided CONTEXT."},
    {"role": "assistant", "content": "CONTEXT: Pandas DataFrame `df` containing transaction data with columns `order_id`, `user_id`, `item_id`, `timestamp`, `score`. `order_id` takes string datatype and identifies the order. `user_id` takes string datatype and identifies the customer. `item_id` takes string datatype that identifies the product. `timestamp` takes timestamp datatype and represents the datetimestamp of transaction. `score` takes float datatype and represents the score of the transaction. Note that a customer can make multiple transactions for a product but the customer product pair will be just one entry for each order."},
    {"role": "user", "content": "Is there a particular day of the week with higher transaction volume?"}
]

In [12]:
encodeds_0 = tokenizer.apply_chat_template(original_data, return_tensors="pt")

model_inputs_0 = encodeds_0.to(device)

generated_ids_0 = model.generate(model_inputs_0, max_new_tokens=1000, do_sample=True)
decoded_0 = tokenizer.batch_decode(generated_ids_0)
print(decoded_0[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] You are a Python function generator. Users will ask you questions in English, and you will produce a Python function as answer to the question based on the provided CONTEXT. [/INST]CONTEXT: Pandas DataFrame `df` containing transaction data with columns `order_id`, `user_id`, `item_id`, `timestamp`, `score`. `order_id` takes string datatype and identifies the order. `user_id` takes string datatype and identifies the customer. `item_id` takes string datatype that identifies the product. `timestamp` takes timestamp datatype and represents the datetimestamp of transaction. `score` takes float datatype and represents the score of the transaction. Note that a customer can make multiple transactions for a product but the customer product pair will be just one entry for each order.</s> [INST] Is there a particular day of the week with higher transaction volume? [/INST]To find out if there is a particular day of the week with higher transaction volume, you can use the `dt.weekday` ac

In [None]:
# This shows that even though the Mistral-7B-Instruct-v0.2 model produces quite an elaborate output 
# yet the response is not good or just enough to feed into the ranking system for further data 
# retrieval. Hence fine tuning is required.