<a href="https://colab.research.google.com/github/smv-manovihar/recipe-realm/blob/main/Llama_Test_(SMV).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install the Modules

In [None]:
!pip install pymongo ctransformers sentence_transformers

Collecting pymongo
  Downloading pymongo-4.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ctransformers
  Downloading ctransformers-0.2.27-py3-none-any.whl (9.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence_transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl (227 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.7/307.7 kB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  

# Run this to intialize Embedding model

In [None]:
# Define the query
import pymongo
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Generate embedding for a single sentence
def generate_embedding(sentence):
    # Load model from HuggingFace Hub
    tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
    model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

    # Tokenize sentence
    encoded_input = tokenizer(sentence, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embedding = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embedding
    sentence_embedding = F.normalize(sentence_embedding, p=2, dim=1)

    # Convert the tensor to a 1D list of floats
    return sentence_embedding.squeeze().tolist()


# Connect to MongoDB
client = pymongo.MongoClient("mongodb+srv://RecipeRisers:reciperisers%40top@recipes.jl22qv1.mongodb.net/")
db = client["recipe_realm"]
collection = db["recipes"]

query ="butter, cheese, eggs, fork, maida, rice, , onion, parsley leaves, salt, sunflower oil, sweet corn, virgin olive oil, wheat bread crumbs"
query_list=query.split(", ")
query_list.sort()
query=""
for i in query_list:
  query.join(query_list)
  query.join(", ")
try:
    query_embedding = generate_embedding(query)
    # print("Query embedding generated:", query_embedding)
    if query_embedding is not None:
        # Perform vector search
        results = collection.aggregate([
            {
                "$vectorSearch": {
                    "queryVector": query_embedding,
                    "path": "ingredient_embedding",
                    "numCandidates": 5796,
                    "limit": 5000,
                    "index": "recipefinder"
                }
            },
            {
                "$addFields": {
                    "ingredientsArray": { "$split": ["$IngredientList", ", "] }
                }
            },
            {
                "$addFields": {
                    "matchingIngredientsCount": {
                        "$size": {
                            "$filter": {
                                "input": "$ingredientsArray",
                                "as": "ingredient",
                                "cond": { "$in": ["$$ingredient", query_list] }
                            }
                        }
                    },
                    "extraIngredientsCount": {
                        "$size": {
                            "$filter": {
                                "input": "$ingredientsArray",
                                "as": "ingredient",
                                "cond": { "$not": { "$in": ["$$ingredient", query_list] } }
                            }
                        }
                    }
                }
            },
            {
                "$sort": {
                    "matchingIngredientsCount": -1,
                    "extraIngredientsCount": 1
                }
            },
            {
                "$limit": 3
            }
        ])
        recipes=[]
        context=[]
        found_results = False
        for document in results:
            recipes.append(document)

        text=""
        i=1
        for document in recipes:
            found_results = True
            text+=f"{i}. RecipeName: {document['RecipeName']}\nIngredients: {document['Ingredients']}\nDiet: {document['Diet']}\nCourse: {document['Course']}\nServings: {document['Servings']}\nPreparation Time (Min): {document['PrepTimeInMins']}\nCooking Time (Min): {document['CookTimeInMins']}\nTotal Time (Min): {document['TotalTimeInMins']}\n\n"
            i+=1
        context.append(text)
        i=1
        for document in recipes:
            text=""
            text=f"{i}. RecipeName: {document['RecipeName']}\nInstructions: {document['Instructions']}\n\n"
            i+=1
            context.append(text)
            # print('Matching Ingredients:', document['matchingIngredientsCount'], '---> Extra Ingredients Needed:', document['extraIngredientsCount'])
        if not found_results:
            print("No documents matched the query.")
except Exception as e:
    print(f"Failed to generate query embedding: {e}")

In [None]:
import pymongo
from sentence_transformers import SentenceTransformer
model=SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
def get_recipes(query):
  # Connect to MongoDB
  client = pymongo.MongoClient("mongodb+srv://RecipeRisers:reciperisers%40top@recipes.jl22qv1.mongodb.net/")
  db = client["recipe_realm"]
  collection = db["recipes"]

  try:
      query_embedding = model.encode(query).tolist()
      # print("Query embedding generated:", query_embedding)
      if query_embedding is not None:
          # Perform vector search
          results = collection.aggregate([
              {
                  "$vectorSearch": {
                      "queryVector": query_embedding,
                      "path": "ingredient_embedding",
                      "numCandidates": 6865,
                      "limit": 5,
                      "index": "recipefinder"
                  }
              }
          ])
          context=""
          found_results = False
          for document in results:
              found_results = True
              # print(f"Found document ID: {document['_id']}")
              context+=f"Recipe Name:{document['RecipeName']}\nIngredients: {document['Ingredients']}\nInstructions: {document['Instructions']}\nDiet: {document['Diet']}\nServings: {document['Servings']}\nTotal Time: {document['TotalTimeInMins']}\n"
          if not found_results:
              print("No documents matched the query.")
              return "No documents found"
          return context
  except Exception as e:
      print(f"Failed to generate query embedding: {e}")

  from tqdm.autonotebook import tqdm, trange


In [None]:
# Define the query
import pymongo
import requests
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Generate embedding for a single sentence
def generate_embedding(sentence):
    # Load model from HuggingFace Hub
    tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
    model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

    # Tokenize sentence
    encoded_input = tokenizer(sentence, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embedding = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embedding
    sentence_embedding = F.normalize(sentence_embedding, p=2, dim=1)

    # Convert the tensor to a 1D list of floats
    return sentence_embedding.squeeze().tolist()

def get_recipes(query):
  # Connect to MongoDB
  client = pymongo.MongoClient("mongodb+srv://RecipeRisers:reciperisers%40top@recipes.jl22qv1.mongodb.net/")
  db = client["recipe_realm"]
  collection = db["recipes"]

  try:
      query_embedding = generate_embedding(query)
      # print("Query embedding generated:", query_embedding)
      if query_embedding is not None:
          # Perform vector search
          results = collection.aggregate([
              {
                  "$vectorSearch": {
                      "queryVector": query_embedding,
                      "path": "ingredient_embedding",
                      "numCandidates": 6865,
                      "limit": 10,
                      "index": "recipefinder"
                  }
              }
          ])
          context=[]
          found_results = False
          for document in results:
              found_results = True
              # print(f"Found document ID: {document['_id']}")
              if(document['Instructions'][-1]=="\n"):
                context.append(f"Recipe Name:{document['RecipeName']}\nIngredients: {document['Ingredients']}\nInstructions: {document['Instructions']}Diet: {document['Diet']}\nServings: {document['Servings']}\nTotal Time: {document['TotalTimeInMins']}\n")
              else:
                context.append(f"Recipe Name:{document['RecipeName']}\nIngredients: {document['Ingredients']}\nInstructions: {document['Instructions']}Diet: {document['Diet']}\nServings: {document['Servings']}\nTotal Time: {document['TotalTimeInMins']}\n")
          if not found_results:
              print("No documents matched the query.")
              return "No documents found"
          return context
  except Exception as e:
      print(f"Failed to generate query embedding: {e}")

NameError: name '_C' is not defined

# Run this to load the LLM and start chatting

In [None]:
from ctransformers import AutoModelForCausalLM

# Initialize the model
llm = AutoModelForCausalLM.from_pretrained('TheBloke/Llama-2-7B-Chat-GGUF', model_file="llama-2-7b-chat.Q5_K_M.gguf",gpu_layers=50, max_new_tokens=4096, context_length=4096)

# Define the system prompt
system_prompt = """<<SYS>> You are a cooking assistant. Answer questions with clear, concise, step-by-step instructions. Provide ingredient substitutions and variations when asked. Prioritize safety and proper food handling. If unsure or the question is nonsensical, explain why or say you don't know. Only answer cooking-related questions. Use the provided context to inform your answers. <</SYS>>"""

# Get user input for ingredients
userlist = input("Enter your ingredients: ")
context=""
for recipe in range(0,3):
  context += get_recipes(userlist)[recipe]

print(f"Context:\n{context}\n\n")

# Main loop to get user prompts and generate answers
while True:
    user_input = input("Enter Your Prompt: ")
    if user_input == "exit":
        break
    if user_input == "change":
      userlist = input("Enter your ingredients: ")
      context = get_recipes(userlist)
      print(f"Context:\n{context}\n\n")
      continue
    prompt_template = f"[INST]{system_prompt}\n\nContext: {context}\n\nQuestion: {user_input}\n\nAnswer: [/INST]"

    # Print the generated output
    answer=llm(prompt_template, stream=False,temperature=0.5, repetition_penalty=0.5)
    print(answer)
    # for word in llm(prompt_template, stream=True):
    #     print(word, end='')
    # print()

In [None]:
# Define the system prompt
system_prompt = """<<SYS>> You are a cooking assistant. Answer questions with clear, concise, step-by-step instructions. Provide ingredient substitutions and variations when asked. Prioritize safety and proper food handling. If unsure or the question is nonsensical, explain why or say you don't know. Only answer cooking-related questions. Use the provided context to inform your answers. <</SYS>>"""

# Get user input for ingredients
userlist = input("Enter your ingredients: ")
context=""
for recipe in range(0,3):
  context += get_recipes(userlist)[recipe]

print(f"Context:\n{context}\n\n")

# Main loop to get user prompts and generate answers
while True:
    user_input = input("Enter Your Prompt: ")
    if user_input == "exit":
        break
    if user_input == "change":
      userlist = input("Enter your ingredients: ")
      context = get_recipes(userlist)
      print(f"Context:\n{context}\n\n")
      continue
    prompt_template = f"[INST]{system_prompt}\n\nContext: {context}\n\nQuestion: {user_input}\n\nAnswer: [/INST]"

    # Print the generated output
    answer=llm(prompt_template, stream=False,temperature=0.5, repetition_penalty=0.5)
    print(answer)
    # for word in llm(prompt_template, stream=True):
    #     print(word, end='')
    # print()

In [None]:
from ctransformers import AutoModelForCausalLM

# Initialize the model
llm = AutoModelForCausalLM.from_pretrained('TheBloke/Llama-2-7B-Chat-GGUF', model_file="llama-2-7b-chat.Q5_K_M.gguf",gpu_layers=50, max_new_tokens=4096, context_length=4096)

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

llama-2-7b-chat.Q5_K_M.gguf:   0%|          | 0.00/4.78G [00:00<?, ?B/s]

In [None]:
system_prompt="<<SYS>>You are given an input text. Extract the Ingredients from it. The ingredients may be in wrong spelling correct them and return a text in [ingredient1,ingredient2,...] format. If no ingredient is found just return [] Do not explain your result<</SYS>>"
user_input="Provide recipes on paneer, brijal, chiken , virat kholi, rohit shaerma, mengo, ladyes figer"
prompt_template = f"[INST]{system_prompt}\n\nInput:{user_input} Answer:[/INST]"
answer=llm(prompt_template, stream=False,temperature=0)
print(answer)


  Sure! Here are the ingredients found in the input text:
[ingredient1: paneer, ingredient2: brijal, ingredient3: chicken, ingredient4: virat kholi, ingredient5: rohit shaerma, ingredient6: mengo, ingredient7: ladyes figer]


# New Version of Generating response

In [None]:
!pip install sentence_transformers pymongo ctransformers

Collecting sentence_transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl (227 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pymongo
  Downloading pymongo-4.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ctransformers
  Downloading ctransformers-0.2.27-py3-none-any.whl (9.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m36.2 MB/s[0m eta [36m0:00:00[0m
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.7/307.7 kB[0m [31m33.4 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using 

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import pymongo
from sentence_transformers import SentenceTransformer

model=SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Generate embedding for a single sentence
def generate_embedding(sentence):
    return model.encode(sentence).tolist()

def get_recipes(query):
  # Connect to MongoDB
  client = pymongo.MongoClient("mongodb+srv://RecipeRisers:reciperisers%40top@recipes.jl22qv1.mongodb.net/")
  db = client["recipe_realm"]
  collection = db["recipes"]
  #Retreival of Recipes
  query_list=query.split(", ")
  query_list.sort()
  query=""
  for i in query_list:
    query.join(query_list)
    query.join(", ")
  query_embedding = generate_embedding(query)
  recipes=[]
  # print("Query embedding generated:", query_embedding)
  if query_embedding is not None:
      # Perform vector search
      results = collection.aggregate([
          {
              "$vectorSearch": {
                  "queryVector": query_embedding,
                  "path": "ingredient_embedding",
                  "numCandidates": 5796,
                  "limit": 5000,
                  "index": "recipefinder"
              }
          },
          {
              "$addFields": {
                  "ingredientsArray": { "$split": ["$IngredientList", ", "] }
              }
          },
          {
              "$addFields": {
                  "matchingIngredientsCount": {
                      "$size": {
                          "$filter": {
                              "input": "$ingredientsArray",
                              "as": "ingredient",
                              "cond": { "$in": ["$$ingredient", query_list] }
                          }
                      }
                  },
                  "extraIngredientsCount": {
                      "$size": {
                          "$filter": {
                              "input": "$ingredientsArray",
                              "as": "ingredient",
                              "cond": { "$not": { "$in": ["$$ingredient", query_list] } }
                          }
                      }
                  }
              }
          },
          {
              "$sort": {
                  "matchingIngredientsCount": -1,
                  "extraIngredientsCount": 1
              }
          },
          {
              "$limit": 3
          }
      ])
      for doc in results:
        recipes.append(doc)
  return recipes

def get_context_list(user_input,recipes):
  #Making chunks of context
  context_list=[]
  text=f"Memory ingredients: {user_input}\n\n"
  i=1
  for document in recipes:
      found_results = True
      text+=f"{i}. RecipeName: {document['RecipeName']}\nIngredients: {document['Ingredients']}\nDiet: {document['Diet']}\nCourse: {document['Course']}\nServings: {document['Servings']}\nPreparation Time (Min): {document['PrepTimeInMins']}\nCooking Time (Min): {document['CookTimeInMins']}\nTotal Time (Min): {document['TotalTimeInMins']}\n\n"
      i+=1
  context_list.append(text)
  i=1
  for document in recipes:
      text=""
      text=f"{i}. RecipeName: {document['RecipeName']}\nInstructions: {document['Instructions']}\n\n"
      i+=1
      context_list.append(text)
  return context_list

def get_context_embeddings(context_list): #List of embeddings
    context_embeddings = []
    for context in context_list:
        context_embeddings.append(generate_embedding(context))
    return context_embeddings

def get_context(user_input,context_list,context_embeddings):
  userinput_embedding=generate_embedding(user_input)
  similarity=[]
  for i in context_embeddings:
    similarity.append(cosine_similarity([userinput_embedding],[i])[0][0])
  return context_list[similarity.index(max(similarity))]

In [None]:
from ctransformers import AutoModelForCausalLM

# Initialize the model
llm = AutoModelForCausalLM.from_pretrained('TheBloke/Llama-2-7B-Chat-GGUF', model_file="llama-2-7b-chat.Q5_K_M.gguf",gpu_layers=50, max_new_tokens=4096, context_length=4096)

# Define the system prompt
system_prompt = """<<SYS>> You are a cooking assistant. Answer questions with clear, concise, step-by-step instructions. Provide ingredient substitutions and variations when asked. Prioritize safety and proper food handling. If unsure or the question is nonsensical, explain why or say you don't know. Only answer cooking-related questions. Use the provided context to inform your answers. <</SYS>>"""

# Get user input for ingredients
userlist = input("Enter your ingredients: ")
recipes = get_recipes(userlist)
context_list=get_context_list(userlist,recipes)
context_embeddings=get_context_embeddings(context_list)

# Main loop to get user prompts and generate answers
while True:
    user_input = input("Enter Your Prompt: ")
    if user_input == "exit":
        break
    if user_input == "change":
      userlist = input("Enter your ingredients: ")
      recipes = get_recipes(userlist)
      context_list=get_context_list(userlist,recipes)
      context_embeddings=get_context_embeddings(context_list)
      continue
    context=get_context(user_input,context_list,context_embeddings)
    print(f"Context:\n{context}\n\n")
    prompt_template = f"[INST]{system_prompt}\n\nContext: {context}\n\nQuestion: {user_input}\n\nAnswer: [/INST]"

    # Print the generated output
    answer=llm(prompt_template, stream=False)
    print(answer)

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Enter your ingredients: butter, cheese, eggs, fork, maida, rice, onion, parsley leaves, salt, sunflower oil, sweet corn, virgin olive oil, wheat bread crumbs
Enter Your Prompt: What ingredient I just gave?
Context:
Memory ingredients: butter, cheese, eggs, fork, maida, rice, onion, parsley leaves, salt, sunflower oil, sweet corn, virgin olive oil, wheat bread crumbs

1. RecipeName: Pan Fried Rice Balls Filled With Cheese Recipe
Ingredients: 2 tablespoon Extra Virgin Olive Oil, 2 tablespoon Butter (unsalted), 1 Onion - finely chopped, 2 cups Jasmine Rice, 1 teaspoon Salt, 1 cup Sweet corn, 1/4 cup Parsley leaves - finely chopped, 1 cup Parmesan cheese - grated, 1 cup Mozzarella cheese - cut into small cubes, 1 cup All Purpose Flour (Maida), 3 Whole Eggs - beaten with a fork, 1 cup Whole Wheat Bread crumbs, Sunflower Oil - for deep frying
Diet: Vegetarian
Course: Appetizer
Servings: 4
Preparation Time (Min): 20
Cooking Time (Min): 45
Total Time (Min): 65

Memory ingredients: butter, chee

KeyboardInterrupt: Interrupted by user

# ***DONT MIND THE BELOW JUNK***

In [None]:
!pip install huggingface_hub



In [None]:
!pip install accelerate

Collecting accelerate
  Downloading accelerate-0.31.0-py3-none-any.whl (309 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/309.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m163.8/309.4 kB[0m [31m4.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from 

In [None]:
!pip install transformers



In [None]:
from huggingface_hub import login
login(token="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")

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
import transformers
import torch

model_id = "meta-llama/Llama-2-7b-chat-hf"

pipeline = transformers.pipeline(
    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto",
)
while True:
  user_input= input("Enter Prompt: ")
  print(pipeline(f"Context:{context}\nQuestion:{user_input} Answer:\n")[0]['generated_text'])

TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'system_prompt'

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name = "daryl149/llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

system_prompt = "You are a helpful assistant."
user_input = "Can you tell me a joke?"
input_text = f"{system_prompt}\n{user_input}"

generator = pipeline("text-generation", model=model, tokenizer=tokenizer, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
result = generator(input_text, max_length=100)

print(result[0]['generated_text'])


tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/507 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
from ctransformers import AutoModelForCausalLM

# Load the model
# llm = AutoModelForCausalLM.from_pretrained('QuantFactory/dolphin-2.9-llama3-8b-GGUF', model_file='dolphin-2.9-llama3-8b.Q4_K_M.gguf', gpu_layers=60)
llm = AutoModelForCausalLM.from_pretrained('QuantFactory/dolphin-2.9-llama3-8b-GGUF', model_file='dolphin-2.9-llama3-8b.Q4_K_M.gguf')
user_input="Give recipes from these without egg "

# Print the generated output
for word in llm(user_input+context, stream=True):
        print(word, end='')
print()

Fetching 0 files: 0it [00:00, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

In [None]:
!pip install transformers langchain torch accelerate

Collecting langchain
  Downloading langchain-0.2.5-py3-none-any.whl (974 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate
  Downloading accelerate-0.31.0-py3-none-any.whl (309 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m37.0 MB/s[0m eta [36m0:00:00[0m
Collecting langchain-core<0.3.0,>=0.2.7 (from langchain)
  Downloading langchain_core-0.2.9-py3-none-any.whl (321 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.8/321.8 kB[0m [31m36.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain)
  Downloading langchain_text_splitters-0.2.1-py3-none-any.whl (23 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.81-py3-none-any.whl (127 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.1/127.1 kB[0m [31m17.5 MB/s[0m e

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from langchain.llms.base import LLM
from langchain import PromptTemplate, LLMChain , HuggingFaceHub

# Custom LLM class for integrating Hugging Face models with LangChain
class HuggingFaceLLM(LLM):
    def __init__(self, model_name: str):
        super().__init__()
        self.model_name = model_name
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)

    def _call(self, prompt: str, stop: list = None) -> str:
        inputs = self.tokenizer(prompt, return_tensors="pt")
        outputs = self.model.generate(**inputs, max_length=512)
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        if stop:
            for token in stop:
                response = response.split(token)[0]
        return response
    @property
    def _identifying_params(self) -> dict:
        return {"model_name": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom_huggingface"

# Initialize the custom HuggingFaceLLM with the model
model_name = "daryl149/llama-2-7b-chat-hf"
huggingface_llm = HuggingFaceLLM(model_name=model_name)

# Define the prompt template with context
prompt_template = """Context: {context}

Question: {question}
Answer:"""

# Create a LangChain prompt template
template = PromptTemplate(input_variables=["context", "question"], template=prompt_template)

# Create the LLMChain with the custom HuggingFaceLLM and the prompt template
llm_chain = LLMChain(llm=huggingface_llm, prompt_template=template)
context='''Recipe Name:
Cauliflower Tortilla Recipe
Ingredients:
1 Cauliflower (gobi),1 Whole Egg,Salt - to taste,1 teaspoon Black pepper powder
Instructions:
To begin making the Cauliflower Tortilla Recipe, wash the cauliflower head and cut them into large florets.Grate the florets and place the grated cauliflower in perforated plate and steam in a steamer for about 15 minutes. Once the grated cauliflower florets are steamed, allow them to cool.While it is cooking, preheat the oven to 180 degree centigrade.Once cauliflower has cooled spoon the cooked cauliflower in center of a cloth and wring out the water. You need to remove as much moisture as possible from the cauliflower.Transfer the dry cauliflower into a clean dry bowl. Add egg, salt and pepper powder, whisk well to combine.Transfer one portion of the mixture, onto a baking sheet lined with parchment paper (a good quality one). Spread the mixture with your fingers into tortilla shape. You can make the cauliflower tortillas thick or thin as per your preference.Bake the cauliflower tortillas in the preheated oven for about 10 minutes. Then flip the tortillas and continue to bake for another 7 to 8 minutes.Once they're done place them on a wire rack to cool.Serve the Cauliflower Tortillas topped with your favorite filling and serve immediately for a wholesome gluten free weeknight dinner.Here is the recipe to make a Soy Bhurji Filling: To make soya bhurji filling, first heat oil in a pan, add onions and saute until they turn translucent. Next add ginger garlic paste and cook till the saw smell of the paste is gone.Add green chilies and tomatoes and cook till tomatoes are soft.Add red chili powder, coriander powder, turmeric powder, roasted cumin powder and salt mix well. Finally add soya granules (pre soaked in water). Cook for 5 minutes on low heat. That’s it, Soy Bhurji is ready!These Cauliflower tortillas can be used to make Soft Taco With Mushroom and Roasted Bell pepper Sauce Recipe or Tacozza - Tacos with Refried Beans and Oregano
Recipe Name:
Paleo Breakfast Casserole Recipe In A Mug
Ingredients:
4 Whole Eggs,2 Onions - finely chopped,1 Green Chilli - finely chopped,1 Cup Cauliflower (gobi) - grated,1 cup Red Bell pepper (Capsicum) - finely chopped,1 cup Yellow Bell Pepper (Capsicum) - finely chopped,1/2 teaspoon Red Chilli flakes,1/2 teaspoon Black pepper powder - freshly pounded,Salt - to taste
Instructions:
To make Paleo Breakfast Casserole Recipe In A Mug, In a bowl, add eggs, grated cauliflower, onions, green chilli, and chopped bell peppers. Whisk them together until they blend well.Season it with salt, red chili flakes and pepper powder.Preheat oven to 350 degree Fahrenheit.In an oven safe ramekins/mug, add egg mixture to three-fourth of the mug.Bake in oven for about 35 minutes until the eggs are set in the middle of the mug. If you don’t prefer watery eggs or undercooked ones, bake them for about additional 10 minutes until the top layer raises up and becomes little brownish.Remove the mug from the oven and leave them as it is for few minutes.Serve Paleo Breakfast Casserole Recipe In A Mug for a healthy and scrumptious breakfast with fresh fruit juice so that you just love your mornings.
Recipe Name:
Zucchini Oatmeal Omelette Recipe
Ingredients:
4 Whole Eggs,1 Green zucchini - grated,1 Onion - finely chopped,1/2 cup Coriander (Dhania) Leaves - finely chopped,1 Green Chilli - finely chopped,2 tablespoons Instant Oats (Oatmeal),1/2 teaspoon Black pepper powder - (or to taste),Salt - to taste,Sunflower Oil - for brushing
Instructions:
To begin making Zucchini Oatmeal Omelette Recipe, grate the zucchini, transfer to a mixing bowl, and add a little salt and mix. Let it stand for a few minutes so it releases most of water in zucchini.Squeeze all the water out from the zucchini and keep aside.Now, take a bowl, whisk in the eggs till they are light and fluffy.Add the chopped onion, zucchini, green chilli, oats flour, salt and pepper and mix well to get a fluffy omelette mixture.Now, heat the skillet on medium flame, brush a little oil over the skillet.Take a small laddle full of the egg mixture and pour it over the skillet to form small 3 inch diameter sized disc.Pour a little oil on the sides and let it cook for 2 minutes.Flip over, let the other side cook for another 2 minutes.Once done, remove it in a plate and proceed to prepare the similar mini omelettes from the remaining egg mixture.Serve the Zucchini Oatmeal Omelette Recipe along with a glass of freshly made Chocolate Banana Smoothie Recipe or Apple Almond Date Smoothie Recipe and with a slice of bread toast.

Recipe Name:
Cabbage Carrot Onion Pudina Thepla Recipe
Ingredients:
1 cup Cabbage (Patta Gobi/ Muttaikose),1 Carrot (Gajjar),1 Onion,2 cups Whole Wheat Flour,1 Green Chilli - finely chopped,1/4 cup Mint Leaves (Pudina) - chopped,1 teaspoon Cumin powder (Jeera),Salt - to taste,Water - to knead,1 tablespoon Sunflower Oil - to knead
Instructions:
To begin making the Cabbage Carrot Onion Pudina Thepla Recipe, we will first attach the chopping blade into the food processor and then add the cabbage and onion in the processor. Close the lid and turn on the food processor. Finely chop the onion and cabbage. Remove the chopping blade and attach the grating blade. Add the carrot through the spout of the food processor and grate the carrots. Once done remove the grating blade and attach the kneading blade. Add the flour, close the lid and turn on the food processor to mix everything well. Next add in the green chillies, mint leaves, cumin powder, salt to taste and start the blender. Slowly, add little water at a time to bring the dough together. Drizzle oil through the spout and continue to blend until the dough comes away from the side of the bowl.If your dough looks too sticky, then add some more flour to the processor and blend. Once done, take out the dough into a bowl.Divide the Cabbage Carrot Onion Pudina dough into lemon sized portions and keep aside.Dust a worktop with flour and roll the dough in flour. With the help of the rolling pin flatten the balls into a 5 inch diameter circle. The thepla should be nice and thin to cook evenly and quickly.Preheat a tawa on high heat and place the Cabbage Carrot Onion Pudina Thepla on the hot tawa. Flip after a few seconds once you notice small bubbles forming on the top. Drizzle oil over the Cabbage Carrot Onion Pudina Thepla and flip again after a few seconds. Drizzle oil on the other side again. Continue to flip and cook the thepla on high heat until golden brown spots appear.The continuous flipping and cooking motion helps to cook the theplas properly and also ensures that they remain soft.Continue the similar process with the remaining rolled portions and serve hot.Serve the Cabbage Carrot Onion Pudina Thepla Recipe along with Gujarati Kadhi Recipe and Khatti Meethi Tindora Sabzi for a delicious everyday meal.

Recipe Name:
Napa Cabbage Coleslaw with Greek Yogurt Recipe
Ingredients:
1 cup Napa Cabbage - cut into thin stripes,2 Carrot (Gajjar) - cut into juliennes,1 Cucumber - core removed and cut into juliennes,1 teaspoon Sunflower seeds,2 teaspoons Cranberries - dried,1 cup Hung Curd (Greek Yogurt),1 tablespoon Honey,Salt - to taste
Instructions:
To begin making the Napa Cabbage Coleslaw with Greek Yogurt Recipe, we will first prep the vegetables and add it in one bowl.Add the Greek yogurt, honey and salt into the same bowl and mix it evenly with all the vegetables. Make sure all the vegetables are coated well.Top Napa Cabbage Coleslaw  with sunflower seeds and dried cranberries and serve fresh.Serve the Napa Cabbage Coleslaw with Greek Yogurt Recipe along with a Burger and glass of Homemade Pomegranate Juice with Ginger Recipe by the side to enrich your Sunday meal.'''
question = "Which recipe is healthiest?"

# Use the LLMChain to generate an answer
response = llm_chain.run(context=context, question=question)

print(response)


ValueError: "HuggingFaceLLM" object has no field "model_name"