## This is the model file and the Task 3 result in files "Task3result team33"
**Install Dependencies**

Use !pip install -qU langchain==0.1.1 langchain-community==0.0.13 to resolve the problem of "multiple augment of top_k" when calling the RAG pipeline.

# Clean my JSON files

--------------

# VECTOR STORE

In [None]:
%pip install langchain_pinecone

Note: you may need to restart the kernel to use updated packages.


In [None]:
%pip install langchain langchain_community sentence-transformers

Note: you may need to restart the kernel to use updated packages.


**Using Colab T4 GPU to run this code**

In [3]:
import torch
torch.cuda.is_available()

True

In [4]:
pip install transformers

Note: you may need to restart the kernel to use updated packages.


In [None]:
from torch import cuda
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model_id = 'sentence-transformers/all-MiniLM-L12-v1'#384 vector space dimension,256 max_seq_length

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32} # can we fine tune?
)

  from .autonotebook import tqdm as notebook_tqdm
  return self.fget.__get__(instance, owner)()


We can use the embedding model to create document embeddings like so:

## Building the Vector Index

We now need to use the embedding pipeline to build our embeddings and store them in a Pinecone vector index. To begin we'll initialize our index, for this we'll need a [free Pinecone API key](https://app.pinecone.io/).

In [8]:
!pip install pinecone-client



In [None]:
import os
from pinecone import Pinecone, PodSpec
# get API key from app.pinecone.io and environment from console
pc=Pinecone(api_key=<api-key>)#Use your own Pinecone api_key.
pc.create_index(
  name="llama",
  dimension=384,
  metric="cosine",
  spec=PodSpec(
    environment="gcp-starter"
  )
)

Now we initialize the index.

In [10]:
index_name = 'llama'# Your Pinecone index for vectorstore
index = pc.Index(index_name)
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.17591,
 'namespaces': {'': {'vector_count': 17591}},
 'total_vector_count': 17591}

## Chunkation or Textsplitter

In [None]:
ctv_df = pd.read_json("C:\\Users\\Bliss\\Desktop\\UofT\\Ai_competition\\ctv_modified.json")
just_df= pd.read_json("C:\\Users\\Bliss\\Desktop\\UofT\\Ai_competition\\just_modified.json")
star_df = pd.read_json("C:\\Users\\Bliss\\Desktop\\UofT\\Ai_competition\\star_modified.json")
all_df=pd.concat([ctv_df,just_df,star_df])
all_df.fillna(value="N/A", inplace=True)
all_data=json.loads(all_df.to_json(orient="records", force_ascii=False))

In [None]:
len(all_data)

In [None]:
import pandas as pd
from langchain.text_splitter import RecursiveCharacterTextSplitter

def json2df(data):
  """Converts JSON-like data to a DataFrame with document and chunk indexing.

  Args:
      data: A list of dictionaries, where each dictionary represents a document
          and has keys like 'title', 'url', and 'text'.

  Returns:
      A Pandas DataFrame with columns 'title', 'url', 'text', 'chunk', and 'index'.
          The 'index' column provides a unique identifier for each chunk in the format 'docN_chunkM'.
  """

  data = pd.DataFrame(data)
  data['time'] = data['time'].str[:10]
  text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
  columns = ['title', 'url', 'text', 'chunk', 'index']
  rows = []

  for i in range(len(data)):
    chunk_counter = 0  # Initialize counter for each document
    for j, chunk in enumerate(text_splitter.split_text(str(data['text'][i]))):
      chunk_counter += 1
      index = f"doc{i+1}_chunk{chunk_counter}"  # Update index within the loop
      rows.append({
          'title': data['title'][i],
          'url': data['url'][i],
          'text': data['text'][i],
          'chunk': chunk,
          'index': index  # Include index in the rows
      })

  df2 = pd.DataFrame(rows, columns=columns)
  df2.set_index('index', inplace=True)  # Set 'index' as DataFrame index
  return df2


In [None]:
df2=json2df(all_data)

In [None]:
# Load data to Pinecone
batch_size = 32
def load_to_pinecone(df2,ns):
    for i in range(0, len(df2), batch_size):
        i_end = min(len(df2), i + batch_size)
        batch = df2.iloc[i:i_end]
        id=[str(j) for j, x in batch.iterrows()]
        texts = [x['chunk'] for j, x in batch.iterrows()]
        url = [x['url'] for j, x in batch.iterrows()]
        title = [x['title'] for j, x in batch.iterrows()]
        embeds = embed_model.embed_documents(texts)  # Embed all texts in the batch
        combine=[]
        metadata = []

        for j in range(len(id)):
            metadata.append({
                "title": title[j],
                "text": texts[j],
                "source": url[j],
            })
            combine.append({
                "id": id[j],
                "metadata": metadata[j],
                "values": embeds[j],
            })

        # metadata = [
        #     {'text': x['chunk'],
        #     'source': x['url'],
        #     'title': x['title']} for j, x in batch.iterrows()
        # ]
        # add to Pinecone vectorstore
        # print(len(metadata))
        # print(len(ids)
        index.upsert(vectors=combine,namespaces=ns)
load_to_pinecone(df2,'ns317')

## Initializing the Hugging Face Pipeline

The first thing we need to do is initialize a `text-generation` pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:

* A LLM, in this case it will be `meta-llama/Llama-2-13b-chat-hf`.

* The respective tokenizer for the model.


**loading llama2 model** 
Put your own hf authority key

In [None]:
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-13b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, need auth token for these
hf_auth = <hf_auth>
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)





In [12]:
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")

Loading checkpoint shards: 100%|██████████| 3/3 [00:50<00:00, 16.75s/it]


Model loaded on cuda:0


**Tokenizer initialization**

In [13]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)



# Generator Transformer pipeline (Setup LLM pipeline)

In [14]:
generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    temperature=0.0001,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,  # mex number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

Confirm this is working:

In [15]:
# res = generate_text("Tell me a summary of your news.")
# print(res[0]["generated_text"])

Now to implement this in LangChain Chain

In [16]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

## Initializing a RetrievalQA Chain

### Example

In [17]:
from langchain.vectorstores import Pinecone
text_field='text'
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

  warn_deprecated(


In [18]:
query = 'News from Canada?'

vectorstore.similarity_search(
    query,k=5
)

[Document(page_content="/ Local Journalism Initiative / Canada’s National Observer', 'None']", metadata={'source': 'https://www.thestar.com/news/canada/wave-of-pollution-from-cruise-ships-expected-regardless-of-new-federal-wastewater-rules/article_e87da217-69ac-5c2a-9cb2-dbc5a738a066.html', 'title': 'Wave of pollution from cruise ships expected regardless of new federal wastewater rules'}),
 Document(page_content='significant issue south of the border as it is in Canada and I was able to learn a great deal about some of the initiatives that are going on here."\', \'\', \'esterday, Baird announced that Canada was joining the coalition against wildlife trafficking.\']', metadata={'source': 'https://www.ctvnews.ca/baird-in-u-s-seeking-global-effort-beyond-kyoto-1.237230', 'title': "Baird in U.S. seeking 'global effort beyond Kyoto'"}),
 Document(page_content='["n the third of CTV News Chief Anchor and Senior Editor Lisa LaFlamme\'s interviews with the major federal party leaders, Conserva

### Few_shot Prompt

In [19]:
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

examples = [
    {
        "question": "Is Tony Guan, an antique dealer a animal trafficker?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Is there any news indicating that this figure is involved in animal trafficking？
Intermediate answer: Xiao Ju Guan, aka Tony Guan, a Canadian antiques dealer, pleaded guilty today in Manhattan federal court to attempting to smuggle rhinoceros horns from New York to Canada.
So the final answer is: Yes and the news title is Canadian Antiques Dealer Pleads Guilty in Manhattan Federal Court to Attempted Wildlife Smuggling.
""",
    },
]

In [20]:
example_prompt = PromptTemplate(
    input_variables=["question", "answer"], template="Question: {question}\n{answer}"
)

print(example_prompt.format(**examples[0]))

Question: Is Tony Guan, an antique dealer a animal trafficker?

Are follow up questions needed here: Yes.
Follow up: Is there any news indicating that this figure is involved in animal trafficking？
Intermediate answer: Xiao Ju Guan, aka Tony Guan, a Canadian antiques dealer, pleaded guilty today in Manhattan federal court to attempting to smuggle rhinoceros horns from New York to Canada.
So the final answer is: Yes and the news title is Canadian Antiques Dealer Pleads Guilty in Manhattan Federal Court to Attempted Wildlife Smuggling.



In [21]:
prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=["input"],
)

### RAG pipeline

In [22]:
from langchain.chains import RetrievalQA

rag_pipeline = RetrievalQA.from_chain_type(
    llm, chain_type='stuff',retriever=vectorstore.as_retriever())

In [23]:
rag_pipeline('Do you know if Tony Guan is involved in wildlife trafficking?')

  warn_deprecated(


{'query': 'Do you know if Tony Guan is involved in wildlife trafficking?',
 'result': ' Based on the information provided, there is no direct evidence that Tony Guan is involved in wildlife trafficking. However, he has been investigated for his wealth and connections to wildlife trafficking organizations. Additionally, he has been associated with the Freeland Foundation, which monitors wildlife trafficking.'}

In [293]:
from langchain_community.llms import HuggingFaceHub

llm = HuggingFaceHub(
    repo_id="humarin/chatgpt_paraphraser_on_T5_base",
    task="text-generation",
    model_kwargs={
        "max_new_tokens": 512,
        "top_k": 30,
        "repetition_penalty": 1.03,
            },
)
from langchain.schema import (
    HumanMessage,
    SystemMessage,
)
from langchain_community.chat_models.huggingface import ChatHuggingFace

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(
        content="What happens when an unstoppable force meets an immovable object?"
    ),
]

llm = ChatHuggingFace(llm=llm,format='json',temperature=0)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


### prompting

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("Is Tony Guan involved in wildlife trafficking?")

### Workflow self-corrective RAG

In [130]:
from typing import Annotated, Dict, TypedDict

from langchain_core.messages import BaseMessage


class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        keys: A dictionary where each key is a string.
    """

    keys: Dict[str, any]

In [300]:
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain.vectorstores import Pinecone
embedding = GPT4AllEmbeddings()

text_field='text'
vectorstore = Pinecone(
    index, embedding.embed_query, text_field
)
retriever = vectorstore.as_retriever()
llm = ChatOllama(model="mistral:instruct") # Keep using mistral


from langchain_core.output_parsers import JsonOutputParser
prompt = PromptTemplate(
    template="""You are a grader assessing relevance of a retrieved document to a user question. \n 
    Here is the retrieved document: \n\n {context} \n\n
    Here is the user question: {question} \n
    If the document contains keywords related to the user question, grade it as relevant. \n
    Grade a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question. \n
    Provide the binary score as a JSON with a single key 'score' and no premable or explaination.""",
    input_variables=["question", "context"],
)

chain = prompt | llm | JsonOutputParser()
question="Is the sun bigger than moon?"
docs=retriever.get_relevant_documents(question)
score=chain.invoke({"question": question,"context":docs[0].page_content}) # From retrievers, the format is like .page_content()



In [326]:
import json
import operator
from typing import Annotated, Sequence, TypedDict

from langchain import hub
from langchain_core.output_parsers import JsonOutputParser
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.tools import DuckDuckGoSearchResults
### Nodes ###
df=pd.read_csv('C:\Users\Bliss\Desktop\UofT\Ai_competition\kyc.csv')

def retrieve(state):
    """
    Retrieve documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE---")
    state_dict = state["keys"]
    question = state_dict["question"]
    # local = state_dict["local"]
    documents = retriever.get_relevant_documents(question)
    # return {"keys": {"documents": documents, "local": local, "question": question}}
    return {"keys": {"documents": documents, "question": question}}


def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains generation
    """
    print("---GENERATE---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = state_dict["documents"]

    # Prompt
    prompt = hub.pull("rlm/rag-prompt")


    # Chain
    rag_chain = prompt | llm | StrOutputParser()

    # Run
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {
        "keys": {"documents": documents, "question": question, "generation": generation}
    }


def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with relevant documents
    """

    print("---CHECK RELEVANCE---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = state_dict["documents"]

    prompt = PromptTemplate(
        template="""You are a grader assessing relevance of a retrieved document to a user question. \n 
        Here is the retrieved document: \n\n {context} \n\n
        Here is the user question: {question} \n
        If the document contains keywords related to the user question, grade it as relevant. \n
        Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question. \n
        Provide the binary score in string with a single key 'score' and no premable or explaination.""",
        input_variables=["question", "context"],
    )
    parser=StrOutputParser()

    chain = prompt | llm | parser

    # Score
    filtered_docs = []
    search = "No"  # Default do not opt for web search to supplement retrieval
    for d in documents:
        score = chain.invoke(
            {
                "question": question,
                "context": d.page_content,
            }
        )
        print(score)
        grade = 'yes' if 'yes' in score else 'no'
        if grade == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            search = "Yes"  # Perform web search
            continue

    return {
        "keys": {
            "documents": filtered_docs,
            "question": question,
            # "local": local,
            "run_web_search": search,
        }
    }


def transform_query(state):
    """
    Transform the query to produce a better question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates question key with a re-phrased question
    """

    print("---TRANSFORM QUERY---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = state_dict["documents"]

    # Create a prompt template with format instructions and the query
    prompt = PromptTemplate(
        template="""You are generating questions that is well optimized for retrieval. \n 
        Look at the input and try to reason about the underlying sematic intent / meaning. \n 
        Here is the initial question:
        \n ------- \n
        {question} 
        \n ------- \n
        Provide an improved question without any premable, only respond with the updated question: """,
        input_variables=["question"],
    )

    # Prompt
    chain = prompt | llm | StrOutputParser()
    better_question = chain.invoke({"question": question})

    return {
        "keys": {"documents": documents, "question": better_question}
    }


def web_search(state):
    """
    Web search based on the re-phrased question using DuckduckGo API.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Web results appended to documents.
    """

    print("---WEB SEARCH---")
    state_dict = state["keys"]
    question = state_dict["question"]
    documents = state_dict["documents"]


    tool =DuckDuckGoSearchResults(backend="news")
    docs = tool.invoke({"query": question})
    web_results = Document(page_content=docs)
    documents.append(web_results)

    return {"keys": {"documents": documents, "question": question}}


### Edges


def decide_to_generate(state):
    """
    Determines whether to generate an answer or re-generate a question for web search.

    Args:
        state (dict): The current state of the agent, including all keys.

    Returns:
        str: Next node to call
    """

    print("---DECIDE TO GENERATE---")
    state_dict = state["keys"]
    question = state_dict["question"]
    filtered_documents = state_dict["documents"]
    search = state_dict["run_web_search"]

    if search == "Yes":
        # All documents have been filtered check_relevance
        # We will re-generate a new query
        print("---DECISION: TRANSFORM QUERY and RUN WEB SEARCH---")
        return "transform_query"
    else:
        # We have relevant documents, so generate answer
        print("---DECISION: GENERATE---")
        return "generate"

In [327]:
import pprint

from langgraph.graph import END, StateGraph

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generate
workflow.add_node("transform_query", transform_query)  # transform_query
workflow.add_node("web_search", web_search)  # web search

# Build graph
workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "transform_query": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "web_search")
workflow.add_edge("web_search", "generate")
workflow.add_edge("generate", END)

# Compile
app = workflow.compile()

In [331]:
# Run
inputs = {
    "keys": {
        "question": "Is the person named TONY GUAN as an antique dealer highly suspected being involved in animal trafficking."
    }
}
for output in app.stream(inputs):
    for key, value in output.items():
        # Node
        # pprint.pprint(f"Node '{key}':")
        # Optional: print full state at each node
        pprint.pprint(value["keys"], indent=2, width=80, depth=None)
    pprint.pprint("\n---\n")

# Final generation
pprint.pprint(value["keys"]["generation"])

---RETRIEVE---
{ 'documents': [ Document(page_content='a key player in a web of wildlife traffickers who used his role as an antique dealer to illicitly smuggle wildlife items, including rhino horn and elephant ivory, from the United States to China,” said Assistant Attorney General Cruden.\\xa0 “We will continue to investigate and prosecute those who', metadata={'source': 'https://www.justice.gov/opa/pr/texas-antiques-appraiser-sentenced-25-months-prison-rhino-and-ivory-smuggling-conspiracy', 'title': 'Texas Antiques Appraiser Sentenced to 25 Months in Prison for Rhino and Ivory Smuggling Conspiracy '}),
                 Document(page_content="Priest Johnson for the Eastern District of Texas, to a one count information charging him with wildlife trafficking in violation of the Lacey Act. \\xa0 \\xa0', 'In papers filed in federal court in April 2016, Malasukum admitted to purchasing a tiger skull from undercover agents who were working for", metadata={'source': 'https://www.justice.gov

 score: 'yes'
---GRADE: DOCUMENT RELEVANT---
 score: "no"
---GRADE: DOCUMENT NOT RELEVANT---
 score: "yes"
---GRADE: DOCUMENT RELEVANT---
 score: "yes"
---GRADE: DOCUMENT RELEVANT---
{ 'documents': [ Document(page_content='a key player in a web of wildlife traffickers who used his role as an antique dealer to illicitly smuggle wildlife items, including rhino horn and elephant ivory, from the United States to China,” said Assistant Attorney General Cruden.\\xa0 “We will continue to investigate and prosecute those who', metadata={'source': 'https://www.justice.gov/opa/pr/texas-antiques-appraiser-sentenced-25-months-prison-rhino-and-ivory-smuggling-conspiracy', 'title': 'Texas Antiques Appraiser Sentenced to 25 Months in Prison for Rhino and Ivory Smuggling Conspiracy '}),
                 Document(page_content="turtle. The Arakan forest turtle is critically endangered, having once been presumed extinct.\\xa0 The illegal trafficking spanned approximately four years.\\xa0 More information 

In [348]:
import pandas as pd
from sklearn.model_selection import train_test_split
df=pd.read_csv('C:\\Users\\Bliss\\Desktop\\UofT\\Ai_competition\\kyc.csv')
train_val, test = train_test_split(df, test_size=0.3, random_state=42)
train, val = train_test_split(train_val, test_size=0.3, random_state=42)


In [368]:
def write_json(data, filename):
    with open(filename, 'w') as f:
        json.dump(data, f)

In [383]:
result=[]
doc=[]
with open('results.txt', 'w',encoding='utf-8') as file:
    for i in range(len(train)):
        inputs = {
            "keys": {
                "question": f"Is the person named {train.iloc[i]['Name']} as an {train.iloc[i]['Occupation']} highly suspected being involved in animal trafficking?"
            }
        }
        for output in app.stream(inputs):
            for key, value in output.items():
                # Node
                # pprint.pprint(f"Node '{key}':")
                # Optional: print full state at each node
                pprint.pprint(value["keys"], indent=2, width=80, depth=None)
            pprint.pprint("\n---\n")

            # Final generation
        pprint.pprint(value["keys"]["generation"])
        
        # Write results to the file
        file.write(f"Result for the {i} row and {train.iloc[i]['cust_id']}:\n")
        file.write(f"Generation: {value['keys']['generation']}\n")
        file.write(f"Documents: {value['keys']['documents']}\n")
        file.write("---\n")


---RETRIEVE---
{ 'documents': [ Document(page_content='state enemies and a cabal of Satan-worshipping cannibals operating a child sex trafficking ring. The individual sharing the posts used an image of Hyten and claimed to be him, even writing, “The account is maintained by me. -genhyten.” However, Guillebeau confirmed to the AP that an impersonator', metadata={'source': 'https://www.thestar.com/news/world/not-real-news-a-look-at-what-didn-t-happen-last-week/article_f1ef1813-390a-54c7-9497-4fbfce7f0afc.html', 'title': 'NOT REAL NEWS: A look at what didn’t happen last week'}),
                 Document(page_content='\'onobos were sent to Armenia in 2011 using fraudulent CITES permits signed by Doumbouya, the alliance said.\', \'"We are very pleased by the strong message of the Guinean government," said Charlotte Houpline, head of an activist project to enforce trafficking laws. She called the arrest a "landmark', metadata={'source': 'https://www.ctvnews.ca/sci-tech/guinea-arrests-ex-wil

 score: "no"
---GRADE: DOCUMENT NOT RELEVANT---
 score: "no"
---GRADE: DOCUMENT NOT RELEVANT---
 score: "no"
---GRADE: DOCUMENT NOT RELEVANT---
 score: "no"
---GRADE: DOCUMENT NOT RELEVANT---
{ 'documents': [],
  'question': 'Is the person named LORRAINE BEAULIEU-POIRIER as an Farm '
              'Laborer highly suspected being involved in animal trafficking?',
  'run_web_search': 'Yes'}
'\n---\n'
---DECIDE TO GENERATE---
---DECISION: TRANSFORM QUERY and RUN WEB SEARCH---
---TRANSFORM QUERY---
{ 'documents': [],
  'question': ' Has Lorraine Beaulieu-Poirier been implicated in animal '
              'trafficking, specifically in relation to her role as a farm '
              'laborer?'}
'\n---\n'
---WEB SEARCH---
{ 'documents': [ Document(page_content='[snippet: A suspected trafficking victim who had been missing ... according to cops. The victim has also been reunited with her family. Police did not offer further details about her disappearance and ..., title: Woman missing for 7 year

KeyboardInterrupt: 

In [390]:
known_df=pd.concat([df[df['cust_id']=="CUST32365345"],
df[df['cust_id']=="CUST14701697"],
df[df['cust_id']=="CUST72930228"],
df[df['cust_id']=="CUST12136191"],
df[df['cust_id']== "CUST62788134"],
df[df['cust_id']=="CUST30572335"]])

In [392]:
with open('resultss.txt', 'w',encoding='utf-8') as file:
    for i in range(len(known_df)):
        inputs = {
            "keys": {
                "question": f"Is the person named {known_df.iloc[i]['Name']} as an {known_df.iloc[i]['Occupation']} highly suspected being involved in animal trafficking?"
            }
        }
        for output in app.stream(inputs):
            for key, value in output.items():
                # Node
                # pprint.pprint(f"Node '{key}':")
                # Optional: print full state at each node
                pprint.pprint(value["keys"], indent=2, width=80, depth=None)
            pprint.pprint("\n---\n")

            # Final generation
        pprint.pprint(value["keys"]["generation"])
        
        # Write results to the file
        file.write(f"Result for the {i} row and {known_df.iloc[i]['cust_id']}:\n")
        file.write(f"Generation: {value['keys']['generation']}\n")
        file.write(f"Documents: {value['keys']['documents']}\n")
        file.write("---\n")

---RETRIEVE---
{ 'documents': [ Document(page_content='a key player in a web of wildlife traffickers who used his role as an antique dealer to illicitly smuggle wildlife items, including rhino horn and elephant ivory, from the United States to China,” said Assistant Attorney General Cruden.\\xa0 “We will continue to investigate and prosecute those who', metadata={'source': 'https://www.justice.gov/opa/pr/texas-antiques-appraiser-sentenced-25-months-prison-rhino-and-ivory-smuggling-conspiracy', 'title': 'Texas Antiques Appraiser Sentenced to 25 Months in Prison for Rhino and Ivory Smuggling Conspiracy '}),
                 Document(page_content="Priest Johnson for the Eastern District of Texas, to a one count information charging him with wildlife trafficking in violation of the Lacey Act. \\xa0 \\xa0', 'In papers filed in federal court in April 2016, Malasukum admitted to purchasing a tiger skull from undercover agents who were working for", metadata={'source': 'https://www.justice.gov

 score: 'yes'
---GRADE: DOCUMENT RELEVANT---
 score: "no"
---GRADE: DOCUMENT NOT RELEVANT---
 score: 'yes'
---GRADE: DOCUMENT RELEVANT---
 score: "yes"
---GRADE: DOCUMENT RELEVANT---
{ 'documents': [ Document(page_content='a key player in a web of wildlife traffickers who used his role as an antique dealer to illicitly smuggle wildlife items, including rhino horn and elephant ivory, from the United States to China,” said Assistant Attorney General Cruden.\\xa0 “We will continue to investigate and prosecute those who', metadata={'source': 'https://www.justice.gov/opa/pr/texas-antiques-appraiser-sentenced-25-months-prison-rhino-and-ivory-smuggling-conspiracy', 'title': 'Texas Antiques Appraiser Sentenced to 25 Months in Prison for Rhino and Ivory Smuggling Conspiracy '}),
                 Document(page_content="['-- An antiques dealer from British Columbia who pleaded guilty to smuggling rhinoceros horns, elephant ivory and coral has been sentenced to two and a half years in a U.S. priso