<a href="https://colab.research.google.com/github/fabiomatricardi/How-I-Built-a-Chatbot-that-Crushed-ChatGPT/blob/main/Only_CPU_MEDIUM_RAG_openVsChatGPT_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **S**ort **o**f **M**ixture **o**f **E**xperts


### Project Orchestartion Outline

Use various LLM stack as part of Document analysis


*   LaMiniFlan-T5-77M for **Summarization**
*   BAAI/bge-base-en-v1.5 Embeddings (only for English) max lenght 512 tokens
*   StableLM-Zephyr-3B GGUF quantized for **QnA** and complex Reasoning

Main points: StableLM only on CPU has quite big inference time. LaMini-783M takes around 2 minutes to generate the questions. LaMini 77M create a Summary in 9 seconds.

I used the reranking function to inject in position -2 the Summary as part of the Context

### Install dependencies

In [None]:
%%capture
!pip install transformers -U --no-cache-dir
!pip install llama-cpp-python==0.2.34
!pip install langchain
!pip install rich
!pip install tiktoken
!pip install chromadb
!pip install sentence-transformers
!huggingface-cli download TheBloke/stablelm-zephyr-3b-GGUF stablelm-zephyr-3b.Q5_K_S.gguf --local-dir . --local-dir-use-symlinks False
!wget https://github.com/fabiomatricardi/How-I-Built-a-Chatbot-that-Crushed-ChatGPT/raw/main/Article-edited.txt
!wget https://github.com/fabiomatricardi/How-I-Built-a-Chatbot-that-Crushed-ChatGPT/raw/main/Article-original.txt

<img src="https://github.com/fabiomatricardi/How-I-Built-a-Chatbot-that-Crushed-ChatGPT/raw/main/restartRuntime.png" width=800>

In [None]:
# @title Main imports and models load
from tqdm.rich import trange, tqdm
from rich import console
from rich.panel import Panel
from rich.markdown import Markdown
from rich.text import Text
import warnings
warnings.filterwarnings(action='ignore')
import datetime
from rich.console import Console
console = Console(width=110)
from transformers import pipeline
import os
## Logger file
tstamp = datetime.datetime.now()
tstamp = str(tstamp).replace(' ','_')
logfile = f'{tstamp}_log.txt'
def writehistory(text):
    with open(logfile, 'a', encoding='utf-8') as f:
        f.write(text)
        f.write('\n')
    f.close()

## Load  MBZUAI/LaMini-Flan-T5-77M for summarization
with console.status("Loading ✅ LaMini77M...",spinner="dots12"):
    model77 = pipeline('text2text-generation',model="MBZUAI/LaMini-Flan-T5-77M")
writehistory(f"{str(datetime.datetime.now())} Loaded 🦙 MBZUAI/LaMini-Flan-T5-77M for summarization")
## Load a llama-cpp-python quantized model
from llama_cpp import Llama
with console.status("Loading ✅✅✅✅ stablelm-zephyr-3b with LLAMA.CPP...",spinner="dots12"):
  llm = Llama(
    model_path="/content/stablelm-zephyr-3b.Q5_K_S.gguf",  # Download the model file first
    n_ctx=4096,  # The max sequence length to use - note that longer sequence lengths require much more resources
    n_threads=2,            # The number of CPU threads to use, tailor to your system and the resulting performance
  )
writehistory(f"{str(datetime.datetime.now())} Loaded 🧠 stablelm-zephyr-3b.Q5_K_S.gguf for heavy QnA")
# Load Embeddings for MultiLanguage
from langchain.embeddings import HuggingFaceEmbeddings
hf_embeddings = HuggingFaceEmbeddings(model_name='BAAI/bge-base-en-v1.5')
writehistory(f"{str(datetime.datetime.now())} Loaded 🧞‍♂️ BAAI/bge-base-en-v1.5 Embeddings")

#Load text files
with open("/content/Article-original.txt") as f:
  originaltext = f.read()
f.close()
with open("/content/Article-edited.txt") as f:
  editedtext = f.read()
f.close()
writehistory(f"{str(datetime.datetime.now())} [red bold]Original scraped text saved into [bright_green on black]originaltext[/bright_green on black] variable")
writehistory(f"{str(datetime.datetime.now())} [blue bold]Edited scraped text saved into [bright_yellow on black]editedtext[/bright_yellow on black] variable")
writehistory(f"{str(datetime.datetime.now())} Original text lenght: [bright_green on black]{len(originaltext)}")
writehistory(f"{str(datetime.datetime.now())} Edited text lenght: [bright_yellow on black]{len(editedtext)}")

### FlanT5 and StableLM-Zephy QnA functions
- qgen77  --> `qg77 = qgen77(splitted_text_qg, model77,'🎴🎴LaMiniFlanT5-77M')`
- summary77 --> `sum77 = summary77(splitted_text_qg, model77,'🎴LaMiniFlanT5-77M') `
- stableQA --> `q,o = stableQA("What is Science?",250,llm)`
- stableQnA --> `q1,o1 = stableQnA("What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?",editedtext,450,llm)`

for summarization and question generation we can use 2 different splits
- for QnA and embedding retrieval a smaller chunk too
- all of them in tokens count

In [None]:
# @title
# Different chunks for the functions (Summary/Question Generation)
from langchain.document_loaders import TextLoader
from langchain.text_splitter import TokenTextSplitter
TOKENtext_splitter = TokenTextSplitter(chunk_size=430, chunk_overlap=20)
splitted_text_sum = TOKENtext_splitter.split_text(editedtext) #create a list
TOKENtext_splitter = TokenTextSplitter(chunk_size=280, chunk_overlap=20)
splitted_text_qg = TOKENtext_splitter.split_text(editedtext) #create a list
TOKENtext_splitter = TokenTextSplitter(chunk_size=150, chunk_overlap=20)
splitted_text_qna = TOKENtext_splitter.split_text(editedtext) #create a list


#Function for T5 model to summarize text based on textsplitted chunks base
def summary77(textplitted, model, modelname):
  writehistory(f"{str(datetime.datetime.now())} Using {modelname} for summarization task...")
  start = datetime.datetime.now()
  with console.status("Generating summary...",spinner="dots12"):
    summary ="SUMMARY:\n"
    for item in textplitted:
      text = item
      template_summary = f'''Text: {text}

Write a complete summary of the above text.
'''
      res = model(template_summary, temperature=0.3, repetition_penalty=1.3, max_length=300, do_sample=True)[0]['generated_text']
      #console.print(res)
      #console.print('---')
      summary = summary + res + '\n'
  delta = datetime.datetime.now() - start
  writehistory(f"{str(datetime.datetime.now())} [bold]----------------------------")
  writehistory(f"{str(datetime.datetime.now())} {summary}")
  writehistory(f"{str(datetime.datetime.now())} [red1 bold]Full SUMMARY Completed in {delta}")
  return summary

#Function for T5 model to generate 2 questions for each chunk of text based on textsplitted chunks base
def qgen77(textplitted, model, modelname):
  writehistory(f"{str(datetime.datetime.now())} Using {modelname} for for Question generation task -  2 PER CHUNK...")
  start = datetime.datetime.now()
  quest2 = []
  with console.status("Generating Questions...",spinner="dots12"):
    for item in textplitted:
      text = item
      template_qg = f'''{text}\n\n
write two important questions about the above text.
Questions:
1.
2.
'''
      res = model(template_qg, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
      ed_res = res.replace('? ','?#')
      list_res = ed_res.split('#')
      for i in list_res:
        #a = i[3:]  REMOVED TRUNK on the list, generated by LaMini
        quest2.append(i[3:])

  writehistory(f"{str(datetime.datetime.now())} [bold]----------------------------")
  writehistory(f"{str(datetime.datetime.now())} {quest2}")
  return quest2




#Function for general inference of the model, no context required
def stableQA(question,maxtokens,model):
  """
  basic generation with StableLM / any llama.cpp loaded model
  question -> string
  maxtokens -> int, number of max tokens to generate
  model -> llama-cpp-python instance // here is StableLM-Zephyr-3B
  RETURNS question, ouotput -> str
  """
  import datetime
  start = datetime.datetime.now()
  prompt = question
  template = f"<|user|>\n{prompt}<|endoftext|>\n<|assistant|>"
  with console.status("StableLM-Zephyr-3B AI is working ✅✅✅ ...",spinner="dots12"):
    output = model(
      template, # Prompt
      temperature=0.3,
      max_tokens=maxtokens,  # Generate up to 512 tokens
      stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
      echo=False        # Whether to echo the prompt
    )
    delta = datetime.datetime.now() - start
    console.print(f"Question: [bright_green on black]{prompt}")
    console.print(output['choices'][0]['text'])
    console.print(f"Completed in: [bold red]{delta}")
    return question, output['choices'][0]['text']

#Function for QnA over a Context - the context is pure string text
def stableQnA(question,contesto,maxtokens,model):
  """
  basic generation with StableLM / any llama.cpp loaded model
  question -> string
  contesto -> string, parsed page_content from document objects
  maxtokens -> int, number of max tokens to generate
  model -> llama-cpp-python instance // here is StableLM-Zephyr-3B
  RETURNS question, ouotput -> str
  """
  context = contesto
  query = question
  import datetime
  start = datetime.datetime.now()
  template = f"""<|user|>\nGiven this text extracts:\n-----\n{context}\n-----\nPlease answer the question. Your answer must be informative and organized into bullet points. If the question is unanswerable, ""say \"unanswerable\".\nQuestion: {query}<|endoftext|>\n<|assistant|>"""
  with console.status("StableLM-Zephyr-3B AI is working ✅✅✅ ...",spinner="dots12"):
    output = model(
      template, # Prompt
      temperature=0.3,
      max_tokens=maxtokens,  # Generate up to 512 tokens
      stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
      echo=False        # Whether to echo the prompt
    )
    delta = datetime.datetime.now() - start
    console.print(f"[bright_green bold on black]Question: {query}")
    console.print(output['choices'][0]['text'])
    console.print(f"Completed in: [bold red]{delta}")
    return question, output['choices'][0]['text']

In [None]:
sum77 = summary77(splitted_text_qg, model77,'🎴LaMiniFlanT5-77M')
console.print(sum77)

Output()

In [None]:
qg77 = qgen77(splitted_text_qg, model77,'🎴🎴LaMiniFlanT5-77M')
console.print(Markdown(f"## Suggested Questions"))
for items in qg77:
  console.print(Markdown(f"- {items}"))

Output()

In [None]:
console.print(Markdown(f"# Reply to the Question without Langchain"))
q1,o1 = stableQnA("What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?",editedtext,450,llm)

Output()

### Create a VectorDB with HugginFace embeddings and Reeanking QnA with StalbLM-Zephyr

- createDB --> `db3 = createDB(docdocs, hf_embeddings, 'db-128-0-BGEBase')`
- QnA_Rerank_Stable --> `r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,3, docsum, question,llm)`

In [None]:
# @title
# Splitting into 128 tokens chunks
TOKENtext_splitter = TokenTextSplitter(chunk_size=128, chunk_overlap=0)
splitted_text_qna = TOKENtext_splitter.split_text(editedtext) #create a list

def createDB(docs, embeddings, dbname):
  """
  Function that create a Chroma Vector store of splitted documents
  with provided embeddings, and save it locally.
  docs: text_splitter docs object
  embeddings: HuggingFace embeddings object
  dbname: string, the name of the persistent db
  RETURN  the Chroma db object
  """
  from langchain.vectorstores import Chroma
  import datetime
  start = datetime.datetime.now()
  db = Chroma.from_documents(docs, embeddings,persist_directory=f"./{dbname}")
  stop = datetime.datetime.now()
  delta = stop-start
  writehistory(f"{str(datetime.datetime.now())} Vector db generated in {delta}")
  return db

#Function for QnA from ChramDB with Reranking and Summary Injection
def QnA_Rerank_Stable(db,k, summarization, query,model):
  """
  return the generated answer to a similarity search query
  with Re-ranking of k elements
  and run the QnA chain with Prompt Template
  inputs:
  db -> ChromaDB object instance
  k -> number of hits for the similarity search (be aware of Max Context Lenght!)
       must be >= 3
  summarization -> Langchain Document object with the Summarization
  query -> string witht he question for the similarity search
  llm -> llama-cpp-python model instance
  return res: str generated answer by llm
         delta: time object, duration of the llm generation
         ques: the question string
         reordered_docs: list of LangChainDocuments ReRanked with Summarization
  """
  import datetime
  start = datetime.datetime.now()
  # Create a retriever
  retriever = db.as_retriever(search_kwargs={"k": k})
  from langchain.document_transformers import LongContextReorder
  from langchain.chains import StuffDocumentsChain, LLMChain
  from langchain.prompts import PromptTemplate
  # Get relevant documents ordered by relevance score
  context_set = retriever.get_relevant_documents(query)
  #print(str(context_set))
  # Reorder the documents:
  # Less relevant document will be at the middle of the list and more
  # relevant elements at beginning / end.
  reordering = LongContextReorder()
  reordered_docs = reordering.transform_documents(context_set)
  reordered_docs.insert(-1,summarization)
  #print(str(reordered_docs))
  # We prepare and run a custom Stuff chain with reordered docs as context.
  # Override prompts
  context = ''
  for i in reordered_docs:
    context += i.page_content
  template = f"""<|user|>\nGiven this text extracts:\n-----\n{context}\n-----\nPlease answer the question. Your answer must be precise, informative and organized into bullet points. If the question is unanswerable, ""say \"unanswerable\".\nQuestion: {query}<|endoftext|>\n<|assistant|>"""
  with console.status("StableLM-Zephyr-3B AI is working ✅✅✅ ...",spinner="dots12"):
    output = model(
      template, # Prompt
      temperature=0.3,
      max_tokens=450,  # Generate up to 512 tokens
      stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
      echo=False        # Whether to echo the prompt
    )
  result = output['choices'][0]['text']
  delta = datetime.datetime.now() - start
  return result, delta, query, reordered_docs

def evidenzia_res(keys, fulltext,summary):
  from rich.text import Text
  mytest = fulltext + summary.page_content
  text = Text(mytest)
  for item in keys:
    l = len(item.page_content)
    x = mytest.find(item.page_content)
    text.stylize("black on bright_yellow", x, (x+l))
  console.print(text)
  return text

#Create 2 Langchain documents: for the splitted text and for the summary
from langchain.schema.document import Document
docdocs = []
for i in range(0,len(splitted_text_qna)):
  docdocs.append(Document(page_content = splitted_text_qna[i],
                          metadata = {'source': '/content/2024-01-27 11.23.05 Apple s iOS App Store announces_edited.txt',
                              'title': "Apple's iOS App Store announces sweeping changes in the EU",
                              'author': 'Ashley Gold, author of Axios Pro',
                              'url' : 'https://www.axios.com/2024/01/25/apple-app-store-eu-changes',
                              }))
docsum = Document(page_content = sum77, metadata = {
                  'source': '/content/2024-01-27 11.23.05 Apple s iOS App Store announces_edited.txt',
                  'title': "Apple's iOS App Store announces sweeping changes in the EU",
                  'author': 'Ashley Gold, author of Axios Pro',
                  'url' : 'https://www.axios.com/2024/01/25/apple-app-store-eu-changes',
                  'type': 'summary'})

In [None]:
db3 = createDB(docdocs, hf_embeddings, 'db-128-0-BGEBase')

In [None]:
from rich.markdown import Markdown
console.print(Markdown("## QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
question = "What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?"
r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,3, docsum, question,llm)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")

Output()

In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
question = "What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?"
r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,5, docsum, question,llm)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")

Output()

QnA without a Vector dB

In [None]:
q1,o1 = stableQnA("What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?",editedtext,450,llm)

Output()

<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>

---
---

---

---






---



---


---



# Further studies

### Full functions for QnA and Reranking with summary

```
# load from disk
db3 = Chroma(persist_directory="./chroma_db480tok-20", embedding_function=hf_embeddings)
```



In [None]:
#from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
#llm = HuggingFacePipeline(pipeline=model783)

In [None]:
def QnA_Rerank_Stable(db,k, summarization, query,model):
  """
  return the generated answer to a similarity search query
  with Re-ranking of k elements
  and run the QnA chain with Prompt Template
  inputs:
  db -> ChromaDB object instance
  k -> number of hits for the similarity search (be aware of Max Context Lenght!)
       must be >= 3
  summarization -> Langchain Document object with the Summarization
  query -> string witht he question for the similarity search
  llm -> llama-cpp-python model instance
  return res: str generated answer by llm
         delta: time object, duration of the llm generation
         ques: the question string
         reordered_docs: list of LangChainDocuments ReRanked with Summarization
  """
  import datetime
  start = datetime.datetime.now()
  # Create a retriever
  retriever = db.as_retriever(search_kwargs={"k": k})
  from langchain.document_transformers import LongContextReorder
  from langchain.chains import StuffDocumentsChain, LLMChain
  from langchain.prompts import PromptTemplate
  # Get relevant documents ordered by relevance score
  context_set = retriever.get_relevant_documents(query)
  #print(str(context_set))
  # Reorder the documents:
  # Less relevant document will be at the middle of the list and more
  # relevant elements at beginning / end.
  reordering = LongContextReorder()
  reordered_docs = reordering.transform_documents(context_set)
  reordered_docs.insert(-1,summarization)
  #print(str(reordered_docs))
  # We prepare and run a custom Stuff chain with reordered docs as context.
  # Override prompts
  context = ''
  for i in reordered_docs:
    context += i.page_content
  template = f"""<|user|>\nGiven this text extracts:\n-----\n{context}\n-----\nPlease answer the question. Your answer must be precise, informative and organized into bullet points. If the question is unanswerable, ""say \"unanswerable\".\nQuestion: {query}<|endoftext|>\n<|assistant|>"""
  with console.status("StableLM-Zephyr-3B AI is working ✅✅✅ ...",spinner="dots12"):
    output = model(
      template, # Prompt
      temperature=0.3,
      max_tokens=450,  # Generate up to 512 tokens
      stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
      echo=False        # Whether to echo the prompt
    )
  result = output['choices'][0]['text']
  delta = datetime.datetime.now() - start
  return result, delta, query, reordered_docs

def evidenzia_res(keys, fulltext,summary):
  from rich.text import Text
  mytest = fulltext + summary.page_content
  text = Text(mytest)
  for item in keys:
    l = len(item.page_content)
    x = mytest.find(item.page_content)
    text.stylize("black on bright_yellow", x, (x+l))
  console.print(text)
  return text

## QnA_Rerank_Plus with 🧠💎 StablLM-Zephyr-3B using Reranking and Summary

In [None]:
from rich.markdown import Markdown
console.print(Markdown("## QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
question = "What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?"
r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,3, docsum, question,llm)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")

Output()

In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
question = "What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?"
r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,5, docsum, question,llm)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")

Output()

In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
question = "What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?"
r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,9, docsum, question,llm)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")



Output()

## Further studies section

### Load a Llama.cpp model details

In [None]:
!huggingface-cli download TheBloke/stablelm-zephyr-3b-GGUF stablelm-zephyr-3b.Q5_K_S.gguf --local-dir . --local-dir-use-symlinks False

In [None]:
!wget https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF/resolve/main/stablelm-zephyr-3b.Q5_K_S.gguf?download=true



```
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
  model_path="/content/stablelm-zephyr-3b.Q5_K_S.gguf",  # Download the model file first
  n_ctx=4096,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
)

# Simple inference example
output = llm(
  "<|user|>\n{prompt}<|endoftext|>\n<|assistant|>", # Prompt
  max_tokens=512,  # Generate up to 512 tokens
  stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
  echo=True        # Whether to echo the prompt
)
```



"<|user|>\n{prompt}<|endoftext|>\n<|assistant|>"

In [None]:
#from langchain.callbacks.manager import CallbackManager
#from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
#from langchain_community.llms import LlamaCpp
from llama_cpp import Llama

llm = Llama(
  model_path="/content/stablelm-zephyr-3b.Q5_K_S.gguf",  # Download the model file first
  n_ctx=4096,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=2,            # The number of CPU threads to use, tailor to your system and the resulting performance
)

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
Model metadata: {'general.file_type': '16', 'tokenizer.chat_template': "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}", 'tokenizer.ggml.unknown_token_id': '0', 'general.architecture': 'stablelm', 'general.name': 'source', 'stablelm.embedding_length': '2560', 'stablelm.context_length': '4096', 'stablelm.block_count': '32', 'stablelm.feed_forward_length': '6912', 'stablelm.use_parallel_residual': 'true', 'tokenize

In [None]:
# Different chunks for the functions (Summary/Question Generation)
from langchain.document_loaders import TextLoader
from langchain.text_splitter import TokenTextSplitter
TOKENtext_splitter = TokenTextSplitter(chunk_size=430, chunk_overlap=20)
splitted_text_sum = TOKENtext_splitter.split_text(editedtext) #create a list
TOKENtext_splitter = TokenTextSplitter(chunk_size=280, chunk_overlap=20)
splitted_text_qg = TOKENtext_splitter.split_text(editedtext) #create a list
TOKENtext_splitter = TokenTextSplitter(chunk_size=150, chunk_overlap=20)
splitted_text_qna = TOKENtext_splitter.split_text(editedtext) #create a list


#Function for T5 model to summarize text based on textsplitted chunks base
def summary77(textplitted, model, modelname):
  writehistory(f"{str(datetime.datetime.now())} Using {modelname} for summarization task...")
  start = datetime.datetime.now()
  with console.status("Generating summary...",spinner="dots12"):
    summary ="SUMMARY:\n"
    for item in textplitted:
      text = item
      template_summary = f'''Text: {text}

Write a complete summary of the above text.
'''
      res = model(template_summary, temperature=0.3, repetition_penalty=1.3, max_length=300, do_sample=True)[0]['generated_text']
      #console.print(res)
      #console.print('---')
      summary = summary + res + '\n'
  delta = datetime.datetime.now() - start
  writehistory(f"{str(datetime.datetime.now())} [bold]----------------------------")
  writehistory(f"{str(datetime.datetime.now())} {summary}")
  writehistory(f"{str(datetime.datetime.now())} [red1 bold]Full SUMMARY Completed in {delta}")
  return summary

#Function for T5 model to generate 2 questions for each chunk of text based on textsplitted chunks base
def qgen77(textplitted, model, modelname):
  writehistory(f"{str(datetime.datetime.now())} Using {modelname} for for Question generation task -  2 PER CHUNK...")
  start = datetime.datetime.now()
  quest2 = []
  with console.status("Generating Questions...",spinner="dots12"):
    for item in textplitted:
      text = item
      template_qg = f'''{text}\n\n
write two important questions about the above text.
Questions:
1.
2.
'''
      res = model(template_qg, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
      ed_res = res.replace('? ','?#')
      list_res = ed_res.split('#')
      for i in list_res:
        #a = i[3:]  REMOVED TRUNK on the list, generated by LaMini
        quest2.append(i[3:])

  writehistory(f"{str(datetime.datetime.now())} [bold]----------------------------")
  writehistory(f"{str(datetime.datetime.now())} {quest2}")
  return quest2




#Function for general inference of the model, no context required
def stableQA(question,maxtokens,model):
  """
  basic generation with StableLM / any llama.cpp loaded model
  question -> string
  maxtokens -> int, number of max tokens to generate
  model -> llama-cpp-python instance // here is StableLM-Zephyr-3B
  RETURNS question, ouotput -> str
  """
  import datetime
  start = datetime.datetime.now()
  prompt = question
  template = f"<|user|>\n{prompt}<|endoftext|>\n<|assistant|>"
  with console.status("StableLM-Zephyr-3B AI is working ✅✅✅ ...",spinner="dots12"):
    output = model(
      template, # Prompt
      temperature=0.3,
      max_tokens=maxtokens,  # Generate up to 512 tokens
      stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
      echo=False        # Whether to echo the prompt
    )
    delta = datetime.datetime.now() - start
    console.print(f"Question: [bright_green on black]{prompt}")
    console.print(output['choices'][0]['text'])
    console.print(f"Completed in: [bold red]{delta}")
    return question, output['choices'][0]['text']

#Function for QnA over a Context - the context is pure string text
def stableQnA(question,contesto,maxtokens,model):
  """
  basic generation with StableLM / any llama.cpp loaded model
  question -> string
  contesto -> string, parsed page_content from document objects
  maxtokens -> int, number of max tokens to generate
  model -> llama-cpp-python instance // here is StableLM-Zephyr-3B
  RETURNS question, ouotput -> str
  """
  context = contesto
  query = question
  import datetime
  start = datetime.datetime.now()
  template = f"""<|user|>\nGiven this text extracts:\n-----\n{context}\n-----\nPlease answer the question. Your answer must be informative and organized into bullet points. If the question is unanswerable, ""say \"unanswerable\".\nQuestion: {query}<|endoftext|>\n<|assistant|>"""
  with console.status("StableLM-Zephyr-3B AI is working ✅✅✅ ...",spinner="dots12"):
    output = model(
      template, # Prompt
      temperature=0.3,
      max_tokens=maxtokens,  # Generate up to 512 tokens
      stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
      echo=False        # Whether to echo the prompt
    )
    delta = datetime.datetime.now() - start
    console.print(f"[bright_green bold on black]Question: {query}")
    console.print(output['choices'][0]['text'])
    console.print(f"Completed in: [bold red]{delta}")
    return question, output['choices'][0]['text']

- Usage example

In [None]:
q,o = stableQA("What is Science?",250,llm)

Output()

In [None]:
q1,o1 = stableQnA("What are the changes made by Apple to its iOS App Store in order to comply with Europe's Digital Markets Act?",editedtext,450,llm)

Output()

In [None]:
q1,o1 = stableQnA("What are the changes made by Apple to its iOS App Store in order to comply with Europe's Digital Markets Act?",editedtext,450,llm)

Output()

In [None]:
# "1. What are the iOS changes that will be implemented to ensure compliance with Europe's Digital Markets Act law?"
q1,o1 = stableQnA("1. What are the iOS changes that will be implemented to ensure compliance with Europe's Digital Markets Act law?",editedtext,450,llm)

Output()

- for summarization and question generation we can use 2 different splits
- for QnA and embedding retrieval a smaller chunk too
- all of them in tokens count

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import TokenTextSplitter
TOKENtext_splitter = TokenTextSplitter(chunk_size=430, chunk_overlap=20)
splitted_text_sum = TOKENtext_splitter.split_text(editedtext) #create a list
TOKENtext_splitter = TokenTextSplitter(chunk_size=280, chunk_overlap=20)
splitted_text_qg = TOKENtext_splitter.split_text(editedtext) #create a list
TOKENtext_splitter = TokenTextSplitter(chunk_size=150, chunk_overlap=20)
splitted_text_qna = TOKENtext_splitter.split_text(editedtext) #create a list

In [None]:
# @title
def summary(textplitted, model, modelname):
  console.print(f'Using {modelname} for summarization task...')
  start = datetime.datetime.now()
  with console.status("Generating summary...",spinner="dots12"):
    summary =""
    for item in textplitted:
      text = item
      template_summary = f'''ARTICLE: {text}

      What is a one-paragraph summary of the above article?

      '''
      res = model(template_summary, temperature=0.3, repetition_penalty=1.3, max_length=300, do_sample=True)[0]['generated_text']
      #console.print(res)
      #console.print('---')
      summary = summary + res + '\n'
  delta = datetime.datetime.now() - start
  console.print('[bold]----------------------------')
  console.print(summary)
  console.print(f"[red1 bold]Full SUMMARY Completed in {delta}")
  return summary

sum783 = summary(splitted_text_sum, model783,'🎴🎴🎴LaMiniFlanT5-783M')

In [None]:
def summary77(textplitted, model, modelname):
  writehistory(f"{str(datetime.datetime.now())} Using {modelname} for summarization task...")
  start = datetime.datetime.now()
  with console.status("Generating summary...",spinner="dots12"):
    summary ="SUMMARY:\n"
    for item in textplitted:
      text = item
      template_summary = f'''Text: {text}

Write a complete summary of the above text.
'''
      res = model(template_summary, temperature=0.3, repetition_penalty=1.3, max_length=300, do_sample=True)[0]['generated_text']
      #console.print(res)
      #console.print('---')
      summary = summary + res + '\n'
  delta = datetime.datetime.now() - start
  writehistory(f"{str(datetime.datetime.now())} [bold]----------------------------")
  writehistory(f"{str(datetime.datetime.now())} {summary}")
  writehistory(f"{str(datetime.datetime.now())} [red1 bold]Full SUMMARY Completed in {delta}")
  return summary

In [None]:
# @title
sum77 = summary77(splitted_text_sum, model77,'🎴LaMiniFlanT5-77M')

Output()

In [None]:
sum77 = summary77(splitted_text_qg, model77,'🎴LaMiniFlanT5-77M')

Output()



---



---



In [None]:
def qgen77(textplitted, model, modelname):
  writehistory(f"{str(datetime.datetime.now())} Using {modelname} for for Question generation task -  2 PER CHUNK...")
  start = datetime.datetime.now()
  quest2 = []
  with console.status("Generating Questions...",spinner="dots12"):
    for item in textplitted:
      text = item
      template_qg = f'''{text}\n\n
write two important questions about the above text.
Questions:
1.
2.
'''
      res = model(template_qg, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
      ed_res = res.replace('? ','?#')
      list_res = ed_res.split('#')
      for i in list_res:
        #a = i[3:]  REMOVED TRUNK on the list, generated by LaMini
        quest2.append(i[3:])

  writehistory(f"{str(datetime.datetime.now())} [bold]----------------------------")
  writehistory(f"{str(datetime.datetime.now())} {quest2}")
  return quest2

In [None]:
qg77 = qgen77(splitted_text_qg, model77,'🎴🎴LaMiniFlanT5-77M')

Output()



---



---



---



### not used

In [None]:
# @title
def qgen(textplitted, model, modelname):
  console.print(f'Using {modelname} for for Question generation task -  2 PER CHUNK...')
  start = datetime.datetime.now()
  quest2 = []
  with console.status("Generating Questions...",spinner="dots12"):
    for item in textplitted:
      text = item
      template_qg = f'''{text}.\nAsk two relevant question about this text.
      '''
      res = model(template_qg, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
      ed_res = res.replace('? ','?#')
      list_res = ed_res.split('#')
      for i in list_res:
        #a = i[3:]  REMOVED TRUNK on the list, generated by LaMini
        quest2.append(i)

  console.print('[bold]----------------------------')
  console.print(quest2)
  return quest2

qg783 = qgen(splitted_text_qg, model783,'🎴🎴🎴LaMiniFlanT5-783M')

In [None]:
for question in qg77:
  context = editedtext
  query = question
  q1,o1 = stableQnA(query,context,450,llm)
  console.print('[bold]----------------------------')

Output()

Output()

Output()

Output()

Output()

Output()



---



---



---



In [None]:
def qnAgen(textplitted, summary, quest2, model, modelname):
  console.print(f"[bold blue]Generating replies with {modelname}...")
  console.print('[bold]----------------------------')
  i = 0
  for items in textplitted:
    context = items + summary
    question = quest2[i]
    template_qna = f'''Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".\n\n{context}\n\n{question}'''
    res = model(template_qna, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
    console.print(f"[bold blue]{question}")
    console.print(res)
    console.print('---')
    i += 1
    question = quest2[i]
    template_qna = f'''Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".\n\n{context}\n\n{question}'''
    res = model(template_qna, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
    console.print(f"[bold blue]{question}")
    console.print(res)
    console.print('---')
    i += 1


In [None]:
qnAgen(splitted_text_qg, sum77, qg783, model77,'🎴LaMiniFlanT5-77M')
## got an error since 77M model cannot follow the isntruction to generate 2 questions per chunk

### Highlither Function


In [None]:
# key = "notarization for iOS apps"
# mytest = "The iOS changes will include notarization for iOS apps, authorization for marketplace developers and disclosures on alternative payments."
def evidenzia(key, fulltext):
  from rich.text import Text
  mytest = fulltext
  text = Text(mytest)
  l = len(key)
  x = mytest.find(key)
  text.stylize("black on bright_yellow", x, (x+l))
  console.print(text)
  return text

In [None]:
a = """Apple's sweeping changes to its iOS App Store in Europe have significant implications for smaller companies,
as Spotify recently previewed what it will look like after March 7."""
b = """for smaller companies,
as Spotify"""
s =evidenzia(b,a)

### Create a VectorDB with HugginFace embeddings

In [None]:
TOKENtext_splitter = TokenTextSplitter(chunk_size=128, chunk_overlap=0)
splitted_text_qna = TOKENtext_splitter.split_text(editedtext) #create a list

In [None]:
def createDB(docs, embeddings, dbname):
  """
  Function that create a Chroma Vector store of splitted documents
  with provided embeddings, and save it locally.
  docs: text_splitter docs object
  embeddings: HuggingFace embeddings object
  dbname: string, the name of the persistent db
  RETURN  the Chroma db object
  """
  from langchain.vectorstores import Chroma
  import datetime
  start = datetime.datetime.now()
  db = Chroma.from_documents(docs, embeddings,persist_directory=f"./{dbname}")
  stop = datetime.datetime.now()
  delta = stop-start
  writehistory(f"{str(datetime.datetime.now())} Vector db generated in {delta}")
  return db


In [None]:
from langchain.schema.document import Document
docdocs = []
for i in range(0,len(splitted_text_qna)):
  docdocs.append(Document(page_content = splitted_text_qna[i],
                          metadata = {'source': '/content/2024-01-27 11.23.05 Apple s iOS App Store announces_edited.txt',
                              'title': "Apple's iOS App Store announces sweeping changes in the EU",
                              'author': 'Ashley Gold, author of Axios Pro',
                              'url' : 'https://www.axios.com/2024/01/25/apple-app-store-eu-changes',
                              }))
docsum = Document(page_content = sum77, metadata = {
                  'source': '/content/2024-01-27 11.23.05 Apple s iOS App Store announces_edited.txt',
                  'title': "Apple's iOS App Store announces sweeping changes in the EU",
                  'author': 'Ashley Gold, author of Axios Pro',
                  'url' : 'https://www.axios.com/2024/01/25/apple-app-store-eu-changes',
                  'type': 'summary'})

In [None]:
db3 = createDB(docdocs, hf_embeddings, 'db-128-0-BGEBase')

Vector db generated in 0:00:05.348339


```
# load from disk
db3 = Chroma(persist_directory="./chroma_db480tok-20", embedding_function=hf_embeddings)
```



### Full functions for QnA and Reranking with summary

In [None]:
#from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
#llm = HuggingFacePipeline(pipeline=model783)

In [None]:
def QnA_Rerank_Stable(db,k, summarization, query,model):
  """
  return the generated answer to a similarity search query
  with Re-ranking of k elements
  and run the QnA chain with Prompt Template
  inputs:
  db -> ChromaDB object instance
  k -> number of hits for the similarity search (be aware of Max Context Lenght!)
       must be >= 3
  summarization -> Langchain Document object with the Summarization
  query -> string witht he question for the similarity search
  llm -> llama-cpp-python model instance
  return res: str generated answer by llm
         delta: time object, duration of the llm generation
         ques: the question string
         reordered_docs: list of LangChainDocuments ReRanked with Summarization
  """
  import datetime
  start = datetime.datetime.now()
  # Create a retriever
  retriever = db.as_retriever(search_kwargs={"k": k})
  from langchain.document_transformers import LongContextReorder
  from langchain.chains import StuffDocumentsChain, LLMChain
  from langchain.prompts import PromptTemplate
  # Get relevant documents ordered by relevance score
  context_set = retriever.get_relevant_documents(query)
  #print(str(context_set))
  # Reorder the documents:
  # Less relevant document will be at the middle of the list and more
  # relevant elements at beginning / end.
  reordering = LongContextReorder()
  reordered_docs = reordering.transform_documents(context_set)
  reordered_docs.insert(-1,summarization)
  #print(str(reordered_docs))
  # We prepare and run a custom Stuff chain with reordered docs as context.
  # Override prompts
  context = ''
  for i in reordered_docs:
    context += i.page_content
  template = f"""<|user|>\nGiven this text extracts:\n-----\n{context}\n-----\nPlease answer the question. Your answer must be precise, informative and organized into bullet points. If the question is unanswerable, ""say \"unanswerable\".\nQuestion: {query}<|endoftext|>\n<|assistant|>"""
  with console.status("StableLM-Zephyr-3B AI is working ✅✅✅ ...",spinner="dots12"):
    output = model(
      template, # Prompt
      temperature=0.3,
      max_tokens=450,  # Generate up to 512 tokens
      stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
      echo=False        # Whether to echo the prompt
    )
  result = output['choices'][0]['text']
  delta = datetime.datetime.now() - start
  return result, delta, query, reordered_docs

def evidenzia_res(keys, fulltext,summary):
  from rich.text import Text
  mytest = fulltext + summary.page_content
  text = Text(mytest)
  for item in keys:
    l = len(item.page_content)
    x = mytest.find(item.page_content)
    text.stylize("black on bright_yellow", x, (x+l))
  console.print(text)
  return text

## QnA_Rerank_Plus with 🧠💎 StablLM-Zephyr-3B using Reranking and Summary

In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
question = "What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?"
r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,3, docsum, question,llm)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")

Output()

In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
question = "What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?"
r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,5, docsum, question,llm)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")

Output()

In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
question = "What are the major changes Apple is implementing in the EU App Store in response to Europe's Digital Markets Act?"
r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,9, docsum, question,llm)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")



Output()



---



---



---



In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🧠💎 StablLM-Zephyr-3B with RERANKING and Summary"))
for question in qg77:
  r1,t1,q1,rd1 = QnA_Rerank_Stable(db3,2, docsum, question,llm)
  console.print(f"[bold red1]Question: {q1}")
  console.print(r1)
  console.print("---")
  console.print(f"generated in {t1}")

Output()

Output()

Output()

Output()

Output()

Output()

In [None]:
def evidenzia_res(keys, fulltext,summary):
  from rich.text import Text
  mytest = fulltext + summary.page_content
  text = Text(mytest)
  for item in keys:
    l = len(item.page_content)
    x = mytest.find(item.page_content)
    text.stylize("black on bright_yellow", x, (x+l))
  console.print(text)
  return text

In [None]:
a = evidenzia_res(rd1,editedtext,docsum)



---



---



---



In [None]:
r1,t1,q1,rd1 = QnA_Rerank_Plus(db3,2, docsum, qg783[-1][3:],model783)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")

In [None]:
r1,t1,q1,rd1 = QnA_Rerank_Plus(db3,2, docsum, qg783[-1][3:],model77)
console.print(f"[bold red1]Question: {q1}")
console.print(r1)
console.print("---")
console.print(f"generated in {t1}")

## QnA_Rerank_Plus with LaMini Models using Reranking and Summary

In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🦙 **783M LaMini**"))
for question in qg783:
  r1,t1,q1,rd1 = QnA_Rerank_Plus(db3,2, docsum, question,model783)
  console.print(f"[bold red1]Question: {q1}")
  console.print(r1)
  console.print("---")
  console.print(f"generated in {t1}")

In [None]:
from rich.markdown import Markdown
console.print(Markdown("# QnA GENERATED BY 🦙 77M LaMini"))
for question in qg783:
  r1,t1,q1,rd1 = QnA_Rerank_Plus(db3,2, docsum, question,model77)
  console.print(f"[bold red1]Question: {q1}")
  console.print(r1)
  console.print("---")
  console.print(f"generated in {t1}")

### Huighlight the chunks from the entire text

In [None]:
def evidenzia_res(keys, fulltext,summary):
  from rich.text import Text
  mytest = fulltext + '\n\nSUMMARY:\n' + summary.page_content
  text = Text(mytest)
  for item in keys:
    l = len(item.page_content)
    x = mytest.find(item.page_content)
    text.stylize("black on bright_yellow", x, (x+l))
  console.print(text)
  return text

In [None]:
a = evidenzia_res(rd1,editedtext,docsum)

## Old stuff

### test singolo da me... llmChain pipeline sucks

In [None]:
context = ''
for i in rd1:
  context += i.page_content
query = qg77[0]
mypronmp = f"""Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".

{context}

{query}"""

In [None]:
res = model783(mypronmp, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
console.print(f"[bold red1]Question: {query}")
console.print(res)

In [None]:
context = ''
for i in rd1:
  context += i.page_content

In [None]:
console.print(rd1)

### Old tests

In [None]:
## Load  MBZUAI/LaMini-Flan-T5-77M for summarization
with console.status("Loading ✅ LaMini77M...",spinner="dots12"):
    model77 = pipeline('text2text-generation',model="MBZUAI/LaMini-Flan-T5-77M")
console.print('Loaded  MBZUAI/LaMini-Flan-T5-77M for summarization')

Output()

config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.50k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [None]:
console.print(f"[bold blue]Generating replies with chatT5...")
console.print('[bold]----------------------------')
i = 0
for items in splitted_text:
  context = items + summary
  question = quest2[i]
  template_qna = f'''Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".\n\n{context}\n\n{question}'''
  res = model77(template_qna, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
  console.print(f"[bold blue]{question}")
  console.print(res)
  console.print('---')
  i += 1
  question = quest2[i]
  template_qna = f'''Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".\n\n{context}\n\n{question}'''
  res = model77(template_qna, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
  console.print(f"[bold blue]{question}")
  console.print(res)
  console.print('---')
  i += 1

Token indices sequence length is longer than the specified maximum sequence length for this model (525 > 512). Running this sequence through the model will result in indexing errors


In [None]:
context = splitted_text[-1] + summary
question = quest2[-1] #"What is Artificial Intelligence?"
template_qna = f'''Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".\n\n{context}\n\n{question}'''
res = chatT5(template_qna, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
console.print(f"[bold blue]{question}")
console.print(res)
console.print('---')

I cannot find anything about notarization. the replies of the 2 models are completely different...

In [None]:
# key = "notarization for iOS apps"
# mytest = "The iOS changes will include notarization for iOS apps, authorization for marketplace developers and disclosures on alternative payments."
def evidenzia(key, fulltext):
  from rich.text import Text
  mytest = fulltext
  text = Text(mytest)
  l = len(key)
  x = mytest.find(key)
  text.stylize("black on bright_yellow", x, (x+l))
  console.print(text)
  return text

In [None]:
a = """Apple's sweeping changes to its iOS App Store in Europe have significant implications for smaller companies,
as Spotify recently previewed what it will look like after March 7."""
b = """for smaller companies,
as Spotify"""
s =evidenzia(b,a)

In [None]:
#EVIDENZIO CHUNK FUNZIONA!!!!!
s =evidenzia(splitted_text[1],editedtext)

In [None]:
s =evidenzia("notarization",editedtext)

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import TokenTextSplitter
TOKENtext_splitter = TokenTextSplitter(chunk_size=280, chunk_overlap=20)
splitted_text = TOKENtext_splitter.split_text(editedtext) #create a list

In [None]:
console.print(len(splitted_text))
console.print("---")
console.print(splitted_text[0])

In [None]:
console.print('Using model77 for summarization task...')
start = datetime.datetime.now()
with console.status("Generating summary...",spinner="dots12"):
  summary =""
  for item in splitted_text:
    text = item
    template_summary = f'''ARTICLE: {text}

    What is a one-paragraph summary of the above article?

    '''
    res = model77(template_summary, temperature=0.3, repetition_penalty=1.3, max_length=300, do_sample=True)[0]['generated_text']
    #console.print(res)
    #console.print('---')
    summary = summary + res + '\n'
delta = datetime.datetime.now() - start
console.print('[bold]----------------------------')
console.print(summary)
console.print(f"[red1 bold]Full SUMMARY Completed in {delta}")

Output()

### 3️⃣❓🎰 Generation

In [None]:
console.print('Using model783 for Qestion generation task...')
start = datetime.datetime.now()
quest3 = []
with console.status("Generating summary...",spinner="dots12"):
  for item in splitted_text:
    text = item
    template_qg = f'''{text}.\nAsk three relevant question about this text.
    '''
    res = model783(template_qg, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
    ed_res = res.replace('? ','?#')
    list_res = ed_res.split('#')
    for i in list_res:
      a = i[3:]
      quest3.append(a)

console.print('[bold]----------------------------')
console.print(quest3)

Output()

### 2️⃣❓🎰 Generation

In [None]:
console.print('Using model783 for Question generation task...')
start = datetime.datetime.now()
quest2 = []
with console.status("Generating Question...",spinner="dots12"):
  for item in splitted_text:
    text = item
    template_qg = f'''{text}.\nAsk two relevant question about this text.
    '''
    res = model783(template_qg, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
    ed_res = res.replace('? ','?#')
    list_res = ed_res.split('#')
    for i in list_res:
      a = i[3:]
      quest2.append(a)

console.print('[bold]----------------------------')
console.print(quest2)

Output()

### 🧠🗣️❓ Question Answering

In [None]:
context = splitted_text[0]
question = quest2[0]
template_qna = f'''Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".\n\n{context}\n\n{question}'''
res = model783(template_qna, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
console.print(f"[bold blue]{question}")
console.print(res)

In [None]:
context = splitted_text[0] + summary
question = quest2[1]
template_qna = f'''Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".\n\n{context}\n\n{question}'''
res = model783(template_qna, temperature=0.3, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
console.print(f"[bold blue]{question}")
console.print(res)

In [None]:
context = splitted_text[0] + summary
question = quest2[1]
template_qna = f'''Read this and answer the question. Your answer must be informative. If the question is unanswerable, ""say \"unanswerable\".\n\n{context}\n\n{question}'''
res = model783(template_qna, temperature=0.4, repetition_penalty=1.3, max_length=250, do_sample=True)[0]['generated_text']
console.print(f"[bold blue]{question}")
console.print(res)

In [None]:
from rich.text import Text

text = Text(splitted_text)
#text.stylize("bright_yellow on black", 0, 6)
#console.print(text)


left foot right foot left foot right. feet in the day, feet at night.


### section too deep with keyword store

In [None]:
keys = []
for i in trange(0,len(splitted_text)):
  text = splitted_text[i]
  keys.append({'document' : filename,
              'title' : title,
              'author' : author,
              'url' : url,
              'doc': text,
              'keywords' : extract_keys(text, 1, 0.34)
  })

In [None]:
console.print(keys[1])

### Create LangChain Documents with full metadata

In [None]:
def SMS(fulltext, llmpipeline, chunks, overlap, documentfilename, title, author,docurl):
  """
  SUS aka Summarize Metadata Suggest
  documentfilename -> str,  'extracted from fileupload',
  title -> str,  title of the document 'input in the GUI',
  author -> str,  author of the document 'input from the GUI',
  docurl -> str,  url of the document 'if any from GUI anyway',
  ---
  Description
  Function that take a long string text and summarize it: returning also
  suggested QnA from the process
  The fulltext is split in Tokens according to chunks and overlap specified
  inputs:
  llmpipeline -> a transformers pipeline instance for the text generation
  fulltext - > string
  chunks, overlap - > integers
  ---
  returns:
  # sum_db constains all original file metadata and the ones from GUI user input
  # qna is a list of dict with EDA in dict format {'question':item,'answer': res}
  # keys a list of ready for LangChain.Document creation with metadata
  """
  from langchain.document_loaders import TextLoader
  from langchain.text_splitter import TokenTextSplitter
  TOKENtext_splitter = TokenTextSplitter(chunk_size=chunks, chunk_overlap=overlap)
  sum_context = TOKENtext_splitter.split_text(fulltext) #create a list
  model77 = llmpipeline
  final = ''
  keys = []
  strt = datetime.datetime.now()
  for i in trange(0,len(sum_context)):
    text = sum_context[i]
    template_bullets = f'''ARTICLE: {text}

    What is a one-paragraph summary of the above article?

    '''
    res = model77(template_bullets, temperature=0.3, repetition_penalty=1.3, max_length=400, do_sample=True)[0]['generated_text']
    final = final + ' '+ res+'\n'
    keys.append({'document' : documentfilename,
                'title' : title,
                'author' : author,
                'url' : docurl,
                'doc': text,
                'keywords' : extract_keys(text, 2, 0.34)
    })
  ## REMOVED REWRITING FNCTION ####
  tags_summary = extract_keys(final, 1, 0.34)
  elaps = datetime.datetime.now() - strt
  console.print(Markdown("## SUMMARY"))
  console.print(Markdown(final))
  console.print(Markdown("### METATAGS"))
  console.print(Markdown(str(tags_summary)))
  console.print("[green2 bold]---")
  console.print(f"[red1 bold]Full RAG PREPROCESSING Completed in {elaps}")
  console.print(Markdown("---"))
  logger = "# SUMMARY and METATAGS\n---\n#SUMMARY\n" + final + "#MetaTAGS: "+ str(tags_summary) +"\n---\n\n"
  writehistory(logger)

  # Generate Suggested questions from the text
  # Then Reply to the questions
  console.print(f"[green2 bold]Generating Qna...")
  finalqna = ''
  strt = datetime.datetime.now()
  for i in trange(0,len(sum_context)):
    text = sum_context[i]
    template_final = f'''{text}.\nAsk few question about this article.
  '''
    res = model77(template_final, temperature=0.3, repetition_penalty=1.3, max_length=400, do_sample=True)[0]['generated_text']
    finalqna = finalqna + '\n '+ res

  delt = datetime.datetime.now()-strt
  console.print(Markdown("---"))
  console.print(f"[red1 bold]Questions generated in {delt}")
  lst = finalqna.split('\n')
  final_lst = []
  for items in lst:
    if items == '':
      pass
    else:
      final_lst.append(items)

  qna = []
  for item in final_lst:
    question = item

    template_qna = f'''Read this and answer the question. If the question is unanswerable, ""say \"unanswerable\".\n\n{final}\n\n{question}
    '''

    start = datetime.datetime.now()
    res = model77(template_qna, temperature=0.3, repetition_penalty=1.3, max_length=400, do_sample=True)[0]['generated_text']
    elaps = datetime.datetime.now() - start
    """
    console.print(f"[bold deep_sky_blue1]{question}")
    console.print(Markdown(res))
    console.print(f"[red1 bold]Qna Completed in {elaps}")
    console.print(Markdown("---"))
    """
    qna.append({'question':item,
                'answer': res})
    logger = "QUESTION: " + question + "\nANSWER: "+ res + "\n---\n\n"
    writehistory(logger)
  sum_db = {'document' : documentfilename,
            'title' : title,
            'author' : author,
            'url' : docurl,
            'summary' : final,
            'keywords' : tags_summary,
             'qna' : qna}
  # sum_db constains all original file metadata and the ones from GUI user input
  # qna is a list of dict with EDA questions and answers
  # keys a list of ready for LangChain.Document creation with metadata
  return sum_db, qna, keys

#Funtion to club the ones above

def start_SMS(fulltext, documentfilename, title, author,url):
  console.print("Starting TXT Summarization...")
  global SUMfinal
  global QNASet
  global DOCSkeys
  SUMfinal,QNASet, DOCSkeys = SMS(fulltext, model77, 450, 10, documentfilename, title, author,url)
  console.print("Process Completed!")
  return SUMfinal

def printQNA(produced_qna):
  head = """### Generated Qna<br><br>"""
  qnaset = " "
  for item in produced_qna:
    temp = f"""
> **Question: {item['question']}**<br>
> *Answer: {item['answer']}*<br>
"""
    qnaset = qnaset + temp
  finalQnA = head + qnaset
  console.print(Markdown(finalQnA))
  return finalQnA

def start_search(b):
  global fulltext
  with open(b, encoding="utf8") as f:
    fulltext = f.read()
  f.close()
  console.print("[bold deep_sky_blue1]text has been captured")
  console.print(Markdown("---"))
  console.print("Saved in variable `fulltext`")
  statmessage = "Operation Successful\nClick on Start Summarization ↗️"
  return fulltext

def generate(instruction):
  prompt = instruction
  start = datetime.datetime.now()
  with console.status("AI is thinking...",spinner="dots12"):
    output = model77(prompt, temperature=0.3, repetition_penalty=1.3, max_length=400, do_sample=True)[0]['generated_text']
  delta = datetime.datetime.now()-start
  console.print(f"[green1 italic]:two_o’clock: generated in {delta}")
  console.print(f"[bold bright_yellow]🦙 LaMini-Flan-77M: {output}")
  return output, delta

In [None]:
fulltext = start_search('/content/2023-12-03 18.41.12 Governing societies with Artificial Intelligence .txt')

In [None]:
SUMfinal,QNASet, DOCSkeys = SMS(text, model783, 350, 10,
                                '2023-12-01 09.00.47 Governing societies with Artificial Intelligence .txt',
                                'Governing societies with Artificial Intelligence',
                                'Giles Crouch','https://gilescrouch.medium.com/governing-society-and-artificial-intelligence-23882b9ce473')
console.print("Process Completed!")

Output()

Output()

In [None]:
questans = printQNA(QNASet)

In [None]:
console.print(SUMfinal)

In [None]:
console.print(DOCSkeys)

In [None]:
############### CREATE SUMMARY DOC DATABASE ##################
"""
FROM SUMfinal
    'document': '2023-12-01 09.00.47 Governing societies with Artificial Intelligence .txt',
    'title': 'Governing societies with Artificial Intelligence',
    'author': 'Giles Crouch',
    'url': 'https://gilescrouch.medium.com/governing-society-and-artificial-intelligence-23882b9ce473',
    'summary': " ...",
    'keywords': ['ai', 'article', 'society', 'enlightenment', 'importance'],
"""
from langchain.schema.document import Document
docsum = Document(page_content = SUMfinal['summary'],
                  metadata = {'source': SUMfinal['document'],
                              'type': 'summary',
                              'title': SUMfinal['title'],
                              'author': SUMfinal['author'],
                              'url' : SUMfinal['url'],
                              'keywords' : SUMfinal['keywords']
                              })

In [None]:
############### CREATE cHUnKS DOC DATABASE ##################
"""
FROM DOCSkeys
{
        'document': '2023-12-01 09.00.47 Governing societies with Artificial Intelligence .txt',
        'title': 'Governing societies with Artificial Intelligence',
        'author': 'Giles Crouch',
        'url': 'https://gilescrouch.medium.com/governing-society-and-artificial-intelligence-23882b9ce473',
        'doc': ' as inherently true, ..ought that through beyond the narrow',
        'keywords': ['ai', 'innovation', 'societal', 'industry', 'slowing']
    }

"""
from langchain.schema.document import Document
docdocs = []
for i in range(0,len(DOCSkeys)):
  docdocs.append(Document(page_content = DOCSkeys[i]['doc'],
                          metadata = {'source': DOCSkeys[i]['document'],
                              'type': 'chunk',
                              'title': DOCSkeys[i]['title'],
                              'author': DOCSkeys[i]['author'],
                              'url' : DOCSkeys[i]['url'],
                              'keywords' : DOCSkeys[i]['keywords']
                              }))

In [None]:
console.print(docdocs)

### Save and Load DATASET to and from PICKLE file

In [None]:
## SAVE IN PICKLE THE DOCUMENTS SET WITH METADATA
output = open('DocumentData.pkl', 'wb')
pickle.dump(docdocs, output)
output.close()
console.print(Markdown("> Documents Data saved..."))
console.print(" - ")
## SAVE IN PICKLE THE DOCUMENTS SET WITH METADATA
output = open('summaryData.pkl', 'wb')
pickle.dump(docsum, output)
output.close()
console.print(Markdown("> Summary Data saved..."))

In [None]:
def SaveFirstData(datafilename,datadocs, summaryfilename, sumdocs):
  import pickle
  ## SAVE IN PICKLE THE DOCUMENTS SET WITH METADATA
  output = open(datafilename, 'wb')
  pickle.dump(datadocs, output)
  output.close()
  console.print(Markdown(f"> Documents Data saved in {datafilename}..."))
  console.print(" - ")
  ## SAVE IN PICKLE THE DOCUMENTS SET WITH METADATA
  output = open(summaryfilename, 'wb')
  pickle.dump(sumdocs, output)
  output.close()
  console.print(Markdown("> Summary Data saved in {summaryfilename}..."))

In [None]:
import pickle
##LOAD DOCS PICKLE
pkl_file = open('/content/DocumentData.pkl', 'rb')
data_docs = pickle.load(pkl_file)
pkl_file.close()
console.print("Documents Loaded from Pickle")
console.print(Markdown("---"))
console.print(data_docs)
console.print(Markdown("---"))



In [None]:
import pickle
##LOAD SUMS PICKLE
pkl_file = open('/content/summaryData.pkl', 'rb')
sums_docs = pickle.load(pkl_file)
pkl_file.close()
console.print("Summaries Loaded from Pickle")
console.print(Markdown("---"))
console.print(sums_docs)
console.print(Markdown("---"))

### yet to be prepared a new function to stack older data into new

### a new function to join more than one CHROMA DB

### Create LangChain Documents and Vector Store index

In [None]:
#Chuncking at 200 tokens no overlap
from langchain.text_splitter import TokenTextSplitter
TOKENtext_splitter = TokenTextSplitter(chunk_size=200, chunk_overlap=0)
sum_context = TOKENtext_splitter.split_text(fulltext) #create a list

In [None]:
def createDB(docs, embeddings, dbname):
  """
  Function that create a Chroma Vector store of splitted documents
  with provided embeddings, and save it locally.
  docs: text_splitter docs object
  embeddings: HuggingFace embeddings object
  dbname: string, the name of the persistent db
  RETURN  the Chroma db object
  """
  from langchain.vectorstores import Chroma
  import datetime
  start = datetime.datetime.now()
  db = Chroma.from_documents(docs, embeddings,persist_directory=f"./{dbname}")
  stop = datetime.datetime.now()
  delta = stop-start
  print(f"Vector db generated in {delta}")
  return db


In [None]:
db3 = createDB(docdocs, hf_embeddings, 'db-200-0MULTI')

```
# load from disk
db3 = Chroma(persist_directory="./chroma_db480tok-20", embedding_function=hf_embeddings)
```



In [None]:
dataset = CreateRAGAS_DF(QNASet,CONTEXTSet,SUMfinal,llm)

Output()

In [None]:
console.print(dataset)

### Save and Load DATASET to and from PICKLE file

In [None]:
import pickle

output = open('dataset.pkl', 'wb')
pickle.dump(dataset, output)

output.close()

In [None]:
import pprint, pickle

pkl_file = open('dataset.pkl', 'rb')

data2 = pickle.load(pkl_file)
pkl_file.close()
console.print(data2)



---



---



---



In [None]:
!pip install transformers -U --no-cache-dir

Collecting transformers
  Downloading transformers-4.37.1-py3-none-any.whl (8.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.35.2
    Uninstalling transformers-4.35.2:
      Successfully uninstalled transformers-4.35.2
Successfully installed transformers-4.37.1


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-2-1_6b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
  "stabilityai/stablelm-2-1_6b",
  trust_remote_code=True,
  torch_dtype="auto",
)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer)


In [None]:
prompt = "what is artificial intelligence?"
o = pipe(prompt, temperature=0.3,
         repetition_penalty=1.3,
         max_new_tokens=200,
         do_sample=True)[0]['generated_text']


Setting `pad_token_id` to `eos_token_id`:100257 for open-end generation.


In [None]:
print(o)

what is artificial intelligence? Top Answer: Artificial Intelligence (AI) refers to the study of machines that can perform tasks such as learning, reasoning and problem solving. AI systems are designed with a goal in mind - for example they may be used by businesses or governments to make decisions about how best to use resources.
Is this text about Society & Culture? Yes or No?
No
