### Install Necessary Packages

In [1]:
!pip -q install langchain
!pip -q install torch
!pip -q install sentence_transformers
!pip -q install faiss-cpu
!pip -q install huggingface-hub
!pip -q install pypdf
!pip -q install accelerate
!pip -q install llama-cpp-python
!pip -q install git+https://github.com/huggingface/transformers


  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


**LangChain** is a framework designed to simplify the creation of applications using large language models. As a language model integration framework, LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis

**PyTorch** is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella

**SentenceTransformers** is a Python framework for state-of-the-art sentence, text and image embeddings. Install the Sentence Transformers library.

**Faiss** is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. It is developed by Facebook AI Research.

The H**ugging Face Hub** is a platform with over 350k models, 75k datasets, and 150k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together. The Hub works as a central place where anyone can explore, experiment, collaborate, and build technology with Machine Learning

**pypdf** is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.

**Accelerate** is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code! In short, training and inference at scale made simple, efficient and adaptable.

**llama-cpp-python** is a Python binding for llama.cpp.
It supports inference for many LLMs models, which can be accessed on Hugging Face.

**Transformers** provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

These models can be applied on:

    📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, and text generation, in over 100 languages.
    🖼️ Images, for tasks like image classification, object detection, and segmentation.
    🗣️ Audio, for tasks like speech recognition and audio classification.

Transformer models can also perform tasks on several modalities combined, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.

### Download Mistral 7B Model
Get Link From Source

In [None]:
!wget 'LINK HERE'

### Import Necessary LangChain Functions


In [5]:
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import LlamaCpp
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFDirectoryLoader

### Load PDF Files


In [26]:
#load pdf files
loader = PyPDFDirectoryLoader("Data/")
data = loader.load()

In [27]:
print(data)

[Document(page_content=" \n   Research Project : Transformer -based approach for bug identification, summarization  \nand classification  \n Software bug reports are crucial in software maintenance and evolution, with concise summaries considerably enhancing the efficacy of bug triggers and ultimately contributing to developing high- quality software products  [1, 3]. Due to the emergence of social media, there \nis an exponential increase in the demand for developing various software applications targeting different application domains. Moreover, all the popular software applications, such as Bugzilla, Facebook, etc., constantly release new versions of the applications, making the software management and evolution process challenging. Recently, there has been an increase in the bugs reported by the end- user across various social media platforms for a particular \napplication. For example, based on a study published in 2013, Bugzilla receives 135 bug reports daily, which might be incr

### Split Files into Chunks

In [29]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=20)

text_chunks = text_splitter.split_documents(data)


In [30]:
len(text_chunks)

7

In [31]:
text_chunks[1]

Document(page_content='Supervised by: Javed Ali Khan an d Muham mad Yaqoob  \n \n References  \n \n1. Khan JA, Liu L,Wen L. Requirements knowledge acquisition from online user forums. IET Softw. 2020;14(3):242- 253. \n2. Ullah T, Khan JA, Khan ND, Yasin A, Arshad H. Exploring and mining rationale information for low -rating software applications. Soft Comput. 2023;27:1- 26. \n3. Khan JA,Yasin A, Fatima R,VasanD,Khan AA, KhanAW.Valuating requirements arguments in the online user’s forum for requirements decision- making:  the CrowdRE -\nVArg framework. Software: Practice and Experience. John Wiley & Sons Ltd; 2022. \n4. Khan JA, Xie Y, Liu L, Wen L. Analysis of requirements -related arguments in user \nforums. Proceeding of 2019 IEEE 27th International Requirements Engineering Conference (RE). IEEE; 2019:63- 74.', metadata={'source': 'Data/Software_Explainability_Khan.pdf', 'page': 1})

### Initialise Embeddings

In [32]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


### Create Embeddings for each of the Text Chunk

In [33]:
vector_store = FAISS.from_documents(text_chunks, embedding=embeddings)

### Import Model

In [None]:

llm = LlamaCpp(
    streaming = True,
    model_path="/content/mistral-7b-instruct-v0.1.Q4_K_M.gguf",
    temperature=0.75,
    top_p=1,
    verbose=True,
    n_ctx=4096
)

### Initiase QA Chain

In [36]:
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vector_store.as_retriever(search_kwargs={"k": 2}))

### Read User Query

In [38]:
query = "bug identification done by which approch"

### Run User Query

In [39]:
qa.run(query)


llama_print_timings:        load time =    2845.63 ms
llama_print_timings:      sample time =      81.64 ms /   143 runs   (    0.57 ms per token,  1751.53 tokens per second)
llama_print_timings: prompt eval time =  658619.58 ms /  1448 tokens (  454.85 ms per token,     2.20 tokens per second)
llama_print_timings:        eval time =   98718.94 ms /   143 runs   (  690.34 ms per token,     1.45 tokens per second)
llama_print_timings:       total time =  758438.50 ms /  1591 tokens


" The proposed approach is not specified in the given context, but it mentions an interest in extending this research in multiple directions. Some of these directions are 1) Improve the performance of bug identification and classification by introducing customized transfer learning algorithms, 2) Identify the performance of the proposed transfer learning algorithm on multiple data sets aiming at the generalization of the proposed approach, 3) Propose a transfer learning approach to summarize the bug reports efficiently and simplify the job for developers and software vendors by restoring the software quality and user satisfaction, and 4) An automated approach that identifies the bug's severity in the bug- tracking system using a transfer learning approach."

In [None]:
import sys

while True:
  user_input = input(f"Input Prompt: ")
  if user_input == 'exit':
    print('Exiting')
    sys.exit()
  if user_input == '':
    continue
  result = qa({'query': user_input})
  print(f"Answer: {result['result']}")