#### **Importing Libraries**

In [15]:
# import required libraries
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_community.document_loaders import TextLoader, WebBaseLoader

#### **Loading the data from a text file using TextLoader**

In [16]:
# creating a text loader
loader = TextLoader("speech.txt")

text_document = loader.load()
text_document

[Document(metadata={'source': 'speech.txt'}, page_content='We are in a generation, where technology has surrounded us from all sides. Our everyday life runs on the use of technology, be it in the form of an alarm clock or a table lamp. Technology has been an important part of our daily lives. Therefore, it is important for the students to be familiar with the term technology. Therefore, we have provided a long speech on technology for students of all age groups. There is also a short speech and a 10 lines speech given in this article.\n\nA warm welcome to everyone gathered here today. I am here to deliver a speech on technology which has taken a tremendous role in our day to day life. We all are in a generation where everything is dependent on technology. Letâ€™s understand what technology is through the lens of Science.\n\n\nTechnology comes in the form of tangible and intangible properties by exerting physical and mental force to achieve something that adds value. For example, a mobi

#### **How to use WebBaseLoader to load data from a web page**

In [17]:
import bs4

# lets create a web based loader to load news data from yahoofinance
news_loader = WebBaseLoader(web_path=("https://finance.yahoo.com/quote/AAPL/latest-news/",),
                            bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                                class_=("yf-xxbei9", "mainContent yf-tnbau3")
                            )))

news_text = news_loader.load()
news_text

[Document(metadata={'source': 'https://finance.yahoo.com/quote/AAPL/latest-news/'}, page_content='')]

#### **How to read from document like PDF, Word, etc.**

In [43]:
from langchain_community.document_loaders import PyPDFLoader

pdf_loader = PyPDFLoader("Automated_Personalized_Mood-Based_Song_Selector.pdf")

pdf_text = pdf_loader.load()
pdf_text

[Document(metadata={'producer': 'Microsoft® Word 2013; modified using iTextSharp 5.4.1 ©2000-2012 1T3XT BVBA (AGPL-version); modified using iText® Core 7.2.4 (AGPL version) ©2000-2022 iText Group NV', 'creator': 'Microsoft® Word 2013', 'creationdate': '2024-11-14T14:47:55+05:30', 'meeting starting date': '24 Oct. 2024', 'moddate': '2025-01-09T07:26:51-05:00', 'ieee article id': '10830371', 'ieee issue id': '10829993', 'subject': '2024 International Conference on Computing, Sciences and Communications (ICCSC);2024; ; ;10.1109/ICCSC62048.2024.10830371', 'ieee publication id': '10829992', 'title': 'Automated Personalized Mood-Based Song Selector', 'meeting ending date': '25 Oct. 2024', 'source': 'Automated_Personalized_Mood-Based_Song_Selector.pdf', 'total_pages': 9, 'page': 0, 'page_label': '1'}, page_content="2024 International Conference on Computing, Sciences and Communications (ICCSC) \n \n979-8-3503-5364-8/24/$31.00 ©2024 IEEE \n \nAutomated Personalized Mood-Based Song Selector \n 

#### **Convert PDF document to chunks**

In [44]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50
)

text_document_chunks = text_splitter.split_documents(pdf_text)
text_document_chunks[:5]

[Document(metadata={'producer': 'Microsoft® Word 2013; modified using iTextSharp 5.4.1 ©2000-2012 1T3XT BVBA (AGPL-version); modified using iText® Core 7.2.4 (AGPL version) ©2000-2022 iText Group NV', 'creator': 'Microsoft® Word 2013', 'creationdate': '2024-11-14T14:47:55+05:30', 'meeting starting date': '24 Oct. 2024', 'moddate': '2025-01-09T07:26:51-05:00', 'ieee article id': '10830371', 'ieee issue id': '10829993', 'subject': '2024 International Conference on Computing, Sciences and Communications (ICCSC);2024; ; ;10.1109/ICCSC62048.2024.10830371', 'ieee publication id': '10829992', 'title': 'Automated Personalized Mood-Based Song Selector', 'meeting ending date': '25 Oct. 2024', 'source': 'Automated_Personalized_Mood-Based_Song_Selector.pdf', 'total_pages': 9, 'page': 0, 'page_label': '1'}, page_content='2024 International Conference on Computing, Sciences and Communications (ICCSC) \n \n979-8-3503-5364-8/24/$31.00 ©2024 IEEE \n \nAutomated Personalized Mood-Based Song Selector \n 

#### **Converting these chunks into a vector and storing the vectors in chromaDB**

In [45]:
import chromadb
from chromadb.utils import embedding_functions
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

# Initialize ChromaDB client
chroma_client = chromadb.PersistentClient(path="./chroma_db")

# Define Ollama embedding function
ollama_embedding = OllamaEmbeddings(model="all-minilm")

# Create a vector store and add documents
vector_db = Chroma.from_documents(text_document_chunks, ollama_embedding)

In [46]:
query = "Facial Emotion Recognition and Music Recommendation System"
result = vector_db.similarity_search(query)
result[0].page_content

'of advanced machine learning algorithms with practical \napplications in real-time facial recognition systems. \n \n• Proposed Solution: The proposed solution in the research \npaper by Bakariya et al. [1] is a comprehensive system \ndesigned for real -time facial emotion recognition and \nmusic recommendation. This system is divided into \nseveral key components, including Face Detection, Face \nEmotion Prediction, Music Recommendation, and Face \nRecognition, with additional functionalities such as a \nsearch and removal button for previously uploaded faces. \nThe system’s architecture is designed to operate in real -\ntime, making it suitable for a wide ra nge of applications \nwhere immediate emotion recognition is crucial. The use \nof the Pygame Python package for building the music \nrecommendation component further underscores the \nsystem’s versatility and applicability in enhancing user  \nexperiences through personalized music \nrecommendations.'

#### **Storing the vectors in the FAISS(Facebook AI Similarity Search)**

In [47]:
from langchain_community.vectorstores import FAISS

faiss_db = FAISS.from_documents(text_document_chunks, ollama_embedding)

In [48]:
query = "Facial Emotion Recognition and Music Recommendation System"
result = faiss_db.similarity_search(query)
result[0].page_content

'involves an AI -integrated application utilizing facial emotion \nrecognition to gauge the driver’s mood and weather data to infer \nthe appropriate mood for the music. This mood detection system \naims to select songs matching the combined mood, enhancing the \noverall driving experience. We employ machine learning models \nfor emotion detection from facial expressions and weather data, \nemploying binary and  multiclass classification techniques. The \nstudy contributes to personalized music recommendation systems \nby introducing a novel approach that considers external \nenvironmental factors, demonstrating the potential for further \nadvancements in mood-based music recommendation systems.  \nKeywords—Artificial Neural Network, Face Emotion \nDetection, Convolutional Neural Network, Image Processing, \nComputer Vision, AI-Integrated Music Recommendation \n \nI.  INTRODUCTION  \nIn today’s rapidly evolving technological landscape and the \nincessant rhythm of modern life, the dema

#### **Using the TinyLlama model with prompts and chains and retrieval for generating output**

In [50]:
from langchain_community.llms import Ollama

# declaring the tinyllama model
tinyllm_model = Ollama(model="tinyllama")
tinyllm_model

Ollama(model='tinyllama')

In [51]:
# designing the chat prompt template
from langchain_core.prompts import ChatPromptTemplate

# defining the prompt template
prompt = ChatPromptTemplate.from_template(
    """
You are a deep research assistant. You answer the questions based on the provided context.
Please answer the question in a clear and concise manner.
Think before you answer.
<context>
{context}
</context>
Question: {input}
"""
)

In [52]:
# creating the chains
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain  = create_stuff_documents_chain(tinyllm_model, prompt)

In [53]:
# declaring the retriever for the FAISS database
retriever = faiss_db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001B39A9CC2D0>, search_kwargs={})

In [54]:
# creating the retrieval chain
from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [57]:
# invoking the retrieval chain
response = retrieval_chain.invoke({"input":"Who are the main authors of the paper?"})

In [58]:
response['answer']

'The main authors of the paper mentioned in the context are not specified in the given text, as they are not explicitly stated as well as their affiliations and other details. The authorized license used in IEEE Xplore for accessing the paper is "IEEE" (Institute of Electrical and Electronics Engineers).'