In [1]:
!pip install -U langchain-upstage
!pip install -U langchain-community faiss-cpu

Collecting langchain-upstage
  Downloading langchain_upstage-0.1.5-py3-none-any.whl (14 kB)
Collecting langchain-core<0.3,>=0.1.52 (from langchain-upstage)
  Downloading langchain_core-0.2.0-py3-none-any.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.9/307.9 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-openai<0.2.0,>=0.1.3 (from langchain-upstage)
  Downloading langchain_openai-0.1.7-py3-none-any.whl (34 kB)
Collecting pymupdf<2.0.0,>=1.24.1 (from langchain-upstage)
  Downloading PyMuPDF-1.24.4-cp310-none-manylinux2014_x86_64.whl (3.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.5/3.5 MB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.3,>=0.1.52->langchain-upstage)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langsmith<0.2.0,>=0.1.0 (from langchain-core<0.3,>=0.1.52->langchain-upstage)
  Downloading langsmith-0.1.60-py3-none-a

In [2]:
from langchain_upstage import UpstageLayoutAnalysisLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_upstage import ChatUpstage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_upstage import ChatUpstage
import os

In [25]:
import glob
files = [file for file in glob.glob("/content/pdf_datasets/*")]
def read_dataset(files):
    all_docs = []
    for file_name in files:
        # Load the file
        file1_load = UpstageLayoutAnalysisLoader(file_name, split="page", api_key="UPSTAGE-API-KEY")
        docs = file1_load.load()
        for doc in docs:
            all_docs.append(doc)
    return all_docs

In [None]:
files

['/content/pdf_datasets/paper2.pdf',
 '/content/pdf_datasets/paper1.pdf',
 '/content/pdf_datasets/Automatic_Gender_Detection.pdf',
 '/content/pdf_datasets/gender_b_social_media.pdf']

In [26]:
from langchain_upstage import UpstageEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import CharacterTextSplitter
embeddings = UpstageEmbeddings(
  upstage_api_key="UPSTAGE-API-KEY",
  model="solar-embedding-1-large"
)
# Embed and store all documents in FAISS
docs = read_dataset(files)

text_splitter = CharacterTextSplitter(chunk_size = 500, chunk_overlap=0)
docs_split = text_splitter.split_documents(docs)

db = FAISS.from_documents(docs_split, embeddings)

In [16]:
# Asking the retriever to do similarity search based on Query
#query = "Tell me about gender bias. Use the embedded documents as context"
# answer = db.similarity_search(query)

# Building the retriever
retriever = db.as_retriever()#search_kwargs = {'k': 3}

In [18]:

template = """

  You are a pdf file information retrieval AI chat assistant. Format the retrieved information as text.


  Use only the context for your answers, do not make up information.

  query: {query}

  {context}
"""
# using chain
prompt = ChatPromptTemplate.from_template(template)
model = ChatUpstage()
chain = (
    {
        "context" : retriever,
        "query" : RunnablePassthrough()
    }
    | prompt | model | StrOutputParser()

)


In [19]:
chain.invoke(""" What are stereotypical beliefs about women? Use all the embedded documents for context.
""")

'Stereotypical beliefs about women can vary widely depending on the culture and context. However, some common stereotypes include:\n\n1. Women are more emotional and less rational than men.\n2. Women are more nurturing and caring than men.\n3. Women are more passive and submissive than men.\n4. Women are more focused on relationships and less focused on career than men.\n5. Women are more likely to be victims of crime or abuse than men.\n6. Women are less capable in certain fields, such as science, technology, engineering, and mathematics (STEM).\n\nIt is important to note that these stereotypes are not true for all women and can be harmful and limiting. It is also important to recognize and challenge these stereotypes in order to promote gender equality and empower women.'

In [20]:
chain.invoke(""" What can you tell me about gender bias? Use all the embedded documents for context.
""")

'Gender bias refers to an unfair difference in the way women and men are treated. It has a long history dating back to ancient times, with stereotypes and beliefs that have dominated our minds. Women were often seen as inferior to men due to the belief that they were intellectually and physically inferior. This belief was fueled by the fact that men often fought in wars and competed for women, leading to the idea that males were "more evolved" than women. These ideas have been passed down through generations, creating the stereotypes we have today.\n\nGender bias can have harmful impacts in real life and can also subconsciously dictate biased viewpoints for large online groups, causing them to further weaponize the bias and spread it. Artificial Intelligence (AI) can potentially mitigate the risk of biases in digital content, as AI algorithms do not have the unconscious assumptions from humans, resulting in less discrimination. However, there is a dilemma as AI can also introduce bias 

In [21]:
chain.invoke(""" How is women's representation in STEM? Use all the embedded documents for context.
""")

'The representation of women in STEM fields is a complex issue that has been the subject of much research and debate. The underrepresentation of women in STEM careers is often referred to as a "leaky pipeline," which describes the gender imbalance that is believed to be affected by a combination of institutional and cultural factors. These factors include biological differences between men and women, girls\' lack of academic preparation for a science major/career, poor attitudes toward science, the absence of female scientists/engineers as role models, and cultural pressures on girls/women to conform to traditional gender roles.\n\nDespite these challenges, there are efforts to alleviate the underrepresentation of women in STEM fields, including diversity campaigns and sponsoring girls and women to attend courses and conferences. Proposed holistic solutions in the literature include increasing the appointment of women in powerful positions, redefining success in academia, and addressin

References:
1. https://python.langchain.com/v0.1/docs/integrations/document_loaders/upstage/
2. https://medium.com/firebird-technologies/chat-with-your-pdfs-using-langchain-e57866b7926d