In [9]:
# !pip install unstructured

Collecting unstructured
  Obtaining dependency information for unstructured from https://files.pythonhosted.org/packages/8f/af/ee9403a1cd4a59d8163ebdd5b317ecda7c9963e7e691dcda241d1bd6434c/unstructured-0.10.14-py3-none-any.whl.metadata
  Downloading unstructured-0.10.14-py3-none-any.whl.metadata (23 kB)
Collecting chardet (from unstructured)
  Obtaining dependency information for chardet from https://files.pythonhosted.org/packages/38/6f/f5fbc992a329ee4e0f288c1fe0e2ad9485ed064cac731ed2fe47dcc38cbf/chardet-5.2.0-py3-none-any.whl.metadata
  Downloading chardet-5.2.0-py3-none-any.whl.metadata (3.4 kB)
Collecting filetype (from unstructured)
  Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting lxml (from unstructured)
  Obtaining dependency information for lxml from https://files.pythonhosted.org/packages/80/2e/49751104148b03ad880aaf381cc24d67b7d8f401f7d074ad7db4f6d9559

In [47]:
# !pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.7.4-cp39-cp39-win_amd64.whl (10.8 MB)
     ---------------------------------------- 0.0/10.8 MB ? eta -:--:--
     - -------------------------------------- 0.4/10.8 MB 12.8 MB/s eta 0:00:01
     --- ------------------------------------ 0.9/10.8 MB 14.6 MB/s eta 0:00:01
     ------ --------------------------------- 1.8/10.8 MB 15.8 MB/s eta 0:00:01
     --------- ------------------------------ 2.6/10.8 MB 18.2 MB/s eta 0:00:01
     ----------- ---------------------------- 3.1/10.8 MB 17.9 MB/s eta 0:00:01
     -------------- ------------------------- 3.8/10.8 MB 17.4 MB/s eta 0:00:01
     ------------------ --------------------- 5.0/10.8 MB 19.9 MB/s eta 0:00:01
     ----------------------- ---------------- 6.2/10.8 MB 22.0 MB/s eta 0:00:01
     ---------------------------- ----------- 7.6/10.8 MB 23.1 MB/s eta 0:00:01
     --------------------------------- ------ 9.1/10.8 MB 25.1 MB/s eta 0:00:01
     ------------------------------------- 

In [1]:
API_KEY = "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"

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredFileLoader
from langchain.embeddings import OpenAIEmbeddings

# Steps

> Loading the data

> Divinding it into chunks

> Create embeddings

> Store embeddings in a vectorstore

> Information Retrieval

> Chat functionality

## Loading the Data

In [3]:
loader = UnstructuredFileLoader('content.txt')
raw_documents = loader.load()

In [4]:
print(raw_documents[0].page_content[:20])

Menstrual Health and


In [5]:
print(raw_documents[0].metadata)

{'source': 'content.txt'}


## Dividing it into chunks

In [6]:
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=20,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""]
)

In [7]:
docs = r_splitter.split_documents(raw_documents)

In [8]:
len(docs)

7

In [9]:
(docs[0].metadata)

{'source': 'content.txt'}

## Create Embeddings

In [10]:
import numpy as np

embedding = OpenAIEmbeddings()
sentence1 = "husky is a dog"
sentence2 = "beagle is a dog"
sentence3 = "mitochondria is the power house of cell"

embedding1 = embedding.embed_query(sentence1)
embedding2 = embedding.embed_query(sentence2)
embedding3 = embedding.embed_query(sentence3)

In [11]:
np.dot(embedding1, embedding2)

0.8650353183722416

In [12]:
np.dot(embedding2, embedding3)

0.7277717576916353

In [13]:
np.dot(embedding1, embedding3)

0.7413251060162334

## Create Vectorstore

In [14]:
!rm -rf ./docs/chroma

'rm' is not recognized as an internal or external command,
operable program or batch file.


In [15]:
from langchain.vectorstores.faiss import FAISS

vectorstore = FAISS.from_documents(docs, embedding)

## Information Retrieval

In [16]:
question = "what is menstrual health?"

In [17]:
resp = vectorstore.similarity_search(question,k=3)
print(resp[2])

page_content='The multi-dimensional issues that menstruators face require multi-sectoral interventions. WASH professionals alone cannot come up with all of the solutions to tackle the intersecting issues of inadequate sanitary facilities, lack of information and knowledge, lack of access to affordable and quality menstrual hygiene products, and the stigma and social norms associated with menstruation. Research has shown that approaches that can effectively combine information and education with appropriate infrastructure and menstrual products, in a conducive policy environment, are more successful in avoiding the negative effects of poor MHH – in short, a holistic approach requiring collaborative and multi-dimensional responses.\n\nPriority Areas\n\nEducation\n\nIn low-income countries, half of the schools lack adequate water, sanitation, and hygiene services crucial to enable girls and female teachers to manage menstruation (UNICEF 2015). Many studies argue that inadequate sanitary f

In [18]:
resp_mmr = vectorstore.max_marginal_relevance_search(question,k=3)
print(resp_mmr[2])

page_content='A survey in Bangladesh found that only 6 percent of schools provide education on health and hygiene, and only 36 percent of girls had prior knowledge about menstruation before their first period (World Bank 2017c).\n\nA sanitary pad intervention in Ghana found that after six months of free sanitary pad provision and puberty education programming, girls missed significantly less school (Montgomery et al. 2012).\n\nHealth\n\nWhen girls and women have access to safe and affordable sanitary materials to manage their menstruation, they decrease their risk of infections. This can have cascading effects on overall sexual and reproductive health, including reducing teen pregnancy, maternal outcomes, and fertility. Poor menstrual hygiene, however, can pose serious health risks, like reproductive and urinary tract infections which can result in future infertility and birth complications. Neglecting to wash hands after changing menstrual products can spread infections, such as hepat

In [19]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [20]:
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))


In [21]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

In [22]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectorstore.as_retriever()
)

In [23]:
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)



Document 1:

Menstrual Health and Hygiene (MHH) is essential to the well-being and empowerment of women and adolescent girls. On any given day, more than 300 million women worldwide are menstruating. In total, an estimated 500 million lack access to menstrual products and adequate facilities for menstrual hygiene management (MHM). To effectively manage their menstruation, girls and women require access to water, sanitation and hygiene (WASH) facilities, affordable and appropriate menstrual hygiene materials, information on good practices, and a supportive environment where they can manage menstruation without embarrassment or stigma. According to the WHO/UNICEF Joint Monitoring Programme 2012, menstrual hygiene management is defined as: “Women and adolescent girls are using a clean menstrual management material to absorb or collect menstrual blood, that can be changed in privacy as often as necessary, using soap and water for washing the body as required, and having access to safe and 

## Question Answer

In [24]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [25]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever()
)

result = qa_chain({"query": question})
result["result"]

'Menstrual health refers to the overall well-being and management of menstruation for women and adolescent girls. It encompasses various aspects, including access to clean and safe menstrual hygiene materials, proper sanitation facilities, knowledge about menstrual hygiene practices, and a supportive environment that promotes dignity and eliminates stigma related to menstruation. Menstrual health is crucial for the physical, mental, and social well-being of individuals, and it plays a significant role in promoting gender equality, education, and overall reproductive health.'

In [26]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [27]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [28]:
result = qa_chain({"query": question})
result["result"]

'Menstrual health refers to the physical, mental, and social well-being of women and adolescent girls in relation to their menstrual cycle. It includes access to clean and safe menstrual hygiene materials, proper sanitation facilities, education and information about menstruation, and the elimination of stigma and discrimination surrounding menstruation. Thanks for asking!'

In [29]:
result["source_documents"][0]

Document(page_content='Menstrual Health and Hygiene (MHH) is essential to the well-being and empowerment of women and adolescent girls. On any given day, more than 300 million women worldwide are menstruating. In total, an estimated 500 million lack access to menstrual products and adequate facilities for menstrual hygiene management (MHM). To effectively manage their menstruation, girls and women require access to water, sanitation and hygiene (WASH) facilities, affordable and appropriate menstrual hygiene materials, information on good practices, and a supportive environment where they can manage menstruation without embarrassment or stigma.\n\nAccording to the WHO/UNICEF Joint Monitoring Programme 2012, menstrual hygiene management is defined as:\n\n“Women and adolescent girls are using a clean menstrual management material to absorb or collect menstrual blood, that can be changed in privacy as often as necessary, using soap and water for washing the body as required, and having acc

## Chat

In [30]:
# Build prompt
from langchain.prompts import PromptTemplate
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Elaborate the answer. Make sure you provide a response of 15 to 20 lines. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template)

# Run chain
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectorstore.as_retriever(),
                                       return_source_documents=True,
                                       chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})


result = qa_chain({"query": question})
result["result"]

'Menstrual health refers to the overall well-being and management of menstruation for women and adolescent girls. It encompasses various aspects such as access to clean and safe menstrual management materials, proper hygiene practices, knowledge about the menstrual cycle, and the availability of supportive environments and facilities. Menstrual health is crucial for the physical, mental, and emotional well-being of menstruators.\n\nHaving good menstrual health means that women and girls have access to affordable and appropriate menstrual hygiene products, such as sanitary pads or menstrual cups, that can effectively absorb or collect menstrual blood. They are able to change these materials as often as necessary in privacy, using soap and water for washing their bodies. They also have access to safe and convenient facilities for disposing of used menstrual products.\n\nIn addition to the physical aspects, menstrual health also includes understanding the basic facts about the menstrual c

In [31]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

In [32]:
from langchain.chains import ConversationalRetrievalChain
retriever=vectorstore.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

In [33]:
result = qa({"question": question})
result['answer']

'Menstrual health refers to the overall well-being and management of menstruation for women and adolescent girls. It encompasses various aspects, including access to clean and safe menstrual hygiene materials, proper sanitation facilities, knowledge about menstrual hygiene practices, and a supportive environment that promotes dignity and eliminates stigma related to menstruation. Menstrual health is crucial for the physical, mental, and social well-being of individuals, and it plays a significant role in promoting gender equality, education, and overall reproductive health.'

In [34]:
question2 = "how does menstrual health promote gender equality?"
result2 = qa({"question": question2})
result2['answer']

"Menstrual health contributes to promoting gender equality in several ways:\n\n1. Dignity and Privacy: Promoting menstrual health and hygiene safeguards women's dignity, privacy, and bodily integrity. It recognizes that menstruation is a natural process and ensures that women and girls can manage their menstruation without shame or stigma.\n\n2. Empowerment: By addressing menstrual health, women and girls gain knowledge and access to resources that enable them to make informed choices about their bodies and reproductive health. This empowerment allows them to have control over their own lives and futures.\n\n3. Education: Inadequate sanitary facilities in schools can negatively impact girls' education. By providing female-friendly facilities and incorporating information on menstruation into the curriculum, stigma is reduced, and girls are more likely to attend school regularly, leading to better education outcomes and increased opportunities for their future.\n\n4. Economic Opportunit

In [43]:
def ask_question(question):
    memory = ConversationBufferMemory(
        memory_key="chat_history",
        return_messages=True
    )
    qa_chain = ConversationalRetrievalChain.from_llm(llm, retriever=vectorstore.as_retriever(), memory = memory)
    result = qa_chain({"question": question})
    return result['answer']


template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Elaborate the answer. Make sure you provide a response of 15 to 20 lines. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""

ask_question("What is menstrual health?")

'Menstrual health refers to the overall well-being and management of menstruation for women and adolescent girls. It encompasses various aspects, including access to clean and safe menstrual hygiene materials, proper sanitation facilities, knowledge about menstrual hygiene practices, and a supportive environment that promotes dignity and eliminates stigma related to menstruation. Menstrual health is crucial for the physical, mental, and social well-being of individuals, and it plays a significant role in promoting gender equality, education, and overall reproductive health.'

In [44]:
ask_question("how does menstrual health promote gender equality?")

"Menstrual health promotion plays a crucial role in promoting gender equality in several ways:\n\n1. Dignity and Empowerment: Menstrual health and hygiene (MHH) initiatives help safeguard women's dignity, privacy, and bodily integrity. By addressing the challenges menstruators face, such as lack of access to menstrual products and inadequate facilities, MHH initiatives empower women and girls to manage their menstruation with confidence and without shame or stigma.\n\n2. Education: Inadequate sanitary facilities in schools often lead to girls missing classes during menstruation or even dropping out of school altogether. MHH interventions that provide female-friendly facilities and incorporate menstrual education into the curriculum can reduce stigma and contribute to better education outcomes for girls. By ensuring that girls can attend school regularly, MHH initiatives promote gender equality in education.\n\n3. Economic Opportunities: Improving menstrual hygiene and providing access 

In [45]:
ask_question("how does MHH initiatives contribute to a more inclusive and gender-equal society?")

"Menstrual Health and Hygiene (MHH) initiatives contribute to a more inclusive and gender-equal society in several ways:\n\n1. Education: MHH initiatives aim to improve access to education for girls by addressing the barriers they face during menstruation. By providing female-friendly facilities in schools, such as separate toilets and changing rooms, girls are more likely to attend school regularly and participate fully in their education. MHH initiatives also incorporate information on menstruation into the curriculum for both girls and boys, reducing stigma and promoting gender equality in education.\n\n2. Health and Well-being: MHH initiatives ensure that girls and women have access to clean and safe facilities for managing their menstruation. This promotes their physical and mental well-being, reducing the risk of infections and other health issues. By addressing the specific needs of menstruators, MHH initiatives contribute to a more inclusive healthcare system that recognizes an