<a href="https://colab.research.google.com/github/lapatradaa/shap/blob/main/PROJECT_CHAT_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###install package

In [None]:
!pip install --upgrade --quiet langchain
!pip install --upgrade --quiet langchain-core
!pip install --upgrade --quiet langchain-community
!pip install --upgrade --quiet langchain-chroma
!pip install --upgrade --quiet langchain-openai
!pip install --upgrade --quiet langchain-google-genai
!pip install --upgrade --quiet langchain-anthropic==0.1.15
!pip install --upgrade --quiet pymupdf

Collecting pymupdf
  Downloading PyMuPDF-1.24.9-cp310-none-manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting PyMuPDFb==1.24.9 (from pymupdf)
  Downloading PyMuPDFb-1.24.9-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.4 kB)
Downloading PyMuPDF-1.24.9-cp310-none-manylinux2014_x86_64.whl (3.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.5/3.5 MB[0m [31m34.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading PyMuPDFb-1.24.9-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (15.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.9/15.9 MB[0m [31m39.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDFb, pymupdf
Successfully installed PyMuPDFb-1.24.9 pymupdf-1.24.9


###import package

In [None]:
import os
import sys

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_anthropic import ChatAnthropic #v 0.1.15
#v0.2
from langchain_community.document_loaders import DirectoryLoader, TextLoader, PyMuPDFLoader, PyPDFLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema import StrOutputParser
from langchain.indexes import VectorstoreIndexCreator

from langchain.chains import ConversationalRetrievalChain, RetrievalQA
from langchain.chains.llm import LLMChain

###Set API key

In [None]:
os.environ['GOOGLE_API_KEY'] = 'AIzaSyBWN7TKSkJQHCPtppspGQlMOCXJ0_LOJOc'

###LOAD DATA

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os

path = '/content/drive/MyDrive/data-phone-num/ref_research'
path_comment = '/content/comment.txt'

In [None]:
# raw_documents = DirectoryLoader(path, glob="**/*.pdf", loader_cls=PyMuPDFLoader)
raw_
loader = raw_documents.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(loader)

###TRAFROM TEXT DATA TO VECTOR

In [None]:
gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

db = Chroma.from_documents(documents, embedding=gemini_embeddings, persist_directory="./langchain/chroma_db")

# Load from disk
vectorstore_disk = Chroma(
                        persist_directory="./langchain/chroma_db",
                        embedding_function=gemini_embeddings   # Embedding model
                   )

retriever = vectorstore_disk.as_retriever(search_kwargs={"k": 1})

###SELECT CHAT MODEL

In [None]:
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro-latest",
                 temperature=0.7, top_p=0.85)

###PROMPT TEMPLATE

In [None]:
llm_prompt_template = """
Answer the question based only on the following context:
{context}

Question: {question}

Respond in the same language as the question.
"""

llm_prompt = PromptTemplate.from_template(llm_prompt_template)

# Combine data from documents to readable string format.
def format_docs(documents):
    return "\n\n".join(doc.page_content for doc in documents)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | llm_prompt
    | llm
    | StrOutputParser()
)

###QUESTION

In [None]:
query = "summarize relationship between mandible and sex determination with references?"
result = rag_chain.invoke(query)
print(result)

Numerous studies have established the mandible as a reliable indicator for sex determination, particularly in scenarios where complete skeletal remains are unavailable.  Here's a breakdown based on the provided references:

* **Mandibular Measurements:**  Studies have successfully used various mandibular measurements to predict sex. These include:
    * Ramus measurements (Tripathi et al., 2011; Abu-Taleb & El Beshlawy, 2015; Maloth et al., 2017)
    * Gonial angle (Upadhyay et al., 2012)
    * General metric analysis (Ongkana & Sudwan, 2009)
    * Mandibular angle (Rai et al., 2007) 

* **Population Specificity:** Research highlights that mandibular dimorphism can vary across populations.  For instance, studies have investigated this in Indian (Rai et al., 2007), Thai (Ongkana & Sudwan, 2009), and Egyptian populations (Kharoshah et al., 2010; Abu-Taleb & El Beshlawy, 2015).

* **Imaging Techniques:** Traditional radiography and advanced imaging like cone beam computed tomography (CBCT