In [2]:
# !pip install langchain-google-vertexai langchain-google-community[vertexaisearch]

In [2]:
from langchain.chains import LLMChain
from langchain.prompts.prompt import PromptTemplate
from langchain_google_vertexai import VertexAI


llm = VertexAI(model_name="gemini-1.0-pro",
               temperature=0.8, max_output_tokens=128)
template = """Describe {plant}.
    First, think whether {plant} exist.
    If they {plant} don't exist, answer "I don't have enough information about {plant}".
    Otherwise, give their title, a short summary and then talk about origin and cultivation.
    After that, describe their physical characteristics.
"""

In [8]:
prompt_template = PromptTemplate(
    input_variables=["plant"],
    template=template
)
chain = LLMChain(llm=llm,
                 prompt=prompt_template)
print(chain.run(plant="black cucumbers"))

## Black Cucumbers

### Do they exist?

Black cucumbers are not widely recognized as a distinct variety. While there are cucumber cultivars with dark green or nearly black skin, true black cucumbers are not commonly found. 

Therefore, I don't have enough information about black cucumbers to provide a comprehensive description. However, I can share some insights about cucumber varieties with dark skin that might be mistaken for black cucumbers.

### Dark-skinned Cucumber Varieties

Several cucumber cultivars exhibit dark green skin that appears nearly black under certain lighting conditions. These include:

* **Armenian Cucumber:** This heirloom variety has long, slender fruits with dark


In [11]:
print(chain.invoke(input="green cucumbers"))

{'plant': 'green cucumbers', 'text': '## Green Cucumbers\n\nGreen cucumbers are a type of cucumber that, as the name suggests, are green in color. They are the most common type of cucumber and are widely available in grocery stores and farmers markets.\n\n### Origin and Cultivation\n\nGreen cucumbers originated in India and have been cultivated for centuries. They are now grown in many parts of the world, including the United States, Europe, and Asia. Green cucumbers are typically grown in warm climates and require a lot of water. They are usually harvested when they are about 6-8 inches long.\n\n### Physical Characteristics\n\nGreen cucumbers have a smooth, thin skin that is dark'}


In [4]:
project_id = "gcp-langchain-450916"
location_id = "global"
data_store_id = "professional-documents_1739744440951"

In [5]:
from langchain_google_community import VertexAISearchSummaryTool
vertex_search = VertexAISearchSummaryTool(
    project_id=project_id, location_id=location_id,
    data_store_id=data_store_id,
    name="Vertex AI Agent Builder",
    description="RAG app on professional documents"
)
query = "What were the file owner's open source contributions?"
print(vertex_search.invoke(query))



The file owner contributed to open source community development, tutorials, and software releases [1]. Their contributions include a package developed to help digital geoscientists with well log interpretation using Python [1]. Subsequent releases included machine learning support for well logs [1]. They also made technical contributions to OpendTect, dGB Earth Sciences [2]. A webinar showcasing these contributions received over a thousand views, becoming one of the highest viewed on the YouTube channel [2].



### Custom RAG Application

In [3]:
from langchain_core.documents import Document

doc = Document(page_content="my page",
               metadata={"source_id": "example.pdf", "page": 1})
print(doc.page_content)

my page


In [33]:
from langchain.retrievers import GoogleVertexAISearchRetriever

vertex_search_retriever = GoogleVertexAISearchRetriever(project_id=project_id,
                                                        location_id=location_id,
                                                        data_store_id=data_store_id,
                                                        max_documents=3)
query: str = """What were the results and findings for the unsupervised thesis project?"""
result = vertex_search_retriever.invoke(query)
for doc in result:
    print(doc.page_content, doc.metadata["source"])

I. Extract 2D image patches in various sizes from the seismic survey (32x32, 64x64
(crosslines by depth slide) and run it through the unsupervised image classification
algorithms.

II. Determine how the images cluster and if consistent facies clusters can be obtained from

the unsupervised image classification algorithms.

III.

Investigate if preprocessing steps (structural filter, denoising) improve the clustering

results

IV. Analyze dominant facies classes, map facies back to seismic. Construct an interactive
dashboard for the manual definition of the facies clusters in the embedded space (e.g.
t-sne umap etc).
V. Port the 2D unsupervised and weakly supervised workflows into the 3D domain to
investigate if 3D facies cubes can be clustered and investigated.

1.3. Research Methodology
The choice of method for this study is to use unsupervised machine learning methods to classify
facies from 3D seismic surveys. State of the art models are used for this. Unlabelled 2D seismic
image pa

In [34]:
doc.metadata

{'id': '6b97359832fe439d1ed9cf53891f4162',
 'source': 'gs://langchain-book-test-bucket/data_source1/data/Thesis-Frontpage.pdf'}

In [35]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_google_vertexai import VertexAI

template = """
    Answer the question based on the following context: {context}
    Question: {question}
    """
prompt = ChatPromptTemplate.from_template(template)
llm = VertexAI(model_name="gemini-pro")
chain = (
    {"context": vertex_search_retriever,
     "question": RunnablePassthrough()}
     | prompt
     | llm
     | StrOutputParser()
)
chain.invoke(query)

'## Results and Findings of the Unsupervised Thesis Project\n\nBased on the provided context, the thesis project aimed to utilize unsupervised machine learning methods to classify facies from 3D seismic surveys. The project focused on extracting 2D image patches from the seismic survey in various sizes and running them through unsupervised image classification algorithms. The objectives were to:\n\n1. **Extract 2D image patches** in various sizes from the seismic survey (32x32, 64x64) and run them through the unsupervised image classification algorithms.\n2. **Determine how the images cluster** and if consistent facies clusters can be obtained from the unsupervised image classification algorithms.\n3. **Investigate if preprocessing steps** (structural filter, denoising) improve the clustering results.\n4. **Analyze dominant facies classes**, map facies back to seismic, and construct an interactive dashboard for the manual definition of the facies clusters in the embedded space (e.g., t

### Query Expansion

In [36]:
from langchain.retrievers.multi_query import MultiQueryRetriever

retriever_with_expansion = MultiQueryRetriever.from_llm(retriever=vertex_search_retriever,
                                                        llm=llm)
result = vertex_search_retriever.invoke(query)
print(len(result))
result_expanded = retriever_with_expansion.invoke(query)
print(len(result_expanded))

3
7


In [37]:
result[0].metadata["source"]

'gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis.pdf'

In [38]:
[result.metadata["source"] for result in result_expanded]

['gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis.pdf',
 'gs://langchain-book-test-bucket/data_source1/data/Thesis-Frontpage.pdf',
 'gs://langchain-book-test-bucket/data_source1/data/Thesis-Frontpage-converted.pdf',
 'gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis.pdf',
 'gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis - Google Docs.pdf',
 'gs://langchain-book-test-bucket/data_source1/data/Thesis-Frontpage.pdf',
 'gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis - Google Docs.pdf']

### Filtering documents

In [None]:
from langchain.retrievers.document_compressors import LLMChainFilter
from langchain_google_community import VertexAISearchRetriever

vertex_search_retriever_many = VertexAISearchRetriever(
   project_id=project_id,
   location_id=location_id,
   data_store_id=data_store_id,
   max_documents=30,
)
results_many = vertex_search_retriever_many.invoke(query)

llm_compression = VertexAI(temperature=0,
                           model_name="gemini-pro")
chain_filter = LLMChainFilter.from_llm(llm=llm_compression)
# results_filtered = chain_filter.compress_documents(result_expanded, query)
results_filtered_many = chain_filter.compress_documents(results_many, query)
print(len(results_many), len(results_filtered_many))



10 2


In [57]:
[result.metadata["source"] for result in results_many]

['gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis.pdf2',
 'gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis - Google Docs.pdf3',
 'gs://langchain-book-test-bucket/data_source1/data/Thesis-Frontpage.pdf4',
 'gs://langchain-book-test-bucket/data_source1/data/Thesis-Frontpage-converted.pdf3',
 'gs://langchain-book-test-bucket/data_source1/data/MY IT REPORT.docx',
 'gs://langchain-book-test-bucket/data_source1/data/MY_IT_REPORT[2]-converted.pdf58',
 'gs://langchain-book-test-bucket/data_source1/data/IT HEADINGS-converted.pdf5',
 'gs://langchain-book-test-bucket/data_source1/data/REPORT_IGARRA[1]-converted.pdf36',
 'gs://langchain-book-test-bucket/data_source1/data/Field_Trip_FrontPage[1]-converted.pdf4',
 'gs://langchain-book-test-bucket/data_source1/data/Seminar-Frontpage-Updated-converted.pdf3']

In [58]:
[result.metadata["source"] for result in results_filtered_many]

['gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis.pdf2',
 'gs://langchain-book-test-bucket/data_source1/data/Full Chapters Thesis - Google Docs.pdf3']

In [52]:
from langchain_core.runnables import RunnableLambda

chain = (
   {"context": vertex_search_retriever, "question": RunnablePassthrough()}
   | RunnableLambda(lambda x: {"context": chain_filter.compress_documents(x["context"], x["question"]), "question": x["question"]})
   | prompt
   | llm
   | StrOutputParser()
)


print(chain.invoke(query))

## Results and Findings

While the provided documents outline the objectives and methodology of the thesis project, they do not present the actual results and findings. 

Here's what we can glean from the documents:

**Objectives:**

* Extract 2D image patches from seismic surveys and run them through unsupervised image classification algorithms.
* Determine how the images cluster and if consistent facies clusters can be obtained.
* Investigate if preprocessing steps improve clustering results.
* Analyze dominant facies classes and map them back to seismic data.
* Develop an interactive dashboard for manual definition of facies clusters.
* Port the 2D unsupervised workflow to the 3D domain to investigate 3D facies cubes.

**Methodology:**

* Unsupervised machine learning methods are used to classify facies from 3D seismic surveys.
* Unlabelled 2D seismic image patches are run through unsupervised clustering algorithms.
* Hyperparameters are tuned to optimize classification performance.