<a href="https://colab.research.google.com/github/Ashitpatel001/cookbook/blob/fix%2Flangchain-deeplake-update/examples/langchain/Code_analysis_using_Gemini_LangChain_and_DeepLake.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2025 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Code analysis using LangChain and DeepLake

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/langchain/Code_analysis_using_Gemini_LangChain_and_DeepLake.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

<!-- Princing warning Badge -->
<table>
  <tr>
    <!-- Emoji -->
    <td bgcolor="#f5949e">
      <font size=30>⚠️</font>
    </td>
    <!-- Text Content Cell -->
    <td bgcolor="#f5949e">
      <h3><font color=black>This notebook requires paid tier rate limits to run properly.<br>  
(cf. <a href="https://ai.google.dev/pricing#veo2"><font color='#217bfe'>pricing</font></a> for more details).</font></h3>
    </td>
  </tr>
</table>

This notebook shows how to use Gemini API with [Langchain](https://python.langchain.com/v0.2/docs/introduction/) and [DeepLake](https://www.deeplake.ai/) for code analysis. The notebook will teach you:
- loading and splitting files
- creating a Deeplake database with embedding information
- setting up Modern LCEL chain

### Load dependencies

In [None]:
#Required Installations
%pip install -q -U langchain-google-genai langchain-deeplake langchain langchain-text-splitters langchain-community deeplake

In [None]:
from glob import glob
from IPython.display import Markdown, display

# Loaders & Splitters
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import Language, RecursiveCharacterTextSplitter

# Google Gemini
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

# Core Components (Modern LCEL)
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# 4. DeepLake
from langchain_community.vectorstores import deeplake

### Configure your API key

To run the following cell, your API key must be stored in a Colab Secret named `GEMINI_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../../quickstarts/Authentication.ipynb) for an example.


In [None]:
# Try except block for safe and secure key usage.

import os
from google.colab import userdata
try:
    os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
    print("API Key loaded successfully.")
except Exception as e:
    print("Error loading API Key. Please check your Secrets tab.")

## Prepare the files

First, download a [langchain-google](https://github.com/langchain-ai/langchain-google) repository. It is the repository you will analyze in this example.

It contains code integrating Gemini API, VertexAI, and other Google products with langchain.

In [None]:
# Knowledge Base
!git clone https://github.com/langchain-ai/langchain-google

This example will focus only on the integration of Gemini API with langchain and ignore the rest of the codebase.

In [None]:
#Find patterns to match with the cloned repo
repo_match = "langchain-google/libs/genai/langchain_google_genai**/*.py"

Each file with a matching path will be loaded and split by `RecursiveCharacterTextSplitter`.
In this example, it is specified, that the files are written in Python. It helps split the files without having documents that lack context.

In [None]:
#Load Documents
docs = []
for file in glob(repo_match, recursive=True):
  loader = TextLoader(file, encoding='utf-8')
  splitter = RecursiveCharacterTextSplitter.from_language(language=Language.PYTHON, chunk_size=2000, chunk_overlap=100)
  docs.extend(loader.load_and_split(splitter))

`Language` Enum provides common separators used in most popular programming languages, it lowers the chances of classes or functions being split in the middle.

In [None]:
# common seperators used for Python files
RecursiveCharacterTextSplitter.get_separators_for_language(Language.PYTHON)

## Create the database
The data will be loaded into the memory since the database doesn't need to be permanent in this case and is small enough to fit.

The type of storage used is specified by prefix in the path, in this case by `mem://`.

Check out other types of storage [here](https://docs.activeloop.ai/setup/storage-and-creds/storage-options).

In [None]:
# define path to database
dataset_path = 'mem://deeplake/langchain_google'

In [None]:
# define the embedding model
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")

In [None]:
#Deeplake version [4.0.0]
%pip  -q install "deeplake<4.0.0"

Everything needed is ready, and now you can create the database. It should not take longer than a few seconds.

In [None]:
docs = docs[:8]
#Store the docs inside the Database(deeplake)
db = deeplake.DeepLake.from_documents(
    dataset_path=dataset_path,
    embedding=embeddings,
    documents=docs,
    read_only=False
)

## Question Answering

Set-up the document retriever.

In [None]:
retriever = db.as_retriever()
retriever.search_kwargs['distance_metric'] = 'cos'
retriever.search_kwargs['k'] = 4 # number of documents to return

In [None]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [None]:
#The

def format_docs(docs):
    return "\n".join(doc.page_content for doc in docs)

# define the chat model
llm = ChatGoogleGenerativeAI(model = "gemini-3-flash-preview")

Now, you can create a chain for Question Answering. In this case, `RetrievalQA` chain will be used.

If you want to use the chat option instead, use `ConversationalRetrievalChain`.

In [None]:
# LCEL CHAIN
# 1.) Retrieve docs -> Format them to string
# 2.) Pass question through
# 3.) Combine in Prompt -> LLM -> Output Parser
final_chain = {"context" : retriever | format_docs , "question"  : RunnablePassthrough()} | prompt | llm | StrOutputParser()

The chain is ready to answer your questions.

NOTE: `Markdown` is used for improved formatting of the output.

In [None]:
query = "what classes are available in Google-Gen-AI Library"

In [None]:
main_chain = final_chain.invoke(query)
display(Markdown(main_chain))

## Summary

Gemini API works great with Langchain. The integration is seamless and provides an easy interface for:
- loading and splitting files
- creating DeepLake database with embeddings
- answering questions based on context from files

## What's next?

This notebook showed only one possible use case for langchain with Gemini API. You can find many more [here](../../examples/langchain).