# Generative AI Application Development 

## Overview

- Start Serverless Warehouse

- Run initial setups

- Explore some LangChain components

- Create a tool function that searches Youtube videos and integrate it in a chain

- Use Langchain to create a chaining of chains that run the chains in parallel: chain to answer general questions, chain to answer about our paper reviews database (Vector Search database) and a chain to search Youtube videos (using a non native tool)

## Setups

#### Auto

In [0]:
# %run ./setup_env/agent_desing

In [0]:
%sql
-- check current catalog and schema
select current_catalog(),  current_schema();

current_catalog(),current_schema()
workspace,default


In [0]:
# checks some libraries
%pip list | grep -E 'langchain|youtube-search-python'

# expected results:

# langchain                          1.2.7
# langchain-classic                  1.0.1
# langchain-community                0.4.1
# langchain-core                     1.2.7
# langchain-text-splitters           1.1.0
# youtube-search-python              1.6.6

langchain                          1.2.7
langchain-classic                  1.0.1
langchain-community                0.4.1
langchain-core                     1.2.7
langchain-text-splitters           1.1.0
youtube-search-python              1.6.6


#### Manual

In [0]:
# if something is wrong, install manually:

# %pip install langchain-classic==1.0.1 langchain-community==0.4.1 langchain-text-splitters==1.1.0 langsmith==0.6.7
# %pip install langchain-core==1.2.7 youtube-search==2.2.0

In [0]:
# %sql
# -- setup catalog and schema

# use catalog `studies`;
# use schema `databricks-dev`;

# select current_catalog() as actual_catalog,  current_schema() as actual_schema;

### Define constants

In [0]:
# set constants

CURRENT_CATALOG = 'studies'
CURRENT_SCHEMA = 'databricks-dev'

## Langchain components

### Prompt

*PromptTemplate*

In [0]:
from langchain_core.prompts import PromptTemplate

# use langchain prompt template
prompt = PromptTemplate.from_template(
    template='Who was the greatest scorer in the history of the {interest_sport}?'
)

prompt.format(interest_sport='soccer')

'Who was the greatest scorer in the history of the soccer?'

### LLMs

*chat_models, ChatDatabricks*

In [0]:
from langchain_community.chat_models import ChatDatabricks

# define a langchain chat model
endpoint_llm = 'databricks-llama-4-maverick'
llm = ChatDatabricks(
	endpoint=endpoint_llm,
	max_tokens='300'
)

# run the model
res = llm.stream(prompt.format(interest_sport='soccer'))

for chunk in res:
    print(chunk.content, end='', flush=False)

The greatest scorer in the history of soccer is a matter of some debate, as there are different opinions on how to measure this achievement. However, according to various sources, including Guinness World Records, the all-time leading scorer in men's soccer is Josef Bican, an Austrian-Czech footballer who scored over 805 goals in more than 530 games between 1931 and 1956.

Bican's record is recognized by the International Federation of Football History & Statistics (IFFHS) and other reputable organizations. He played for several clubs, including Slavia Prague, and scored an incredible 535 goals in 272 league games.

Other notable scorers in soccer history include:

1. Cristiano Ronaldo: Over 819 goals in more than 1136 games (and counting).
2. Lionel Messi: Over 792 goals in more than 1022 games (and counting).
3. Ferenc Hirzer: 746 goals in 585 games.
4. Imre Schlosser: 746 goals in 463 games.
5. Pelé: 541 goals in 656 games.

It's worth noting that the accuracy of historical records 

### Retriever

*retrievers, WikipediaRetriever*

In [0]:
from langchain_community.retrievers import WikipediaRetriever

# instantiate a retriever
retriever = WikipediaRetriever()

# get relevant documents from the retriever
docs = retriever.invoke(input='Pelé')
docs[0]

Document(metadata={'title': 'Pelé', 'summary': 'Edson Arantes do Nascimento (Brazilian Portuguese: [ˈɛd(ʒi)sõ(w) aˈɾɐ̃tʃiz du nasiˈmẽtu]; 23 October 1940 – 29 December 2022), better known by his nickname Pelé (Brazilian Portuguese: [peˈlɛ]), was a Brazilian professional footballer who played as a forward. Widely regarded as one of the greatest players in history, he was among the most successful and popular sports figures of the 20th century. His 1,279 goals in 1,363 games, which includes friendlies, is recognised as a Guinness World Record. In 1999, he was named Athlete of the Century by the International Olympic Committee and was included in the Time list of the 100 most important people of the 20th century. In 2000, Pelé was voted World Player of the Century by the International Federation of Football History & Statistics (IFFHS) and was one of the two joint winners of the FIFA Player of the Century, alongside Diego Maradona.\nPelé began playing for Santos at age 15 and for the Braz

In [0]:
# docs[0].metadata['summary']

## Tools

*tool, ToolRuntime, create_agent, dataclass, youtubesearchpython(VideosSearch)*


- https://docs.langchain.com/oss/python/langchain/tools

In [0]:
# utility used as a tool in langchain

# from youtubesearchpython import VideosSearch
# res_video_search = VideosSearch('Goals of Pelé', limit=1)
# res_search = res_video_search.result()
# res_search

In [0]:
# create langchain tool
from youtubesearchpython import VideosSearch
from dataclasses import dataclass
from langchain.tools import tool, ToolRuntime
from langchain.agents import create_agent


@dataclass
class UserContext:
    search_terms: str

@tool
def get_video_info(runtime: ToolRuntime[UserContext]) -> str:
    '''Search for videos in Youtube.'''
    search_terms = runtime.context.search_terms
    tool = VideosSearch(search_terms, limit=1)
    res_search = tool.result()
    return res_search


agent = create_agent(
    llm,
    tools=[get_video_info],
    context_schema=UserContext,
    system_prompt='You are an assistant that can search videos in Youtube.'
)

In [0]:
#  run the tool
search_terms = 'Goals of Pelé'
result = agent.invoke(
    {
        'messages': [{
            'role': 'user', 'content': 'Extract the link, title and description of the video for the following search terms'
        }]
    },
    context=UserContext(search_terms=search_terms)
)

result

{'messages': [HumanMessage(content='Extract the link, title and description of the video for the following search terms', additional_kwargs={}, response_metadata={}, id='af1914de-65a5-4ad2-b715-0aa3b06abacc'),
  AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_792888ea-03b2-482a-ad8f-51f967fa4f69', 'type': 'function', 'function': {'name': 'get_video_info', 'arguments': '{}'}}]}, response_metadata={'prompt_tokens': 347, 'completion_tokens': 54, 'total_tokens': 401}, id='lc_run--019c2463-abf9-7451-8016-2427372c6978-0', tool_calls=[{'name': 'get_video_info', 'args': {}, 'id': 'call_792888ea-03b2-482a-ad8f-51f967fa4f69', 'type': 'tool_call'}], invalid_tool_calls=[]),
  ToolMessage(content='{"result": [{"type": "video", "id": "WXg8P0u9W9I", "title": "Pele -Top 10 Impossible Goals Ever", "publishedTime": "7 years ago", "duration": "3:37", "viewCount": {"text": "21,213,099 views", "short": "21M views"}, "thumbnails": [{"url": "https://i.ytimg.com/vi/WXg8P0u9W9I/hqdefault.j

#### Parse the response

In [0]:
import json

res_search = json.loads(result['messages'][2].content)['result'][0]

print(f' . {res_search['title']} #')
print(f' . Channel: {res_search['channel']['name']}')
print(f" . Duration: {res_search['duration']}s")
print(f" . Views: {res_search['viewCount']['short']}")
print(f" . URL: {res_search['link']}")

 . Pele -Top 10 Impossible Goals Ever #
 . Channel: Sports 360
 . Duration: 3:37s
 . Views: 21M views
 . URL: https://www.youtube.com/watch?v=WXg8P0u9W9I


## Chains

*StrOutputParser, LCEL*

In [0]:
prompt

PromptTemplate(input_variables=['interest_sport'], input_types={}, partial_variables={}, template='Who was the greatest scorer in the history of the {interest_sport}?')

In [0]:
from langchain_core.output_parsers import StrOutputParser

# define a sequential chain using LCEL (LangChain Expression Language)
chain = prompt | llm | StrOutputParser()

res_chain = chain.invoke({'interest_sport':'chess'})
print(res_chain)

I think you meant to ask "Who is the greatest scorer in the history of chess?" or more accurately, "Who is the highest-scoring chess player of all time?"

The answer to this question can be subjective, as it depends on how one measures "greatest scorer." However, if we consider the chess players with the highest career scoring percentages or the most tournament victories, some top contenders include:

1. **Garry Kasparov**: Regarded by many as the greatest chess player of all time, Kasparov was the World Chess Champion from 1985 to 2000. He has an impressive tournament record and a high scoring percentage against top opponents.

2. **Bobby Fischer**: An American chess prodigy, Fischer became a grandmaster at 15 years and 6 months old and later became the 11th World Chess Champion. His match against Boris Spassky in 1972 was a historic event.

3. **Viswanathan Anand**: A five-time World Chess Champion, Anand has been one of the top players in the world for over three decades. He has a r

## Multi-stage chains

  - Create a Vector Store
  - Create Sector Search endpoint (format: `vs_endpoint_name`)
  - Check if thes endpoint are assigned by username
  - To config a Vector Search run previously notebook (`prepare_data.ipynb`) 

In [0]:
# list vector search endpoints and tables

from databricks.sdk import WorkspaceClient
from databricks.vector_search.client import VectorSearchClient


workspace_client = WorkspaceClient()
vs_client = VectorSearchClient(disable_notice=True)

endpoints_list = vs_client.list_endpoints()
tables = workspace_client.tables.list(
    catalog_name=CURRENT_CATALOG,
    schema_name=CURRENT_SCHEMA
)

print('=== Infos about Vector Search ===')
for endpoint in endpoints_list.get('endpoints'):
    print(f'Endpoint name: {endpoint.get('name')}')
    print(f' - Status: {endpoint.get('endpoint_status').get('state')}')

print('\n=== Infos about Delta tables ===')
for table in tables:
    # print(f't.name, "-", t.table_type)
    print(f'Table name: {table.name}')
    print(f' - Schema name: {table.schema_name}')
    print(f' - Type: {table.table_type}')
    print(f' - Full name: {table.full_name}')
    print(f' - Description: {table.comment}\n')


=== Infos about Vector Search ===
Endpoint name: endpoint_vector_search_reviews
 - Status: ONLINE

=== Infos about Delta tables ===
Table name: db_paper_reviews
 - Schema name: databricks-dev
 - Type: TableType.MANAGED
 - Full name: studies.databricks-dev.db_paper_reviews
 - Description: The table contains reviews of academic papers, detailing evaluations and remarks associated with each paper. It includes information on the paper's identification, the language of the review, and the timespan relevant to the review. This data can be used to analyze trends in academic feedback, assess the confidence levels of reviews, and link specific evaluations to their respective papers.

Table name: paper_reviews
 - Schema name: databricks-dev
 - Type: TableType.MANAGED
 - Full name: studies.databricks-dev.paper_reviews
 - Description: The table contains records of reviews for academic papers. It includes details such as the paper identifier, the review text, and the timespan related to the review.

#### Chain 1

In [0]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.chat_models import ChatDatabricks


prompt1 = PromptTemplate.from_template(
    template='''Your are a research paper review specialist. Try to give a concise answer about the user question.

    Question: {question_about_reviews}

    Answer:
    '''
)

# choose chat model
endpoint_llm1 = 'databricks-llama-4-maverick' 
llm_model1 = ChatDatabricks(
	endpoint=endpoint_llm1,
	max_tokens='200'
)

# define a chain
question_runnable1 = ({
    'question_about_reviews':  RunnablePassthrough(),
})
chain1 = question_runnable1 | prompt1 | llm_model1 | StrOutputParser()

# run the chain
question_about_reviews = 'Can you explain about good practices for the implementation of management systems?'
res_chain1 = chain1.invoke({
    'question_about_reviews': question_about_reviews,
})
print(res_chain1)

Effective implementation of management systems is crucial for organizations to achieve their objectives. Good practices for implementation include: 

1. **Clear objectives and scope**: Define the system's purpose, goals, and boundaries.
2. **Top-management commitment**: Demonstrate leadership and commitment to the management system.
3. **Stakeholder engagement**: Involve relevant stakeholders in the implementation process.
4. **Gap analysis and risk assessment**: Identify gaps and risks to determine necessary actions.
5. **Training and awareness**: Educate employees on the system's requirements and their roles.
6. **Phased implementation**: Roll out the system in stages to ensure a smooth transition.
7. **Monitoring and review**: Regularly assess the system's performance and make improvements.
8. **Continuous improvement**: Foster a culture of ongoing evaluation and enhancement.

By following these good practices, organizations can ensure a successful implementation of their management

#### Chain 2

In [0]:
import json
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_community.chat_models import ChatDatabricks


def str_to_dict(text: str) -> dict:
    '''Parse output.'''
    text = text.strip().strip('"')
    return json.loads(text)


prompt2 = PromptTemplate.from_template(
    template='''Your are a research paper review specialist. Try to give a concise answer. Your return must be a Python dictionary of type==dict.
        You have to access both databases of reviews: 'db_paper_reviews' and 'paper_reviews_index'.
        You have to find the papers that talk about {selected_topic} searching in 'paper_reviews_index'.
        First, you might to query searching {selected_topic} in the column 'chunked_text' of 'paper_reviews_index'. After this,  merge the results with 'db_paper_reviews' using column 'paper_id'. Note: Use the tool 'search_papers_reviews()' to make the search in 'paper_reviews_index'.
        If you don't find any result, return a empty json.
        Returns only the first result that you encounter.       
        
        Example:
            Inputs example:
                - question: 'What review informations do you have about the selected topic?'
                - selectec_topic: 'recomendaciones prácticas para el desarrollo de software seguro'
            Answer example:
            [
                "text_encountered": "- El artículo presenta recomendaciones prácticas para el 
                desarrollo de software seguro. Se describen l [...]",
                "confidence_level": 4,
                "evaluation": 1,
                "language": es,
                "review_id": 2,
                "paper_id": 1,
                "timespan": "2010-07-05"
            ]
    
        Question: {question_about_topic}

        Answer:
    '''
)

# choose chat model
endpoint_llm2 = 'databricks-qwen3-next-80b-a3b-instruct'
llm_model2 = ChatDatabricks(
	endpoint=endpoint_llm2,
	max_tokens='300'
)

# define a chain
question_runnable2 = ({
    'question_about_topic':  RunnablePassthrough(),
    'selected_topic': RunnablePassthrough()
})
chain2 = question_runnable2 | prompt2 | llm_model2 | StrOutputParser() | RunnableLambda(str_to_dict)

# run the chain
question_about_topic = 'What review informations do you have about the selected topic?'
selected_topic = 'prácticas concretas para la implementación de sistemas de gestión'
res_chain2 = chain2.invoke({
    'question_about_topic': question_about_topic,
    'selected_topic': selected_topic
})
print(res_chain2)

{'text_encountered': '- El artículo presenta prácticas concretas para la implementación de sistemas de gestión, incluyendo la estandarización de procesos, capacitación del personal y monitoreo continuo. [...]', 'confidence_level': 4, 'evaluation': 1, 'language': 'es', 'review_id': 2, 'paper_id': 1, 'timespan': '2010-07-05'}


### Chain 3

In [0]:
from youtubesearchpython import VideosSearch
from dataclasses import dataclass
from langchain.tools import tool, ToolRuntime
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain.agents import create_agent
from langchain_community.chat_models import ChatDatabricks
import json


# choose chat model
endpoint_llm3 = 'databricks-gemma-3-12b' 
llm_model3 = ChatDatabricks(
	endpoint=endpoint_llm3,
	max_tokens='200'
)

def parse_videos_search_result(res: str) -> str:
    '''Parse the result of the videos search'''
    res_parsed = json.loads(res['messages'][2].content)['result'][0]
    return res_parsed

@dataclass
class UserContext:
    search_terms: str

@tool
def get_video_info(runtime: ToolRuntime[UserContext], search_terms:  str) -> str:
    '''Search for videos in Youtube.'''
    search_terms = search_terms
    tool = VideosSearch(search_terms, limit=1)
    res_search = tool.result()
    return res_search

agent3 = create_agent(
    llm_model3,
    tools=[get_video_info],
    context_schema=UserContext,
    system_prompt='You are an assistant that can search videos in Youtube.'
)


# define a chain
chain3 = agent3 | RunnableLambda(parse_videos_search_result)

# run the chain
search_terms = 'Estandarización de procesos, capacitación del personal y auditorías periódicas'
result_search = chain3.invoke(
    {
        'messages': [{
            'role': 'user', 
            'content': 'Extract the link, title and description of the video for the following search terms: {search_terms}.'
        }]
    },
    context=UserContext(search_terms=search_terms)
)

print(f' . Video title: {result_search['title']}')
print(f' . Channel: {result_search['channel']['name']}')
print(f" . Duration: {result_search['duration']} s")
print(f" . Views: {result_search['viewCount']['short']}")
print(f" . URL: {result_search['link']}")

 . Video title: My OAV (Oxalic Acid Vaporization) Setup
 . Channel: Jimmy's Neighborhood Bees
 . Duration: 13:15 s
 . Views: 224 views
 . URL: https://www.youtube.com/watch?v=iSqXScdrNGo


In [0]:
result_search

{'type': 'video',
 'id': 'iSqXScdrNGo',
 'title': 'My OAV (Oxalic Acid Vaporization) Setup',
 'publishedTime': '4 months ago',
 'duration': '13:15',
 'viewCount': {'text': '224 views', 'short': '224 views'},
 'thumbnails': [{'url': 'https://i.ytimg.com/vi/iSqXScdrNGo/hq720.jpg?sqp=-oaymwE2COgCEMoBSFXyq4qpAygIARUAAIhCGAFwAcABBvABAfgB_gmAAtAFigIMCAAQARh_IBMoLzAP&rs=AOn4CLAKrJJQ4h_0lW9QIa6mVPdpxzQlQQ',
   'width': 360,
   'height': 202},
  {'url': 'https://i.ytimg.com/vi/iSqXScdrNGo/hq720.jpg?sqp=-oaymwE2CNAFEJQDSFXyq4qpAygIARUAAIhCGAFwAcABBvABAfgB_gmAAtAFigIMCAAQARh_IBMoLzAP&rs=AOn4CLA5vuH7Ng5alcys1Zm0HSu3isyr8g',
   'width': 720,
   'height': 404}],
 'richThumbnail': {'url': 'https://i.ytimg.com/an_webp/iSqXScdrNGo/mqdefault_6s.webp?du=3000&sqp=CPaxiMwG&rs=AOn4CLBiN4ELflkHJJPS76Uij-C4Q5u3Dw',
  'width': 320,
  'height': 180},
 'descriptionSnippet': [{'text': "Today I talk about how I store and use my Oxalic Acid System. I'm trying to be more organized so I've set up a storage\xa0..."}],

## Chaining chains

In [0]:
# from langchain_core.runnables import RunnableParallel
from langchain_core.runnables import RunnableParallel, RunnableLambda

# differents inputs for the chains
question_about_reviews = 'Can you explain about good practices for the implementation of management systems?'
question_about_topic = 'What review informations do you have about the selected topic?'
selected_topic = 'prácticas concretas para la implementación de sistemas de gestión'
search_terms = 'Estandarización de procesos, capacitación del personal y auditorías periódicas'

# chaining chains with RunnableParallel
multi_stage_chain = RunnableParallel({
        'chain1': RunnableLambda(lambda x: x['input_chain1']) | chain1,
        'chain2': RunnableLambda(lambda x: x['input_chain2']) | chain2,
        'chain3': RunnableLambda(lambda x: x['input_chain3']) | chain3,
    })

# run chains in parallel
res_multi_stage = multi_stage_chain.invoke({
    'input_chain1': {'question_about_reviews': question_about_reviews},
    'input_chain2': {
        'question_about_topic': question_about_topic,
        'selected_topic': selected_topic,
    },
    'input_chain3':
        {
            'messages': [{
                'role': 'user', 
                'content': 'Extract the link, title and description of the video for the following search terms: {search_terms}.',
                'context': search_terms
            }],
        },
})

print(res_multi_stage)

{'chain1': "Effective implementation of management systems is crucial for organizations to achieve their objectives. Good practices for implementation include:\n\n1. **Clear objectives and scope**: Define the system's purpose, goals, and boundaries.\n2. **Top-management commitment**: Demonstrate leadership's dedication to the system's success.\n3. **Stakeholder engagement**: Involve relevant parties in the implementation process.\n4. **Risk-based approach**: Identify and mitigate potential risks associated with the system.\n5. **Process-oriented design**: Structure the system around the organization's processes.\n6. **Training and awareness**: Educate employees on the system's requirements and their roles.\n7. **Monitoring and review**: Regularly assess the system's performance and make improvements.\n8. **Continuous improvement**: Foster a culture of ongoing enhancement and adaptation.\n\nBy following these good practices, organizations can ensure a successful management system implem

<img src="./imgs/chaining_chains.png">