In [2]:
%load_ext dotenv
%dotenv

In [127]:
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.output_parsers import StrOutputParser

from langchain_community.utilities import WikipediaAPIWrapper
from langchain_community.tools import WikipediaQueryRun

from langchain_chroma import Chroma
from langchain_core.tools import tool
from langchain_core.tools import create_retriever_tool

from langchain import hub
from langchain.agents import AgentExecutor, create_tool_calling_agent

from platform import python_version

In [6]:
wikipedia_api = WikipediaAPIWrapper()

In [26]:
print(wikipedia_api.run('python'))

Page: Python (programming language)
Summary: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically type-checked and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.
Python consistently ranks as one of the most popular programming languages, and has gained widespread use in the machine learning community.



Pa

In [34]:
wikipedia_tool = WikipediaQueryRun(api_wrapper=wikipedia_api)

In [36]:
wikipedia_tool.invoke('python')



  lis = BeautifulSoup(html).find_all('li')


'Page: Python (programming language)\nSummary: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.\nPython is dynamically type-checked and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.\nGuido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.\nPython consistently ranks as one of the most popular programming languages, and has gained widespread use in the machine learning community.\

In [58]:
TEMPLATE = '''
Convert the question from the user into a wikipedia search query. DO NOT answer the question.

{question}
'''

prompt_template = PromptTemplate.from_template(template=TEMPLATE)

In [60]:
llm = ChatOpenAI(
    model='gpt-4o-mini',
    temperature=0,
    max_completion_tokens=200
)

In [68]:
chain = prompt_template | llm | StrOutputParser() | wikipedia_tool

In [72]:
chain.invoke({'question':'Who was the creator of python?'})

'Page: Guido van Rossum\nSummary: Guido van Rossum (Dutch: [ˈxidoː vɑn ˈrɔsʏm]; born 31 January 1956) is a Dutch programmer. He is the creator of the Python programming language, for which he was the "benevolent dictator for life" (BDFL) until he stepped down from the position on 12 July 2018. He remained a member of the Python Steering Council through 2019, and withdrew from nominations for the 2020 election.\n\n\n\nPage: Python (programming language)\nSummary: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.\nPython is dynamically type-checked and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.\nGuido van Rossum began working on Python in the late 1980s as a successor to the ABC

In [76]:
embedding = OpenAIEmbeddings(model='text-embedding-ada-002')

In [78]:
vectorstore = Chroma(persist_directory='./vector_db',embedding_function=embedding)

In [82]:
retriver = vectorstore.as_retriever(search_type = 'mmr', search_kwargs={'k':3,'lambda_multi':0.7})
retriver

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x117c597c0>, search_type='mmr', search_kwargs={'k': 3, 'lambda_multi': 0.7})

In [86]:
retriver.invoke('What software do data scientists use ?')

[Document(id='fee6d2f3-7028-4ba6-8112-31cb32abc434', metadata={'course title': 'Data Analytics Course', 'module title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need'}, page_content='As you can see from the infographic, R, and Python are the two most popular tools across all columns. Their biggest advantage is that they can manipulate data and are integrated within multiple data and data science software platforms. They are not just suitable for mathematical and statistical computations. In other words, R, and Python are adaptable. They can solve a wide variety of business and data-related problems from beginning to the end'),
 Document(id='d7780371-ba9d-412d-941c-473e01924b40', metadata={'course title': 'Data Analytics Course', 'module title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need'}, page_content='Among the many applications we have plotted, we can say there is an increasing amount of software designed

In [145]:
retriver_tool = create_retriever_tool(retriever=retriver, 
                                      name='Data-Science-and-Data-Analytics', 
                                      description='''
                                      Use this tool in retrive information related to Data Science and Data Analytics
                                      ''')

In [147]:
retriver_tool

Tool(name='Data-Science-and-Data-Analytics', description='\n                                      Use this tool in retrive information related to Data Science and Data Analytics\n                                      ', args_schema=<class 'langchain_core.tools.retriever.RetrieverInput'>, func=functools.partial(<function _get_relevant_documents at 0x1128a9080>, retriever=VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x117c597c0>, search_type='mmr', search_kwargs={'k': 3, 'lambda_multi': 0.7}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_separator='\n\n', response_format='content'), coroutine=functools.partial(<function _aget_relevant_documents at 0x1128a91c0>, retriever=VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x117c597c0>, search_type='mmr', sea

In [149]:
@tool('get-python-version')
def get_python_version() -> str:
    '''Use this tool to get the current python version being used'''
    return python_version()

In [151]:
get_python_version(tool_input='')

'3.12.9'

In [153]:
tools = [wikipedia_tool, retriver_tool, get_python_version]

In [155]:
chat_prompt_template = hub.pull('hwchase17/openai-tools-agent')



In [157]:
agent = create_tool_calling_agent(llm=llm, tools=tools, prompt=chat_prompt_template)

In [159]:
agent_executor = AgentExecutor(tools=tools, agent=agent, verbose=True, return_intermediate_steps=True)

In [161]:
agent_executor.invoke({'input':"Could you tell me what's my current python version?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get-python-version` with `{}`


[0m[38;5;200m[1;3m3.12.9[0m[32;1m[1;3mYour current Python version is 3.12.9.[0m

[1m> Finished chain.[0m


{'input': "Could you tell me what's my current python version?",
 'output': 'Your current Python version is 3.12.9.',
 'intermediate_steps': [(ToolAgentAction(tool='get-python-version', tool_input={}, log='\nInvoking: `get-python-version` with `{}`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_o7nfCJnfPwwlJ9OC3VZCUBs8', 'function': {'arguments': '{}', 'name': 'get-python-version'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0392822090'}, id='run-4c7af004-04de-4964-bf33-ce24cf37bc15', tool_calls=[{'name': 'get-python-version', 'args': {}, 'id': 'call_o7nfCJnfPwwlJ9OC3VZCUBs8', 'type': 'tool_call'}], tool_call_chunks=[{'name': 'get-python-version', 'args': '{}', 'id': 'call_o7nfCJnfPwwlJ9OC3VZCUBs8', 'index': 0, 'type': 'tool_call_chunk'}])], tool_call_id='call_o7nfCJnfPwwlJ9OC3VZCUBs8'),
   '3.12.9')]}

In [163]:
agent_executor.invoke({'input':"What software do data scientists use?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Data-Science-and-Data-Analytics` with `{'query': 'software used by data scientists'}`


[0m[33;1m[1;3mIt’s actually a software framework which was designed to address the complexity of big data and its computational intensity. Most notably, Hadoop distributes the computational tasks on multiple computers which is basically the way to handle big data nowadays. Power BI, SaS, Qlik, and especially Tableau are top-notch examples of software designed for business intelligence visualizations

As you can see from the infographic, R, and Python are the two most popular tools across all columns. Their biggest advantage is that they can manipulate data and are integrated within multiple data and data science software platforms. They are not just suitable for mathematical and statistical computations. In other words, R, and Python are adaptable. They can solve a wide variety of business and data-related problems from begi

{'input': 'What software do data scientists use?',
 'output': 'Data scientists use a variety of software tools to analyze data, build models, and visualize results. Here are some of the most commonly used software:\n\n1. **Programming Languages**:\n   - **Python**: Widely used for data manipulation, analysis, and machine learning due to its extensive libraries (like Pandas, NumPy, and Scikit-learn).\n   - **R**: Popular for statistical analysis and data visualization, with a rich ecosystem of packages.\n\n2. **Data Visualization Tools**:\n   - **Tableau**: A leading tool for creating interactive and shareable dashboards.\n   - **Power BI**: A Microsoft tool for business analytics that provides interactive visualizations.\n   - **Qlik**: Offers business intelligence solutions for data visualization and reporting.\n\n3. **Big Data Technologies**:\n   - **Hadoop**: A framework that allows for distributed processing of large data sets across clusters of computers.\n\n4. **Statistical Softw

In [171]:
agent_executor.invoke({'input':'''
I am following a data science coure -
Could you tell me what are the most used software used by data scientists that I need to learn ?
Also, who the creators of those tools were ?
'''})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Data-Science-and-Data-Analytics` with `{'query': 'most used software by data scientists'}`


[0m[33;1m[1;3mAs you can see from the infographic, R, and Python are the two most popular tools across all columns. Their biggest advantage is that they can manipulate data and are integrated within multiple data and data science software platforms. They are not just suitable for mathematical and statistical computations. In other words, R, and Python are adaptable. They can solve a wide variety of business and data-related problems from beginning to the end

Among the many applications we have plotted, we can say there is an increasing amount of software designed for working with big data such as Apache Hadoop, Apache Hbase, and Mongo DB. In terms of big data, Hadoop is the name that must stick with you. Hadoop is listed as a software in the sense that it is a collection of programs, but don’t imagine it as a nice-look

{'input': '\nI am following a data science coure -\nCould you tell me what are the most used software used by data scientists that I need to learn ?\nAlso, who the creators of those tools were ?\n',
 'output': "In the field of data science, some of the most commonly used software and programming languages include:\n\n1. **Python**: \n   - **Creator**: Guido van Rossum began working on Python in the late 1980s and first released it in 1991. Python is known for its readability and versatility, making it a favorite among data scientists for data manipulation, analysis, and machine learning.\n\n2. **R**: \n   - R is another popular programming language specifically designed for statistical computing and graphics. It is widely used among statisticians and data miners.\n\n3. **Apache Hadoop**: \n   - Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers. It is essential for big data applications.\n\n4. **Apache HBase**: \n   - HBase 