In [10]:
from langchain_chroma.vectorstores import Chroma

from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_core.tools import tool
from langchain_core.tools import create_retriever_tool

from platform import python_version



In [17]:
embedding = OpenAIEmbeddings()

In [18]:
vectorstore = Chroma(
    persist_directory='.chromadb_intro_to_data_science/', 
    embedding_function=embedding
)

In [19]:
retriever = vectorstore.as_retriever(
    search_type='mmr', 
    search_kwargs={'k':3, 'lambda_mult':0.7}
)

In [20]:
retriever.invoke("What tools do data sciencist use?")

[Document(metadata={'Lesson': 'Programming Languages & Software Employed in Data Science - All the Tools You Need', 'Title': 'Introduction to Data Science'}, page_content='Their smaller scope does not make them less useful, in fact, just the opposite—they are a lot easier to learn and be adopted by others. You have already heard of several of those. Because of its ability to do relatively complex computations and good visualizations quickly, Excel is a tool applicable to more than one category—traditional data, BI, and Data Science. Similarly, SPSS is a very famous tool for working with traditional data and applying statistical analysis.'),
 Document(metadata={'Lesson': 'Programming Languages & Software Employed in Data Science - All the Tools You Need', 'Title': 'Introduction to Data Science'}, page_content='As you can see from the infographic, R, and Python are the two most popular tools across all columns. Their biggest advantage is that they can manipulate data and are integrated w

In [26]:
retriever_tool = create_retriever_tool(
    retriever=retriever, 
    name='Introduction to Data and Data Science Course Lectures',
    description="""For any questions regarding the 
    Introduction to Data and Data Science course you must use this tool.
    """
    )
print(retriever_tool)

name='Introduction to Data and Data Science Course Lectures' description='For any questions regarding the \n    Introduction to Data and Data Science course you must use this tool.\n    ' args_schema=<class 'langchain_core.tools.retriever.RetrieverInput'> func=functools.partial(<function _get_relevant_documents at 0x000001CD139D0CC0>, retriever=VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001CD2641C750>, search_type='mmr', search_kwargs={'k': 3, 'lambda_mult': 0.7}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_separator='\n\n') coroutine=functools.partial(<function _aget_relevant_documents at 0x000001CD139D1580>, retriever=VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001CD2641C750>, search_type='mmr', search_kwargs={'k': 3, 'lambda_mult':

In [27]:
retriever_tool.args

{'query': {'description': 'query to look up in retriever',
  'title': 'Query',
  'type': 'string'}}

In [29]:
print(retriever_tool.invoke("Could you list the programming languages a data sciencist should know"))

Thus, we need a lot of computational power, and we can expect people to use the languages similar to those in the big data column. Apart from R, Python, and MATLAB, other, faster languages are used like Java, JavaScript, C, C++, and Scala. Cool. What we said may be wonderful, but that’s not all! By using one or more programming languages, people create application software or, as they are sometimes called, software solutions, that are adjusted for specific business needs.

What about big data? Apart from R and Python, people working in this area are often proficient in other languages like Java or Scala. These two have not been developed specifically for doing statistical analyses, however they turn out to be very useful when combining data from multiple sources. All right! Let’s finish off with machine learning. When it comes to machine learning, we often deal with big data.

More importantly, it will be sufficient for your need to create quick and accurate analyses. However, if your 

## Custom tool

In [31]:
@tool("Another name")
def get_python_version() -> str:
    ''' Usefull for questions regarding the version od Python currently used.'''
    return python_version()

In [35]:
get_python_version({})

'3.11.9'