# Generation: Generating a Response

In [None]:
# Run the line of code below to check the version of langchain in the current environment.
# Substitute "langchain" with any other package name to check their version.

In [1]:
# Confirm langchain version
!pip show langchain

Name: langchain
Version: 0.3.26
Summary: Building applications with LLMs through composability
Home-page: 
Author: 
Author-email: 
License: MIT
Location: C:\Users\Marcus\anaconda3\Lib\site-packages
Requires: langchain-core, langchain-text-splitters, langsmith, pydantic, PyYAML, requests, SQLAlchemy
Required-by: langchain-community


In [2]:
# Load environment variable
%load_ext dotenv
%dotenv

In [5]:
# Import langchain classes
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough
from langchain_core.runnables import RunnableParallel
from langchain_core.output_parsers import StrOutputParser

In [6]:
# Create vectorstore object
vectorstore = Chroma(persist_directory = "./intro-to-ds-lectures", 
                     embedding_function = OpenAIEmbeddings(model='text-embedding-ada-002'))

In [7]:
# Display quantity of documents
len(vectorstore.get()['documents'])

61

In [8]:
# Create vector store backed retriever by applying the as_retriever method to the vectorstore
retriever = vectorstore.as_retriever(search_type = 'mmr', 
                                     search_kwargs = {'k':3, 
                                                      'lambda_mult':0.7})

In [9]:
# Define prompt template from string template
TEMPLATE = '''
Answer the following question:
{question}

To answer the question, use only the following context:
{context}

At the end of the response, specify the name of the lecture this context is taken from in the format:
Resources: *Lecture Title*
where *Lecture Title* should be substituted with the title of all resource lectures.
'''
# Create instance of prompt template class, apply from template method and pass string as an argument
prompt_template = PromptTemplate.from_template(TEMPLATE)

In [10]:
# Define chat object
chat = ChatOpenAI(model_name = 'gpt-4', 
                  seed = 365,
                  max_tokens = 250)

In [11]:
# Create question varible to store input
question = "What software do data scientists use?"

In [12]:
# Create chain invoking elements
chain = ({'context': retriever, 
         'question': RunnablePassthrough()} 
         | prompt_template 
         | chat 
         | StrOutputParser())

In [13]:
# Invoke chain
chain.invoke(question)

"Data scientists primarily use tools such as R and Python due to their adaptability and capability to manipulate data. They are not only used for mathematical and statistical computations but can solve various business and data-related issues. Other software used in data science includes Hadoop, a software framework designed for handling big data's complexity and computational intensity. Business intelligence visualization software like Power BI, SaS, Qlik, and Tableau are also used.\n\nResources: Programming Languages & Software Employed in Data Science - All the Tools You Need"

In [14]:
# Print to inspect answer
print('Data scientists use a variety of software tools. R and Python are the two most popular tools as they can manipulate data and are integrated within multiple data and data science software platforms. They are adaptable and can solve a wide range of business and data-related problems. Hadoop is a software framework designed to handle the complexity and computational intensity of big data by distributing computational tasks on multiple computers. Additionally, Power BI, SaS, Qlik, and Tableau are top-notch examples of software designed for business intelligence visualizations.\n\nResources: Programming Languages & Software Employed in Data Science - All the Tools You Need')

Data scientists use a variety of software tools. R and Python are the two most popular tools as they can manipulate data and are integrated within multiple data and data science software platforms. They are adaptable and can solve a wide range of business and data-related problems. Hadoop is a software framework designed to handle the complexity and computational intensity of big data by distributing computational tasks on multiple computers. Additionally, Power BI, SaS, Qlik, and Tableau are top-notch examples of software designed for business intelligence visualizations.

Resources: Programming Languages & Software Employed in Data Science - All the Tools You Need
