[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb)

# Building RAG Chatbots with LangChain

We will be using LangChain, OpenAI, and Pinecone vector DB, to build a chatbot capable of learning from the external world using **R**etrieval **A**ugmented **G**eneration (RAG).

The dataset we will be using is our own Notion Active Learning Repository.

### Prerequisites

Before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of what each library does:

- **langchain**: This is a library for GenAI. We'll use it to chain together different language models and components for our chatbot.
- **openai**: This is the official OpenAI Python client. We'll use it to interact with the OpenAI API and generate responses for our chatbot.
- **datasets**: This library provides a vast array of datasets for machine learning. We'll use it to load our knowledge base for the chatbot.
- **pinecone-client**: This is the official Pinecone Python client. We'll use it to interact with the Pinecone API and store our chatbot's knowledge base in a vector database.

You can install these libraries using pip like so:

In [None]:
!pip install -qU \
    langchain==0.0.354 \
    openai==1.6.1 \
    datasets==2.10.1 \
    pinecone-client==3.1.0 \
    tiktoken==0.5.2

### Building a Chatbot (no RAG)

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a `ChatOpenAI` object. For this we do need an [OpenAI API key](https://platform.openai.com/account/api-keys).

In [None]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or "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"

chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

  warn_deprecated(


Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically structured (in plain text) like this:

```
System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: Tell me about active learning

Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "Tell me about active learning"}
]
```

In LangChain there is a slightly different format. We use three _message_ objects like so:

In [None]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="Tell me about active learning")
]

The format is very similar, we're just swapped the role of `"user"` for `HumanMessage`, and the role of `"assistant"` for `AIMessage`.

We generate the next response from the AI by passing these messages to the `ChatOpenAI` object.

In [None]:
res = chat(messages)
res

  warn_deprecated(


AIMessage(content='Active learning is an approach to learning that involves students engaging in activities that require them to process information, make connections, and apply their knowledge actively. This can include discussions, group work, problem-solving tasks, hands-on experiments, and other interactive activities. Active learning helps students develop critical thinking skills, improve retention of information, and enhance their overall understanding of the subject matter. It is often seen as more effective than passive learning methods like traditional lectures because it encourages students to be actively involved in the learning process.')

In response we get another AI message object. We can print it more clearly like so:

In [None]:
print(res.content)

Active learning is an approach to learning that involves students engaging in activities that require them to process information, make connections, and apply their knowledge actively. This can include discussions, group work, problem-solving tasks, hands-on experiments, and other interactive activities. Active learning helps students develop critical thinking skills, improve retention of information, and enhance their overall understanding of the subject matter. It is often seen as more effective than passive learning methods like traditional lectures because it encourages students to be actively involved in the learning process.


Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="How can I implement Active Learning for Biomedical Engineering, I want to teach Thermodynamics"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Implementing active learning in a Biomedical Engineering course on Thermodynamics can be a great way to engage students and help them understand the concepts more effectively. Here are some strategies you can use:

1. Problem-based learning: Present students with real-world biomedical engineering problems that require the application of thermodynamics principles. Have them work in small groups to analyze and solve these problems, encouraging discussion and collaboration.

2. Case studies: Incorporate case studies related to biomedical engineering applications of thermodynamics. Ask students to analyze the cases, identify key concepts, and propose solutions based on their understanding of thermodynamics principles.

3. Hands-on experiments: Conduct hands-on experiments or demonstrations that illustrate thermodynamics concepts in the context of biomedical engineering. This can help students visualize and understand the principles in action.

4. Interactive simulations: Use interactive si

### Dealing with Hallucinations

We have our chatbot, but as mentioned — the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the _parametric knowledge_ of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM.

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you give me an example (research paper) of Active Learning strategy used for biomedical engineering?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

I'm unable to provide specific research papers as I don't have access to external sources. However, I can offer you a general example based on the context provided earlier:

**Research Paper Title:** "Enhancing Student Engagement and Problem-Solving Skills in Biomedical Engineering Through Problem-Based Learning"

**Authors:** Dr. Jessica Lee, Dr. Michael Patel, Dr. Sarah Wang

**Abstract:** This research paper explores the implementation of Problem-Based Learning (PBL) as an active learning strategy in a Biomedical Engineering curriculum focusing on medical device innovation. The study investigates the impact of PBL on student engagement, critical thinking, and practical application of engineering principles in biomedical contexts. Data was collected through student surveys, assessments, and classroom observations. Results demonstrate that PBL effectively enhances student problem-solving skills, fosters collaboration, and promotes a deeper understanding of biomedical engineering conce

Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it _does_ know the answer — and this can be very hard to detect.

For Example:

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you give me an example (research paper) of Active Learning strategy used for biomedical engineering?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

I don't have the ability to provide direct links to external content. However, you can easily search for research papers on active learning strategies in Biomedical Engineering by using academic databases like PubMed, IEEE Xplore, or Google Scholar. Simply enter keywords such as "active learning," "biomedical engineering," and specific topics of interest to find relevant research papers on the subject.


There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

In [None]:
llmchain_information = [
    '{"method_name": "Problem-based learning (PBL)", "description": "Students work on real-world problems to learn concepts.", "target_learners": "Suitable for all levels, particularly motivates advanced learners.", "group_size": "Small groups (3-5 students)", "time_commitment": "High preparation time, moderate class time.", "materials_resources": ["Project materials and resources relevant to the chosen problem.", "Access to technology and research tools."], "advantages": ["Increases engagement and motivation.", "Develops critical thinking and problem-solving skills.", "Promotes collaboration and communication skills.", "Encourages deep understanding and application of knowledge."], "disadvantages": ["Requires careful planning and preparation.", "May not be suitable for all topics or learning objectives.", "Can be challenging for students who are not used to independent learning.", "Assessment can be complex and time-consuming."], "resources": ["https://www.edutopia.org/article/problem-based-learning-guide", "https://www.pblworks.org/"]}',
    '{"method_name": "Peer learning", "description": "Students collaborate and teach each other.", "target_learners": "All levels, especially benefits diverse learning styles.", "group_size": "Pairs or small groups", "time_commitment": "Moderate preparation time, variable class time.", "materials_resources": ["Clear learning objectives and instructions.", "Materials for individual and collaborative work."], "advantages": ["Promotes active participation and engagement.", "Develops communication and collaboration skills.", "Encourages different perspectives and deeper understanding.", "Provides opportunities for peer feedback and support."], "disadvantages": ["Requires careful planning and structure to ensure all students participate.", "May be less effective for complex topics or individual learning needs.", "Can be challenging to manage large groups or students with diverse learning styles."], "resources": ["https://www.edutopia.org/article/peer-learning-strategies", "https://www.teachervision.com/teaching-methods/active-learning/effective-peer-learning-activities"]}',
    '{"method_name": "3-2-1 Survey", "description": "Students conduct a structured exit survey to gather learning and engagement feedback. Ask three prompts: 3 concepts learned, 2 applications of concepts, and 1 question.", "target_learners": "All levels", "group_size": "Individual", "time_commitment": "3-10 minutes", "materials_resources": ["Survey prompts (3 concepts, 2 applications, 1 question)"], "advantages": ["Evaluates student learning", "Identifies areas of understanding and confusion", "Encourages reflection on learning experience"], "disadvantages": ["Requires preparation", "May not capture all student perspectives"], "resources": ["Northern Illinois University - 3-2-1 format: https://soediped2019.weebly.com/uploads/1/0/9/5/109514741/formative_and_summative_assessment.pdf"]}',
    '{"method_name": "Affective Response", "description": "Students indicate their emotional response to a learning experience using a visual scale or other method.", "target_learners": "All levels", "group_size": "Individual", "time_commitment": "1-2 minutes", "materials_resources": ["Visual scales or emoji prompts"], "advantages": ["Gauges student engagement and emotions", "Identifies aspects of the learning experience that resonate or cause discomfort"], "disadvantages": ["Limited feedback", "May not capture the full range of student experiences"], "resources": null}',
    '{"method_name": "Backchannel Discussion", "description": "Students use a digital tool to share their thoughts and questions about a topic in real time.", "target_learners": "All levels", "group_size": "Whole class or small groups", "time_commitment": "5-20 minutes", "materials_resources": ["Digital tool like Slido or Poll Everywhere"], "advantages": ["Encourages active participation and dialogue", "Promotes deeper understanding", "Gauges student understanding"], "disadvantages": ["Requires technology and digital literacy", "May be difficult to manage large groups"], "resources": null}',
    '{"method_name": "Background Knowledge Probe", "description": "Students answer questions or complete a task to assess their prior knowledge of a topic before instruction begins.", "target_learners": "All levels", "group_size": "Individual or small groups", "time_commitment": "5-10 minutes", "materials_resources": ["Assessment questions or tasks"], "advantages": ["Identifies student learning gaps and areas of prior understanding", "Tailors instruction to individual student needs", "Activates prior knowledge and builds upon existing foundation"], "disadvantages": ["Requires preparation", "May not be engaging for all students"], "resources": null}',
    '{"method_name": "Brainstorm", "description": "Students generate a list of ideas or solutions to a problem or question in a collaborative setting.", "target_learners": "All levels", "group_size": "Small groups or whole class", "time_commitment": "10-15 minutes", "materials_resources": ["Whiteboard or flipchart", "Markers or pens", "Problem or question to be addressed"], "advantages": ["Encourages creativity and critical thinking", "Explores different perspectives and possibilities", "Generates a wide range of ideas to consider"], "disadvantages": ["Can be dominated by a few students", "May not be structured enough for some learners"], "resources": null}',
    '{"method_name": "Case Study", "description": "Students analyze a real-world case study to apply their knowledge and skills to a practical scenario.", "target_learners": "Advanced learners", "group_size": "Small groups or individual", "time_commitment": "20-45 minutes", "materials_resources": ["Case study document", "Guiding questions", "Presentation tools"], "advantages": ["Develops critical thinking and problem-solving skills", "Applies theoretical knowledge to real-world situations", "Enhances understanding of complex concepts"], "disadvantages": ["Requires preparation and time", "May be challenging for some learners"], "resources": null}',
    '{"method_name": "Concept Map", "description": "Students visually represent the relationships between key concepts and ideas.", "target_learners": "All levels", "group_size": "Individual or small groups", "time_commitment": "15-30 minutes", "materials_resources": ["Chart paper, markers, or online tools"], "advantages": ["Promotes deeper understanding and organization of knowledge", "Identifies relationships between concepts", "Provides a visual representation for review and discussion"], "disadvantages": ["Can be time-consuming to create", "May be challenging for some learners to visualize complex relationships"], "resources": null}',
    '{"method_name": "Debate", "description": "Students explore opposing viewpoints on a topic through formal or informal debate.", "target_learners": "Advanced learners", "group_size": "Whole class or small groups", "time_commitment": "20-45 minutes", "materials_resources": ["Debate topic", "Research materials", "Timekeeper"], "advantages": ["Develops critical thinking and argumentation skills", "Encourages research and analysis", "Promotes respectful communication and collaboration"], "disadvantages": ["Requires preparation and research", "Can be intimidating for some students", "May not be appropriate for all topics"], "resources": null}',
    '{"method_name": "Exit Ticket", "description": "Students answer a brief prompt or question to assess their understanding at the end of a lesson.", "target_learners": "All levels", "group_size": "Individual", "time_commitment": "1-2 minutes", "materials_resources": ["Exit ticket prompts"], "advantages": ["Provides immediate feedback to teachers", "Identifies areas where students need additional support", "Allows for quick adjustments to instruction"], "disadvantages": ["May not capture the full range of student understanding", "Can be overwhelming for some students"], "resources": null}',
    '{"method_name": "Flipped Classroom", "description": "Students learn new material outside of class through lectures, readings, or online resources, and then use class time for active learning and application.", "target_learners": "All levels", "group_size": "Variable", "time_commitment": "Variable", "materials_resources": ["Online resources, textbooks, videos, lecture notes"], "advantages": ["Allows for personalized learning", "Provides more class time for active learning", "Empowers students to take ownership of their learning"], "disadvantages": ["Requires access to technology and reliable internet", "May require significant self-motivation and discipline", "May not be suitable for all topics or learning styles"], "resources": null}',
    '{"method_name": "Gallery Walk", "description": "Students rotate around stations displaying information on a topic, answering questions, and engaging in discussions.", "target_learners": "All levels", "group_size": "Small groups", "time_commitment": "20-30 minutes", "materials_resources": ["Station materials (e.g., posters, pictures, artifacts)", "Response sheets or prompts", "Station markers"], "advantages": ["Promotes active engagement and movement", "Encourages collaboration and discussion", "Provides multiple perspectives on a topic"], "disadvantages": ["Requires preparation and set-up", "May not be suitable for large groups", "Can be noisy and disruptive"], "resources": null}',
    '{"method_name": "Jigsaw", "description": "Students become experts on a specific part of a topic and then teach others in their group.", "target_learners": "All levels", "group_size": "Small groups", "time_commitment": "30-45 minutes", "materials_resources": ["Topic divided into sections", "Expert group materials", "Sharing materials (e.g., flipcharts, markers)"], "advantages": ["Promotes deep understanding and expertise", "Develops communication and collaboration skills", "Provides all students with an opportunity to teach"], "disadvantages": ["Requires careful planning and organization", "Can be challenging for some students to independently learn a topic", "May not be suitable for all topics"], "resources": null}',
    '{"method_name": "Muddiest Point", "description": "Students anonymously identify the aspect of the lesson they found most confusing.", "target_learners": "All levels", "group_size": "Individual or whole class", "time_commitment": "5-10 minutes", "materials_resources": ["Sticky notes or cards", "Pen or markers"], "advantages": ["Identifies areas where students need additional instruction", "Provides immediate feedback to teachers", "Allows for quick clarification of confusing concepts"], "disadvantages": ["May not capture the full range of student confusion", "Can be overwhelming if students are unsure about multiple concepts"], "resources": null}',
    '{"method_name": "One Minute Paper", "description": "Students briefly reflect on the key points of a lesson and write down one question they still have.", "target_learners": "All levels", "group_size": "Individual", "time_commitment": "2-3 minutes", "materials_resources": ["Paper or notecards", "Pens or pencils"], "advantages": ["Provides quick and easy feedback to teachers", "Encourages reflection and critical thinking", "Identifies areas where students need additional support"], "disadvantages": ["May not capture the full range of student learning", "Can be challenging for some students to articulate their thoughts"], "resources": null}',
    '{"method_name": "Think-Pair-Share", "description": "Students individually think about a question or prompt, then discuss their ideas with a partner, and finally share with the whole class.", "target_learners": "All levels", "group_size": "Pairs", "time_commitment": "5-10 minutes", "materials_resources": ["Question or prompt"], "advantages": ["Promotes active thinking and discussion", "Provides opportunities for peer collaboration and support", "Allows for diverse perspectives to be shared"], "disadvantages": ["May not be suitable for complex topics", "Can be dominated by a few students in each pair"], "resources": null}',
    '{"method_name": "Two-Stage Quiz", "description": "Students first answer questions individually, then discuss and revise their answers in groups before submitting a final answer.", "target_learners": "All levels", "group_size": "Small groups", "time_commitment": "15-20 minutes", "materials_resources": ["Quiz questions", "Paper or answer sheets"], "advantages": ["Promotes individual accountability and peer learning", "Encourages critical thinking and discussion", "Provides opportunity for students to learn from each other"], "disadvantages": ["Requires preparation and coordination", "May be time-consuming", "May not be suitable for all types of assessments"], "resources": null}',
    '{"method_name": "Concept Tests", "description": "Students answer short, frequent quizzes to gauge their understanding of key concepts and identify areas needing clarification.", "target_learners": "All levels", "group_size": "Individual", "time_commitment": "5-10 minutes", "materials_resources": ["Concept test questions", "Online platforms or paper and pens"], "advantages": ["Provides immediate feedback to both students and instructors", "Identifies knowledge gaps and misconceptions", "Promotes active engagement and focus"], "disadvantages": ["Requires careful question design", "May induce test anxiety in some students", "May not capture the full range of student understanding"], "resources": null}',
    '{"method_name": "Course Web Pages and Web-Based Course Evaluations", "description": "Interactive online platforms that offer resources, activities, and feedback to enhance learning and student engagement.", "target_learners": "All levels", "group_size": "Individual or variable", "time_commitment": "Variable", "materials_resources": ["Course website or learning management system"], "advantages": ["Provides readily accessible learning resources and activities", "Allows for personalized learning and feedback", "Facilitates communication and collaboration"], "disadvantages": ["Requires technical infrastructure and maintenance", "Access to technology and internet may be a barrier for some students", "May require additional training for instructors and students"], "resources": null}',
    '{"method_name": "Electronic Response Systems", "description": "Students use handheld devices or smartphones to respond to questions and participate in polls, creating an interactive learning environment.", "target_learners": "All levels", "group_size": "Whole class", "time_commitment": "Variable", "materials_resources": ["Electronic response system devices or software"], "advantages": ["Encourages active participation and engagement", "Provides immediate feedback and data for analysis", "Promotes anonymity and diverse perspectives"], "disadvantages": ["Requires additional technology and financial investment", "May not be suitable for all learning activities", "Technical issues can disrupt the learning process"], "resources": null}',
    '{"method_name": "Game-Based Learning", "description": "Students learn through interactive games and simulations, promoting engagement, motivation, and application of knowledge.", "target_learners": "All levels", "group_size": "Individual or small groups", "time_commitment": "Variable", "materials_resources": ["Educational games or simulations", "Computers or other devices"], "advantages": ["Enhances motivation and engagement", "Promotes problem-solving and critical thinking skills", "Provides a safe environment for experimentation and learning from mistakes"], "disadvantages": ["May require significant development time and resources", "Game mechanics may distract from learning objectives", "Not all topics are suitable for game-based learning"], "resources": null}',
    '{"method_name": "Just-in-Time Teaching", "description": "Students answer pre-class questions or complete short activities to assess their prior knowledge and guide the instructor\'s teaching.", "target_learners": "All levels", "group_size": "Individual or whole class", "time_commitment": "5-10 minutes before class", "materials_resources": ["Pre-class questions or activities", "Online platform or paper and pens"], "advantages": ["Tailors instruction to students\' individual needs and prior knowledge", "Activates prior knowledge and builds upon existing foundation", "Provides early insights into student understanding"], "disadvantages": ["Requires additional instructor preparation", "May be challenging for some students to complete pre-class activities"], "resources": null}',
    '{"method_name": "Pair Programming", "description": "Students work in pairs to solve problems or complete tasks, collaboratively applying their knowledge and skills.", "target_learners": "All levels", "group_size": "Pairs", "time_commitment": "Variable", "materials_resources": ["Problem or task description", "Computer or other tools"], "advantages": ["Promotes collaboration and communication skills", "Provides opportunities for peer learning and feedback", "Encourages different perspectives and deeper understanding"], "disadvantages": ["May be challenging for some students to work in pairs", "May not be suitable for all topics or learning objectives", "Can be difficult to manage large groups"], "resources": null}',
    '{"method_name": "POGIL (Process Oriented Guided Inquiry Learning)", "description": "Students work in small groups to investigate phenomena, analyze data, and construct explanations through guided inquiry activities.", "target_learners": "All levels", "group_size": "Small groups", "time_commitment": "Variable", "materials_resources": ["POGIL activities and handouts", "Laboratory equipment or other resources"], "advantages": ["Develops critical thinking and problem-solving skills", "Promotes active learning and student collaboration", "Encourages scientific inquiry and reasoning"], "disadvantages": ["Requires carefully designed activities and materials", "May be time-consuming to implement", "May not be suitable for all topics"], "resources": null}',
    '{"method_name": "Project-Based Learning", "description": "Students work on extended projects that require research, design, implementation, and presentation of their findings.", "target_learners": "All levels", "group_size": "Individual or small groups", "time_commitment": "Variable", "materials_resources": ["Project guidelines and resources", "Computer or other tools"], "advantages": ["Promotes self-directed learning and problem-solving skills", "Encourages collaboration, communication, and presentation skills", "Provides opportunities for applying knowledge to real-world problems"], "disadvantages": ["Requires significant time commitment and resources", "May be challenging to manage large projects and diverse student groups", "May not be suitable for all topics or learning styles"], "resources": null}',
    '{"method_name": "Simulations and Role Playing", "description": "Students participate in simulated scenarios or role-playing activities that apply their knowledge and skills to real-world situations.", "target_learners": "All levels", "group_size": "Variable", "time_commitment": "Variable", "materials_resources": ["Simulation scenarios or role-play instructions", "Props or costumes (optional)"], "advantages": ["Provides opportunities for hands-on learning and application of knowledge", "Encourages critical thinking, decision-making, and communication skills", "Promotes empathy and understanding of different perspectives"], "disadvantages": ["May require significant preparation and resources", "Can be overwhelming for some students", "May not be suitable for all topics or learning styles"], "resources": null}',
]

source_knowledge = "\n".join(llmchain_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [None]:
query = "Can you tell me about the Flipped Classroom?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

**Flipped Classroom**

**Description:** In a Flipped Classroom model, students learn new material outside of class through lectures, readings, or online resources, and then use class time for active learning and application. This approach shifts traditional instruction to a student-centered model where students engage with content independently before coming to class, allowing for more interactive and collaborative activities during class time.

**Target Learners:** The Flipped Classroom model is suitable for all levels of learners, offering personalized learning opportunities and empowering students to take ownership of their learning process.

**Group Size:** The group size in a Flipped Classroom can vary based on individual learning needs and activities.

**Time Commitment:** The time commitment for a Flipped Classroom model is variable, as students engage with pre-class materials at their own pace and participate in active learning activities during class time.

**Materials/Resourc

The quality of this answer is phenomenal. This is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem — how do we get this information in the first place?

We learned in the previous chapters about Pinecone and vector databases. Well, they can help us here too. But first, we'll need a dataset.

### Importing the Data

In [None]:
import pandas as pd

df = pd.read_excel("/content/Active Learning Repo.xlsx")  # Assuming your data is in an Excel file
df.head()

Unnamed: 0,Major,Pedagogy Technique,Definition,Objective of the teaching technique,Resources,Citation,Course Structure,Method,Class size,Findings,S-C,S-S,S-T,ICAP,"A,S",Steps to try in Sandbox
0,Automation & Control Engineering,Muddiest Point Technique,Students write down the muddiest point they ha...,1. You can quickly check for understanding. Th...,https://www.niu.edu/citl/resources/guides/exam...,"Agavekar, R., Bhore, P., Kadam, H., & Moharir,...","Due to Covid 19 pandemic situations, the teach...",The active learning strategies viz. Muddiest p...,, All the students are of the opinion that the...,2.0,1.0,2.0,IC,S,1. Determine what feedback you want. Do you wa...
1,Systems Engineering,Project Based Learning,Students work on long-term projects that requi...,The goal is to build students' creative capaci...,Buck Institute for Education. (n.d.).Project b...,"E. Mills, J., & F. Treagust, D. (n.d.). ENGINE...",Course Structure: Aalborg University: Students...,Aalborg University: Students work in groups of...,*(the students work in groups of 5 but with no...,Aalborg's project-based program produced gradu...,2.0,1.0,2.0,IC,"A,S",
2,Computer Science & Software Engineering,Flipped Classroom,Students watch lectures or other instructional...,The main idea is to have students view and/or ...,"Braseby, A. M. (2014). The flipped classroom. ...","Lin, Y. (2021). Effects of Flipped Learning Ap...",,All students from the three classes were divid...,The participants were 54 students (ranging in ...,- According to the results of the students’ le...,,,,,,1. Define your goals \n2. Choose a...
3,Biomedical Engineering,Problem Based Learning,Students are presented with a real-world probl...,Have students identify and explore knowledge g...,https://citl.illinois.edu/citl-101/teaching-le...,Taylor & Francis Group. (n.d.). The Suitabilit...,Four years coursework. First two years - Probl...,First Two Years: - Student-groups work on case...,variable* (As long as it facilitates group wor...,PBL was more motivating to students because of...,2.0,1.0,2.0,IC,S,1. Team Formation: - Students are divided into...
4,Computer Science & Software Engineering,Problem Based Learning & Project Based Learning,Problem based- Students are presented with a r...,Have students identify and explore knowledge g...,,"Bédard, D. (n.d.). Problem-based and Project-b...","Over the four-year curricula, there are eight ...",both programs have decided that the tutor shou...,First Year: There were 34 students in the firs...,,,,,,,


In [None]:
import pandas as pd
from datasets import Dataset


# Step 2: Define a custom dataset class
class CustomDataset:
    def __init__(self, dataframe):
        self.data = dataframe

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return {
            "Major": self.data.iloc[idx]['Major'],
            "Pedagogy Technique": self.data.iloc[idx]['Pedagogy Technique'],
            "Definition": self.data.iloc[idx]['Definition'],
            "Objective": self.data.iloc[idx]['Objective of the teaching technique'],
            "Resources": self.data.iloc[idx]['Resources'],
            "Citation": self.data.iloc[idx]['Citation'],
            "Course Structure": self.data.iloc[idx]['Course Structure'],
            "Method": self.data.iloc[idx]['Method'],
            "Class size": self.data.iloc[idx]['Class size'],
            "Findings": self.data.iloc[idx]['Findings'],
            "S-C": self.data.iloc[idx]['S-C'],
            "S-S": self.data.iloc[idx]['S-S'],
            "S-T": self.data.iloc[idx]['S-T'],
            "ICAP": self.data.iloc[idx]['ICAP'],
            "A,S": self.data.iloc[idx]['A,S'],
            "Steps to try in Sandbox": self.data.iloc[idx]['Steps to try in Sandbox']
        }

# Step 3: Create an instance of your custom dataset
dataset = CustomDataset(df)

# Example usage
print(len(dataset))  # Print number of rows in the dataset
print(dataset[0])    # Print the first row of the dataset


39
{'Major': 'Automation & Control Engineering', 'Pedagogy Technique': 'Muddiest Point Technique', 'Definition': "Students write down the muddiest point they had in a lecture on a card. The instructor collects the cards and addresses the students' questions at the end of the lecture. This helps students identify and clarify their understanding of the material.", 'Objective': nan, 'Resources': nan, 'Citation': 'Agavekar, R., Bhore, P., Kadam, H., & Moharir, M. (2023). Effective Application of One Minute Paper and Muddiest Point Technique to Enhance Students’ Active Engagement: A Case Study. Journal of Engineering Education Transformations, 36(3), 8–17.', 'Course Structure': 'Due to Covid 19 pandemic situations, the teaching learning process was conducted in hybrid mode. The lecture sessions were conducted using the online google meet platform. A padlet wall, Google form and paper slips were created for the collection of Muddiest Points and responses to One Minute Paper', 'Method': 'The 

In [None]:
dataset[0]

{'Major': 'Automation & Control Engineering',
 'Pedagogy Technique': 'Muddiest Point Technique',
 'Definition': "Students write down the muddiest point they had in a lecture on a card. The instructor collects the cards and addresses the students' questions at the end of the lecture. This helps students identify and clarify their understanding of the material.",
 'Objective': nan,
 'Resources': nan,
 'Citation': 'Agavekar, R., Bhore, P., Kadam, H., & Moharir, M. (2023). Effective Application of One Minute Paper and Muddiest Point Technique to Enhance Students’ Active Engagement: A Case Study. Journal of Engineering Education Transformations, 36(3), 8–17.',
 'Course Structure': 'Due to Covid 19 pandemic situations, the teaching learning process was conducted in hybrid mode. The lecture sessions were conducted using the online google meet platform. A padlet wall, Google form and paper slips were created for the collection of Muddiest Points and responses to One Minute Paper',
 'Method': '

#### Dataset Overview

The dataset we are using is a repository of active learning methods which have been used by faculties all around the world for their courses. Each entry in the dataset represents a "chunk" of text from these papers.

Because most **L**arge **L**anguage **M**odels (LLMs) only contain knowledge of the world as it was during training, they cannot answer our questions about Active Learning techniques specific to an Engineering Major with Proof of Work — at least not without this data.

### Task 4: Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

We begin by initializing our connection to Pinecone, this requires a [free API key](https://app.pinecone.io).

In [None]:
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.getenv("PINECONE_API_KEY") or "917f1f5c-3ee8-4a46-8a3c-645b686a334a"

# configure client
pc = Pinecone(api_key=api_key)

Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

In [None]:
from pinecone import ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-west-2"
)

Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

In [None]:
import time
from pinecone import Pinecone, PodSpec

index_name = 'llama-2-rag'
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# check if index already exists (it shouldn't if this is first time)
if index_name not in existing_indexes:
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of ada 002
        metric='dotproduct',
        spec=spec
    )

    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0003,
 'namespaces': {'': {'vector_count': 30}},
 'total_vector_count': 30}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's `text-embedding-ada-002` model — we can access it via LangChain like so:

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

  warn_deprecated(


Using this model we can create embeddings like so:

In [None]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)

From this we get two (aligning to our two chunks of text) 1536-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [None]:
from tqdm.auto import tqdm  # for progress bar

# Assuming you have already loaded your custom dataset 'dataset'

batch_size = 100

for i in tqdm(range(0, len(dataset), batch_size)):
    i_end = min(len(dataset), i + batch_size)
    # get batch of data
    batch = dataset[i:i_end]
    # generate unique ids for each chunk (assuming 'Major' and 'Pedagogy Technique' are keys in each row)
    ids = [f"{batch['Major'][i]}-{batch['Pedagogy Technique'][i]}" for i in range(len(batch['Major']))]

    # Concatenate relevant fields into a single string
    texts = [f"{batch['Major'][i]} {batch['Pedagogy Technique'][i]} {batch['Definition'][i]} \
             {batch['Objective'][i]} {batch['Resources'][i]} {batch['Citation'][i]} \
             {batch['Course Structure'][i]} {batch['Method'][i]} {batch['Class size'][i]} \
             {batch['Findings'][i]} {batch['S-C'][i]} {batch['S-S'][i]} \
             {batch['S-T'][i]} {batch['ICAP'][i]} {batch['A,S'][i]} \
             {batch['Steps to try in Sandbox'][i]}" for i in range(len(batch['Major']))]

    # embed text
    embeds = embed_model.embed_documents(texts)

    # get metadata to store in Pinecone
    metadata = [
        {'Major': str(batch['Major'][i]),  # Convert to string
        'Pedagogy Technique': str(batch['Pedagogy Technique'][i]),  # Convert to string
        'Definition': str(batch['Definition'][i]),  # Convert to string
        'Objective': str(batch['Objective'][i]),  # Convert to string
        'Resources': str(batch['Resources'][i]),  # Convert to string
        'Citation': str(batch['Citation'][i]),  # Convert to string
        'Course Structure': str(batch['Course Structure'][i]),  # Convert to string
        'Method': str(batch['Method'][i]),  # Convert to string
        'Class size': str(batch['Class size'][i]),  # Convert to string
        'Findings': str(batch['Findings'][i]),  # Convert to string
        'S-C': str(batch['S-C'][i]),  # Convert to string
        'S-S': str(batch['S-S'][i]),  # Convert to string
        'S-T': str(batch['S-T'][i]),  # Convert to string
        'ICAP': str(batch['ICAP'][i]),  # Convert to string
        'A,S': str(batch['A,S'][i]),  # Convert to string
        'Steps to try in Sandbox': str(batch['Steps to try in Sandbox'][i]),  # Convert to string
        'text': texts[i]  # Include the concatenated text
        } for i in range(len(batch['Major']))
    ]

    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))


  0%|          | 0/1 [00:00<?, ?it/s]

We can check that the vector index has been populated using `describe_index_stats` like before:

In [None]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0003,
 'namespaces': {'': {'vector_count': 30}},
 'total_vector_count': 30}

#### Retrieval Augmented Generation

We've built a fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

In [None]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

  warn_deprecated(


Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

In [None]:
query = "How can I teach Mechanical Engineering using Active Learning"

vectorstore.similarity_search(query, k=3)

[Document(page_content="Mechanical Engineering Flipped Classroom Students watch lectures or other instructional materials outside of class and then come to class to work on problems and discuss the material. This helps students learn at their own pace and get more practice solving problems.              The primary objective of the flipped classroom active learning technique is to shift the traditional education paradigm by reimagining the roles of in-class and out-of-class activities. The approach aims to enhance student engagement, critical thinking, and comprehension by reorganizing the learning process. By assigning instructional content for self-paced learning outside of class, such as through videos or readings, valuable in-class time can be dedicated to interactive and collaborative activities. This promotes deeper understanding through discussions, problem-solving, peer interactions, and immediate feedback from instructors. The flipped classroom seeks to create a more dynamic a

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [None]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

Using this we produce an augmented prompt:

In [None]:
print(augment_prompt(query))

Using the contexts below, answer the query.

    Contexts:
    Mechanical Engineering Flipped Classroom Students watch lectures or other instructional materials outside of class and then come to class to work on problems and discuss the material. This helps students learn at their own pace and get more practice solving problems.              The primary objective of the flipped classroom active learning technique is to shift the traditional education paradigm by reimagining the roles of in-class and out-of-class activities. The approach aims to enhance student engagement, critical thinking, and comprehension by reorganizing the learning process. By assigning instructional content for self-paced learning outside of class, such as through videos or readings, valuable in-class time can be dedicated to interactive and collaborative activities. This promotes deeper understanding through discussions, problem-solving, peer interactions, and immediate feedback from instructors. The flipped cla

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

To teach Mechanical Engineering using Active Learning, one effective strategy you can consider is implementing a Flipped Classroom approach. In a Flipped Classroom setting, students engage with instructional materials outside of class, such as watching pre-recorded lectures or reading materials, and then come to class to work on problems, discuss concepts, and engage in collaborative activities. This approach can enhance student engagement, critical thinking, and comprehension by reorganizing the traditional learning process.

Here is a suggested plan for implementing a Flipped Classroom approach in teaching Mechanical Engineering using Active Learning:

1. **Pre-Lecture Videos:** Record and upload pre-lecture videos covering key concepts and topics to an online platform like Blackboard. Encourage students to watch these videos before coming to class to familiarize themselves with the material.

2. **Pre-Quiz:** Provide a pre-quiz related to the pre-lecture videos to assess students' u

We can continue with more Llama 2 questions. Let's try _without_ RAG first:

In [None]:
prompt = HumanMessage(
    content="Can I apply Problem Based Learning to Biomedical Engineering? Can you give examples where it has been done?"
)

res = chat(messages + [prompt])
print(res.content)

Yes, Problem-Based Learning (PBL) can be effectively applied to Biomedical Engineering education. PBL allows students to work on real-world problems that are relevant to their field of study, promoting critical thinking, problem-solving skills, and application of knowledge. Here are a few examples where PBL has been used in Biomedical Engineering:

1. **Case-Based Learning in Biomedical Engineering**: In a study published in the International Journal of Engineering Education, researchers implemented a case-based learning approach in a Biomedical Engineering course where students worked on real-world cases related to medical device design and development. The case studies required students to apply their knowledge of engineering principles to solve complex problems in the biomedical field.

2. **Biomedical Device Design Challenge**: In another example, a university integrated a biomedical device design challenge into their Biomedical Engineering curriculum using a PBL approach. Students

The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in `messages`. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG.

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "Can I apply Problem Based Learning to Biomedical Engineering? Can you give examples where it has been done?"
    )
)

res = chat(messages + [prompt])
print(res.content)

Yes, Problem-Based Learning (PBL) can be effectively applied to Biomedical Engineering to promote critical thinking, problem-solving skills, and collaboration among students. Here are some examples where PBL has been implemented in the field of Biomedical Engineering:

1. **Research Paper Title:** "Active learning through problem-based learning in Biomedical Engineering: Enhancing student engagement and performance"
   
   **Authors:** John Smith, Emily Johnson, Sarah Lee
   
   **Description:** This study implemented a problem-based learning approach in a Biomedical Engineering course focusing on biomechanics. Students were presented with real-world biomechanical problems related to prosthetic designs and medical device development. They worked in small groups to research, analyze, and propose solutions to these problems. The study found that students in the problem-based learning group demonstrated improved critical thinking skills, problem-solving abilities, and engagement compared 

We get a much more informed response that includes several items missing in the previous non-RAG response, such as "red-teaming", "iterative evaluations", and the intention of the researchers to share this research to help "improve their safety, promoting responsible development in the field".

Delete the index to save resources:

In [None]:
# pc.delete_index(index_name)

---