https://medium.com/@glenpatzlaff/raw-json-to-measurable-rag-insights-in-a-matter-of-minutes-with-langchain-and-trulens-f36e4415b079

## Read Json Data

In [1]:
import json

def read_json_file(file_path):
    """
    Reads JSON data from a file and returns it as a Python dictionary.

    :param file_path: Path to the JSON file.
    :return: Parsed JSON data as a dictionary or list.
    """
    try:
        with open(file_path, 'r', encoding='utf-8') as json_file:
            data = json.load(json_file)
            return data
    except FileNotFoundError:
        print(f"Error: The file {file_path} was not found.")
    except json.JSONDecodeError:
        print(f"Error: The file {file_path} contains invalid JSON.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

In [5]:
data = read_json_file("notebooks_summary.json")

## Convert List of Dict to List of Text

In [8]:
texts = []

for github_notebook in data:
    local_path = github_notebook["local_path"]
    name = github_notebook["name"]
    table_of_content =  " ".join(github_notebook["table_of_content"])
    practice_exercises =  " ".join(github_notebook["practice_exercises"])
    github_link = github_notebook['github_link']

    # Prepare text for embedding
    text_to_embed = f"Python Programming Pytopia Course Repository: local_path is: {local_path} file name is: {name}, table_of_content is {table_of_content} and practice_exercises: {practice_exercises} and github_link: {github_link}"
    texts.append(text_to_embed)

## Create RAG on json Data

In [32]:
import json
from pathlib import Path
from langchain import hub
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.schema import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.runnables import RunnablePassthrough
from langchain.docstore.document import Document

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

In [31]:
# from trulens_eval import TruChain, Tru
# from trulens_eval.feedback.provider import OpenAI
# from trulens_eval import Feedback
# import numpy as np

# tru = Tru()
# tru.reset_database()

### Create system_prompt

In [44]:
system_prompt = (
    "You are an assistant for question-answering tasks related to Python programming lectures from the Pytopia Python Programming GitHub repository. "
    "Your task is to use the provided context to answer questions about these lectures succinctly and accurately. "
    "If you don't know the answer, state that you don't know."
    "\n\n"
    "Use the following JSON structure to extract relevant information:"
    "\n- Main topics covered in the lecture, found in the 'table_of_content' field."
    "\n- Practice exercises included in the lecture, found in the 'practice_exercises' field, along with their expected output."
    "\n- GitHub link to the lecture notebook, found in the 'github_link' field."
    "\n\n"
    "To assist users efficiently, structure your response as follows:"
    "\n1. Main topics covered in the lecture."
    "\n2. Practice exercises and their expected output."
    "\n3. GitHub link to the lecture notebook."
    "\n\n"
    "Include the relevant context provided below:"
    "\n{context}"
    "\n\n"
    "Use this structure to help knowledge seekers access specific lecture content and related exercises effectively."
)

In [45]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [35]:
# ## https://python.langchain.com/docs/tutorials/rag/

# question_answer_chain = create_stuff_documents_chain(llm, prompt)
# rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# response = rag_chain.invoke({"input": "What is Task Decomposition?"})
# print(response["answer"])

In [46]:
documents = [Document(page_content=text) for text in texts]

# text_splitter = RecursiveCharacterTextSplitter(
#     chunk_size=1000,
#     chunk_overlap=200
# )

# splits = text_splitter.split_documents(documents)

vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=OpenAIEmbeddings()
)

retriever = vectorstore.as_retriever()

In [47]:
# prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [48]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [49]:
from langchain_core.prompts import PromptTemplate

template = """

You are an assistant for question-answering tasks related to Python programming lectures from the Pytopia Python Programming GitHub repository.
Your task is to use the provided context to answer questions about these lectures succinctly and accurately.
    If you don't know the answer, state that you don't know.
    \n\n
    Use the following JSON structure to extract relevant information:
    \n- Main topics covered in the lecture, found in the 'table_of_content' field.
    \n- Practice exercises included in the lecture, found in the 'practice_exercises' field, along with their expected output.
    \n- GitHublink to the lecture notebook, found in the 'github_link' field.
    \n\n
    To assist users efficiently, structure your response as follows:
    \n1. Main topics covered in the lecture.
    \n2. Practice exercises and their expected output.
    \n3. GitHub link to the lecture notebook.
    \n\n
    Include the relevant context provided below:
    \n{context}
    \n\n
    Use this structure to help knowledge seekers access specific lecture content and related exercises effectively.

"""


custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)



In [51]:
print(rag_chain.invoke("i want Learn Argument Dictionary Unpacking?"))

1. Main topics covered in the lecture:
   - Packing into tuples
   - Unpacking from tuples
   - Unpacking from lists
   - Unpacking dictionaries
   - Extended unpacking with lists
   - Bonus: Swapping values

2. Practice exercises and their expected output:
   - Packing into Tuples:
     - Create a tuple called `student_info` by packing the variables `name`, `age`, `grade`, and `subject`.
   - Unpacking from Tuples:
     - Given a tuple `coordinates`, unpack the values into variables `x`, `y`, and `z`.
   - Unpacking from Lists:
     - Unpack a list of scores into variables `math_score`, `science_score`, and `english_score`.
   - Unpacking Dictionaries:
     - Unpack values from a dictionary `student` into variables `student_name`, `student_age`, and `student_email`.
   - Extended Unpacking with Lists:
     - Use extended unpacking to separate numbers into variables and a list.

3. GitHub link to the lecture notebook:
   - GitHub link: [Variable-Length Argument Lecture](https://github.

In [52]:
retriever.invoke("i want Learn Argument Dictionary Unpacking?")

[Document(metadata={}, page_content='Development Basics". 4. Use argument dictionary unpacking to call `session_details` with details stored in a dictionary for the session "Web Development Basics". 5. Combine both packing and unpacking techniques to organize a session named "Advanced Python", which includes unpacking a list of topics and unpacking session details from a dictionary. *Expected Output:** ```sh Topics planned for this session: [\'Python Basics\', \'Data Types Fundamentals\', \'Functions\'] Session Details: Name: Python Development Basics Attendees: 40 Room: 105 Topics planned for this session: [\'Object Oriented Programming\', \'Modules\'] Session Details: Name: Advanced Python Attendees: 25 Room: 203 ``` This exercise will help you practice the use of variable-length argument handling in Python, including both packing and unpacking techniques, in the context of a practical scenario. By completing these tasks, you will gain a deeper understanding of how to utilize these f

In [38]:
def format_docs(documents):
    return "\n\n".join(doc.page_content for doc in documents)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

result = rag_chain.invoke("i want Learn Argument Dictionary Unpacking?")
print(result)

KeyError: "Input to ChatPromptTemplate is missing variables {'input'}.  Expected: ['context', 'input'] Received: ['context', 'question']\nNote: if you intended {input} to be part of the string and not a variable, please escape it with double curly braces like: '{{input}}'."

In [27]:
retriever.invoke("Argument Dictionary Unpacking?")

[Document(metadata={}, page_content='Development Basics". 4. Use argument dictionary unpacking to call `session_details` with details stored in a dictionary for the session "Web Development Basics". 5. Combine both packing and unpacking techniques to organize a session named "Advanced Python", which includes unpacking a list of topics and unpacking session details from a dictionary. *Expected Output:** ```sh Topics planned for this session: [\'Python Basics\', \'Data Types Fundamentals\', \'Functions\'] Session Details: Name: Python Development Basics Attendees: 40 Room: 105 Topics planned for this session: [\'Object Oriented Programming\', \'Modules\'] Session Details: Name: Advanced Python Attendees: 25 Room: 203 ``` This exercise will help you practice the use of variable-length argument handling in Python, including both packing and unpacking techniques, in the context of a practical scenario. By completing these tasks, you will gain a deeper understanding of how to utilize these f

Pytopia is a data science community that helps students get internships and jobs in the field of data science.
Pytopia have GitHub Repository:
📍 [@pytopia] (https://github.com/orgs/pytopia/repositories)

List of pytopia Machine Learning Bootcamp courses:
- [Python-Programming](https://github.com/pytopia/Python-Programming)



Prompt:
"Given the JSON data about a lecture on variable-length arguments in Python, please extract and summarize the following information:
The name of the lecture.
The key topics covered in the table of contents.
The practice exercise details, including the task and expected output.
The link to the GitHub repository for further reference.
Please format the response clearly and concisely."


Give me the detailed instructions for the "Practice Exercise: Organizing a Coding Workshop" from the provided JSON data. Include the task description and expected output.  If there are multiple instances of "Practice Exercise: Organizing a Coding Workshop", combine the instructions from all instances.




Prompt:
"Provide information about the lecture on Variable-Length Argument in Python programming. What are the main topics covered in the lecture? What practice exercises are included, and what is the expected output of the exercise? Also, provide the GitHub link to the lecture notebook."
Expected Response:
The response should include the following information:
Main topics covered in the lecture (from the "table_of_content" field)
Practice exercises included in the lecture (from the "practice_exercises" field)
Expected output of the exercise (from the "practice_exercises" field)
GitHub link to the lecture notebook (from the "github_link" field)

You are provided with details of a Jupyter Notebook from the Pytopia Python Programming GitHub repository. The task is to extract relevant data based on the following JSON structure:

Provide information about the lecture on {User Query} in Python programming. What are the main topics covered in the lecture? What practice exercises are included, and what is the expected output of the exercise? Also, provide the GitHub link to the lecture notebook.

Expected Response:
The response should include the following information:
Main topics covered in the lecture (from the "table_of_content" field)
Practice exercises included in the lecture (from the "practice_exercises" field)
Expected output of the exercise (from the "practice_exercises" field)
GitHub link to the lecture notebook (from the "github_link" field)

Use this structure to assist knowledge seekers in accessing specific lecture content and related exercises efficiently.