In [40]:
# RagPipeline

In [41]:
text = """
Dear Sir/Ma’am,
Thank you for agreeing to provide a technical review of our book “Ultimate Python coding”.
This letter of understanding summarizes our agreement.
As a thank you, we will recognize your contribution by
1. Crediting you as a reviewer in relevant published materials.
2. Sending you a complimentary print copy of the work
We request that you help us promote the book by writing an honest review, and
acknowledging your involvement in the project, on Amazon.com.
In exchange, we request that, for each chapter, you review the materials for accuracy,
relevance, and clarity by inserting comments directly into the documents. There may be
multiple iterations of certain chapters, based on your feedback.
While performing the review, we ask that you:
1. Adhere to the deadlines agreed upon with us
2. Insert detailed, self-explanatory comments (adopt an evidence-based approach)
3. Check for content accuracy, relevance, flow, gaps
4. Provide constructive, practical solutions, where possible
5. Test all instructions and code snippets to ensure they work as described and that the
instructions themselves are clear and direct. Provide screenshots of the final output,
and if the codes don’t work expected, please explain the issue and, if possible, help to
diagnose the problem and recommend a solution.
6. Check that the Q&A questions are accurate and can be answered using the
information in the chapter.
7. Review and sign off all content shared for review within 3 working days, to ensure
that the technical accuracy of the final product meets industry standards and best
practices.
8. Write a comprehensive Summary Comment for each chapter.
9. You need to also ensure that the unpublished manuscript is not accidentally/or
otherwise shared with anyone online or outside the book project team.
10. The Reviewer agrees to enter into a Non-Disclosure agreement till the time of
completing the project as mentioned in the contract or as decided by the publisher.
Reviewer will not share the work, draft manuscript, project idea or details of the
associated members on the project till the work is published with any member who is
not part of the publishing company Green Education Private Limited.
Please sign and date this letter to indicate that you’ve read and understood our terms and
conditions. 
"""

In [42]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [43]:
splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=2)
chunks = splitter.split_text(text)


In [44]:
print(chunks)

['Dear Sir/Ma’am,', 'Thank you for agreeing to provide a technical review of our book “Ultimate Python coding”.', 'This letter of understanding summarizes our agreement.', 'As a thank you, we will recognize your contribution by', '1. Crediting you as a reviewer in relevant published materials.', '2. Sending you a complimentary print copy of the work', 'We request that you help us promote the book by writing an honest review, and', 'acknowledging your involvement in the project, on Amazon.com.', 'In exchange, we request that, for each chapter, you review the materials for accuracy,', 'relevance, and clarity by inserting comments directly into the documents. There may be', 'multiple iterations of certain chapters, based on your feedback.', 'While performing the review, we ask that you:\n1. Adhere to the deadlines agreed upon with us', '2. Insert detailed, self-explanatory comments (adopt an evidence-based approach)', '3. Check for content accuracy, relevance, flow, gaps', '4. Provide con

In [45]:
from langchain_huggingface import HuggingFaceEmbeddings

# Generate embeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
embeddings = embedding_model.embed_documents(chunks)
print(embeddings)

[[-0.05540827661752701, 0.07559618353843689, 0.0975860059261322, 0.04188172146677971, -0.039447810500860214, -0.02143116295337677, -0.00760697852820158, -0.02748766914010048, -0.011333233676850796, 0.02190558612346649, -0.013582264073193073, 0.016349826008081436, -0.014411470852792263, -0.006676916498690844, -0.017486609518527985, 0.07693850249052048, 0.00856776349246502, 0.0008562536677345634, 0.05641438439488411, 0.05826399847865105, 0.05196540430188179, 0.08878729492425919, 0.045098140835762024, 0.00898262020200491, -0.0063131581991910934, 0.00017850003496278077, 0.022415386512875557, 0.06809976696968079, -0.04818058758974075, -0.07291928678750992, -0.011996938847005367, 0.028995972126722336, 0.04332311823964119, 0.027536163106560707, -0.01835469901561737, 0.019984710961580276, 0.03117861971259117, 0.01845264434814453, -0.019602905958890915, -0.05860014632344246, 0.024575671181082726, -0.04852823168039322, 0.08012628555297852, 0.024184852838516235, 0.06514876335859299, 0.02059236541

In [46]:
query = "what is timeline for Review?"

query_embedding = embedding_model.embed_query(query)
print(query_embedding)

[-0.05564823001623154, 0.03869331628084183, -0.09034311771392822, 0.025693880394101143, 0.04392337054014206, 0.07541516423225403, -0.09817449748516083, -0.03626177832484245, 0.010127649642527103, 0.03992212936282158, 0.04512123018503189, 0.049422893673181534, -0.043377459049224854, 0.002584231784567237, -0.12785254418849945, -0.019978443160653114, 0.015846362337470055, -0.09075449407100677, 0.006056888960301876, -0.04701652750372887, 0.02296643704175949, -0.02482428401708603, 0.09329786896705627, -0.0031301002018153667, -0.008696470409631729, -0.034397151321172714, -0.07525302469730377, -0.011912048794329166, 0.009524442255496979, -0.024052729830145836, 0.03453387692570686, 0.010828956961631775, 0.04761278256773949, 0.026876555755734444, -0.0588180273771286, 0.01039301510900259, 0.045074619352817535, -0.011713349260389805, 0.03481553867459297, -0.007158050313591957, -0.058670371770858765, -0.0031027239747345448, 0.02332548052072525, 0.008413518778979778, 0.06542202085256577, -0.0361348

In [47]:
import numpy as np

# Reshape query embedding to 2D array
query_embedding = np.array(query_embedding).reshape(1, -1)

In [48]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute similarity between query and all KB vectors
similarity_scores = cosine_similarity(query_embedding, embeddings)
print(similarity_scores)
# Sort based on similarity scores (highest to lowest)
# top_k_indices = similarity_scores[0].argsort()[::-1]  

# print(top_k_indices)

[[ 0.03266935  0.13393293  0.15201281  0.1086706   0.40118347  0.22066552
   0.37674001  0.22177393  0.37278278  0.20967309  0.30508238  0.57256416
   0.17492386  0.31051151  0.10461181  0.09613794  0.07856338 -0.00109648
   0.10392228  0.20007482  0.24017078  0.42282005  0.23736496  0.26912292
   0.2545294   0.09558153  0.48765342  0.25978767  0.33585961  0.14699468
   0.07840793  0.22755124  0.19748078]]


In [49]:
# Get the top 3 most relevant chunks
top_k_indices = similarity_scores[0].argsort()[::-1]

# Print the top matching chunks
top_k = 3
context = ""

for i in top_k_indices[:top_k]:
    print(f"Score: {similarity_scores[0][i]:.4f} -> {chunks[i]}")
    
    # Correctly append the text chunks to the context
    context += chunks[i] + " "

print(context)

Score: 0.5726 -> While performing the review, we ask that you:
1. Adhere to the deadlines agreed upon with us
Score: 0.4877 -> 10. The Reviewer agrees to enter into a Non-Disclosure agreement till the time of
Score: 0.4228 -> 7. Review and sign off all content shared for review within 3 working days, to ensure
While performing the review, we ask that you:
1. Adhere to the deadlines agreed upon with us 10. The Reviewer agrees to enter into a Non-Disclosure agreement till the time of 7. Review and sign off all content shared for review within 3 working days, to ensure 


In [50]:

final_prompt = f"Context: {context} \n\nQuery: {query} \n\nAnswer:"
print(final_prompt)

Context: While performing the review, we ask that you:
1. Adhere to the deadlines agreed upon with us 10. The Reviewer agrees to enter into a Non-Disclosure agreement till the time of 7. Review and sign off all content shared for review within 3 working days, to ensure  

Query: what is timeline for Review? 

Answer:


In [73]:
import requests
import json
response = requests.post(
    "http://localhost:11434/api/chat",
    json={
        "model": "deepseek-r1:1.5b",
        "messages": [
            {"role": "user", "content": final_prompt}
        ]
    },

)

# Handle streaming response
for chunk in response.iter_lines():
    if chunk:
        data = json.loads(chunk.decode('utf-8'))
        message = data.get('message', {}).get('content', '')
        if message:
            print(message, end='', flush=True)     # Pretty print

<think>
Okay, so I need to figure out the timeline for the Review based on the context provided. Let me read through it again carefully.

The query says, "what is timeline for Review?" and there's a previous answer that mentions several deadlines. It starts with Adherence to agreed deadlines (10), then 7 days for the review itself. 

Looking at the structure: "While performing the review, we ask that you: 1. Adhere to the deadlines agreed upon with us 10. The Reviewer agrees to enter into a Non-Disclosure agreement till the time of 7." So it's saying the deadline is for the entire reviewing process.

In the previous answer, they break down each step:

- Adherence to deadlines: 10
- Review itself: 7 days

So maybe the overall timeline is 3 days? Or perhaps there are additional steps beyond just adhering to those deadlines. It's possible that after meeting all the necessary deadlines, the reviewer has 3 working days to sign off all content.

I should make sure I'm interpreting this corre

In [73]:
import requests
import json
response = requests.post(
    "http://localhost:11434/api/chat",
    json={
        "model": "deepseek-r1:1.5b",
        "messages": [
            {"role": "user", "content": final_prompt}
        ]
    },

)

# Handle streaming response
for chunk in response.iter_lines():
    if chunk:
        data = json.loads(chunk.decode('utf-8'))
        message = data.get('message', {}).get('content', '')
        if message:
            print(message, end='', flush=True)     # Pretty print

<think>
Okay, so I need to figure out the timeline for the Review based on the context provided. Let me read through it again carefully.

The query says, "what is timeline for Review?" and there's a previous answer that mentions several deadlines. It starts with Adherence to agreed deadlines (10), then 7 days for the review itself. 

Looking at the structure: "While performing the review, we ask that you: 1. Adhere to the deadlines agreed upon with us 10. The Reviewer agrees to enter into a Non-Disclosure agreement till the time of 7." So it's saying the deadline is for the entire reviewing process.

In the previous answer, they break down each step:

- Adherence to deadlines: 10
- Review itself: 7 days

So maybe the overall timeline is 3 days? Or perhaps there are additional steps beyond just adhering to those deadlines. It's possible that after meeting all the necessary deadlines, the reviewer has 3 working days to sign off all content.

I should make sure I'm interpreting this corre