### Rag Using Langchain

In [82]:
import json
import pandas as pd
import re
import torch
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.llms import Ollama
from langchain.vectorstores import FAISS
from langchain.schema import Document
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
import numpy as np
import faiss
from langchain.docstore import InMemoryDocstore
import json
from langchain_community.vectorstores import FAISS

In [83]:
top_k = 5
LLM_Model = Ollama(model='qwen3:8b', temperature=0.3, num_ctx=4096)

In [84]:
def clean_text(text):
    """Clean text data"""
    text = str(text)
    text = re.sub(r'\n{3,}', '\n\n', text)
    text = re.sub(r'\s+', ' ', text)
    return text.strip()



In [85]:
def load_and_preprocess_data(file_paths):
    """Load and preprocess data from multiple JSON files."""
    all_clean_texts = []
    for file_path in file_paths:
        with open(file_path, 'r') as f:
            raw_data = json.load(f)
        clean_texts = [clean_text(entry) for entry in raw_data if isinstance(entry, str)]
        all_clean_texts.extend(clean_texts) 

    combined_text = "\n".join(all_clean_texts)
    return Document(page_content=combined_text, metadata={"user_id": '001', "file_id": '01'})


document = load_and_preprocess_data(["Market Research Report_extracted_text.json", 'PMS Market Research_extracted_text.json'])


text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=20, 
    length_function=lambda x: len(x.split()),
    separators=["\n\n\n", "\n\n", "\n", ". ", "! ", "? ", "; ", ", ", " ", ""],
    keep_separator=False,
    add_start_index=True,
    strip_whitespace=True
)
chunks = text_splitter.split_text(document.page_content)
print(f"Number of chunks: {len(chunks)}")

Number of chunks: 26


In [86]:
chunks[-2]

'Renewable and Solar -Focused Platforms During our research, we also explored platforms that are specifically designed for the renewable energy sector and others that are solar PV -focused . At first, they seemed promising because they’re built with energy systems in mind. However, after testing and reviewing them, we found that they don’t meet the type of project management needs we’re aiming for , although they market them selves as they have project management tools. Platforms like Ra Power Management (RaPM) , SenseHawk, and Payac a are examples of solar - focused systems. These tools are mainly designed for monitoring the performance of solar plants — such as tracking electricity production, system health, faults, and maintenance alerts. While they’re excellent for operations and post -installation monitoring , they do not support document approvals, workflows, submittals, or collaboration between stakeholde rs like contractors, consultants, and clients. In short, these are more li

In [87]:
document

Document(metadata={'user_id': '001', 'file_id': '01'}, page_content='MARKET RESEARCH REPORT: ANALYSIS OF DOCUMENT TRANSLATION TOOLS Evaluating Leading Solutions for Multilingual Document Translation Mah inour Mohammad\nIntroduction This market research report analyzes competitors offering document translation tools that support PDF, Word, Excel, and scanned images while preserving layout and formatting. The focus is on tools that handle Arabic, French, and English languages, catering to both B2B and B2C markets. The key features evaluated include layout preservation, Arabic support and quality, translation accuracy and speed, pricing model, and Optical Character Recognition (OCR) support. To assess these tools, a series of test cases were conducted for each language, including: 1. Text -based documents: Evaluating basic translation accuracy, layout preservation, handling of number lists, bullet points, and right -to-left (RTL) and left -to-right (LTR) conversions. 2. Scanned documents:

In [88]:
chunks[:2]

["MARKET RESEARCH REPORT: ANALYSIS OF DOCUMENT TRANSLATION TOOLS Evaluating Leading Solutions for Multilingual Document Translation Mah inour Mohammad\nIntroduction This market research report analyzes competitors offering document translation tools that support PDF, Word, Excel, and scanned images while preserving layout and formatting. The focus is on tools that handle Arabic, French, and English languages, catering to both B2B and B2C markets. The key features evaluated include layout preservation, Arabic support and quality, translation accuracy and speed, pricing model, and Optical Character Recognition (OCR) support. To assess these tools, a series of test cases were conducted for each language, including: 1. Text -based documents: Evaluating basic translation accuracy, layout preservation, handling of number lists, bullet points, and right -to-left (RTL) and left -to-right (LTR) conversions. 2. Scanned documents: Testing OCR performance, particularly for Arabic, and preservation

In [89]:
device = 'mps' if torch.mps.is_available() else 'cpu'
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", model_kwargs={'device':device})


def batch_embed(texts, batch_size=64):
    embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        batch_embeddings = embedding_model.embed_documents(batch)
        embeddings.extend(batch_embeddings)
    return np.array(embeddings, dtype=np.float32)

def create_vectorstore( chunks):
    """Create vector store from chunks."""
    docs = [
            Document(page_content=chunk, metadata={"user_id": '001', "file_id": '01'})
            for chunk in chunks
        ]
        
    texts = [doc.page_content for doc in docs]
    embeddings = batch_embed(texts)
    dimension = embeddings.shape[1]
    index = faiss.IndexFlatIP(dimension)  
    faiss.normalize_L2(embeddings)
    index.add(embeddings)
        
    docstore = InMemoryDocstore({str(i): doc for i, doc in enumerate(docs)})
    index_to_docstore_id = {i: str(i) for i in range(len(docs))}
        
    vectorstore = FAISS(
            embedding_function=embedding_model.embed_query,
            index=index,
            docstore=docstore,
            index_to_docstore_id=index_to_docstore_id
        )
        
  
    return vectorstore

In [90]:
vector_store  = create_vectorstore(chunks)

`embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.


In [91]:
def save_vectorstore(vector_store, directory_path):
    """Save vector store to a directory."""
    vector_store.save_local(directory_path)

def load_vectorstore(directory_path, embeddings):
    """Load vector store from a directory."""
    return FAISS.load_local(directory_path, embeddings)


In [92]:
def format_docs(docs):
    """Format documents for context"""
    return "\n\n".join(f"[Source {i}]: {doc.page_content}" for i, doc in enumerate(docs, 1))


def qa_chain():
    """Setup QA chain with custom prompts"""
    retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": top_k}
    )

    print("RETRIEVER is :", retriever)
    prompt_template = """You are a helpful assistant. Analyze the context and provide a structured response.

    Context:
    {context}

    Question: {question}

    Please provide your response in exactly this format:

    RESPONSE:
    [Your direct, concise answer to the question]

    REASONING:
    [Brief explanation of how you arrived at this answer using the sources]

    SOURCES:
    [List the source numbers that support your answer, e.g., 1, 2, 3]

    Important: Do not include any <think> tags or internal reasoning. Be direct and concise."""

    PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
    )

    qa_chain = (
    {
    "context": retriever | format_docs,
    "question": RunnablePassthrough()
    }
    | PROMPT
    | LLM_Model
    | StrOutputParser()
    )

    return qa_chain

In [93]:
def parse_structured_response(response_text):
    """Parse the structured response"""
    cleaned_response = re.sub(r'<think>.*?</think>', '', response_text, flags=re.DOTALL)
    cleaned_response = re.sub(r'<[^>]+>', '', cleaned_response)
    cleaned_response = re.sub(r'\n\s*\n', '\n\n', cleaned_response.strip())
        
    sections = {'response': '', 'reasoning': '', 'sources': ''}
    current_section = None
    current_content = []
        
    lines = cleaned_response.split('\n')
        
    for line in lines:
        line = line.strip()
            
        if line.upper().startswith('RESPONSE:'):
            if current_section:
                sections[current_section] = '\n'.join(current_content).strip()
            current_section = 'response'
            current_content = [line[9:].strip()]
                
        elif line.upper().startswith('REASONING:'):
            if current_section:
                sections[current_section] = '\n'.join(current_content).strip()
            current_section = 'reasoning'
            current_content = [line[10:].strip()]
                
        elif line.upper().startswith('SOURCES:'):
            if current_section:
                sections[current_section] = '\n'.join(current_content).strip()
            current_section = 'sources'
            current_content = [line[8:].strip()]
                
        elif current_section and line:
            current_content.append(line)
        
    if current_section:
        sections[current_section] = '\n'.join(current_content).strip()
        
    source_ids = []
    if sections['sources']:
        source_text = sections['sources']
        source_ids = [int(x) for x in re.findall(r'\d+', source_text)]
        
    return {
            'answer': sections['response'],
            'reasoning': sections['reasoning'],
            'sources': source_ids,
            'raw_response': cleaned_response
        }
    

In [94]:
def ask_question(question, return_sources=True):
    chain = qa_chain() 
    response = chain.invoke(question)
    parsed_response = parse_structured_response(response)

    if return_sources:
        retriever = vector_store.as_retriever(
            search_type="similarity",
            search_kwargs={"k": top_k}
        )
        source_docs = retriever.get_relevant_documents(question)
        parsed_response['answer'] = parsed_response['answer'].strip()
        parsed_response['reasoning'] = parsed_response['reasoning'].strip()
        parsed_response['sources'] = [int(x) for x in parsed_response['sources']]
        parsed_response['source_documents'] = source_docs
        parsed_response['source_texts'] = [doc.page_content for doc in source_docs]

    return parsed_response





In [95]:
vector_store = create_vectorstore(chunks)
response = ask_question("What are the key findings from the market research report?")
print(response['answer'])
print("\nReasoning:")
print(response['reasoning'])
print("\nSources:")
print(response['sources'])

`embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.


RETRIEVER is : tags=['FAISS'] vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x3761e3700> search_kwargs={'k': 5}
The key findings include no single tool fully meets all requirements for accurate, efficient, and cost-effective document translation across Arabic, French, and English, particularly for layout preservation and OCR in scanned documents. Recommended features include PDF editing, annotations, split PDF, and improved OCR/RTL handling.

Reasoning:
Source 2 highlights that no tool satisfies all translation needs for layout, OCR, and multilingual support. Source 4 outlines the evaluation criteria and test cases, emphasizing gaps in OCR, layout preservation, and RTL handling. Source 5 adds context on required features like PDF editing and annotations.

Sources:
[2, 4, 5]


In [96]:
ask_question(" If you had to build a hybrid solution using two platforms from the research, which combination would you choose for a $50M solar project, \
                      and how would you handle the integration challenges, particularly around \
                      the 12 core features identified in the research?")

RETRIEVER is : tags=['FAISS'] vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x3761e3700> search_kwargs={'k': 5}


{'answer': 'PMWeb and Aconex would be the optimal combination for a $50M solar project.',
 'reasoning': 'PMWeb excels in cost control, scheduling, contract management, and workflow automation (Sources 1, 3, 5), while Aconex provides robust document management, collaboration, and structured workflows (Sources 1, 4). Together, they cover all 12 core features: PMWeb handles financial oversight, bids, and role-based workflows, while Aconex manages document approval, versioning, and cross-organization collaboration. Integration would require APIs/middleware for real-time data sync, ensuring seamless version control, submittal tracking, and financial reporting.',
 'sources': [1, 2, 3, 4, 5],
 'raw_response': 'RESPONSE:  \nPMWeb and Aconex would be the optimal combination for a $50M solar project.  \n\nREASONING:  \nPMWeb excels in cost control, scheduling, contract management, and workflow automation (Sources 1, 3, 5), while Aconex provides robust document management, collaboration, and stru

### Summerize using map reduce

In [100]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# ENHANCED MAP PROMPT - More detailed and specific
map_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert analyst tasked with creating comprehensive summaries. Your goal is to extract and preserve all important information from document sections.

Create a detailed summary of the following text that includes:

1. **Key Points & Findings**: All main ideas, conclusions, and important statements
2. **Specific Data**: Numbers, statistics, percentages, dates, and quantitative information
3. **Technical Details**: Terminology, processes, methods, and specifications
4. **Comparisons & Analysis**: Any comparisons, evaluations, or analytical insights
5. **Actionable Information**: Recommendations, next steps, or practical implications
6. **Context**: Background information and relationships between concepts

Guidelines:
- Maintain the logical structure and flow of information
- Preserve technical accuracy and specific terminology
- Include concrete examples and case studies mentioned
- Keep important names, tools, platforms, and entities
- Don't lose nuanced details or qualifications
- Aim for completeness while being well-organized

Text to summarize:
{context}

Comprehensive Summary:""")
])

# ENHANCED REDUCE PROMPT - Better synthesis instructions
reduce_template = """You are tasked with creating a masterful final summary by synthesizing multiple detailed section summaries from a comprehensive document.

Section Summaries to Synthesize:
{docs}

Create a cohesive, comprehensive final summary that:

**Structure & Organization:**
- Organize information into logical themes and categories
- Create clear sections with descriptive headings where appropriate
- Maintain a natural flow from general concepts to specific details

**Content Integration:**
- Merge related information from different sections seamlessly
- Eliminate redundancy while preserving all unique insights
- Synthesize complementary information to create deeper understanding
- Resolve any apparent contradictions by providing context

**Preservation of Detail:**
- Retain all important data, statistics, and quantitative information
- Keep specific examples, case studies, and concrete details
- Maintain technical accuracy and specialized terminology
- Preserve comparisons, evaluations, and analytical insights

**Completeness & Clarity:**
- Ensure the summary stands alone and provides complete understanding
- Include all major themes, findings, and conclusions
- Highlight key recommendations and actionable insights
- Maintain the depth and richness of the original content

**Final Requirements:**
- The summary should be comprehensive yet well-organized
- Focus on creating maximum value for readers
- Ensure no critical information is lost in the synthesis process

Final Comprehensive Summary:"""

reduce_prompt = ChatPromptTemplate.from_messages([("human", reduce_template)])

# Initialize LLM_Model and output parser
output_parser = StrOutputParser()

map_chain = map_prompt | LLM_Model | output_parser
reduce_chain = reduce_prompt | LLM_Model | output_parser

# ENHANCED MAP STEP with better tracking
def enhanced_map_step(chunks):
    """Generate individual summaries for each chunk with enhanced tracking"""
    summaries = []
    total_chunks = len(chunks)
    
    print(f"🔄 Starting MAP phase: Processing {total_chunks} chunks...")
    
    for i, chunk in enumerate(chunks):
        print(f"   📝 Processing chunk {i+1}/{total_chunks} ({len(chunk)} chars)")
        
        try:
            summary = map_chain.invoke({"context": chunk})
            summaries.append(summary)
            print(f"   ✅ Chunk {i+1} summarized ({len(summary)} chars)")
            
        except Exception as e:
            print(f"   ❌ Error processing chunk {i+1}: {e}")
            # Add a fallback summary
            fallback = f"Error processing chunk {i+1}. Original content (truncated): {chunk[:500]}..."
            summaries.append(fallback)
    
    print(f"✅ MAP phase complete: {len(summaries)} summaries generated")
    return summaries

# ENHANCED REDUCE STEP with better formatting
def enhanced_reduce_step(summaries):
    """Combine all summaries into final consolidated summary with better formatting"""
    
    print(f"🔄 Starting REDUCE phase: Combining {len(summaries)} summaries...")
    
    # Create well-formatted combined summaries
    combined_summaries = "\n\n".join([
        f"=== SECTION {i+1} SUMMARY ===\n{summary}" 
        for i, summary in enumerate(summaries)
    ])
    
    print(f"   📊 Combined summaries: {len(combined_summaries)} characters")
    
    try:
        # Generate final summary
        final_summary = reduce_chain.invoke({"docs": combined_summaries})
        print(f"✅ REDUCE phase complete: Final summary generated ({len(final_summary)} chars)")
        return final_summary
        
    except Exception as e:
        print(f"❌ Error in REDUCE phase: {e}")
        # Return combined summaries as fallback
        return f"Error generating final summary. Individual summaries:\n\n{combined_summaries}"

# COMPLETE ENHANCED WORKFLOW
def enhanced_map_reduce_summarization(chunks, verbose=True):
    """Complete enhanced map-reduce workflow"""
    
    if verbose:
        print(f"🚀 Starting Enhanced Map-Reduce Summarization")
        print(f"📊 Input: {len(chunks)} chunks")
        print(f"📊 Total characters: {sum(len(chunk) for chunk in chunks):,}")
        print("-" * 60)
    
    # MAP PHASE
    chunk_summaries = enhanced_map_step(chunks)
    
    if verbose:
        print("-" * 60)
    
    # REDUCE PHASE  
    final_summary = enhanced_reduce_step(chunk_summaries)
    
    if verbose:
        print("-" * 60)
        print("🎉 Enhanced Map-Reduce Summarization Complete!")
    
    return {
        "final_summary": final_summary,
        "chunk_summaries": chunk_summaries,
        "chunk_count": len(chunks),
        "method": "enhanced_map_reduce"
    }

# OPTIONAL: Domain-specific prompt variants
def create_domain_specific_map_prompt(domain="general"):
    """Create domain-specific map prompts for better results"""
    
    domain_instructions = {
        "technical": """Focus particularly on:
- Technical specifications, methodologies, and processes
- Performance metrics, benchmarks, and measurements
- System architectures, tools, and technologies
- Implementation details and technical requirements""",
        
        "business": """Focus particularly on:
- Business strategies, market analysis, and competitive insights
- Financial data, costs, pricing, and ROI information
- Operational processes, workflows, and efficiency metrics
- Strategic recommendations and business implications""",
        
        "research": """Focus particularly on:
- Research methodologies, study designs, and experimental procedures
- Key findings, results, and statistical significance
- Literature references, citations, and theoretical frameworks
- Implications for future research and practical applications""",
        
        "general": """Focus on capturing all important information comprehensively"""
    }
    
    base_prompt = """You are an expert analyst tasked with creating comprehensive summaries. Your goal is to extract and preserve all important information from document sections.

Create a detailed summary of the following text that includes:

1. **Key Points & Findings**: All main ideas, conclusions, and important statements
2. **Specific Data**: Numbers, statistics, percentages, dates, and quantitative information
3. **Technical Details**: Terminology, processes, methods, and specifications
4. **Comparisons & Analysis**: Any comparisons, evaluations, or analytical insights
5. **Actionable Information**: Recommendations, next steps, or practical implications
6. **Context**: Background information and relationships between concepts

{domain_focus}

Guidelines:
- Maintain the logical structure and flow of information
- Preserve technical accuracy and specific terminology
- Include concrete examples and case studies mentioned
- Keep important names, tools, platforms, and entities
- Don't lose nuanced details or qualifications
- Aim for completeness while being well-organized

Text to summarize:
{{context}}

Comprehensive Summary:"""

    prompt_text = base_prompt.format(domain_focus=domain_instructions.get(domain, domain_instructions["general"]))
    
    return ChatPromptTemplate.from_messages([("system", prompt_text)])

# USAGE EXAMPLES:

# 1. Basic enhanced usage (drop-in replacement for your existing code)
def run_basic_enhanced():
    # Your existing chunks
    summaries = enhanced_map_step(chunks)
    final_summary = enhanced_reduce_step(summaries)
    return final_summary

# 2. Complete workflow with tracking
def run_complete_workflow():
    result = enhanced_map_reduce_summarization(chunks, verbose=True)
    print("\nFINAL SUMMARY:")
    print("=" * 80)
    print(result["final_summary"])
    return result

# 3. Domain-specific version
def run_domain_specific(domain="technical"):
    # Create domain-specific map chain
    domain_map_prompt = create_domain_specific_map_prompt(domain)
    domain_map_chain = domain_map_prompt | LLM_Model | output_parser
    
    # Use domain-specific map step
    summaries = []
    for i, chunk in enumerate(chunks):
        print(f"Processing chunk {i+1}/{len(chunks)} with {domain} focus")
        summary = domain_map_chain.invoke({"context": chunk})
        summaries.append(summary)
    
    # Use enhanced reduce step
    final_summary = enhanced_reduce_step(summaries)
    return final_summary



In [101]:
# Replace your existing functions with:
summaries = enhanced_map_step(chunks)
final_summary = enhanced_reduce_step(summaries)

# Or use the complete workflow:
result = enhanced_map_reduce_summarization(chunks)
print(result["final_summary"])

🔄 Starting MAP phase: Processing 26 chunks...
   📝 Processing chunk 1/26 (1850 chars)
   ✅ Chunk 1 summarized (9709 chars)
   📝 Processing chunk 2/26 (2118 chars)
   ✅ Chunk 2 summarized (7759 chars)
   📝 Processing chunk 3/26 (3450 chars)
   ✅ Chunk 3 summarized (8654 chars)
   📝 Processing chunk 4/26 (647 chars)
   ✅ Chunk 4 summarized (7223 chars)
   📝 Processing chunk 5/26 (2265 chars)
   ✅ Chunk 5 summarized (6831 chars)
   📝 Processing chunk 6/26 (2356 chars)
   ✅ Chunk 6 summarized (9543 chars)
   📝 Processing chunk 7/26 (2312 chars)
   ✅ Chunk 7 summarized (7744 chars)
   📝 Processing chunk 8/26 (1852 chars)
   ✅ Chunk 8 summarized (7197 chars)
   📝 Processing chunk 9/26 (2518 chars)
   ✅ Chunk 9 summarized (7883 chars)
   📝 Processing chunk 10/26 (2641 chars)
   ✅ Chunk 10 summarized (7931 chars)
   📝 Processing chunk 11/26 (2282 chars)
   ✅ Chunk 11 summarized (8765 chars)
   📝 Processing chunk 12/26 (2576 chars)
   ✅ Chunk 12 summarized (6733 chars)
   📝 Processing chunk 13/

Traceback (most recent call last):
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/IPython/core/interactiveshell.py", line 3550, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/c2/f9lh6rmd4q1648_pfl1636zw0000gn/T/ipykernel_35989/1953650938.py", line 2, in <module>
    summaries = enhanced_map_step(chunks)
  File "/var/folders/c2/f9lh6rmd4q1648_pfl1636zw0000gn/T/ipykernel_35989/3980674352.py", line 89, in enhanced_map_step
    summary = map_chain.invoke({"context": chunk})
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/langchain_core/runnables/base.py", line 3047, in invoke
    input_ = context.run(step.invoke, input_, config)
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/langchain_core/language_models/llms.py", line 389, in invoke
    self.generate_prompt(
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/langchain_core/language_models/llms.py", line 766, in g

In [None]:
# from langchain_core.prompts import ChatPromptTemplate
# from langchain_core.output_parsers import StrOutputParser

# map_prompt = ChatPromptTemplate.from_messages(
#     [("system", "Write a concise summary of the following:\\n\\n{context}")]
# )

# reduce_template = """
# The following is a set of summaries:
# {docs}
# Take these and distill it into a final, consolidated summary
# of the main themes.
# """

# reduce_prompt = ChatPromptTemplate.from_messages([("human", reduce_template)])

# # Initialize LLM_Model and output parser

# output_parser = StrOutputParser()


# map_chain = map_prompt | LLM_Model | output_parser
# reduce_chain = reduce_prompt | LLM_Model | output_parser

# # MAP STEP: Generate summaries for each chunk
# def map_step(chunks):
#     """Generate individual summaries for each chunk"""
#     summaries = []
#     for i, chunk in enumerate(chunks):
#         print(f"Processing chunk {i+1}/{len(chunks)}")
#         summary = map_chain.invoke({"context": chunk})
#         summaries.append(summary)
#     return summaries

# # REDUCE STEP: Combine summaries into final summary
# def reduce_step(summaries):
#     """Combine all summaries into final consolidated summary"""
#     # Join summaries with separators
#     combined_summaries = "\n\n".join([f"Summary {i+1}: {summary}" for i, summary in enumerate(summaries)])
    
#     # Generate final summary
#     final_summary = reduce_chain.invoke({"docs": combined_summaries})
#     return final_summary

### Stuff method

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.llm import LLMChain
from langchain_core.prompts import ChatPromptTemplate

# Define prompt
prompt = ChatPromptTemplate.from_messages(
    [("system", "Write a concise summary of the following:\\n\\n{context}")]
)

# Instantiate chain
chain = create_stuff_documents_chain(LLM_Model, prompt)

# Invoke chain with a list of Document objects
result = chain.invoke({"context": [document]})
print(result)

<think>
Okay, so I need to analyze this detailed comparison of project management platforms for the construction industry. Let me start by breaking down what each platform offers based on the information provided. 

First, the user mentioned platforms like Monday.com, Wrike, PMWeb, Aconex, and Procore. They also touched on renewable energy-focused tools like RaPM, SenseHawk, and Payaca, but those are more monitoring tools rather than project management systems. The main focus is on construction-specific needs.

Looking at the feature matrix, each platform is rated on 12 features. Let me go through each one:

1. **Assigning roles**: Monday.com is at 10%, which is low. Wrike is 85%, PMWeb 80%, Aconex 80%, Procore 80%. So Wrike and the others are better here. Maybe Monday.com lacks role-based task assignment.

2. **Document approval workflow**: All except Monday.com are 100% or 80% or 80%+. Wait, the matrix shows Monday.com at 60%, Wrike 100%, PMWeb 100%, Aconex 100%, Procore 100%. Wait, 

### LangChain chains

In [99]:
from langchain.chains.summarize import load_summarize_chain


chunk_docs = [Document(page_content=chunk) for chunk in chunks]

chain = load_summarize_chain(LLM_Model, chain_type="map_reduce", verbose=True)
map_reduce_summary = chain.run(chunk_docs)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"MARKET RESEARCH REPORT: ANALYSIS OF DOCUMENT TRANSLATION TOOLS Evaluating Leading Solutions for Multilingual Document Translation Mah inour Mohammad
Introduction This market research report analyzes competitors offering document translation tools that support PDF, Word, Excel, and scanned images while preserving layout and formatting. The focus is on tools that handle Arabic, French, and English languages, catering to both B2B and B2C markets. The key features evaluated include layout preservation, Arabic support and quality, translation accuracy and speed, pricing model, and Optical Character Recognition (OCR) support. To assess these tools, a series of test cases were conducted for each language, including: 1. Text -based documents: Evaluating basic translation accuracy, layout preservation, handling of

Traceback (most recent call last):
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/IPython/core/interactiveshell.py", line 3550, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/c2/f9lh6rmd4q1648_pfl1636zw0000gn/T/ipykernel_35989/2302064687.py", line 7, in <module>
    map_reduce_summary = chain.run(chunk_docs)
    return wrapped(*args, **kwargs)
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/langchain/chains/base.py", line 603, in run
    return self(args[0], callbacks=callbacks, tags=tags, metadata=metadata)[
    return wrapped(*args, **kwargs)
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/langchain/chains/base.py", line 386, in __call__
    return self.invoke(
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/langchain/chains/base.py", line 167, in invoke
    raise e
  File "/Users/maryamsaad/Library/Python/3.9/lib/python/site-packages/langchain/chains/base.

In [None]:
print(map_reduce_summary)

<think>
Okay, the user wants a concise summary of the provided text. Let me start by reading through the original content carefully.

The main points seem to be about comparing project management platforms for the construction and renewable energy sectors. The text mentions that some platforms are focused on solar PV monitoring but lack project management features like workflow approvals and collaboration. Then there's a comparison of several platforms: Monday.com, Wrike, PMWeb, Aconex, and Pro, each with different strengths and weaknesses. The feature matrix is also part of the summary, showing how each platform scores on various functionalities.

I need to make sure the summary is concise, so I should highlight the key differences between the platforms and the main findings. The user might be looking for a quick overview to decide which platform suits their needs. They might be in the construction or renewable energy industry, needing project management tools. The summary should ment

In [None]:
from langchain.chains.summarize import load_summarize_chain


chunk_docs = [Document(page_content=chunk) for chunk in chunks]

chain = load_summarize_chain(LLM_Model, chain_type="stuff", verbose=True)
stuff_summary = chain.run(chunk_docs)

print(stuff_summary)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"MARKET RESEARCH REPORT: ANALYSIS OF DOCUMENT TRANSLATION TOOLS Evaluating Leading Solutions for Multilingual Document Translation Mah inour Mohammad
Introduction This market research report analyzes competitors offering document translation tools that support PDF, Word, Excel, and scanned images while preserving layout and formatting. The focus is on tools that handle Arabic, French, and English languages, catering to both B2B and B2C markets. The key features evaluated include layout preservation, Arabic support and quality, translation accuracy and speed, pricing model, and Optical Character Recognition (OCR) support. To assess these tools, a series of test cases were conducted for each language, including: 1. Text -based documents: Evaluating basic translation accuracy, layout preservation, handling of num