To optimise a construction schedule using an Agentic Retrieval-Augmented Generation (RAG) system with T5-small and incorporate document retrieval, we can extend our previous approach to include document-based information retrieval. This involves combining knowledge graph representation, a question-answering language model, and document retrieval for enhanced insights.
Step-by-Step Implementation

    Data Collection and Preparation: Collect and prepare data, including task details, dependencies, resource allocations, and relevant documents.
    Define the Knowledge Graph: Create a knowledge graph using networkx to represent the construction schedule.
    Integrate Document Retrieval: Use a document retrieval system to fetch relevant documents based on queries.
    Integrate the T5 Model: Use the T5 model for question-answering, leveraging the retrieved documents for better context.
    Implement the Agentic RAG System: Combine the knowledge graph, document retrieval, and T5 model for schedule optimisation.

In [1]:
import networkx as nx
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Initialise the T5 model and tokenizer
model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
qa_pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer)



In [2]:
# Initialize the graph
G = nx.DiGraph()

# Add nodes for Tasks, PredTasks, TasksRSRC, and RSRC
G.add_node("Task1", type="Task", duration=5, description="Excavate foundation")
G.add_node("Task2", type="Task", duration=3, description="Pour concrete")
G.add_node("Resource1", type="RSRC", description="Excavator")
G.add_node("Resource2", type="RSRC", description="Concrete Mixer")
G.add_node("Task1-Resource1", type="TasksRSRC")
G.add_node("Task2-Resource2", type="TasksRSRC")

# Add edges representing relationships
G.add_edge("Task1", "Task2", type="HAS_PREDTASK")
G.add_edge("Task1", "Resource1", type="REQUIRES_RSRC")
G.add_edge("Task2", "Resource2", type="REQUIRES_RSRC")
G.add_edge("Resource1", "Task1-Resource1", type="ALLOCATED_TO")
G.add_edge("Resource2", "Task2-Resource2", type="ALLOCATED_TO")


### Define a Retrieval System

In [6]:
# Sample documents for retrieval
documents = [
    "Excavation best practices and safety measures.",
    "Concrete pouring techniques and timelines.",
    "Resource management in construction projects.",
    "Optimising task schedules for construction efficiency.",
    "Handling dependencies and predecessor tasks effectively."
]

# Initialise the TF-IDF vectorizer
vectorizer = TfidfVectorizer()
doc_vectors = vectorizer.fit_transform(documents)

def retrieve_documents(query, vectorizer, doc_vectors, documents, top_k=2):
    query_vector = vectorizer.transform([query])
    similarities = cosine_similarity(query_vector, doc_vectors).flatten()
    top_indices = np.argsort(similarities)[-top_k:]
    return [documents[i] for i in top_indices[::-1]]


### Define Helper System

In [9]:
def graph_to_text(graph):
    text = "Construction schedule details:\n"
    for node, data in graph.nodes(data=True):
        text += f"Task: {node}, Duration: {data.get('duration', 'N/A')}, Description: {data.get('description', 'N/A')}\n"
    for u, v, data in graph.edges(data=True):
        text += f"{u} -> {v} ({data['type']})\n"
    return text

def query_graph(graph, query, qa_pipeline, vectorizer, doc_vectors, documents):
    text_input = f"Query: {query}\n{graph_to_text(graph)}"
    retrieved_docs = retrieve_documents(query, vectorizer, doc_vectors, documents)
    text_input += "\nRelevant documents:\n" + "\n".join(retrieved_docs)
    
    output = qa_pipeline(text_input, max_length=512)
    return output[0]['generated_text']


In [11]:
def optimise_schedule(graph, qa_pipeline, vectorizer, doc_vectors, documents):
    queries = [
        "What is the optimal order of tasks?",
        "How can we minimize the duration of the construction project?",
        "Which resources are critical for task completion?",
        "Identify potential bottlenecks in the schedule."
    ]
    
    optimisation_results = {}
    for query in queries:
        answer = query_graph(graph, query, qa_pipeline, vectorizer, doc_vectors, documents)
        optimisation_results[query] = answer
    
    return optimisation_results


In [13]:
optimisation_results = optimise_schedule(G, qa_pipeline, vectorizer, doc_vectors, documents)
for query, result in optimisation_results.items():
    print(f"Query: {query}")
    print(f"Answer: {result}\n")


Query: What is the optimal order of tasks?
Answer: : What is optimal order of tasks? Query: What is optimal order of tasks? Construction schedule details: Task: Task1, Duration: 5, Description: Excavate foundation Task: Task2, Duration: 3, Description: Pour concrete Task: Resource1, Duration: N/A, Description: Pour concrete Task: Resource1, Duration: N/A, Description: Pour concrete Task: Resource1, Duration: N/A, Description: Pour concrete Task: Resource1, Duration: N/A, Description: Pour concrete Task: Resource2, Duration: N/A

Query: How can we minimize the duration of the construction project?
Answer: : How can we minimize the duration of the construction project? Query: How can we minimize the duration of the construction project? Construction schedule details: Task: Task1, Duration: 5, Description: Excavate foundation Task: Task2, Duration: 3, Description: Pour concrete Task: Resource1, Duration: N/A, Description: Pour concrete Task: Resource1, Duration: N/A, Description: Pour con

Explanation

    Data Collection and Preparation: Collect and prepare task and resource data, including task descriptions, durations, relationships, and relevant documents.
    Knowledge Graph: Create a knowledge graph using networkx to represent tasks, resources, and their relationships.
    Document Retrieval: Implement a simple document retrieval system using TF-IDF vectorization and cosine similarity to fetch relevant documents based on queries.
    T5 Model Integration: Use the T5 model for question-answering and text generation, initializing the model and tokenizer, and creating a pipeline for text generation.
    Helper Functions: Define functions to convert the graph to text, retrieve relevant documents, and query the graph using the T5 model.
    Optimization Logic: Develop a function to optimize the schedule by querying the graph with relevant questions, retrieving documents for context, and analyzing the model's responses.
    Implementation: The example implementation includes defining the nodes and edges of the graph, creating helper functions, running the optimization, and printing the results.