In [274]:
pip install ollama


Note: you may need to restart the kernel to use updated packages.


# Load model and configure model parameters

In [275]:
import ollama
from ollama import Client

#model_version = "llama3.1:8b-instruct-fp16"
model_version = "llama3.1:70b"
num_ctx = 40960 #context window higher because of the refinement process
temperature = 0 # should be kept low for deterministic results, default value 1
top_p = "1" 
modelfile='''
FROM {}
PARAMETER num_ctx {}
PARAMETER temperature {}
'''.format(model_version, num_ctx, temperature)

#model_name = "emir_" + model_version
model_name = model_version
#ollama.create(model=model_name, modelfile=modelfile)

one_shot = False



# Load one-shot example if set True

In [276]:
import json
if one_shot:
    with open('one_shot_example.txt', 'r') as file:
        example_run_text = file.read()
        example_run_prompt = f"""
        I will give you an example run for another software project just to show you the reasoning process and the output format. 

        - EXAMPLE RUN START - NOTE: "DO NOT COPY AND PASTE! This Example is for Reference Only, Come Up With Your Own Reasoning for The User Input!"

        {example_run_text}
        
        - EXAMPLE RUN END -

        """
else:
    example_run_prompt = ""
example_run_prompt


''

# Load input file

In [277]:
project_title_map = {
        "archives_space": "Archives Space Project",
        "bad_camp": "BADCamp Project",
        "neurohub": "NeuroHub Project",
        "open_spending": "Open Spending Project",
        "planning_poker": "Planning Poker Project",
        "recycling": "Recycling Project"
    }
projects = ["archives_space", "bad_camp", "neurohub", "open_spending", "planning_poker", "recycling"]

selected_project = projects[0] # select which project you want to test with

project_path = "./input_files/user_stories_{}.json".format(selected_project)
with open(project_path, 'r') as file:
        project_content = file.read()
        project_title = project_title_map[selected_project]

# Get first assessment from model

In [278]:
client = Client(
  headers={}
)
messages = []


generator_system_message = {'role': 'system', 'content': f"""
You are a software architect. Your job is to get a list of categorized user stories with a description, analyze them in detail and
assign a score for each architecture pattern depending on their relevance to the project and 
if it would prove useful in the implementation of the described project.

When the user gives you a list of user stories with a description, analyze the given user stories, give the reasoning for each architecture pattern, and then print out the assigned scores.
Be as objective as possible during your scoring. The decisions need to be deterministic and reproducible, regardless of the order of elements presented or any random factors.

Here are the architecture patterns you will score:
*Layered Architecture 
*Event-Driven Architecture (Pub-Sub Architecture) 
*Microkernel Architecture 
*Microservices Architecture 
*Space-Based Architecture
*Pipeline Architecture (Pipe-Filter Architecture) 
*Client-Server Architecture

Here are the descriptions for each pattern and usage examples:

*Layered Architecture
Layered Architecture organizes software into horizontal layers, each with distinct responsibilities such as presentation, business logic, and data persistence. The core idea is that each layer only interacts with the layer directly beneath it, which makes the system more manageable and easier to maintain. Changes within a single layer can often be made without affecting other layers, enhancing modifiability. It also facilitates the separation of concerns, allowing teams to focus on specific layers without impacting the entire system. However, it can introduce performance overhead if there are many layers and each request must pass through them.
Typical Usage Example:
A common example is an enterprise web application with a front-end/UI layer, a service or business logic layer, and a database access layer. For instance, many Java EE or .NET applications follow a Layered Architecture to clearly separate presentation, business, and data access concerns.

*Event-Driven Architecture (Pub-Sub Architecture)
In an Event-Driven Architecture, components communicate primarily through the production and consumption of events. Publishers emit events without knowing which subscribers will be interested in them, and subscribers react when they receive relevant events. This decouples event producers from consumers, making the system more flexible and scalable. It also allows for asynchronous processing, where components can respond to events at different times. However, debugging can be more complex due to the non-linear flow of events and the indirect communication paths.
Typical Usage Example:
A common example is a messaging system with a message broker like Apache Kafka or RabbitMQ. For instance, e-commerce platforms use an event-driven approach to process inventory updates, orders, and notifications asynchronously.

*Microkernel Architecture
Microkernel Architecture (also known as Plugin Architecture) consists of a core system, or “kernel,” that provides the fundamental services, and additional features are implemented as plug-in modules around it. The core is kept minimal and stable, with most business logic or specialized features added through extensions or plugins. This allows for easy customization and extensibility, since new modules can be added without modifying the kernel. It also improves system resilience because a faulty plugin can be disabled or replaced without bringing down the entire system. However, designing a robust microkernel can be complex as it must accommodate a wide range of future extensions.
Typical Usage Example:
Examples include operating systems like the early versions of Mac OS X (with its Mach microkernel) and Eclipse IDE, where the core platform loads various plugin modules for functionality such as language support, testing tools, or version control integrations.

*Microservices Architecture
Microservices Architecture breaks down an application into smaller, independently deployable services, each responsible for a specific business capability. Each service can be built, tested, deployed, and scaled independently, which promotes agility and faster development cycles. Teams can choose different technologies or databases for each service if needed, providing flexibility. However, this approach introduces complexity in inter-service communication, monitoring, and data consistency. DevOps, CI/CD pipelines, and robust service discovery mechanisms are usually critical for successful microservices deployments.
Typical Usage Example:
Netflix is a well-known practitioner of microservices, splitting its platform into small, specialized services for managing user data, recommendation algorithms, and video streaming, among others. Likewise, many modern e-commerce companies adopt microservices to handle catalog services, shopping cart, and payment services separately.

*Space-Based Architecture
Space-Based Architecture (also called Cloud-Based or Tuple Space Architecture) uses the concept of a “data grid” or “processing grid,” where the application components share a common distributed memory or “space.” It is designed to handle high scalability and peak loads by distributing both the data and the processing across multiple nodes. Components can write data or tasks into the space, and other components pick up the tasks or data for processing. This approach reduces bottlenecks by avoiding central databases at high transaction volumes. However, it can be more complex to implement and manage due to the distributed nature of both data and processing.
Typical Usage Example:
Retail websites dealing with heavy seasonal traffic (like Black Friday sales) might use space-based architecture to handle massive spikes in transactions. GigaSpaces is a known platform that implements this pattern to allow for in-memory data sharing and fast scaling.

*Pipeline Architecture (Pipe-Filter Architecture)
Pipeline Architecture, also known as Pipe-Filter, organizes processing steps into a sequence of filters, where the output of one filter is the input to the next. Each filter performs a distinct operation, and data flows continuously through the pipeline. This makes the system modular, as filters can be swapped or reordered without changing the surrounding components. It is particularly suitable for data processing or transformation workflows. However, it might not be optimal for interactive applications needing bidirectional communication.
Typical Usage Example:
A classic example is a data processing workflow in ETL (Extract, Transform, Load) operations, where you have steps for extracting raw data, cleaning and transforming it, and finally loading it into a target system or database.

*Client-Server Architecture
Client-Server Architecture divides the system into two main parts: clients that request services or resources, and a server that processes these requests and returns a response. This structure simplifies the way users interact with shared data or centralized processes. The server can be scaled up independently, and security or permissions can be centrally managed. However, if the server fails or becomes overloaded, clients will be unable to access the service. It remains a fundamental design pattern for many networked applications.
Typical Usage Example:
Any typical web application employs a client-server setup: a web browser (client) connects to a remote web server to fetch and display web pages. Another example is email clients communicating with a mail server over protocols like IMAP/POP3 for receiving mail and SMTP for sending mail.


Here are the score options and their corresponding meaning:
1: "Completely unsuitable"
2: "Partially suitable"
3: "Sufficient for requirements"
4: "Well-suited"
5: "Perfectly aligned"

The final scores must be given in this format after the detailed reasoning for each architecture pattern:

**Layered Architecture**: -integer score value-
**Event-Driven Architecture (Pub-Sub Architecture)**: -integer score value-
**Microkernel Architecture**: -integer score value-
**Microservices Architecture**: -integer score value-
**Space-Based Architecture**: -integer score value-
**Pipeline Architecture (Pipe-Filter Architecture)**: -integer score value-
**Client-Server Architecture**: -integer score value-

{example_run_prompt}

"""}
messages.append(generator_system_message)


messages.append({'role': 'user', 'content': f"""
I will give you a list of categorized user stories and a description created for a software project titled {project_title}. Please analyze them in detail and give me the scoring for each architecture pattern:

Project Title: {project_title}

Categorized User Stories:

{project_content}

"""})
response = client.chat(model=model_name, messages=messages, options={"num_ctx": num_ctx, "temperature": temperature})
message = response['message']
print(message['content'])
messages.append(message)


After analyzing the provided user stories for the Archives Space Project, I will provide scores for each architecture pattern based on their relevance and suitability for the project.

**Layered Architecture**: 4 (Well-suited)
The project involves various layers of functionality, such as accessioning, describing, and arranging materials, managing authorities and rights, and providing reference services. A layered architecture can help organize these different layers of functionality, making it easier to manage and maintain the system.

**Event-Driven Architecture (Pub-Sub Architecture)**: 2 (Partially suitable)
While there are some events mentioned in the user stories, such as importing EAD files or uploading an EAD for import, the project does not seem to be heavily event-driven. However, using an event-driven architecture could help with tasks like processing imports or updates in the background.

**Microkernel Architecture**: 3 (Sufficient for requirements)
The project has a core se

# Self-refinement Process

In [279]:

# Let's capture the model's initial answer in a variable:
current_answer = message['content']  # e.g. the text with reasoning + final scores
evaluator_context = messages.copy() # this context will be used in evaluator messages
eval_system_message = {
        "role": "system",
        "content": """Your task is changed. You are now the Evaluator. 
Your job: By taking the previously given user stories into account, evaluate the given reasoning and scores for each architecture pattern.

Here are the score options and their corresponding meaning:
1: "Completely unsuitable"
2: "Partially suitable"
3: "Sufficient for requirements"
4: "Well-suited"
5: "Perfectly aligned"

- Do not assume any new information about the project. The description and the user stories for the project remain the same.
- Only suggest a refinement for an assigned score when you are sure its score is wrong and it must change. IMPORTANT: If the current score is already reasonable DO NOT make a refinement suggestion.
- Also, go through your previous evaluations to ensure you don’t repeat the same refinement suggestion for an architecture pattern more than once for the same reason.

After your refinement suggestions, on a NEW line, end your response with exactly one of the two markers:
- REFINE (if refinements are needed)
- NO_REFINEMENT (if no refinements are needed)

Do not include any additional text after that marker.
Do not wrap it in quotes.

Important:
- Score can only take the integer values that are given as the score options, no float values.
- If you provide refinement for any of the pattern scores, do not use NO_REFINEMENT.
Because this will stop the whole process and the current scores will be left unrefined.
- Use NO_REFINEMENT only if you have zero refinements to suggest.
"""
}
evaluator_context.append(eval_system_message)

refiner_context = messages.copy() # this context will be used in refiner messages

refine_system_message = {
        "role": "system",
        "content": """Your task is changed. You are now the Refiner. 
You will be given your latest reasoning/scores and the evaluator's feedback for your latest reasoning/scores. 
Use it to refine the current scores. Keep the same format in the updated response (reasoning and then the updated scores).
Be as objective as possible during your refinement. The decisions need to be deterministic and reproducible, regardless of the order of elements presented or any random factors.

"""
    }
refiner_context.append(refine_system_message)
# We'll define how many refinement loops to allow:
MAX_REFINEMENT_ITERATIONS = 3

refinement_iterations = MAX_REFINEMENT_ITERATIONS # if not set again in the loop, it means the model went through max iterations
for i in range(MAX_REFINEMENT_ITERATIONS):
    # 1) Evaluate the current answer
    
    evaluate_prompt = {
        "role": "user",
        "content": f"""
            Evaluate the current reasoning and scores by taking the original categorized user stories into account:

            Current Answer:
            {current_answer}

            Refinement Suggestions:
            """
    }
    evaluator_context.append(evaluate_prompt)
    eval_response = client.chat(model=model_name, messages=evaluator_context, options={"num_ctx": num_ctx, "temperature": temperature})
    eval_message = eval_response['message']
    eval_feedback = eval_message['content'].strip()
    evaluator_context.append(eval_message)
    
    print(f"\n=== Evaluator Feedback (Iteration {i+1}) ===\n{eval_feedback}\n")
    
    # 2) If the evaluator indicates no changes are needed, break out
    if "NO_REFINEMENT" in eval_feedback:
        print("Evaluator says no changes are needed. Stopping refinement.")
        refinement_iterations = i
        break
    
    # 3) Otherwise, refine
    refine_prompt = {
        "role": "user",
        "content": f"""
            Refine your previous answer by taking the feedback of the evaluator into account. Keep the same format (reasoning and then the final scores).

            Previous Answer:
            {current_answer}

            Evaluator Feedback:
            {eval_feedback}

            Revised Answer:
            """
    }
    refiner_context.append(refine_prompt)
    
    refine_response = client.chat(model=model_name, messages=refiner_context, options={"num_ctx": num_ctx, "temperature": temperature})
    refine_message = refine_response['message']
    revised_answer = refine_message['content'].strip()
    refiner_context.append(refine_message)
    
    print(f"=== Refined Answer (Iteration {i+1}) ===\n{revised_answer}\n")
    
    # Update current_answer for potential next iteration
    current_answer = revised_answer

# After the loop, current_answer holds the final refined output:
print("=== Final Refined Scores & Reasoning ===\n", current_answer)


=== Evaluator Feedback (Iteration 1) ===
After re-evaluating the current reasoning and scores, I suggest the following refinements:

**Event-Driven Architecture (Pub-Sub Architecture)**: The score should be refined to 1 (Completely unsuitable). Upon closer inspection of the user stories, there is no clear indication that event-driven architecture would provide significant benefits for this project. Most of the events mentioned are related to importing EAD files or uploading an EAD for import, which can be handled through other architectural patterns.

**Microkernel Architecture**: The score should remain 3 (Sufficient for requirements). While microkernel architecture might not be necessary for this specific project, it still provides some benefits in terms of customization and extensibility. However, the user stories do not provide enough evidence to justify a higher score.

**Pipeline Architecture (Pipe-Filter Architecture)**: The score should remain 3 (Sufficient for requirements). 

# Summarize the evaluation result

In [280]:
refiner_context.append({'role': 'system', 'content': """
Ok now the evaluation process is finished. take the first assessment and the last assessment from the user. 
And just return the final scores before and after the evaluation-refinement process
following this display format below:

**Layered Architecture**: 1 -> 2
**Event-Driven Architecture (Pub-Sub Architecture)**: 2 -> at
**Microkernel Architecture**: 3 -> 3
**Microservices Architecture**: 4 -> 5
**Space-Based Architecture**: 1 -> 3
**Pipeline Architecture (Pipe-Filter Architecture)**: 2 -> 1
**Client-Server Architecture**: 3 -> 3
"""})

refiner_context.append({'role': 'user', 'content': f"""
read the first assessment and the last assessment, and return the final scores before and after the evaluation-refinement process
following this display format below:

**Layered Architecture**: 1 -> 2
**Event-Driven Architecture (Pub-Sub Architecture)**: 2 -> 2
**Microkernel Architecture**: 3 -> 3
**Microservices Architecture**: 4 -> 5
**Space-Based Architecture**: 1 -> 3
**Pipeline Architecture (Pipe-Filter Architecture)**: 2 -> 1
**Client-Server Architecture**: 3 -> 3


first assessment:
{refiner_context[2]["content"].strip()}

last assessment:
{current_answer.strip()}
"""})
print(refiner_context[-1]["content"])
response = client.chat(model=model_name, messages=refiner_context, options={"num_ctx": num_ctx, "temperature": temperature})
message = response['message']
refiner_context.append(message)
print(message["content"])



read the first assessment and the last assessment, and return the final scores before and after the evaluation-refinement process
following this display format below:

**Layered Architecture**: 1 -> 2
**Event-Driven Architecture (Pub-Sub Architecture)**: 2 -> 2
**Microkernel Architecture**: 3 -> 3
**Microservices Architecture**: 4 -> 5
**Space-Based Architecture**: 1 -> 3
**Pipeline Architecture (Pipe-Filter Architecture)**: 2 -> 1
**Client-Server Architecture**: 3 -> 3


first assessment:
After analyzing the provided user stories for the Archives Space Project, I will provide scores for each architecture pattern based on their relevance and suitability for the project.

**Layered Architecture**: 4 (Well-suited)
The project involves various layers of functionality, such as accessioning, describing, and arranging materials, managing authorities and rights, and providing reference services. A layered architecture can help organize these different layers of functionality, making it easie

# Print log to output file

In [281]:
from datetime import datetime
import json
from pathlib import Path
Path("./logs/self-refinement-one-shot").mkdir(parents=True, exist_ok=True)
Path("./logs/self-refinement-zero-shot").mkdir(parents=True, exist_ok=True)

final_data = {
        "modelName": model_name,
        "modelParameters": modelfile,
        "projectTitle": project_title,
        "selfRefinement": True,
        "oneShot": one_shot,
        "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
        "lastMessage": refiner_context[-1]["content"],
        "numberOfIterations": refinement_iterations,
        "messages": [message["content"] for message in refiner_context],
    }
    
# 3) Generate a filename based on model name and current timestamp
if one_shot:
    filename = f"./logs/self-refinement-one-shot/log_{model_version}_{final_data['timestamp']}.json"
else:
    filename = f"./logs/self-refinement-zero-shot/log_{model_version}_{final_data['timestamp']}.json"
# 4) Write the conversation to a JSON file
with open(filename, "w", encoding="utf-8") as f:
    json.dump(final_data, f, indent=2, ensure_ascii=False)

print(f"Assesment complete. The whole conversation is saved to {filename}")

Assesment complete. The whole conversation is saved to ./logs/self-refinement-zero-shot/log_llama3.1:70b_20250129_093408.json
