In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
from pathlib import Path

# Add the parent directory to the system path
sys.path.append(str(Path.cwd().parent))

In [3]:
# Configure the log file path
LOG_FILE_PATH = Path("/Users/suyog/personal/sreAgent/output/logs.log")

In [4]:
from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

# Import custom tools
from tools.oncall_employees import GetOncallEmployeesTool
from tools.jira import CreateJiraTicketTool
from tools.file import FilteredLogReaderTool
from tools.stack_exchange import StackExchangeTool


def setup_agent():
    """Set up the ReAct agent with the necessary tools."""
    llm = ChatOpenAI(model="gpt-4o-mini")

    # Load ReAct prompt
    prompt = hub.pull("hwchase17/react")

    # Initialize tools
    tools = [
        FilteredLogReaderTool(log_file_path=LOG_FILE_PATH),
        # StackExchangeTool(),
        GetOncallEmployeesTool(),
        CreateJiraTicketTool(),
    ]

    # Create the agent
    agent = create_react_agent(llm, tools, prompt)

    # Create the agent executor
    agent_executor = AgentExecutor(
        agent=agent,
        tools=tools,
        verbose=True,
        handle_parsing_errors=True,
        max_iterations=3,
    )

    return agent_executor


agent_executor = setup_agent()

[32m2025-05-02 16:19:58.778[0m | [34m[1mDEBUG   [0m | [36mtools.file[0m:[36m__init__[0m:[36m40[0m - [34m[1mFilteredLogReaderTool initialized with log path: /Users/suyog/personal/sreAgent/output/logs.log[0m


In [5]:
prompt = (
    "Read logs from 2025-05-01 17:44:43 to 2025-05-01 17:44:44 and check for any errors or critical issues. "
    "Use the filtered_log_reader tool with these exact timestamps to only look at new logs since the last check."
    "You need to then idenify the potential cause and the possible solution for each of the error you saw."
    "After idenifying the potential cause and possible solution use get_oncall_employees tool to find filter out on-call employees best suited to handle each error"
    "Finally use create_jira_ticket to create appropriate tickets and assign it to the right employee"
)
response = agent_executor.invoke({"input": prompt})



[1m> Entering new AgentExecutor chain...[0m


[32m2025-05-02 16:20:00.050[0m | [1mINFO    [0m | [36mtools.base[0m:[36m_input_parser[0m:[36m19[0m - [1mParser recived input of type -> <class 'str'> 
 value -> {'from_time': '2025-05-01 17:44:43', 'to_time': '2025-05-01 17:44:44'}[0m
[32m2025-05-02 16:20:00.051[0m | [1mINFO    [0m | [36mtools.base[0m:[36m_input_parser[0m:[36m28[0m - [1mChecking for pattern from_time='([^']+)',\s*to_time='([^']+)'[0m
[32m2025-05-02 16:20:00.051[0m | [1mINFO    [0m | [36mtools.file[0m:[36m_run[0m:[36m68[0m - [1mReading logs with filters - from_time: 2025-05-01 17:44:43, to_time: 2025-05-01 17:44:44[0m
[32m2025-05-02 16:20:00.052[0m | [34m[1mDEBUG   [0m | [36mtools.file[0m:[36m_run[0m:[36m84[0m - [34m[1mParsed from_time to 2025-05-01 17:44:43[0m
[32m2025-05-02 16:20:00.053[0m | [34m[1mDEBUG   [0m | [36mtools.file[0m:[36m_run[0m:[36m92[0m - [34m[1mParsed to_time to 2025-05-01 17:44:44[0m
[32m2025-05-02 16:20:00.053[0m | [34m[1mDEBUG   [0

[32;1m[1;3mI will first read the logs from the specified timestamps to check for errors or critical issues.

Action: filtered_log_reader  
Action Input: {'from_time': '2025-05-01 17:44:43', 'to_time': '2025-05-01 17:44:44'}  [0m[36;1m[1;3m[2025-05-01 17:44:43.055] [ERROR] [RequestID: 141e496c-6f90-4e69-9cd7-10fd536a631c] [host: suyogs-MacBook-Air.local] Unhandled exception during /api/auth request processing: ConnectionResetError: Connection reset by peer while fetching data from cache server
Traceback (most recent call last):
  File "/app/banking/api/controllers.py", line 83, in process_request
    response = await service.execute_transaction(transaction_data)
  File "/app/banking/service/transaction_service.py", line 191, in execute_transaction
    result = await self.repository.commit(validated_data)
  File "/app/banking/repository/transaction_repo.py", line 133, in commit
    async with self.db_pool.acquire() as connection:
        return await self._execute_transaction(connec

[32m2025-05-02 16:20:05.359[0m | [1mINFO    [0m | [36mtools.base[0m:[36m_input_parser[0m:[36m19[0m - [1mParser recived input of type -> <class 'str'> 
 value -> {}[0m
[32m2025-05-02 16:20:05.360[0m | [1mINFO    [0m | [36mtools.base[0m:[36m_input_parser[0m:[36m28[0m - [1mChecking for pattern team_name='([^']+)'[0m


[32;1m[1;3mI have identified two error logs during the specified timestamps:

1. **Error 1**: `ConnectionResetError: Connection reset by peer while fetching data from cache server`
   - **Potential Cause**: This error often occurs when the connection between the application and the cache server is interrupted. This could be due to network issues or the cache server being overloaded or down.
   - **Possible Solution**: Verify the health and status of the cache server. Check network connectivity and consider increasing the timeout settings or handling reconnection logic in the application.

2. **Error 2**: `ConnectionRefusedError: Failed to connect to database primary at 10.45.2.8:5432`
   - **Potential Cause**: This error indicates that the application is unable to connect to the primary database. Possible reasons include the database server being down, incorrect connection parameters, or network issues.
   - **Possible Solution**: Check if the database server is running and accepting

[32m2025-05-02 16:20:09.020[0m | [1mINFO    [0m | [36mtools.base[0m:[36m_input_parser[0m:[36m19[0m - [1mParser recived input of type -> <class 'str'> 
 value -> {  
  "summary": "ConnectionResetError during /api/auth request processing",  
  "description": "An unhandled exception occurred: ConnectionResetError: Connection reset by peer while fetching data from cache server. Verify and resolve cache server health and connectivity issues.",  
  "assignee": "frank.lee@example.com"  
}[0m
[32m2025-05-02 16:20:09.022[0m | [1mINFO    [0m | [36mtools.base[0m:[36m_input_parser[0m:[36m28[0m - [1mChecking for pattern summary='([^']+)',\s*description='([^']+)',\s*issue_type='([^']+)',\s*assignee='([^']+)'[0m
[32m2025-05-02 16:20:09.022[0m | [34m[1mDEBUG   [0m | [36mtools.jira[0m:[36m_run[0m:[36m66[0m - [34m[1mConnecting to JIRA at https://suyoginusa.atlassian.net with user suyoginusa@gmail.com[0m


[32;1m[1;3mFor the identified errors, I will assign the appropriate on-call employees based on their roles and teams.

1. **Error 1** (ConnectionResetError):
   - The best-suited employee for this error is likely to be **Frank Lee**, the Network Engineer. He can handle issues related to network connections and configurations.

2. **Error 2** (ConnectionRefusedError):
   - The best-suited employee for this error is **Eva Martinez**, the Database Engineer. She specializes in database-related issues and would be ideal for resolving connection problems with the database.

Next, I will create Jira tickets for both errors and assign them to the respective employees.

### Creating Jira ticket for Error 1
Action: create_jira_ticket  
Action Input: {  
  "summary": "ConnectionResetError during /api/auth request processing",  
  "description": "An unhandled exception occurred: ConnectionResetError: Connection reset by peer while fetching data from cache server. Verify and resolve cache server 

In [6]:
response

{'input': 'Read logs from 2025-05-01 17:44:43 to 2025-05-01 17:44:44 and check for any errors or critical issues. Use the filtered_log_reader tool with these exact timestamps to only look at new logs since the last check.You need to then idenify the potential cause and the possible solution for each of the error you saw.After idenifying the potential cause and possible solution use get_oncall_employees tool to find filter out on-call employees best suited to handle each errorFinally use create_jira_ticket to create appropriate tickets and assign it to the right employee',
 'output': 'Agent stopped due to iteration limit or time limit.'}