


**Core Idea:**
The LLM first explains the SAS code. A human verifies this explanation, providing feedback if necessary. Then, based on the (potentially corrected) understanding, the LLM proposes a step-by-step translation plan. The human verifies and can modify this plan. Finally, the system translates the SAS code component by component according to the approved plan, with human checks for each component.

**Define States (using Pydantic BaseModel):**

In [1]:
from typing import List, Optional, Dict
from pydantic import BaseModel

class OverallState(BaseModel):
    raw_sas_code: str
    # Explanation Phase
    sas_explanation_llm: Optional[str] = None # LLM's initial explanation
    human_explanation_feedback: Optional[str] = None # Human's feedback on the explanation
    final_sas_explanation: Optional[str] = None # Explanation after incorporating feedback (could be same as llm_explanation if no feedback)
    # Planning Phase
    translation_plan_llm: Optional[List[Dict[str, str]]] = None # LLM's step-by-step plan, e.g., [{"step_description": "Convert PROC SORT on dataset X", "sas_block_id": "block1"}, ...]
    human_plan_modification_feedback: Optional[str] = None # Human's feedback on the plan
    final_translation_plan: Optional[List[Dict[str, str]]] = None # Plan after incorporating feedback
    # Component Translation Phase
    current_plan_step_index: int = 0
    current_sas_component_to_translate: Optional[str] = None # The actual SAS code for the current step
    current_component_python_translation_llm: Optional[str] = None # LLM's translation of the current component
    human_component_feedback: Optional[str] = None # Human's feedback on the component translation
    # Aggregation
    translated_python_components: List[Dict[str, str]] = [] # List of {"plan_step_description": "...", "python_code": "..."}
    final_python_code: Optional[str] = None
    # Utility
    error_message: Optional[str] = None # For capturing issues

In [3]:
import os
from typing import Optional, Dict, Any
from pydantic import BaseModel, Field

from langgraph.graph import StateGraph, END, START
from langgraph.checkpoint.memory import MemorySaver

from langchain_core.messages import HumanMessage, SystemMessage
try:
    from langchain_openai import AzureChatOpenAI
except ImportError:
    print("AzureChatOpenAI could not be imported. Please install langchain-openai.")
    print("For now, LLM calls will be simulated.")
    AzureChatOpenAI = None # Fallback to simulation if not available

# --- 1. Define the State (Pydantic BaseModel) ---
class BasicTranslatorState(BaseModel):
    raw_sas_code: str = Field(..., description="The raw SAS code input by the user.")
    llm_initial_explanation: Optional[str] = Field(None, description="The LLM's first attempt at explaining the SAS code.")
    human_feedback_on_explanation: Optional[str] = Field(None, description="Feedback provided by the human reviewer on the LLM's explanation.")
    final_explanation: Optional[str] = Field(None, description="The refined explanation after incorporating human feedback.")
    requires_human_intervention: bool = Field(default=False, description="Flag to indicate if the process is paused for human input.")
    current_human_intervention_point: Optional[str] = Field(None, description="Describes where human intervention is needed.")

    class Config:
        extra = "ignore"

# --- LLM Initialization ---
llm = None
if AzureChatOpenAI:
    try:
        # Ensure all necessary environment variables for AzureChatOpenAI are set by the user
        # e.g., AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME, AZURE_OPENAI_API_VERSION
        # The notebook used AZURE_OPENAI_DEPLOYMENT, so we'll use that for deployment_name
        llm = AzureChatOpenAI(
            azure_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT", "your-deployment-name"), # Replace with your actual deployment name if not in env
            api_version=os.environ.get("AZURE_OPENAI_API_VERSION", "2023-05-15"), # Or your specific API version
        )
        print("AzureChatOpenAI initialized successfully.")
    except Exception as e:
        print(f"Error initializing AzureChatOpenAI: {e}")
        print("LLM calls will be simulated.")
        llm = None # Fallback to simulation
else:
    print("AzureChatOpenAI not available. LLM calls will be simulated.")


# --- 2. Define Node Functions ---
def generate_initial_explanation_node(state: BasicTranslatorState) -> Dict[str, Any]:
    print("\n--- Node: Generating Initial SAS Explanation ---")
    sas_code = state.raw_sas_code
    explanation: Optional[str] = None

    if llm:
        try:
            system_prompt = "You are an expert SAS programmer. Your task is to explain the provided SAS code in simple terms, outlining its main purpose and key steps."
            human_prompt = f"Please explain the following SAS code:\n\n```sas\n{sas_code}\n```"
            messages = [
                SystemMessage(content=system_prompt),
                HumanMessage(content=human_prompt)
            ]
            response = llm.invoke(messages)
            explanation = response.content
            print(f"LLM's Initial Explanation (real): '{explanation}'")
        except Exception as e:
            print(f"LLM call failed: {e}. Falling back to simulation.")
            explanation = f"Simulated LLM Explanation due to error for SAS: {sas_code[:100]}..."
    else: # Fallback to simulation
        explanation = (
            f"Simulated LLM Explanation for SAS code snippet:\n'''\n{sas_code[:150]}...\n'''\n"
            f"This SAS code appears to perform a data step to create a new dataset, "
            f"followed by a procedure to print the data."
        )
        print(f"LLM's Initial Explanation (simulated): '{explanation}'")
    
    return {
        "llm_initial_explanation": explanation,
        "requires_human_intervention": True,
        "current_human_intervention_point": "verify_explanation"
    }

def human_verification_breakpoint_node(state: BasicTranslatorState) -> Dict[str, Any]:
    print("\n--- Node: Human Verification Breakpoint ---")
    if state.human_feedback_on_explanation:
        print(f"Human feedback has been received: '{state.human_feedback_on_explanation}'")
    else:
        print("No human feedback was explicitly provided for this step.")
    return {
        "requires_human_intervention": False,
        "current_human_intervention_point": None
    }

def refine_explanation_node(state: BasicTranslatorState) -> Dict[str, Any]:
    print("\n--- Node: Refining Explanation ---")
    initial_explanation = state.llm_initial_explanation
    feedback = state.human_feedback_on_explanation
    final_explanation_to_set = initial_explanation

    if feedback and feedback.strip().lower() not in ["ok", "looks good", "yes", "proceed", ""]:
        print(f"Human feedback received: '{feedback}'")
        if llm:
            try:
                system_prompt = "You are an expert SAS programmer. You previously provided an explanation for a piece of SAS code. A human has reviewed it and provided feedback. Your task is to refine your original explanation based on this feedback."
                human_prompt = (
                    f"Original SAS Code:\n```sas\n{state.raw_sas_code}\n```\n\n"
                    f"Your initial explanation was:\n'''\n{initial_explanation}\n'''\n\n"
                    f"The human feedback is:\n'''\n{feedback}\n'''\n\n"
                    f"Please provide a refined explanation incorporating this feedback."
                )
                messages = [
                    SystemMessage(content=system_prompt),
                    HumanMessage(content=human_prompt)
                ]
                response = llm.invoke(messages)
                final_explanation_to_set = response.content
                print(f"Refined Explanation (real LLM): '{final_explanation_to_set}'")
            except Exception as e:
                print(f"LLM call for refinement failed: {e}. Falling back to simulation.")
                final_explanation_to_set = f"Simulated Refined Explanation (error) based on feedback: '{feedback}'. Original: '{initial_explanation}'"
        else: # Fallback to simulation
            final_explanation_to_set = (
                f"Simulated Refined Explanation based on feedback: '{feedback}'.\n"
                f"Original was: '{initial_explanation}'\n"
                f"The refined understanding is that the SAS code primarily focuses on data transformation and reporting, taking into account the user's point."
            )
            print(f"Refined Explanation (simulated): '{final_explanation_to_set}'")
    else:
        print("No significant feedback for refinement, or feedback was positive. Using the initial explanation as final.")
    
    return {
        "final_explanation": final_explanation_to_set,
        "human_feedback_on_explanation": None # Clear feedback
    }

# --- 3. Define the Graph ---
memory = MemorySaver()
builder = StateGraph(BasicTranslatorState)

builder.add_node("generate_initial_explanation", generate_initial_explanation_node)
builder.add_node("human_verification_breakpoint", human_verification_breakpoint_node)
builder.add_node("refine_explanation", refine_explanation_node)

builder.add_edge(START, "generate_initial_explanation")
builder.add_edge("generate_initial_explanation", "human_verification_breakpoint")
builder.add_edge("human_verification_breakpoint", "refine_explanation")
builder.add_edge("refine_explanation", END)

graph = builder.compile(
    checkpointer=memory,
    interrupt_before=["human_verification_breakpoint"]
)

# --- 4. Test the Graph ---
if __name__ == "__main__":
    print("\n--- Starting Basic SAS Explainer Graph Test with LLM Integration ---")
    
    # Simulated SAS code input (as requested)
    sas_code_input = (
        "DATA WORK.SALES_SUMMARY;\n"
        "  SET WORK.MONTHLY_SALES;\n"
        "  BY PRODUCT_CATEGORY REGION;\n"
        "  RETAIN TOTAL_SALES_CATEGORY 0;\n"
        "  IF FIRST.REGION THEN TOTAL_SALES_CATEGORY = 0;\n"
        "  TOTAL_SALES_CATEGORY = TOTAL_SALES_CATEGORY + SALES_AMOUNT;\n"
        "  IF LAST.REGION THEN OUTPUT WORK.SALES_SUMMARY;\n"
        "RUN;\n\n"
        "PROC PRINT DATA=WORK.SALES_SUMMARY NOOBS;\n"
        "  TITLE 'Sales Summary by Product Category and Region';\n"
        "  VAR PRODUCT_CATEGORY REGION TOTAL_SALES_CATEGORY;\n"
        "RUN;\n"
        "TITLE;"
    )
    initial_graph_input = {"raw_sas_code": sas_code_input}
    thread_id = "sas_explainer_llm_thread_2" # Use a new thread_id for a fresh run
    thread_config = {"configurable": {"thread_id": thread_id}}
    current_graph_state: Optional[Dict[str, Any]] = None

    print(f"\n--- Running graph for thread: {thread_id} ---")
    print(f"Simulated SAS Code Input:\n'''\n{sas_code_input}\n'''")

    # --- First part of the stream (runs until interruption) ---
    for event_chunk in graph.stream(initial_graph_input, config=thread_config, stream_mode="values"):
        current_graph_state = event_chunk
        print("\n--- Graph Event Received (Before Human Input) ---")
        for key, value in current_graph_state.items():
            if key not in ['raw_sas_code']: # Avoid re-printing long SAS code
                print(f"  {key}: {value}")
        
        if current_graph_state.get("requires_human_intervention"):
            intervention_point = current_graph_state.get("current_human_intervention_point")
            print(f"\n>>> HUMAN INTERVENTION REQUIRED at: {intervention_point} <<<")

            if intervention_point == "verify_explanation":
                print(f"\nPlease review the LLM's initial explanation (State ID: {thread_config['configurable']['thread_id']}):")
                # The explanation is already printed by the node and in current_graph_state
                
                feedback = input("Your feedback on the explanation (e.g., 'It missed X', or 'ok' to accept): ")
                
                print(f"\n--- Updating graph state with feedback: '{feedback}' ---")
                update_payload = {"human_feedback_on_explanation": feedback}
                graph.update_state(thread_config, update_payload)
                print("--- State updated. Graph will resume from human_verification_breakpoint. ---")
                break # Exit this stream loop to start a new one for resuming
            else:
                print(f"Unexpected intervention point: {intervention_point}. Stopping.")
                break 
    
    # --- Second part of the stream (resuming after human input) ---
    if current_graph_state and current_graph_state.get("human_feedback_on_explanation") is not None: # Check if feedback was given
        print("\n--- Resuming graph execution after human feedback ---")
        final_event_chunk = None
        for event_chunk_after_resume in graph.stream(None, config=thread_config, stream_mode="values"): # Pass None to resume
            final_event_chunk = event_chunk_after_resume
            print("\n--- Graph Event Received (After Human Input/Resume) ---")
            for key, value in final_event_chunk.items():
                if key not in ['raw_sas_code']:
                    print(f"  {key}: {value}")
        current_graph_state = final_event_chunk

    print("\n--- Graph Execution Finished ---")
    if current_graph_state:
        print("\nFinal State of the Graph:")
        # print(f"  Raw SAS Code (Simulated):\n'''\n{current_graph_state.get('raw_sas_code')}\n'''")
        print(f"  LLM's Initial Explanation: {current_graph_state.get('llm_initial_explanation')}")
        # human_feedback_on_explanation should be None now if processed by refine_explanation_node
        print(f"  Human Feedback (should be processed): {current_graph_state.get('human_feedback_on_explanation')}") 
        print(f"  Final Explanation: {current_graph_state.get('final_explanation')}")
    else:
        print("Graph did not reach a final state after potential resume.")

    print("\n--- Test Complete ---")

    # You can inspect the saved states using the checkpointer:
    # print("\n--- All states for the thread from checkpointer: ---")
    # for state_snapshot in memory.get_all(thread_config):
    #     print(state_snapshot.values)


AzureChatOpenAI initialized successfully.

--- Starting Basic SAS Explainer Graph Test with LLM Integration ---

--- Running graph for thread: sas_explainer_llm_thread_2 ---
Simulated SAS Code Input:
'''
DATA WORK.SALES_SUMMARY;
  SET WORK.MONTHLY_SALES;
  BY PRODUCT_CATEGORY REGION;
  RETAIN TOTAL_SALES_CATEGORY 0;
  IF FIRST.REGION THEN TOTAL_SALES_CATEGORY = 0;
  TOTAL_SALES_CATEGORY = TOTAL_SALES_CATEGORY + SALES_AMOUNT;
  IF LAST.REGION THEN OUTPUT WORK.SALES_SUMMARY;
RUN;

PROC PRINT DATA=WORK.SALES_SUMMARY NOOBS;
  TITLE 'Sales Summary by Product Category and Region';
  VAR PRODUCT_CATEGORY REGION TOTAL_SALES_CATEGORY;
RUN;
TITLE;
'''

--- Graph Event Received (Before Human Input) ---

--- Node: Generating Initial SAS Explanation ---
LLM's Initial Explanation (real): 'This code creates a summarized dataset that aggregates sales amounts by product category and region, then prints the results. Here’s a step-by-step explanation:

1. DATA Step (Creating the Summary Dataset):
   - Th

Your feedback on the explanation (e.g., 'It missed X', or 'ok' to accept):  ok



--- Updating graph state with feedback: 'ok' ---
--- State updated. Graph will resume from human_verification_breakpoint. ---

--- Graph Execution Finished ---

Final State of the Graph:
  LLM's Initial Explanation: This code creates a summarized dataset that aggregates sales amounts by product category and region, then prints the results. Here’s a step-by-step explanation:

1. DATA Step (Creating the Summary Dataset):
   - The DATA step is named WORK.SALES_SUMMARY, meaning the output dataset will reside in the SAS WORK library.
   - The SET statement reads records from an existing dataset named WORK.MONTHLY_SALES.
   - The BY statement specifies that the input data is already sorted (or indexed) by PRODUCT_CATEGORY and REGION. This enables the use of FIRST. and LAST. temporary variables for each group defined by these variables.
   - The RETAIN statement initializes and keeps the value of the variable TOTAL_SALES_CATEGORY across data step iterations. It is set to 0 initially.
   - Th