# This notebook demo's the following:
1. Codebase -> Dynamics Linespan
2. LLM-assisted-Codebase -> Petrinet AMR
3. AMR enrichment with Parameter extraction

Date created: 11/22/23

In [None]:
import requests
import json
import os
from pathlib import Path


SKEMA_ADDRESS = os.environ.get("SKEMA_ADDRESS", "https://api.askem.lum.ai")

### Codebase -> Dynamics Linespan
- Overview: 
    - This endpoint takes in a zip file containing code with dynamics somewhere in it (ideally), and returns a linespan entry for each file in it. 
- Current Details:
    - This endpoint prompts an LLM, as a result it can be quite slow for large repo's (and other random times)
    - As of right now, each python file in the repo will get a linespan entry. If the model does not suspect there are dynamics in that file it will output a linespan of [L0-L0] and have a description of "Failed to parse dynamics".
    - Each linespan entry has a name which corresponds to the file it refers to. 
- Future Work:
    - Adding support beyond only python, to match the same coverage as our code2fn functionality
    - We will our developing own model for this functionality as well, it will likely be run in parallel to this unless it is significantly superior. The goal of our own model will be easier inference over a entire codebase instead of file by file like this initial support does. Also to be faster. 


In [None]:
zipfile_path =  "./data/code/code_sir.zip"

In [None]:
print(open("./data/code/code1/code.py").read())

In [None]:
URL = f"{SKEMA_ADDRESS}/morae/linespan-given-filepaths-zip"
response_zip = requests.post(URL, files={"zip_file": open(zipfile_path, "rb")},)
response_zip.json()

The next example contains 4 code files, each a different versio chime. One is just the dynamics, one is the complete chime model code, and other two are partial modifications of the code. Note how we get 4 responces back and dynamics was found for each file. 

In [None]:
CHIME_SIR_URL = (
    "https://artifacts.askem.lum.ai/askem/data/models/zip-archives/CHIME-SIR-model.zip"
)
response = requests.get(CHIME_SIR_URL)

In [None]:
URL = f"{SKEMA_ADDRESS}/morae/linespan-given-filepaths-zip"
response_zip = requests.post(URL, files={"zip_file": response.content},)
response_zip.json()

### LLM-assisted-codebase -> Petrinet AMR
- Overview: 
    - This endpoint takes in a zip file containing code with dynamics somewhere in it (ideally), and returns a Petrinet AMR. 
- Current Details:
    - This endpoint has the same input and output as our /workflows/code/codebase-to-pn-amr endpoint. This is to make it easier to integrate.
    - This endpoint takes in the codebase and uses the linespan functionality of before and slices the relevant the code which is then sent to our code-snippets endpoint. This reduces the chance for errors from our code ingestion pipeline and simplifies our extraction process as only a subset of the code is ingested, with the goal of greatly increasing the robustness of this workflow. 
    - For the case where there could multiple files with dynamics we currently only return one AMR to match the input and output of original un-assisted endpoint. To do these we return the AMR with the most "states", using it as a proxy for completeness. 
- Future Work:
    - Multiple AMR outputs could be an option
    - Expanded coverage of coding idioms
    - Once we have our own developed linespan model up, we will replace the original endpoint with that model assisting it and run these two endpoints in parallel, unless one is significantly better than the other. 


In [None]:
# simple sir
URL = f"{SKEMA_ADDRESS}/workflows/code/llm-assisted-codebase-to-pn-amr"
response_zip = requests.post(URL, files={"zip_file": open(zipfile_path, "rb")},)
print(json.dumps(response_zip.json(), indent=2))

In [None]:
# 4 CHIME models at once
URL = f"{SKEMA_ADDRESS}/workflows/code/llm-assisted-codebase-to-pn-amr"
response_zip = requests.post(URL, files={"zip_file": response.content},)
print(json.dumps(response_zip.json(), indent=2))

In [None]:
# penn chime repo, trimmed to only the python files ~49 files. 
# NOTE: Takes 3-5 minutes
zipfile_path =  "./data/code/chime_trimmed.zip"
response_zip = requests.post(URL, files={"zip_file": open(zipfile_path, "rb")},)
print(json.dumps(response_zip.json(), indent=2))

### AMR enrichment with parameter extraction
- Overview: 
    - This endpoint takes an AMR and the code it was derived from and enriches the AMR with the relevant parameters in the code. 
- Current Details:
    - This feature checks the AMR for parameters and finds their entries in the code. It then creates a dataflow trace of their assignment and executes it to extract their value. 
    - This execution framework also gives us access to all the values a parameter can take on. However this is not output into the AMR as of now. 
- Future Work:
    - Expanded coverage of types of assignments/executions to be extracted
    - Handling for cases when the parameter names in the AMR do not match the variable names in the code, but they should be matched.
    - Support for outputting parameter ranges, based on parameters that take on multiple values during execution. 

First we show enriching a simple epidemiology model.

In [None]:
amr_path = Path("./data/execution_engine/epi_model_amr.json")
source_path = Path("./data/execution_engine/epi_model_source.py")

print(json.dumps(json.loads(amr_path.read_text()), indent=2))

In [None]:
request = {
    "amr": json.loads(amr_path.read_text()),
    "source": source_path.read_text(),
    "filename": "epi_model_source.py"
}
URL = f"{SKEMA_ADDRESS}/execution_engine/amr-enrichment"
response = requests.post(URL, json=request)
enriched_amr = response.json()
print(json.dumps(enriched_amr, indent=2))

Now we show a test file showing some of the coverage of parameter extraction capabilites.

In [None]:
amr_path = Path("./data/execution_engine/complex_amr.json")
source_path = Path("./data/execution_engine/complex_source.py")

print(source_path.read_text())

In [None]:
request = {
    "amr": json.loads(amr_path.read_text()),
    "source": source_path.read_text(),
    "filename": "complex_source.py"
}
URL = f"{SKEMA_ADDRESS}/execution_engine/amr-enrichment"
response = requests.post(URL, json=request)
enriched_amr = response.json()
print(json.dumps(enriched_amr, indent=2))

The cell below computes the percent coverage for the give list of coding idioms we currently support

In [None]:
total_params = 0
enriched_params = 0
for entry in enriched_amr['semantics']['ode']['parameters']:
    total_params+=1
    if "value" in entry:
        enriched_params+=1

percent_coverage = (enriched_params/total_params) * 100
print(f"Percent Coverage: {percent_coverage:.2f}%")