# Validation 

In [43]:
# Make sure you have the latest version of the SDK available to use the Batch API
%pip install openai --upgrade

Note: you may need to restart the kernel to use updated packages.


In [45]:
import pandas as pd
from openai import OpenAI
import time
import re
import warnings
import json
warnings.filterwarnings('ignore')
from IPython.display import Image, display

In [47]:
API_KEY=""""""
client = OpenAI(
    api_key=API_KEY,
)
MAX_RETRIES = 3
SLEEP_DURATION = 5

In [49]:
evaluate_terraform_prompt="""You are an expert Terraform validator specializing in validating infrastructure as code for cloud environments.
For generated Terraform code, below nine evaluation axes for Terraform code quality are defined.

Correctness: The generated code must be syntactically correct and adhere to Terraform’s language standards.
All necessary resources must be correctly defined and referenced, outputs and variables should be correctly declared and used, enabling reusability and modularity.
Running terraform plan and terraform apply should not produce errors.

Idempotency:Applying the code multiple times should not produce different results. Terraform code should be idempotent, meaning it only makes changes when needed.

Security: The generated code should ensure sensitive data (like secrets or private keys) is handled securely, avoiding hardcoding and using appropriate mechanisms like environment variables or secure storage solutions.
IAM roles and policies should follow the principle of least privilege, granting only necessary permissions.

Performance and Efficiency:The generated code should optimize resource usage, avoiding unnecessary or over-provisioning of resources.
The code should be modular and reusable, with logical separation of components using Terraform modules.

Scalability:The code should be scalable and capable of handling increased load without significant modifications.The use of variables should allow easy scaling of the infrastructure

Documentation and Readability:The code should be well-documented with comments explaining complex sections.The code should
use consistent and descriptive naming conventions for resources, variables, and outputs.

Compliance:The generated code should comply with organizational policies and industry standards.The code should be easily auditable, with clear logging and versioning.

Modularity and Reusability: The generated code should make use of Terraform modules for reusability.There should be use of variables to control the behavior of the code, making it adaptable to different environments.

Overall Quality: This axis answers the question “How good is the overall Terraform code quality. This can
encompass all of the above axes of quality, as well as others you feel are important.  If it’s hard to find ways to make
the Terraform configuration better, the overall quality is good.  If thereare lots of different ways the Terraform configuration can be made better,
the overall quality is bad

Given a Cloud configuration User query and the generated Terraform Code for the User query 

User query :{}
Microsoft one_shot Generated Terraform Code : {}


Considering the Correctness, Idempotency, Security, Performance and Efficiency, Scalability, Documentation and Readability, Compliance, Modularity and Reusability, Compliance
and Overall Quality of the Terraform Code as defined above, provide the Terraform code with a overall score and with a Rationale for the score
Score:
Rationale:
Please provide the results in a JSON format like example:
{{
  "Score": ,
  "Rationale": {{
    "Correctness": {{
      "score": ,
      "comments": ""
    }},
    "Idempotency": {{
      "score": ,
      "comments": ""
    }},
    "Security": {{
      "score": ,
      "comments": ""
    }},
    "Performance and Efficiency": {{
      "score": ,
      "comments": ""
    }},
    "Scalability": {{
      "score": ,
      "comments": ""
    }},
    "Documentation and Readability": {{
      "score": ,
      "comments": ""
    }},
    "Compliance": {{
      "score": ,
      "comments": ""
    }},
    "Modularity and Reusability": {{
      "score": ,
      "comments": ""
    }},
    "Overall Quality": {{
      "score": ,
      "comments": ""
    }}
  }}
}}
"""

In [51]:
df=pd.read_excel('../Fine-Tuning/Research_Analysis_Base_data_filtered_new_microsoft_Phi-3-mini-4k-instruct_fine_tuned_code_generation2.xlsx')
# List of columns to delete
#columns_to_delete = ['GPT4_data_generation', 'Similiar User Query2', 'Terraform Configuration2', 'Similiar User Query3', 'Terraform Configuration3' ]
# Delete specified columns from the DataFrame
#df.drop(columns=columns_to_delete,inplace=True)
del df['Unnamed: 0']
print(len(df))
df

61


Unnamed: 0,Category,Cloud_provider,user_query,Terraform_Code,microsoft:Phi-3-mini-4k-instruct_fine-tuned_code_generation
0,,Azure,Please generate a Terraform configuration for ...,# Main Terraform\nterraform {\n required_vers...,#Provider.tf\n\nterraform {\n required_versio...
1,,AWS,Please generate a Terraform configuration for ...,"locals {\n bucket = ""yeonghoey-media""\n}\n\nr...",Here's a Terraform configuration that meets th...
2,,AWS,Please generate a Terraform configuration for ...,"module ""worker"" {\n source = ""github.co...","resource ""aws_eip"" ""dns_ip"" {\n vpc = true\n}..."
3,,AWS,Please generate a Terraform configuration for ...,"# nubis-consul release 10\n\nvariable ""nubis_v...","provider ""aws"" {\n region = ""us-east-1""\n}\n\..."
4,,AWS,Please generate a Terraform configuration for ...,variable aws_region {}\n\nvariable enabled {}\...,Here is a sample Terraform configuration that ...
...,...,...,...,...,...
56,,GCP,Please generate a Terraform configuration for ...,/**\n * Copyright 2018 Google LLC\n *\n * Lice...,"#variables.tf\n\n\nresource ""google_compute_su..."
57,,GCP,Please generate a Terraform configuration for ...,/**\n * Copyright 2018 Google LLC\n *\n * Lice...,"api_version: ""2""\n\n# Terraform configuration ..."
58,,GCP,Please generate a Terraform configuration for ...,/**\n * Copyright 2021 Google LLC\n *\n * Lice...,"**main.tf**\n\nprovider ""google"" {\n project ..."
59,,GCP,Please generate a Terraform configuration for ...,/**\n * Copyright 2021 Google LLC\n *\n * Lice...,"provider ""google"" {\n project = ""my-project-i..."


In [53]:
short_df = df.iloc[:10]
short_df

Unnamed: 0,Category,Cloud_provider,user_query,Terraform_Code,microsoft:Phi-3-mini-4k-instruct_fine-tuned_code_generation
0,,Azure,Please generate a Terraform configuration for ...,# Main Terraform\nterraform {\n required_vers...,#Provider.tf\n\nterraform {\n required_versio...
1,,AWS,Please generate a Terraform configuration for ...,"locals {\n bucket = ""yeonghoey-media""\n}\n\nr...",Here's a Terraform configuration that meets th...
2,,AWS,Please generate a Terraform configuration for ...,"module ""worker"" {\n source = ""github.co...","resource ""aws_eip"" ""dns_ip"" {\n vpc = true\n}..."
3,,AWS,Please generate a Terraform configuration for ...,"# nubis-consul release 10\n\nvariable ""nubis_v...","provider ""aws"" {\n region = ""us-east-1""\n}\n\..."
4,,AWS,Please generate a Terraform configuration for ...,variable aws_region {}\n\nvariable enabled {}\...,Here is a sample Terraform configuration that ...
5,,AWS,Please generate a Terraform configuration for ...,variable account_name {}\n\nvariable admin_use...,"# *terraform/main.tf*\n\nprovider ""aws"" {\n r..."
6,,AWS,Please generate a Terraform configuration for ...,"module ""worker"" {\n source ...","**main.tf**\n\n```hcl\nvariable ""env"" {\n def..."
7,,AWS,Please generate a Terraform configuration for ...,"module ""worker"" {\n source = ""gith...","# Main.tf\n\nprovider ""aws"" {\n region = ""us-..."
8,,AWS,Please generate a Terraform configuration for ...,"module ""info"" {\n source = ""../info""\n ...",terraform {\n required_providers {\n aws =...
9,,AWS,Please generate a Terraform configuration for ...,"module ""info"" {\n source = ""../info""\n ...",Here is a Terraform configuration that meets t...


In [55]:
# Creating an array of json tasks

tasks = []

for index, row in df.iterrows():
    
    user_query=row['user_query']
    generated=row['microsoft:Phi-3-mini-4k-instruct_fine-tuned_code_generation']
    input_query=evaluate_terraform_prompt.format(user_query,generated)
    
    task = {
        "custom_id": f"task-{index}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            # This is what you would have in your Chat Completions API call
            "model": "gpt-4o",
            "temperature": 0,
            "response_format": { 
                "type": "json_object"
            },
            "messages": [
                {
                    "role": "user",
                    "content":input_query
                }
            ],
        }
    }
    
    tasks.append(task)

In [57]:
# Creating the file

file_name = "Terraform_code_validation_Generated_Terraform_code_Microsoft_fine-tuned_batch.jsonl"

with open(file_name, 'w') as file:
    for obj in tasks:
        file.write(json.dumps(obj) + '\n')

In [59]:
#Uploading the file 
batch_file = client.files.create(
  file=open(file_name, "rb"),
  purpose="batch"
)

In [61]:
print(batch_file)

FileObject(id='file-s8hTQ5jLXpPbBA4CTbuDEKWG', bytes=593580, created_at=1731809215, filename='Terraform_code_validation_Generated_Terraform_code_Microsoft_fine-tuned_batch.jsonl', object='file', purpose='batch', status='processed', status_details=None)


In [63]:
batch_job = client.batches.create(
  input_file_id=batch_file.id,
  endpoint="/v1/chat/completions",
  completion_window="24h"
)

# Checking results

In [67]:
batch_job = client.batches.retrieve(batch_job.id)
print(batch_job)



Batch(id='batch_67394fc801fc8190ae26b154f937a585', completion_window='24h', created_at=1731809224, endpoint='/v1/chat/completions', input_file_id='file-s8hTQ5jLXpPbBA4CTbuDEKWG', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1731809242, error_file_id=None, errors=None, expired_at=None, expires_at=1731895624, failed_at=None, finalizing_at=1731809238, in_progress_at=1731809224, metadata=None, output_file_id='file-5b9NvpBj5m9SimIJvr5rB5l8', request_counts=BatchRequestCounts(completed=61, failed=0, total=61))


# Retrieving results

In [69]:
result_file_id = batch_job.output_file_id
result = client.files.content(result_file_id).content

In [71]:
result_file_name = "Terraform_code_validation_Generated_Terraform_code_Microsoft_fine-tuned_batch_results.jsonl"

with open(result_file_name, 'wb') as file:
    file.write(result)

In [None]:
error_file_id = batch_job.error_file_id
error = client.files.content(error_file_id).content

In [None]:
error_file_name = "Terraform_code_validation_Generated_Terraform_code_Microsoft_fine-tuned_batch_error.jsonl"

with open(error_file_name, 'wb') as file:
    file.write(error)

In [73]:
# Loading data from saved file
results = []
with open(result_file_name, 'r') as file:
    for line in file:
        # Parsing the JSON string into a dict and appending to the list of results
        json_object = json.loads(line.strip())
        results.append(json_object)
#print(results)

# Reading results 

In [75]:
df

Unnamed: 0,Category,Cloud_provider,user_query,Terraform_Code,microsoft:Phi-3-mini-4k-instruct_fine-tuned_code_generation
0,,Azure,Please generate a Terraform configuration for ...,# Main Terraform\nterraform {\n required_vers...,#Provider.tf\n\nterraform {\n required_versio...
1,,AWS,Please generate a Terraform configuration for ...,"locals {\n bucket = ""yeonghoey-media""\n}\n\nr...",Here's a Terraform configuration that meets th...
2,,AWS,Please generate a Terraform configuration for ...,"module ""worker"" {\n source = ""github.co...","resource ""aws_eip"" ""dns_ip"" {\n vpc = true\n}..."
3,,AWS,Please generate a Terraform configuration for ...,"# nubis-consul release 10\n\nvariable ""nubis_v...","provider ""aws"" {\n region = ""us-east-1""\n}\n\..."
4,,AWS,Please generate a Terraform configuration for ...,variable aws_region {}\n\nvariable enabled {}\...,Here is a sample Terraform configuration that ...
...,...,...,...,...,...
56,,GCP,Please generate a Terraform configuration for ...,/**\n * Copyright 2018 Google LLC\n *\n * Lice...,"#variables.tf\n\n\nresource ""google_compute_su..."
57,,GCP,Please generate a Terraform configuration for ...,/**\n * Copyright 2018 Google LLC\n *\n * Lice...,"api_version: ""2""\n\n# Terraform configuration ..."
58,,GCP,Please generate a Terraform configuration for ...,/**\n * Copyright 2021 Google LLC\n *\n * Lice...,"**main.tf**\n\nprovider ""google"" {\n project ..."
59,,GCP,Please generate a Terraform configuration for ...,/**\n * Copyright 2021 Google LLC\n *\n * Lice...,"provider ""google"" {\n project = ""my-project-i..."


In [77]:
# Reading only the first results
validation_result=pd.DataFrame(columns=['Cloud_provider','user_query','Terraform_Code','microsoft:Phi-3-mini-4k-instruct_fine-tuned_code_generation','GPT4_Validation_Result'])
for res in results[:]:
    task_id = res['custom_id']
    # Getting index from task id
    index = task_id.split('-')[-1]
    #print("Index is", index)
    result = res['response']['body']['choices'][0]['message']['content']
    terraform_index = df.iloc[int(index)]
    #print("terraform index is", terraform_index)
    #Category = terraform_index['Category']
    cp = terraform_index['Cloud_provider']
    user_query = terraform_index['user_query']
    config = terraform_index['Terraform_Code']
    generated = terraform_index['microsoft:Phi-3-mini-4k-instruct_fine-tuned_code_generation']
    #title = movie['Series_Title']
    #print(f"user_query: {user_query}\n\n\nvalidation_result: {result}")
    #print("\n\n----------------------------\n\n")
    new_row = {"Cloud_provider":cp,"user_query":user_query,"Terraform_Code":config,"microsoft:Phi-3-mini-4k-instruct_fine-tuned_code_generation":generated,"GPT4_Validation_Result":result}
    #new_row = {"Category":Category,'cp':cp,"Config":Config,"Userquery":Userquery,"Generated_Terraform_code_Claude":updated_response_claude}
    new_df = pd.DataFrame([new_row])
    validation_result = pd.concat([validation_result, new_df], ignore_index=True)
validation_result.head()

Unnamed: 0,Cloud_provider,user_query,Terraform_Code,microsoft:Phi-3-mini-4k-instruct_fine-tuned_code_generation,GPT4_Validation_Result
0,Azure,Please generate a Terraform configuration for ...,# Main Terraform\nterraform {\n required_vers...,#Provider.tf\n\nterraform {\n required_versio...,"{\n ""Score"": 3,\n ""Rationale"": {\n ""Corre..."
1,AWS,Please generate a Terraform configuration for ...,"locals {\n bucket = ""yeonghoey-media""\n}\n\nr...",Here's a Terraform configuration that meets th...,"{\n ""Score"": 5,\n ""Rationale"": {\n ""Corre..."
2,AWS,Please generate a Terraform configuration for ...,"module ""worker"" {\n source = ""github.co...","resource ""aws_eip"" ""dns_ip"" {\n vpc = true\n}...","{\n ""Score"": 4,\n ""Rationale"": {\n ""Corre..."
3,AWS,Please generate a Terraform configuration for ...,"# nubis-consul release 10\n\nvariable ""nubis_v...","provider ""aws"" {\n region = ""us-east-1""\n}\n\...","{\n ""Score"": 6.5,\n ""Rationale"": {\n ""Cor..."
4,AWS,Please generate a Terraform configuration for ...,variable aws_region {}\n\nvariable enabled {}\...,Here is a sample Terraform configuration that ...,"{\n ""Score"": 4,\n ""Rationale"": {\n ""Corre..."


In [79]:
validation_result.to_excel('Research_Analysis_Base_data_Microsoft_fine_tuned_generated_gpt_4o_validated2.xlsx', index=False)