# SAS programming language analyser

In [1]:
import datetime
import os
import openai
import sys

from dotenv import load_dotenv

In [2]:
sys.version

'3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]'

In [3]:
print('Today is:', datetime.datetime.today().strftime ('%d-%b-%Y %H:%M:%S'))

Today is: 21-Sep-2023 08:12:44


In [4]:
load_dotenv("azure.env")

# Azure Open AI
openai.api_type: str = "azure"
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = os.getenv("OPENAI_API_VERSION")

print("Open AI version:", openai.__version__)

Open AI version: 0.28.0


In [5]:
model = "text-davinci-003"

In [6]:
def azure_openai(prompt, temperature=0.8):
    """
    Get Azure Open AI results
    """
    prompt = prompt + "\n" + code

    results = openai.Completion.create(
        engine=model,
        prompt=prompt,
        temperature=temperature,
        max_tokens=800,
    )

    answer = results["choices"][0]["text"].strip("\n")

    return answer

## Examples

In [7]:
code = """
data work.table2;
    input Name $ Height;
    label Height='Height (in cm)';
    datalines;
Bob 185
Melissa 168
Susan 164
John 178
Patrick 191
;

run;
"""

In [8]:
print(code)


data work.table2;
    input Name $ Height;
    label Height='Height (in cm)';
    datalines;
Bob 185
Melissa 168
Susan 164
John 178
Patrick 191
;

run;



In [9]:
answer = azure_openai("Can you describe this file in one line?")
print(answer)

This is a SAS program that creates a data set called "work.table2" with two variables, "Name" and "Height", and includes five observations.


In [10]:
answer = azure_openai("Can you explain this file?")
print(answer)

This file is a SAS program that creates a data set called 'work.table2'. The data set contains two variables: Name (character) and Height (numeric). The label statement sets the label of Height to 'Height (in cm)' so that the units are included. The datalines statement is the data that will be entered into the data set; it includes five observations of two variables, Name and Height. Finally, the run statement executes the program and creates the data set.


In [11]:
code = """
proc sql;
    create table work.left_join_proc_sql as
    select t1.name,
        t1.weight,
        t2.height
    from work.table1 t1
        left join work.table2 t2
        on t1.name = t2.name;
quit;
"""

In [12]:
answer = azure_openai("Can you describe this code in one line?")
print(answer)

This code creates a table called "left_join_proc_sql" that combines data from two tables ("table1" and "table2") on the common column "name".


In [13]:
answer = azure_openai("What is the programming language?")
print(answer)

Answer: SAS


In [14]:
answer = azure_openai("Can you convert it into Python?")
print(answer)

import pandas as pd

df1 = pd.read_csv('table1.csv')
df2 = pd.read_csv('table2.csv')

df_left_join = df1.merge(df2, how='left', left_on='name', right_on='name')


In [15]:
answer = azure_openai("Can you convert it into R?")
print(answer)

# The above code in R would be:

left_join_R <- left_join(x = table1, y = table2, by = "name")


In [16]:
answer = azure_openai("Can you convert it into SPSS?")
print(answer)

DATASET DECLARE left_join_proc_sql.

* Specifying the names of the variables.
VARIABLE NAMES 
name weight height.

* Specifying the data types of the variables.
VARIABLE LEVEL
name (A20)
weight (F8)
height (F8).

* Assigning labels to the variables.
VARIABLE LABELS 
name 'Name'
weight 'Weight (kg)'
height 'Height (m)'.

* Reading the data from the original tables.
GET DATA 
/TYPE=TXT
/FILE='table1.txt'
/DELIMITERS=" "
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/VARIABLES=
name (A20)
weight (F8).
DATASET NAME table1.

GET DATA 
/TYPE=TXT
/FILE='table2.txt'
/DELIMITERS=" "
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/VARIABLES=
name (A20)
height (F8).
DATASET NAME table2.

* Merging the two tables.
MATCH FILES
/FILE= *
/BY name.
EXECUTE.

* Saving the merged data in an SPSS dataset.
SAVE OUTFILE='left_join_proc_sql.sav'
/KEEP name weight height.


In [17]:
code = """
proc print data=work.left_join_proc_sql noobs label;
run;
"""

In [18]:
answer = azure_openai("What is the programming language?")
print(answer)

This is SAS syntax.


In [19]:
answer = azure_openai("Can you describe this code in one line?")
print(answer)

This code prints a data set from the "work.left_join_proc_sql" table with no observations and labels.


In [20]:
answer = azure_openai("Can you convert it into Python?")
print(answer)

import pandas as pd
data=pd.read_csv('work.left_join_proc_sql')
data.info()
data.head()
data.columns.values


In [21]:
code = """
proc ttest data=sashelp.fish (where=(Species = "Bream"))
    alpha=0.05;
    var weight;
run;
"""

In [22]:
answer = azure_openai("What is the programming language?")
print(answer)

This code is written in SAS (Statistical Analysis System) programming language.


In [23]:
answer = azure_openai("What is the SAS Procedure used?")
print(answer)

The SAS procedure used in this example is PROC TTEST.


In [24]:
answer = azure_openai("What is the TTEST procedure?")
print(answer)

The TTEST procedure is a statistical procedure available in SAS that can be used to compare the means of two populations. It can also be used to compare the means of one population against a given constant value. In the example above, the TTEST procedure is used to compare the mean weight of Bream fish from the SAS Help Fish dataset. The argument "alpha=0.05" indicates that the significance level for the test is 0.05.


In [25]:
answer = azure_openai("Can you describe this code in one line?")
print(answer)

This code runs a t-test on the variable "weight" in the dataset "sashelp.fish" using an alpha of 0.05, with the subset being fish of species "Bream".


In [26]:
answer = azure_openai("Can you convert it into Python?")
print(answer)

import pandas as pd
import scipy.stats as stats

# Read in the data
df = pd.read_sas('sashelp.fish')

# Filter for only Bream
bream_df = df.loc[df['Species'] == 'Bream']

# Run the t-test
stats.ttest_1samp(bream_df['Weight'], 0, alpha=0.05)


In [27]:
code = """
proc sort data=sashelp.fish
    out=work.fish (where=(Species in ("Bream" "Parkki"))
    keep=Species Weight);
    by Species;
run;
"""

In [28]:
answer = azure_openai("What is the SAS Procedure used?")
print(answer)

The SAS procedure used is PROC SORT.


In [29]:
answer = azure_openai("What is the SORT procedure?")
print(answer)

The SORT procedure is a SAS procedure that is used to sort data from a SAS dataset into an output dataset. It rearranges the order of observations in a SAS dataset and sorts them by one or more variables. The SORT procedure can be used to sort data into an existing dataset or to create a new dataset. The syntax for the SORT procedure includes the "data" statement, which specifies the dataset to be sorted; the "out" statement, which specifies the output dataset; the "where" statement, which specifies which observations should be selected; and the "by" statement, which specifies the variable or variables by which the data should be sorted.


In [30]:
answer = azure_openai("Can you describe this code in one line?")
print(answer)

This code sorts the SAS dataset 'sashelp.fish' and creates a new dataset 'work.fish' with only the 'Species' and 'Weight' variables for the 'Bream' and 'Parkki' species.


In [31]:
answer = azure_openai("Can you convert it into Python?")
print(answer)

import pandas as pd 

fish_data = pd.read_sas('sashelp.fish')

filtered_fish_data = fish_data[fish_data['Species'].isin(["Bream", "Parkki"])][["Species", "Weight"]]

sorted_fish_data = filtered_fish_data.sort_values('Species')

sorted_fish_data.to_csv('work.fish')
