# Open-SAS Comprehensive Test Notebook

This notebook demonstrates all the main functionalities of Open-SAS:

## Features Tested:
- ✅ **DATA Steps**: Creating and manipulating datasets
- ✅ **PROC Procedures**: Statistical analysis and reporting
- ✅ **Data Input**: DATALINES, external files, library management
- ✅ **Data Manipulation**: Variables, conditions, loops
- ✅ **Output**: PROC PRINT, PROC MEANS, PROC FREQ
- ✅ **Dataset Display**: Automatic HTML rendering of results

**Kernel**: Make sure to select "osas" as the kernel!


In [1]:
/* Test 1: Basic DATA Step with DATALINES */
data work.employees;
    input employee_id name $ department $ salary;
    datalines;
1 Alice Engineering 75000
2 Bob Marketing 55000
3 Carol Engineering 80000
4 David Sales 45000
5 Eve Engineering 70000
6 Frank Marketing 60000
7 Grace Sales 50000
8 Henry Engineering 85000
;
run;


Saved dataset work.employees to library work


employee_id,name,department,salary
1.0,Alice,Engineering,75000.0
2.0,Bob,Marketing,55000.0
3.0,Carol,Engineering,80000.0
4.0,David,Sales,45000.0
5.0,Eve,Engineering,70000.0


In [1]:
/* Test 2: PROC PRINT - Display Dataset */
proc print data=work.employees;
run;


PROC PRINT - Dataset Contents
Observations: 8
Variables: 4

employee_id | name     | department  | salary  
-----------------------------------------------
1.0         | Alice    | Engineering | 75000.0 
2.0         | Bob      | Marketing   | 55000.0 
3.0         | Carol    | Engineering | 80000.0 
4.0         | David    | Sales       | 45000.0 
5.0         | Eve      | Engineering | 70000.0 
6.0         | Frank    | Marketing   | 60000.0 
7.0         | Grace    | Sales       | 50000.0 
8.0         | Henry    | Engineering | 85000.0 


In [1]:
/* Test 3: PROC MEANS - Descriptive Statistics */
proc means data=work.employees;
    var salary;
    class department;
run;


PROC MEANS - Grouped Analysis
Analysis Variables: salary
Grouping Variables: department

department  | salary_count | salary_mean | salary_std         | salary_min | salary_max
---------------------------------------------------------------------------------------
Engineering | 4            | 77500.0     | 6454.972243679028  | 70000.0    | 85000.0   
Marketing   | 2            | 57500.0     | 3535.5339059327375 | 55000.0    | 60000.0   
Sales       | 2            | 47500.0     | 3535.5339059327375 | 45000.0    | 50000.0   


In [1]:
/* Test 4: PROC FREQ - Frequency Analysis */
proc freq data=work.employees;
    tables department;
run;


PROC FREQ - Frequency Table for department

Value                Frequency    Percent    Cumulative Percent
------------------------------------------------------------
Engineering          4            50.0       50.0              
Marketing            2            25.0       75.0              
Sales                2            25.0       100.0             
------------------------------------------------------------
Total                8            100.0      100.0             


In [1]:
/* Test 5: Data Manipulation - Create New Variables */
data work.enhanced_employees;
    set work.employees;
    salary_category = ifn(salary > 70000, 'High', ifn(salary > 55000, 'Medium', 'Low'));
    annual_bonus = salary * 0.1;
    total_compensation = salary + annual_bonus;
run;


Evaluating IFN expression: ifn(salary > 70000, 'High', ifn(salary > 55000, 'Medium', 'Low'))
IFN parsed - condition: salary > 70000, true: High, false: ifn(salary > 55000, 'Medium', 'Low'
Condition result: 0     True
1    False
2     True
3    False
4    False
5    False
6    False
7     True
Name: salary, dtype: bool
Handling nested IFN
Nested IFN - cond2: salary > 55000, val2: Medium, val3: Low
Final IFN result: 0      High
1       Low
2      High
3       Low
4    Medium
5    Medium
6       Low
7      High
dtype: object
Saved dataset work.enhanced_employees to library work


employee_id,name,department,salary,salary_category,annual_bonus,total_compensation
1.0,Alice,Engineering,75000.0,High,7500.0,82500.0
2.0,Bob,Marketing,55000.0,Low,5500.0,60500.0
3.0,Carol,Engineering,80000.0,High,8000.0,88000.0
4.0,David,Sales,45000.0,Low,4500.0,49500.0
5.0,Eve,Engineering,70000.0,Medium,7000.0,77000.0


In [1]:
/* Test 6: Display Enhanced Dataset */
proc print data=work.enhanced_employees;
run;


PROC PRINT - Dataset Contents
Observations: 8
Variables: 7

employee_id | name     | department  | salary   | salary_category | annual_bonus | total_compensation
-----------------------------------------------------------------------------------------------------
1.0         | Alice    | Engineering | 75000.0  | High            | 7500.0       | 82500.0           
2.0         | Bob      | Marketing   | 55000.0  | Low             | 5500.0       | 60500.0           
3.0         | Carol    | Engineering | 80000.0  | High            | 8000.0       | 88000.0           
4.0         | David    | Sales       | 45000.0  | Low             | 4500.0       | 49500.0           
5.0         | Eve      | Engineering | 70000.0  | Medium          | 7000.0       | 77000.0           
6.0         | Frank    | Marketing   | 60000.0  | Medium          | 6000.0       | 66000.0           
7.0         | Grace    | Sales       | 50000.0  | Low             | 5000.0       | 55000.0           
8.0         | Henry   

In [1]:
/* Test 7: Conditional Processing - WHERE Clause */
data work.high_earners;
    set work.enhanced_employees;
    where salary > 70000;
run;


Saved dataset work.high_earners to library work


employee_id,name,department,salary,salary_category,annual_bonus,total_compensation
1.0,Alice,Engineering,75000.0,High,7500.0,82500.0
3.0,Carol,Engineering,80000.0,High,8000.0,88000.0
8.0,Henry,Engineering,85000.0,High,8500.0,93500.0


In [1]:
/* Test 8: Display Filtered Results */
proc print data=work.high_earners;
run;


PROC PRINT - Dataset Contents
Observations: 3
Variables: 7

employee_id | name     | department  | salary   | salary_category | annual_bonus | total_compensation
-----------------------------------------------------------------------------------------------------
1.0         | Alice    | Engineering | 75000.0  | High            | 7500.0       | 82500.0           
3.0         | Carol    | Engineering | 80000.0  | High            | 8000.0       | 88000.0           
8.0         | Henry    | Engineering | 85000.0  | High            | 8500.0       | 93500.0           


In [1]:
/* Test 9: PROC SORT - Sort Data */
proc sort data=work.enhanced_employees;
    by department descending salary;
run;


PROC SORT - Dataset Sorted
BY Variables: department (ASC), salary (DESC)
Observations: 8



In [1]:
/* Test 10: Display Sorted Data */
proc print data=work.enhanced_employees;
run;


PROC PRINT - Dataset Contents
Observations: 8
Variables: 7

employee_id | name     | department  | salary   | salary_category | annual_bonus | total_compensation
-----------------------------------------------------------------------------------------------------
8.0         | Henry    | Engineering | 85000.0  | High            | 8500.0       | 93500.0           
3.0         | Carol    | Engineering | 80000.0  | High            | 8000.0       | 88000.0           
1.0         | Alice    | Engineering | 75000.0  | High            | 7500.0       | 82500.0           
5.0         | Eve      | Engineering | 70000.0  | Medium          | 7000.0       | 77000.0           
6.0         | Frank    | Marketing   | 60000.0  | Medium          | 6000.0       | 66000.0           
2.0         | Bob      | Marketing   | 55000.0  | Low             | 5500.0       | 60500.0           
7.0         | Grace    | Sales       | 50000.0  | Low             | 5000.0       | 55000.0           
4.0         | David   

In [1]:
/* Test 11: Advanced Statistics - PROC MEANS with, print, no output */
proc means data=work.enhanced_employees;
    class department;
    var salary total_compensation;
    output 
           mean=avg_salary avg_compensation 
           min=min_salary min_compensation 
           max=max_salary max_compensation;
run;


PROC MEANS - Grouped Analysis
Analysis Variables: salary, total_compensation
Grouping Variables: department

department  | salary_count | salary_mean | salary_std         | salary_min | salary_max | total_compensation_count | total_compensation_mean | total_compensation_std | total_compensation_min | total_compensation_max
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Engineering | 4            | 77500.0     | 6454.972243679028  | 70000.0    | 85000.0    | 4                        | 85250.0                 | 7100.469468046931      | 77000.0                | 93500.0               
Marketing   | 2            | 57500.0     | 3535.5339059327375 | 55000.0    | 60000.0    | 2                        | 63250.0                 | 3889.0872965260114     | 60500.0                | 66000.0               
Sales       | 2            

In [1]:
/* Test 12: Advanced Statistics - PROC MEANS with noprint, and output */
proc means data=work.enhanced_employees noprint;
    class department;
    var salary total_compensation;
    output out=work.dept_summary 
           mean=avg_salary avg_compensation 
           min=min_salary min_compensation 
           max=max_salary max_compensation;
run;


In [1]:
/* Test 12: Display Summary Statistics */
proc print data=work.dept_summary;
run;


PROC PRINT - Dataset Contents
Observations: 3
Variables: 7

avg_salary | avg_compensation | max_salary | max_compensation | department  | salary_min | total_compensation_min
-----------------------------------------------------------------------------------------------------------------
77500.0    | 85250.0          | 85000.0    | 93500.0          | Engineering | 70000.0    | 77000.0               
57500.0    | 63250.0          | 60000.0    | 66000.0          | Marketing   | 55000.0    | 60500.0               
47500.0    | 52250.0          | 50000.0    | 55000.0          | Sales       | 45000.0    | 49500.0               


In [1]:
/* Test 13: Cross-tabulation Analysis */
proc freq data=work.enhanced_employees;
    tables department * salary_category / nocol nopercent;
run;


PROC FREQ - Cross-tabulation: department * salary_category
Options: nocol nopercent

department      | High | Low | Medium | Total
---------------------------------------------
Engineering     | 3    | 0   | 1      | 4    
Marketing       | 0    | 1   | 1      | 2    
Sales           | 0    | 2   | 0      | 2    
Total           | 3    | 3   | 2      | 8    


In [1]:
/* Test 14: PROC CONTENTS - Dataset Information */
proc contents data=work.enhanced_employees;
run;


PROC CONTENTS - Dataset Information

Dataset Information:
  Observations: 8
  Variables: 7

Variable Information:
--------------------------------------------------------------------------------
#   Variable             Type       Length   Non-Null   Null    
--------------------------------------------------------------------------------
1   employee_id          Numeric    8        8          0       
2   name                 Character  Variable 8          0       
3   department           Character  Variable 8          0       
4   salary               Numeric    8        8          0       
5   salary_category      Character  Variable 8          0       
6   annual_bonus         Numeric    8        8          0       
7   total_compensation   Numeric    8        8          0       
