## GWU STAT 4197/STAT 6197

### Week 3 SAS Code Examples (Part 2): Transforming Data 
#### (Some of syntax descriptions were obtained from SAS Documentation)

* Creating New Variable by Using
    * Assignment Statements 
    * IFC/IFN Functions
    * SELECT-WHEN-OTHERWISE Statements
    * Case Expression in PROC SQL
    * RETAIN and SUM Statements
* Running R code in PROC IML


### Creating a New Variable by Recoding Distinct Values
[Setting the Length of Character Variables](https://documentation.sas.com/?docsetId=basess&docsetTarget=n1cruyh1wg40v9n1ddf1lkrcs2j0.htm&docsetVersion=9.4&locale=en)

In [20]:
data work.demographics;
 set sashelp.demographics;
 length region_name $ 22;
 if region = 'AFR' then region_name = 'Africa';
 else if region = 'AMR' then region_name = 'Americas';
 else if region = 'EUR'  then region_name= 'Europe';
 else if region = 'EMR' then region_name ='Eastern Mediterranean';
 else if region = 'SEAR' then region_name= 'South-East Asia';
 else if region = 'WPR' then region_name= 'Western Pacific';
 run;
 
 proc freq data=work.demographics; 
  tables region_name; 
run;

region_name,Frequency,Percent,Cumulative Frequency,Cumulative Percent
Africa,46,23.35,46,23.35
Americas,35,17.77,81,41.12
Eastern Mediterranean,21,10.66,102,51.78
Europe,55,27.92,157,79.7
South-East Asia,11,5.58,168,85.28
Western Pacific,29,14.72,197,100.0


### Creating a New Variable by Recoding Ranges of Values

In [21]:
options nocenter nodate nonumber;
data Heart;
 length AgeAtDeath_group $25;
  set sashelp.heart;

  /*Character-type categorical variables using an assignment statement;
  IF-THEN/Else Statements are used to conditionally assign values to variables.*/

  if 36<=AgeAtDeath<=49 then AgeAtDeath_group='36-49 Years';
  else if 50<=AgeAtDeath<=64 then AgeAtDeath_group= '50-64 Years';
  else if 65<=AgeAtDeath<=79 then AgeAtDeath_group= '65-79 Years';
  else if 80<=AgeAtDeath<=94 then AgeAtDeath_group= '80-94 Years';
  else AgeAtDeath_group= ' ';

 title "Age at Death Grouping Created with an Assignment Statement";
 title2 '(IF-THEN/ELSE-IF-THEN in DATA Step)';

proc freq data=Heart; 
 table AgeAtDeath_group; 
run;


AgeAtDeath_group,Frequency,Percent,Cumulative Frequency,Cumulative Percent
36-49 Years,49,2.46,49,2.46
50-64 Years,522,26.22,571,28.68
65-79 Years,970,48.72,1541,77.40
80-94 Years,450,22.60,1991,100.00
Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218


### Creating a Formatted Character Variable by Using the PUT Function

In [22]:
 proc format ;
 value agefmt low-49 = '36-49 Years'
              50-64 = '50-64 Years'
              65-79 = '65-79 Years'
              80-94 = '80-94 Years' ;
  data Heart;
  length AgeAtDeath_group $25;
  set sashelp.heart; 
  *Character-type categorical variables using a PUT function;
  if ageatdeath ne . then ageatdeath_group=put(ageatdeath, agefmt.);
 
 title 'Age at Death Grouping Created with an Assignment Statement';
 title2 'and the PUT Function in DATA Step';
 proc freq data=Heart; 
 table  ageatdeath_group ; 
run;
title;


AgeAtDeath_group,Frequency,Percent,Cumulative Frequency,Cumulative Percent
36-49 Years,49,2.46,49,2.46
50-64 Years,522,26.22,571,28.68
65-79 Years,970,48.72,1541,77.40
80-94 Years,450,22.60,1991,100.00
Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218


### Creating Multiple Variables with the DO Statement

In [23]:
 data Heart;
  length AgeAtDeath_group AgeAtDeath_group_x $25;
  set sashelp.heart;
   if 36<=ageatdeath <=49 then 
       DO;
        AgeAtDeath_group ='36-49 Years';
        AgeAtDeath_group_x= 'Adults';
       END;
    else if 50<=ageatdeath<=64 then 
       DO;
         AgeAtDeath_group = '50-64 Years';
         AgeAtDeath_group_x = 'Middle-Aged Adults'; 
       END;

    else if ageatdeath>=65 then 
       DO;
         AgeAtDeath_group = '65+ Years';
        AgeAtDeath_group_x ='Older Adults';
       END;
    
 title 'Listing of Multiple Variables Created with DO Group';
title;
 proc freq data=Heart;
  tables AgeAtDeath_group AgeAtDeath_group_x; 
run;


AgeAtDeath_group,Frequency,Percent,Cumulative Frequency,Cumulative Percent
36-49 Years,49,2.46,49,2.46
50-64 Years,522,26.22,571,28.68
65+ Years,1420,71.32,1991,100.00
Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218

AgeAtDeath_group_x,Frequency,Percent,Cumulative Frequency,Cumulative Percent
Adults,49,2.46,49,2.46
Middle-Aged Adults,522,26.22,571,28.68
Older Adults,1420,71.32,1991,100.00
Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218,Frequency Missing = 3218


### Creating Dichotomous Variables Using the IFC/IFN Functions

### The IFC Function 
* uses the IF-THEN/ELSE logic

* can be used to create a new Variable with this function that normally uses three arguments 

    • 1st argument - a logical expression () – a condition (true/false) to be evaluated

    • 2nd argument - character value returned when true 

    • 3rd argument – character value to returned when false 
#### The IFN Function

It is the same as the IFC function except that the IFN function returns the numeric value. For a logical expression with a missing value as in the following example, you can have a 4th argument for SAS to return the value in the 4th argument.


In [24]:
proc format;
    value agefmt
       . = 'Unknown'
       1 = '36-64 Years'
       0 = '65+ Years';
 data IFC1_IFN1;
  length agedth_group_IFC1 $10;
  set sashelp.heart; 
    agedth_group_IFC1 = IFC(36<=ageatdeath<=64, '36-64 Years', '65+ Years'); 
    agedth_LE64_IFN1 = IFN(36<=ageatdeath<=64, 1, 0);
    agedth_LE64_IFN1_formatted = put(agedth_LE64_IFN1, agefmt.);
   title1 'Ex12_IFC_IFN_Function.sas';
 proc freq; 
 tables agedth_group_IFC1 agedth_LE64_IFN1 agedth_LE64_IFN1_formatted;
 run;
title;

 

agedth_group_IFC1,Frequency,Percent,Cumulative Frequency,Cumulative Percent
36-64 Year,571,10.96,571,10.96
65+ Years,4638,89.04,5209,100.0

agedth_LE64_IFN1,Frequency,Percent,Cumulative Frequency,Cumulative Percent
0,4638,89.04,4638,89.04
1,571,10.96,5209,100.0

agedth_LE64_IFN1_formatted,Frequency,Percent,Cumulative Frequency,Cumulative Percent
36-64 Years,571,10.96,571,10.96
65+ Years,4638,89.04,5209,100.0


## The IFN Function with a Fourth Argument - Creating a New Variable

In [25]:
*Ex13_IFN_Fourth_Argument.sas;
DATA Work.Ifn_Func;
INPUT property_value;
property_tax = ifn(property_value GE 150000,
               property_value*.02,
             property_value*.015, .);
format property_value property_tax dollar8.;
datalines;
150000
. 
250000
100000
;
title1 'Ex13_IFN_Fourth_Argument.sas';
proc print data=Work.Ifn_Func; 
run;
title1;

Obs,property_value,property_tax
1,"$150,000","$3,000"
2,.,.
3,"$250,000","$5,000"
4,"$100,000","$1,500"


### [Using WHEN Statements in a SELECT Group](https://documentation.sas.com/?docsetId=ds2ref&docsetTarget=n01kskkbawu6isn1vee8uw50jzvu.htm&docsetVersion=9.4&locale=en)
 
* Evaluating the when-expression When a select-expression Is Included
* Evaluating the when-expression When a select-expression Is Not Included
* Evaluating the when-expression When a statement-list Is Not Included


### Creating a New Column Using a CASE Expression in PROC SQL

In [26]:
*Ex21_create_variable_in_SQL.sas (Part 1);
options nocenter nodate nonumber;
title1 'Ex21_create_variable_in_SQL.sas (Part 1)';
PROC SQL;
SELECT 
  CASE
  WHEN weight <100 THEN '<100 lbs'
  WHEN weight GE 100 AND weight LT 120 THEN '100-<120 lbs'
  WHEN weight GE 120 AND weight LE 150 THEN '120-150 lbs'
  ELSE '120-150 lbs'
  END AS Weight_Cat label= 'Weight Category',
  count(*) as freq_count
FROM sashelp.class
group by Weight_Cat
order by  freq_count desc;
quit;

Weight Category,freq_count
<100 lbs,10
100-<120 lbs,6
120-150 lbs,3


### Creating an Accumulator Variable Using the RETAIN Statement 
#### The RETAIN statement
* returns the value of the variable in the PDV acorss iterations of the DATA step
* initializes the retained variable to missing or a specified before the first iteration of the DATA step
* is a compile-time statement

#### The RETAIN statement has no effect on variables that are read with SET, MERGE, or UPDATE statements. Variables read from SAS data sets are retained automatically.


In [27]:
*Ex22_Retain_Sum_Statement.sas (Part 2);
DATA temp1 ;
   RETAIN Total_sales 0;
   FORMAT Sales Total_sales dollar8.;
   INPUT month sales;
    Total_sales= sum(Total_sales, sales);
   DATALINES;
   1 4000
   2 5000
   3 . 
   4 5500 
   5 5000 
   ;
title1 'Ex22_Retain_Sum_Statement.sas (Part 2)';
title2 'RETAIN Statement';
PROC PRINT data=temp1; 
  VAR month sales Total_sales;
run;

Obs,month,Sales,Total_sales
1,1,"$4,000","$4,000"
2,2,"$5,000","$9,000"
3,3,.,"$9,000"
4,4,"$5,500","$14,500"
5,5,"$5,000","$19,500"


### Creating an Accumulator Variable Using the SUM Statement (as an alternative to the RETAIN statement)
#### The SUM Statement
* (by default), creates the sum or accumulator variable that is automatically set to 0 before the first observation is read. The variable's value is retained from one iteration to the next, as if it had appeared in a RETAIN statement. The sum statement is equivalent to using the SUM function and the RETAIN statement.
* has an exprerssion that is evaluated  and the results are added to the accumulator variable. 
* automatically retains the variable across iterations of the DATA step
* ignores the missing values


In [28]:
*Ex22_Retain_Sum_Statement.sas (Part 1);
DATA temp ;
  INPUT month sales;
      Total_sales+sales;
 FORMAT Sales Total_sales dollar8.;
  DATALINES;
    1 4000 
    2 5000
    3 . 
    4 5500 
    5 5000 
    ;
title1 'Ex22_Retain_Sum_Statement.sas (Part 1)';
title2 'SUM Statement';
PROC PRINT noobs; run;

month,sales,Total_sales
1,"$4,000","$4,000"
2,"$5,000","$9,000"
3,.,"$9,000"
4,"$5,500","$14,500"
5,"$5,000","$19,500"


### Creating an Accumulator Variable Using the SUM Statement Coupled with the RETAIN Statement

* To reset the sum variable to a value other than zero, you need to include the accumulator variable in a RETAIN statement with an initial value.  

In [29]:
*Ex22_Retain_Sum_Statement.sas (Part 3);
options nocenter nodate nonumber;
DATA temp;
   RETAIN Total_sales 1000;
   INPUT month sales ;
    Total_sales+sales;
   FORMAT Sales Total_sales dollar8.;
   DATALINES;
   1 4000 
   2 5000 
   3 . 
   4 5500
   5 5000 
   ;
title 'Ex22_Retain_Sum_Statement.sas (Part 3)';
title2 'RETAIN and SUM Statements';
PROC PRINT noobs; 
 var month sales Total_sales;
RUN;
title;


month,sales,Total_sales
1,"$4,000","$5,000"
2,"$5,000","$10,000"
3,.,"$10,000"
4,"$5,500","$15,500"
5,"$5,000","$20,500"


### Creating an Accumulator Variable Using the SUM Statement in DATA Step with BY-Group Processing
#### The SASHELP.CARS has 428 rows categorized into 38 makes. How to count the number of cars for each of these 38 makes?
#### Code Explanation (obtained from "Programming for SAS Viya") 
* The PROC SORT step and the DATA step with BY-Group Processing calculates the number of cars by MAKE.
* The data set must first be sorted to take advantage of the DATA step with By-Group processing. 
* The program uses FIRST.  processing to set the first observation of COUNT to 0 at the beginnibng of the each MAKE.
* The program uses LAST. processing to write the last observation of the MAKE that contains the final accumulated COUNT for each MAKE.

[How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS by Rick Wicklin](https://blogs.sas.com/content/iml/2018/02/26/how-to-use-first-variable-and-last-variable-in-a-by-group-analysis-in-sas.html)

[Select a specified number of observations from the top of each BY-Group -SAS Documentation](http://support.sas.com/kb/24/778.html)

In [30]:
*Ex22_Retain_Sum_Statement.sas  (Part 4);
proc sort data = sashelp.cars out=cars; by make; run;
data cars_x;
  set cars;
  count + 1;
  by make;
  if first.make then count = 1;
  if last.make;
run;
title 'Ex22_Retain_Sum_Statement.sas (Part 4)';
title2 'SUM Statement';
proc print data=cars_x;
var make count;
sum count;
run;
title;



Obs,Make,count
1.0,Acura,7
2.0,Audi,19
3.0,BMW,20
4.0,Buick,9
5.0,Cadillac,8
6.0,Chevrolet,27
7.0,Chrysler,15
8.0,Dodge,13
9.0,Ford,23
10.0,GMC,8


In [None]:
proc format;
value $regionfmt
    'AFR' = 'Africa'
    'AMR' = 'Americas'
    'EUR' = 'Europe'
    'EMR'  ='Eastern Mediterranean'
    'SEAR' = 'South-East Asia'
    'WPR' = 'Western Pacific';
run

In [None]:
proc sort data=sashelp.demographics out=demographics; 
  by region; run;
data want1(keep= region countries sum_pop);
  set demographics;
  by region;
  if first.region then do;
    sum_pop=pop;
    countries=1;
  end;
  else do;
    sum_pop+pop;
    countries+1;
  end;
  if last.region then output;
 run;
 proc print data=want1; 
 format region $regionfmt.;
 run;

## Creating an Accumulator Variable Containing a Single Value
### (Using the conditional SUM Statement and END = Data Set Option)

#### Task: Get a running total of observations that represent Volvo in the SASHELP.CARS data set.
(You do this by reading up to the last observation and then performing the calculation based on the last observation.)

*  Many applications require that you determine when the DATA step processes the last observation in the input data set. For example, you might want to perform calculations only on the last observation in a data set, or you might want to write an observation only after the last observation has been processed. For this purpose, you can use the END= option for the SET, MERGE, MODIFY, or UPDATE statement. 

* The END= option defines a temporary variable whose value is 1 when the DATA step is processing the last observation. At all other times, the value of variable is 0. Although the DATA step can use the END= variable, SAS does not add it to the resulting data set.

In [31]:
options nocenter nodate nonumber;
DATA Volvo_cars;
set sashelp.cars end=eof;
if make="Volvo" then Volvo_cars+1;
if eof then output;
keep  Volvo_cars;
run;
proc print data=Volvo_cars noobs;
run;


Volvo_cars
12


## The SUM Statement, END= Data Set Option, and PUT Statement
#### Dispaly the value of the accumulator variable in the Log window

In [32]:
options nocenter nodate nonotes nonumber nosource;
ods html close;
DATA _NULL_;
set sashelp.cars end=eof;
if make="Volvo" then Volvo_cars+1;
if eof then  put Volvo_cars=;
run;



The SAS System

Volvo_cars=12

The SAS System

E3969440A681A2408885998500000034


### The SUM Statement - Printing and Counting Invalid Dates in the Log window
#### Code Explanation

* In the INPUT statement below, the ?? format modifier for the S_DATE variable suppresses the invalid data message and, in addition, prevents the automatic variable _ERROR_ from being set to 1 when invalid data are read.[See SAS® Documentation for details]

* The same field has been read twice, once read into a numeric variable and the second time as a character variable.

##### Use a sum statement to accumulate the count of bad dates (i.e., S_date=.) during DATA step execution.

* During the first iteration, specify the column position as well as the text to the LOG window.

* Output to the LOG window the values of bad dates from the "character date variable" (i.e., S_date_ch) at the specified column position if the "numeric date variable" (i.e. S_date)  has a missing value.

* The format argument is the equal sign so that the text (within quotes) as well as the variable name precedes it value.

* The END= option defines a temporary variable whose value is 1 when the DATA step is processing the last observation. At all other times, the value of variable is 0. Although the DATA step can use the END= variable, SAS does not add it to the resulting data set.

In [1]:
options nodate nonumber nonotes nosource;
ods html close;
DATA _NULL_;
infile 'C:/users/pmuhuri/SASCourse/Week3/Week3Data/Sample_data.txt' END=lastobs;
input Name $ 1-6 
      @8 s_date ?? yymmdd8.
      @8 s_date_ch $8.;
if s_date = . then invalid_dates+1; 
if _n_=1 then put @1 'List of records with invalid dates'; 
if _n_=2 then put @1 'Name' @8 'S_DATE' @16 'S_DATE_CH';
if s_date = . then put name @8 s_date @16 s_date_ch;

if lastobs then put @1  'Number of invalid dates = ' invalid_dates;
run;


5                                                          The SAS System                           16:35 Saturday, February 1, 2025

24         ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
24       ! ods graphics on / outputfmt=png;
[38;5;21mNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1[0m
25         
26         options nodate nonumber nonotes nosource;
List of records with invalid dates
Name   S_DATE  S_DATE_CH
James  .       20090229
Rose   .       20100229
Stuart .       20110229
Liton  .       20130229
Lan    .       20110300
Number of invalid dates = 5

                                                           The SAS System

E3969440A681A2408885998500000003


### Creating a New Variable Using the "In Operator" in the IF/IF-ELSE Statement

In [34]:
/*Creating a New Variable Using the In Operator vs. In: Operator*/
*Ex24_In_Operator_Two_Parens.sas (Part 1);
options nocenter nodate nonumber;
data work.have1;
length diag $ 12;
input icd9code $ @@ ;
if icd9code in ("250", "3572", "3620",
    "648", "36641", "4280") then diag= 'Diabetes';
else if ("29620" <=:icd9code <="29625") |
        ("29630" <=:icd9code <="29635") |
        icd9code in ("2980", "3004", "3091", "311") 
      then diag = 'Depression';
else if icd9code in ("4912", "4932", "496", "5064")
        then diag = 'COPD';
else if icd9code = "493" then diag= 'Ashtma'; 
datalines;
250 3572 3620 648 36641 4280 
29620 29621 29623 29624 29625
29630 29631 29632 29633 29634 29635
2980 3004 3091 311
4912 4932 496 5064 
493
;
title1 'Frequency of variable created using IN: operator';
proc freq data=work.Have1;
 tables diag /nopercent;
run;

diag,Frequency,Cumulative Frequency
Ashtma,1,1
COPD,4,5
Depression,15,20
Diabetes,6,26


### Creating a Dummy Variable Using the "In Operator"

In [35]:
/*Creating an 1/0 Dummy Variable Using the In Operator  and 
  Outer and Inner Parentheses */

*Ex24_In_Operator_Two_Parens.sas (Part 2);
data work.have2;
input icd9code @@ ;
Diag = (icd9code in (250, 3572, 3620, 648, 36641, 4280));
datalines;
250 3572 3620 648 36641 4280 
29620 29621 29623 29624 29625
29630 29631 29632 29633 29634 29635
2980 3004 3091 311
4912 4932 496 5064 
493
;
title1 'Frequency of variable created using the In Operator and Outer and Inner Parentheses ';
proc freq data=work.Have2;
 tables diag/nopercent;
run;
title1; 


Diag,Frequency,Cumulative Frequency
0,20,20
1,6,26


### Check whether you have permission to call R from the SAS system 

In [36]:
options nocenter nodate nonumber nosource;
ods _ALL_ close;
ods listing close;
proc options option=RLANG;
run;
ods listing;


The SAS System

    SAS (r) Proprietary Software Release 9.4  TS1M7

 RLANG             Enables SAS to execute R language statements.

The SAS System

E3969440A681A2408885998500000038


In [5]:
*Ex46_Create_Newcars_SAS_R.sas;
data class_in_SAS;
  set sashelp.class;
  bmi = (weight / (height*height) ) * 703;
  run;
proc print data=class_in_SAS (obs=3) noobs; run;


Name,Sex,Age,Height,Weight,bmi
Alfred,M,14,69.0,112.5,16.6115
Alice,F,13,56.5,84.0,18.4986
Barbara,F,13,65.3,98.0,16.1568


In [6]:
proc iml;
call ExportDataSetToR("work.class_in_SAS", "class_r"); * work.class_in_SAS created earlier;
submit / R;
names(class_r) <- tolower(names(class_r))
str(class_r)
setwd("C:/Users/pmuhuri/SASCourse/Week3/Week3Data")
save(class_r, file = 'class_r.Rdata')
endsubmit;
quit;


### Manipulating data in R within PROC IML
* Use load() to load the R data into memory
* Use mutate() in R tidyverse-dplyr

In [8]:
PROC IML;
SUBMIT / R;
library("tidyverse")
setwd("C:/Users/pmuhuri//SASCourse/Week3/Week3Data")
load("class_r.Rdata")
class <- class_r

class$sex <- factor(class$sex, level=c('M', 'F'),
                                 label=c('male', 'female') 
                   )
class %>%
  mutate(
         bmi = (weight / (height*height) ) * 703
        ) 
head(class)
ENDSUBMIT;
QUIT;


In [1]:
PROC IML;
SUBMIT / R;
setwd ("C:/Users/pmuhuri/SASCourse/Week3/SAS_Codes")
list.files(pattern="SAS$", 
           full.names = TRUE, 
           ignore.case = TRUE)
ENDSUBMIT;
QUIT;

In [1]:
PROC IML;
SUBMIT / R;
setwd ("C:/Users/pmuhuri/SASCourse/Week3/SAS_Codes")
Sys.glob("*.sas")  # returns a sorted list of files
ENDSUBMIT;
QUIT;