### GWU STAT 4197/STAT 6197
#### Week 4, Part 2: Do Loops and Arrays (Code Examples)

###### (Some of the descriptions of the code were obtained from Online SAS Documentation)

### Do Loops in SAS

* Do i = 1 to 10 by 1 - start to finish by increment
* Do i = 1 to 10 Until (Condition) - a true condition and tested at bottom
* Do i = 1 to 10 While (Condition) - a false condition and tested at top

####  All 3 types of loop require an END statement at the end of block of code.

[Beginner’s Guide to Using ARRAYs and DO Loops - Jennifer L. Waller (SAS Global Forum, 2020](https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4419-2020.pdf)

[Wright, Wendi. Loop-Do-Loop Around Arrays, NESUG 18.](https://www.lexjansen.com/nesug/nesug05/pm/pm8.pdf)

[DeFoor, Jimmy. Using Do Statements, Links, and Arrays. 2008.  SAS Global Forum](https://support.sas.com/resources/papers/proceedings/pdfs/sgf2008/179-2008.pdf)
[Make Your DO Loop More Efficient- G. Liu](https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/5001-2020.pdf)

In [5]:
*Ex27_Do_Loops_Different_Flavors.sas (Part 2);
* Assess the Effect of having  an OUTPUT statement before the END statement;
options nocenter nodate nonumber;
data have2;
 do i=1 to 5;
    x = i + _N_;
 output;
end;
run;
title 'Iterative Do Group with an Index Variable -  an OUTPUT statement before the END statement';
proc print data=have2 noobs;
run;
title;

i,x
1,2
2,3
3,4
4,5
5,6


In [1]:
*https://communities.sas.com;
* By mikezdeb;

data ci (keep=period principal);
retain initial 10000 rate 0.10;
do period=1 to 10;
  principal = compound(initial, . , rate, period);
  *output;
end;
format principal dollar10.;
run;
proc print data=ci; run;

SAS Connection established. Subprocess id is 9612



Obs,period,principal
1,11,"$25,937"


#### Do Until statements evaluate at the top of the loop.

In [12]:
options nocenter nodate nonumber;
data have;
x = 0;
i = 1;
 do until (i ge 6);
    x = x + _N_;
    i = i + _N_;
 output;
end;
run;
title 'Do Until -  an OUTPUT statement before the END statement';
proc print data=have noobs;
run;
title;

x,i
1,2
2,3
3,4
4,5
5,6


#### Do While statements evaluate at the bottom of the loop.

In [11]:
options nocenter nodate nonumber;
data have;
x = 0;
i = 1;
 do while (i le 5);
    x = x + _N_;
    i = i + _N_;
 output;
end;
run;
title 'Do While -  an OUTPUT statement before the END statement';
proc print data=have noobs;
run;
title;

x,i
1,2
2,3
3,4
4,5
5,6


## Simulating Data Using Do Loop

In [3]:
*Ex25_date_group.sas (Part 1);
data work.Have (drop=rownum date_a date_b);
call streaminit(123);
  date_a="01JAN2016"d;
  date_b="31DEC2016"d;
    do rownum = 1 to 100;
    do gender = "Female", "Male"; 
    Date_p=date_a + floorz((date_b-date_a) * rand("uniform"));
    expense = floorz(1000*rand("Uniform"));
    format date_p date9.;
    output;
  end;
 end;
run;
proc print data=work.Have (obs=5); run;

Obs,gender,Date_p,expense
1,Female,31JUL2016,35
2,Male,29JAN2016,387
3,Female,30APR2016,361
4,Male,03MAY2016,169
5,Female,21JAN2016,79


### The Array Statement

#### An  array is used to reference a group of variables that are manipulated in the same way in a DATA step for many important tasks including the following:
* changing values of a set of variables
* creating a set of new variables based on the existing set of variables
* creating new variables
* creating multiple observations from a single observation
* creating a single observation from multiple observations
* reading variable length data records

The ARRAY statement has syntax structures including the following:
* array name 
* (n) number of array elements (optional in some situations)
* $ - indicates an array with character variables
* length of each array element (optional in some situations)
* array elements \_temporary_ or SAS variable names or variable list
* array initial values

SAS arrays must contain either numeric or character variables but not both.

Reference: [Droogendyk, Harry. Arrays – Data Step Efficiency. 2013. SAS Global Forum Paper](http://support.sas.com/resources/papers/proceedings13/519-2013.pdf)


### Implicit Do Loop Using Do Over

In [7]:
*Ex21_Arrays3.sas (Part 1);
*Convert inches to millimeters 
*Method 1 (Implicit Array);
data work.Implicit_array;
 infile datalines;
 input city  $10. inch_T1-inch_T4;
array  yinch  inch_T1-inch_T4;
array ymm ymm_T1-ymm_T4;
do over  yinch;
  ymm = round((yinch/0.03937007874),.1);
end;
datalines;
Sacramento   3.73 2.87 2.57 1.16
Miami        2.01 2.08 2.39 2.85
Albany       2.36 2.27 2.93 2.99
;
Title1 'Method 1 (Implicit Array)';
proc print data=Implicit_array noobs; 
 var city  ymm_T1-ymm_T4;
run;

city,ymm_T1,ymm_T2,ymm_T3,ymm_T4
Sacramento,94.7,72.9,65.3,29.5
Miami,51.1,52.8,60.7,72.4
Albany,59.9,57.7,74.4,75.9


### Explicit Do Loop (Using Index Variable)

In [8]:
*Ex21_Arrays3.sas (Part 2);
*Method 2 (Explicit Array);
data work.Explicit_array;
 infile datalines;
 input city  $10. inch_T1-inch_T4;
array inch {4} inch_T1-inch_T4;
array mm {4} mm_T1-mm_T4;
do i = 1 to 4;
  mm{i} = round((inch{i}/0.03937007874),.1);
end;
datalines;
Sacramento   3.73 2.87 2.57 1.16
Miami      2.01 2.08 2.39 2.85
Albany        2.36 2.27 2.93 2.99
;
title1 'Method 2 (Explicit Array)';
proc print data=work.Explicit_array noobs; 
 var  city mm_T1-mm_T4; 
run;

city,mm_T1,mm_T2,mm_T3,mm_T4
Sacramento,94.7,72.9,65.3,29.5
Miami,51.1,52.8,60.7,72.4
Albany,59.9,57.7,74.4,75.9


#### The DIM function in the Do statement gives you the number of variables (dimensions) present in the array below.

In [9]:
*Ex21_Arrays3.sas (Part 3);
*Method 3 - DIM FUNCTION;
data work.Dim_function;
 infile datalines;
 input city  $10. inch_T1-inch_T4;
array zinch {*} inch_T1-inch_T4;
array zmm  {*} zmm_T1-zmm_T4;
    do i = 1 to dim(zinch);
  zmm{i} = round((zinch{i}/0.03937007874),.1);
  end;
datalines;
Sacramento  3.73 2.87 2.57 1.16
Miami       2.01 2.08 2.39 2.85
Albany        2.36 2.27 2.93 2.99
;
title1 'Method 3 - Explicit Arrays with the DIM Function';
proc print data=work.Dim_function noobs; 
 var  city zmm_T1-zmm_T4;
run;

city,zmm_T1,zmm_T2,zmm_T3,zmm_T4
Sacramento,94.7,72.9,65.3,29.5
Miami,51.1,52.8,60.7,72.4
Albany,59.9,57.7,74.4,75.9


In [10]:
*Ex21_Arrays3.sas (Part 4);
data work.score_to_grade;
 infile datalines;
 input id :$2. mtm1 mtm2 final ;
array exam {3} mtm1 mtm2 final ;
array c_exam {3} $2 c_mtm1 c_mtm2 c_final ;
do i = 1 to 3;
   if exam{i} >=3.80 then c_exam{i} = 'A';
   else if 3.50<=exam{i} <=3.80 then c_exam{i} = 'B+';
   else if 3.00<=exam{i} <=3.49 then c_exam{i} = 'B';
   else if exam{i} <3.00 then c_exam{i} = 'C';
end;
datalines;
G1 3.73 2.87 4.00
G2 3.80 3.53 2.73
G3 3.31 3.93 3.83
G4 2.92 3.08 2.39
G5 2.36 2.27 1.93
;
title1 'Creating multiple new variables with ARRAY staements';
proc print data=work.score_to_grade noobs;
 var  mtm1 mtm2 final c_mtm1 c_mtm2 c_final; 
run;

mtm1,mtm2,final,c_mtm1,c_mtm2,c_final
3.73,2.87,4.0,B+,C,A
3.8,3.53,2.73,A,B+,C
3.31,3.93,3.83,B,A,A
2.92,3.08,2.39,C,B,C
2.36,2.27,1.93,C,C,C


## Simulting data using the following:
* CALL STREAMINIT
* Array statement
* CATS, SUBSTR, COMPRESS, UUDIGN and RAND Functions

In [12]:
*** Ex23_array_call_sortn.sas (Part 1);
*** Simulating Students' Scores;
options nocenter nonumber nodate;
data have(drop= i j);
retain id;
call streaminit(123);
array x[*] TEST1-TEST5 ASSIGNMENT1 ASSIGNMENT2
               MIDTERM FINAL;
do i = 1 to 12;
   do j = 1 to dim(x);
      id = cats('GW',substr(compress(uuidgen(),'-'),1,6));
      x[j] = rand("Integer", 40, 100); 
   end;   
  output;
 end;
run;
title1 'Listing from a simulated Data Set';
proc print data=HAVE noobs; run;


id,TEST1,TEST2,TEST3,TEST4,TEST5,ASSIGNMENT1,ASSIGNMENT2,MIDTERM,FINAL
GW8ce87b,75,42,44,63,60,62,60,50,43
GWe43e71,44,96,53,69,99,85,91,60,89
GWe99a41,41,66,95,46,44,48,83,84,96
GW31a3ff,85,71,96,68,96,66,100,78,58
GW9f6bc8,75,96,96,78,73,98,86,70,94
GWff654b,49,84,87,98,61,88,95,95,98
GW16f987,85,43,83,90,88,45,45,44,88
GW18ec72,73,46,58,59,98,74,80,82,98
GW3efeec,80,79,88,51,84,62,48,96,95
GWf29415,97,57,99,72,59,96,97,93,49


## Creating Variable Names Using an Array

In [11]:
*Ex29_Create_Varnames_from_array.sas;
options nocenter nodate nonumber;
data have;
 array var[10] (10*5);
 do i=1 to 3;
   output;
 end;
 drop i;
run;
title 'Create variable names from arrays';
proc print data=have noobs; run;
title;

var1,var2,var3,var4,var5,var6,var7,var8,var9,var10
5,5,5,5,5,5,5,5,5,5
5,5,5,5,5,5,5,5,5,5
5,5,5,5,5,5,5,5,5,5


## Scoring multiple choice questions using arrays

In [20]:
*Ex34_Temporary_Array.sas;
/*Loop-Do-Loop Around Arrays Wendi L. Wright*/
options nocenter nodate nonumber nosource nonotes;
data temp;
 item1 = 'A';
 item2='C';
 item3 ='A';
 item4 = 'B';
 item5 = 'A';

Array Raw {*} item1-item5;
Array Key {5} $ _temporary_ ('B' 'C' 'A' 'B' 'D');
Array Score {5} ;
Do i = 1 to 5;
if raw{i} eq key{i} then score{i}=1;
 else score{i}=0;
End;
TotalCorrect = sum( of score1-score5 ); 
run;
title1 'Scoring multiple-choice questions using ARRAY statements';
proc print data=temp noobs; run;
title1;


item1,item2,item3,item4,item5,Score1,Score2,Score3,Score4,Score5,i,TotalCorrect
A,C,A,B,A,0,1,1,1,0,6,3


#### Iterative Do group with index variable

In [2]:
options nonotes nodate nonumber nosource;
ods html close;
Data _Null_;
array Test (5) x1-x5 (1, 3, 5, 6,7);
Do i = 1 to 5;
 put test(i)=;
end;
run;

#### Do Over is another form of Do group

In [22]:
options nonotes nodate nonumber nosource;
ods html close;
Data _Null_;
array Test x1-x5 (1, 3 ,5 , 6,7);
Do over test;
 put test (_I_)=;
end;
run;

In [24]:
*Ex35_Arrays_to_assign_values (Part 4);
*Contributed by Rick Wicklin to SAS-L - 1/5/2016;
options nocenter nodate nonumber nosource nonotes;
data _null_;
array x[3] (1, 2, 3);
putlog _ALL_;
run;

## Count the Instances of a Certain Value from a Group of Variables

In [3]:
*Ex22_array_count_specific_value.sas;
options nocenter nonumber nodate;
Data work.Have(drop=i);                                                                                                                     input var1 var2 var3 var4;                                                                                                            
  array test(*) var1--var4;                                                                                                             
/* initialize the counters to zero for 
  each observation */                                                                              
  counter1=0;                                                                                                                           
  counter2=0;                                                                                                                           
/* count the values of 1 and 2 */                                                                                                       
  do i = 1 to 4;                                                                                                                        
    if test(i) = 1 then counter1+1;                                                                                                     
    else if test(i) = 2 then counter2+1;                                                                                                
  end;                                                                                                                                  
  datalines;                                                                                                                            
1  0  1  2                                                                                                                              
2  2  2  1                                                                                                                              
0  2  1  0                                                                                                                              
1  1  1  1                                                                                                                              
2  2  2  2                                                                                                                              
;                                                                                                                                       
run;  
title1 'Count the instances of a certain value'; 
proc print data=work.Have noobs;
run;
title1;

var1,var2,var3,var4,counter1,counter2
1,0,1,2,2,1
2,2,2,1,1,3
0,2,1,0,1,1
1,1,1,1,4,0
2,2,2,2,0,4
