# Paper details

Title: A Beginner’s Guide to Using ARRAYs and DO Loops

Author: Jennifer L. Waller, Augusta University, Augusta, GA

number: 4419-2020

display(Markdown('![figure](../images/email.banner-sasgf2020.png)'))
# SAS #SASGF

```
®
```
# GLOBAL FORUM 2020

```
Paper 4419 -20 20
```
## A Beginner’s Guide to Using ARRAYs and DO Loops

### Jennifer L. Waller, Augusta University, Augusta, GA

# Abstract

If you are copying and pasting code over and over to perform the same operation on multiple variables in a SAS® data step you need to learn about arrays and DO
loops. Arrays and DO loops are efficient and powerful data manipulation tools that you should have in your programmer’s tool box. Arrays list the variables that you want to perform the same operation on and can be specified with or without the number of elements/variables in the array. DO loops are used to specify the operation across the elements in the array. This workshop will show you how to create an array and utilize DO loops to perform operations on the elements in an array, create new variables, and change a short wide to a long and skinny data structure.


# Introduction

Data preparation can take up the majority of the time dedicated to a statistical analysis for a consulting project. Rather than making sure statistical assumptions are correct, running the procedures to actually analyze the data, and examining the results, much of the time on a project is spent preparing the data for analysis. Often, when preparing a data set for analysis the raw data needs to be manipulated in some way; for example, new variables need to be created, specific questionnaire items need to be reversed, and/or scores need to be calculated. The list can go on and on. What makes the task of preparing a data set for analysis tedious is that many times the same operation needs to be performed on a long list of variables (e.g. questionnaire items). For a beginning SAS® programmer, the most likely approach taken to writing the necessary SAS code is to copy and paste the same code over and over for each variable and then change the variable name. For example, if there is a 100-item questionnaire and 10 items need to be reversed, the code to reverse these 10 items results in a minimum of 10 lines of code, one line for each questionnaire item to reverse. And if there are more items that need manipulation, copying, pasting, and changing variable names becomes a time sink for the programmer/analyst and results in a less efficient program. One way to overcome the inefficient use of time, manpower, and computer processing is to use SAS ARRAYs and DO loops.

:::{note}
This text is **standard** _Markdown_
:::

# SAS Arrays

A SAS ARRAY is a set of variables of the same type, called the “elements” of the array, that
you want to perform the same operation on. An array name is assigned to the set of
variables and then the array name is referenced in later DATA step programming, usually a
DO loop, to do an operation on the entire set of variables in the array.

Arrays can be used to do all sorts of things. To list just a few, an array can be used to

1. Set up a list of items of a questionnaire that need to be reversed.
2. Change values of several variables, e.g. change a value of “Not Applicable” to
    missing for score calculation purposes.
3. Create a set of new variables from an existing set of variables, e.g. dichotomizing
    ordinal or continuous variables.


For example, assume we have collected data on the Centers for Epidemiologic Studies
Depression (CES-D) scale, which is a 20-item questionnaire used to assess depressive
symptomatology. Each questionnaire item is measured on an ordinal 0 to 3 scale. An
overall CESD-D score needs to be calculated and consists of the sum of the 20 questionnaire
items. However, 4 questionnaire items were asked such that the responses to the items
need to be reversed; that is, 0 needs to become a 3, 1 needs to become a 2, 2 needs to
become a 1, and 3 needs to become a 0. The four items that need to be reversed are items
cesd4, cesd8, cesd12, and cesd16. An example of the data is given in Figure 1.

```
Obs ID CESD1 CESD2 CESD3 CESD4 CESD5 CESD6 CESD7 CESD8 CESD9 CESD
1 1101 2 3 2. 3 2 2 3 3 2
2 1102 0 2 3 0 2 2 2 1 0 0
3 1103 3 0 2 3 2 1 2 3 1 2
4 1104 1 0 0 2 3 3 2 3 3 2
5 1105 3 2 2. 3. 3 3. 2
```
```
Obs CESD11 CESD12 CESD13 CESD14 CESD15 CESD16 CESD17 CESD18 CESD19 CESD
1 1 3 3 2 3 3 0 1 3 0
2 2 2 2 3 2 3 3 2 1 1
3 1 3 2 2 3 3 1 1 0 2
4 1 2 2 2 0 3 2 2 2 2
5 2 3 3 3 3 3 0 0 2 0
```
Figure 1: Raw CES-D Data

You might use the following SAS code to reverse the four items resulting in the output in
Figure 2.

```
data cesd;
set in.cesd1;
cesd4=3-cesd4;
cesd8=3-cesd8;
cesd12=3-cesd12;
cesd16=3-cesd16;
```
```
Obs ID CESD1 CESD2 CESD3 CESD4 CESD5 CESD6 CESD7 CESD8 CESD9 CESD10 CESD
1 1101 2 3 2. 3 2 2 0 3 2 1
2 1102 0 2 3 3 2 2 2 2 0 0 2
3 1103 3 0 2 0 2 1 2 0 1 2 1
4 1104 1 0 0 1 3 3 2 0 3 2 1
5 1105 3 2 2. 3. 3 0. 2 2
```
```
Obs CESD12 CESD13 CESD14 CESD15 CESD16 CESD17 CESD18 CESD19 CESD
1 0 3 2 3 0 0 1 3 0
2 1 2 3 2 0 3 2 1 1
3 0 2 2 3 0 1 1 0 2
```

```
Obs CESD12 CESD13 CESD14 CESD15 CESD16 CESD17 CESD18 CESD19 CESD
4 1 2 2 0 0 2 2 2 2
5 0 3 3 3 0 0 0 2 0
```
Figure 2: CES-D Data with Items 4, 8, 12, and 16 Reversed


Notice that the code to reverse each of the four items is essentially the same with the only
difference being the variable name of the item needing to be reversed. Copying code that
performs the same operation for a small number of variables, in this instance four, is not
that big of a problem. However, what if the same operation had to be performed on 100
variables? It would be very inefficient to copy the code 100 times and change the variable
name in each line of code. There would be an increased likelihood of coding errors.

The solution to overcome the inefficiency is to use a SAS ARRAY with a subsequent DO loop.
We will first define two different types of arrays, the indexed array and a non-indexed array.
Then, we will move on to how to reference these types of arrays with a DO loop to perform
the operation on all the elements of the array.

### INDEXED ARRAY SYNTAX

There are two types of arrays that can be specified in SAS. The first is what I call an
indexed array and the second is a non-indexed array. All arrays are set up and utilized
within a DATA step. The syntax for an indexed array is as follows:

```
ARRAY arrayname {n} [$] [length] list_of_array_elements;
```
where

```
ARRAY is a SAS keyword that specifies that an array is being defined
```
```
arrayname a valid SAS name that is not a variable name in the data set
```
```
{n} used to specify the number of elements, or variables, in the
array, optional
```
```
[$] used to specify if the all elements in the array are character
variables, the default type is numeric
```
```
[length] used to define the length of new variables being created in the
array, optional
```
```
list_of_array_elements a list of variables of the same type (all numeric or all character)
to be included in the array
```
One thing to note is that all variables in an array must be of the same type, either all
numeric or all character. You cannot mix variables types within an array and an error will
occur.

An indexed array is one in which the number of elements, {n}, is specified when the array
is defined. A non-indexed array is one in which the number of elements is not specified and
SAS determines the number of elements based on the number of variables listed in the
array. You can always use an indexed array, however you can only sometimes, depending
on the situation, use a non-indexed array.


Remember that the arrayname must be a valid SAS name that is not a variable name in
the data set. One tip I can give you to help distinguish an array name from a variable name
is to start the arrayname with the letter “a”.

### Example of An Indexed ARRAY

Going back to the example of reversing the CES-D items, the SAS code that would be used
to define an indexed array containing the 4 CES-D items that need to be reversed is:

```
data cesd;
set in.cesd1;
array aireverse {4} cesd4 cesd8 cesd12 cesd18;
```
In defining this array we specify the array with the SAS keyword ARRAY and then define the
additional parameters needed in the ARRAY statement with

aireverse the arrayname used to reference the array in
future SAS code

{4} indicating there are 4 elements that will be in the
array

[$] is not needed as all variables in the array are
numeric

[length] is not needed

cesd4 cesd8 cesd12 cesd18 is the list of the variables that specify the 4 array
elements

### NON-INDEXED ARRAY SYNTAX

In addition to the indexed array, SAS also provides the option of using a non-indexed array.
Here you don’t specify the number of elements in the array, {n}. Rather, during the
creation of the array, SAS determines the number of elements of the array based on the set
of variables listed. The syntax for a non-indexed array is as follows:

```
ARRAY arrayname [$] [length] list_of_array_elements;
```
where

```
ARRAY is a SAS keyword that specifies that an array is being defined
```
```
arrayname a valid SAS name that is not a variable name in the data set
```
```
[$] used to specify if the elements in the array are character
variables, the default type is numeric
```
```
[length] used to define the length of new variables being created in the
array, optional
```
```
list_of_array_elements a list of variables of the same type (all numeric or all character)
to be included in the array
```

### Example of A Non-Indexed ARRAY

Again, using the CES-D item reversal example, the SAS code that would be to define a non-
indexed array containing the 4 CES-D items that need to be reversed is:

```
data cesd;
set in.cesd1;
array areverse cesd4 cesd8 cesd12 cesd18;
```
In defining this array we specify the array with the SAS keyword ARRAY and then define the
additional parameters for the ARRAY statement with

areverse the arrayname used to reference the array in
future SAS code

cesd4 cesd8 cesd12 cesd18 is the list of the variables that specify the 4 array
elements

One great thing about non-indexed arrays is that they allow for less typing, but give the
same functionality.


# Topic 1

# Topic 2

## Subtopic 1

{numref}`example-table` is an example.