# Overview 2: Managing SAS Tables  

In the first chapter, we covered the **DATA step** for creating SAS tables from external files or datasets.  

This second chapter focuses on **modifying SAS tables** and **creating new tables** from existing ones.  



## **I. Programming Syntax Rules**  

Here are **four key rules** to remember:  

1. **The DATA step (also applies to PROC)** → A command consists of statements and options. Options are always linked to a statement, but a statement does not always require an option.
 
   


In [None]:
Infile "file_path"

   Here, `DLM=` is an option.  

2. **A statement always ends with a semicolon (`;`).**  
   - Forgetting `;` is a common error, which can be easily identified in the **Log window**.  
   - **Two consecutive commas (`,,`) have no effect** on program execution (it creates an empty statement).  

3. **Statements, variable names, and table names** can be written in **lowercase, uppercase, or mixed case**.  

4. **A command can be written across multiple lines** (usually one line per statement).  
   - A statement can start at any position in a line. This is useful for **nested loops**.  
   - Since a statement ends with a `;`, it can span multiple lines, especially when using multiple options.  

### **Two fundamental principles:**  
1. Write a program that is **as readable and well-documented as possible** → Makes it easier to review.  
2. Pay attention to **small programming errors**, as they are time-consuming (always read the documentation carefully!).
     
## **II. Modifying SAS Tables**  

SAS can handle very **large datasets** (over **1 million observations** with dozens of variables).  

- SAS processes **one observation at a time**.  
- **Caution:** When using transformations like **cumulative sums** or **lag functions**, ensure they are correctly applied.  

### **The `DATA` step as a loop over a table**  

The `DATA` step executes **observation by observation** using the **Program Data Vector (PDV)**. 

In [None]:
Data temp;  
    Set in.brevet;  
    Instruction_1;  
    Instruction_k;  
Run;

- **`SET`** specifies the input table (`brevet`) stored in the `in` library.  
- **`DATA`** creates the output table (`temp`).
- #### **Important:**  
- You can **read and write to the same table**: 

In [None]:
Data in.brevet;  
      Set in.brevet;  
  Run;

### **How SAS processes data step by step:**  

1. Takes the **first observation** from the input table.  
2. Applies the instructions (`Instruction_1`, ..., `Instruction_k`).  
3. Stores the transformed observation in the **output table**.  
4. Resets counters and repeats the process **for each observation** until the last one.  
5. **Challenge:** Performing cumulative calculations is difficult because SAS processes one observation at a time.
   ## **A. Creating Variables**  

## **A. Creating Variables**  

When creating a **SAS table from an external file**, new variables are also created. However, new variables can also be derived from **existing variables**.  

### **Rules for new variables:**  
- Every new variable must have a **name** and a **type** (numeric or character).  
- Labels and formats can also be assigned (see SAS documentation).  

### **Character & Numeric Functions in SAS**  

#### **Character Functions**  

| Function | Description |
|----------|------------|
| `Length(x)` | Returns the length of `x`. |
| `Compress(x, 'c')` | Removes characters `'c'` from `x`. |
| `Repeat(x, n)` | Repeats `x` `n` times. |
| `Index(x, y)` | Finds the position of `y` in `x`. |
| `Upcase(x)` | Converts `x` to uppercase. |
| `Lowcase(x)` | Converts `x` to lowercase. |
| `Substr(x, n, l)` | Extracts `l` characters from `x`, starting at position `n`. |
| `Scan(x, n, 'sp')` | Extracts the `n`-th word from `x`, using `sp` as a separator. |
| `Tranwrd(x, y, z)` | Replaces all occurrences of `y` with `z` in `x`. |

#### **Numeric Functions**  

| Function | Description |
|----------|------------|
| `Floor(x)` | Returns the integer part of `x`. |
| `Abs(x)` | Returns the absolute value of `x`. |
| `Sign(x)` | Returns `1` if `x>0`, `-1` if `x<0`, `0` otherwise. |
| `Round(x, a)` | Rounds `x` to precision `a`. |
| `Max(x1, ..., xn)` | Returns the maximum value. |
| `Min(x1, ..., xn)` | Returns the minimum value. |
| `Mod(x, y)` | Returns the remainder of `x ÷ y`. |
| `Sqrt(x)` | Returns the square root of `x`. |
| `Exp(x)` | Returns `e^x`. |
| `Log(x)` | Returns the natural logarithm of `x`. |

---

## **B. Filtering Observations**   
Filtering is done using the `IF ... THEN ... ELSE` statement: 

In [None]:
If brevet_2004 > 250 then output;
If brevet_2005 < 300 then delete;

### **Using `SELECT / WHEN` (Alternative to `IF ... THEN`)** 

In [None]:
Data temp;  
    Set in.brevet;  
    Select;  
        When (brevet_2005 = .) efficacite = "ind";  
        When (0 < brevet_2005 <= 50) efficacite = "low";  
        When (50 < brevet_2005 <= 200) efficacite = "medium";  
        When (brevet_2005 > 200) efficacite = "high";  
        Otherwise efficacite = "unknown";  
    End;  
Run;

## **III. Selecting Variables (DROP, KEEP, RENAME)**

In [None]:
Data temp; Set in.brevet; Keep Brevet_2005; Run;
Data temp; Set in.brevet; Drop Brevet_2005 Brevet_2004; Run;
Data temp; Set in.brevet; Rename Brevet_2004 = Innovation_2004; Run;

🔹 `SELECT` is **faster than multiple `IF ... THEN`** conditions.  

---