# Overview 04 : The different types of Proc Steps

## The proc step complements the Data step.

### Three main functions:

1) Statistical processing  
2) Econometric processing  
3) Data formatting (graphs, table presentation, etc.)

This chapter focuses on **point 1: Statistical processing**.

---


## I – General Structure of a PROC

In [None]:
Proc XXX <procedure_options> ;
Instruction1 </instruction1_options> ;
...
...
Instructionp </instructionp_options> ;
Run ;

The syntax always starts the same way:  
- `Proc` + the name of the procedure
- Generally, Procs provide multiple instructions to be used depending on the problem to solve.
- Instructions may have options.
- **Important:** Separate instructions and options with `/`.

**Tip:** If multiple Procs follow one another in a program, a single `run` is sufficient to execute them all. However, it is recommended to place a `run` after each Proc for debugging purposes.

**Reminder:** Some Procs require `quit` instead of `run` (e.g., `proc sql`, `proc model`).


### Essential Instructions

- **Var** → Specifies the variable(s) to be used.
- **By** → Creates sub-populations.
    - If `BY var` is specified in a PROC, as many Procs will be executed as there are unique values in `var`.

#### Output Storage

- `Output` → Stores all or part of the results of a Proc in an output table.

In [None]:
Output out=output_table Keywords;

- `Where` → Selects specific observations.

**Warning:** The `If…then…else` instruction **cannot** be used in a Proc.

## II – PROC MEANS

`PROC MEANS` generates **descriptive statistics** for continuous quantitative variables (mean, standard deviation, quartiles, etc.).   

### Syntax:

In [None]:
PROC MEANS data=input_table options ; 
VAR list_of_quantitative_variables ; 
BY variable(s) ;  
CLASS variable(s) ; 
TYPES variable1*variablep;
WEIGHT variable ;  
ID variable ;  
OUTPUT OUT=output_table 
statistic1=name1_in_output_table 
statisticp=namep_in_output_table ; 

### Important Instructions:

- **Var** → Variables to compute statistics on.
- **By** → Produces statistics grouped by variable values.  
  - **Note:** The table must be sorted beforehand using `PROC SORT`.
- **Class** → Computes statistics based on classification variables.  
  - Unlike `By`, sorting is not required.
- **Types** → Crosses classification variables.
- **Weight** → Introduces a weighting variable.
- **ID** → If an output table is created, stores the **maximum value** of the `ID` variable.
- **Output** → Saves selected statistics in a table.

### Common Statistics in PROC MEANS

| Statistic | Description |
|-----------|------------|
| **N** | Number of observations |
| **NMISS** | Number of missing values |
| **MIN** | Minimum value |
| **MAX** | Maximum value |
| **RANGE** | Max - Min |
| **SUM** | Sum |
| **MEAN** | Mean |
| **STD** | Standard deviation |
| **KURTOSIS** | Kurtosis coefficient |
| **SKEWNESS** | Skewness coefficient |
| **USS** | Sum of squares |
| **CSS** | Sum of squared deviations from mean |
| **CV** | Coefficient of variation |


### Example Application

In [None]:
Proc means data=temp; run;

Example output:

| Variable | N | Mean | Std Dev | Min | Max |
|----------|----|------|---------|-----|-----|
| date_nais | 20 | 1957.80 | 17.34 | 1929.00 | 1980.00 |
| conso | 18 | 22.99 | 11.20 | 6.10 | 40.70 |
| budget | 19 | 24.57 | 12.47 | 6.20 | 44.70 |


## III – PROC UNIVARIATE

`PROC UNIVARIATE` produces **descriptive statistics** on numerical variables and **visualizations**.

It is similar to `PROC MEANS` but has additional features:

1) **Plots distributions of statistical variables**  
2) **Performs statistical tests** (confidence intervals, normality tests)

### Syntax:

In [None]:
Proc univariate data=input_table options ; 
Var list_of_quantitative_variables ;
By variable(s) ;  
Class variable(s) ; 
Weight variable ;  
ID variable ; 
Histogram variable(s) / options ; 
QQplot variable(s) / options ; 
Probplot variable(s) / options ; 
Inset keyword(s) data=input_table option(s);
Output out=output_table
statistic1=name1_in_output_table 
statisticp=namep_in_output_table ;

### Histogram Example:

In [None]:
Proc univariate data=temp noprint;
Histogram conso budget ;
run;

**- Adding Options :**

In [None]:
Proc univariate data=temp noprint;
Histogram conso/ midpoints=10 15 20 25 30 35 40 vaxislabel="Frequencies";
Histogram budget/ Normal vaxislabel="Frequencies";
run;

## IV – PROC FREQ

`PROC FREQ` generates **frequency statistics**, either **univariate** (like `PROC MEANS` or `PROC UNIVARIATE`) or **bivariate** (contingency tables).

- Variables used to define categories (max **32767** categories) can be **character** or **numeric**.

### Syntax:

In [None]:
Proc Freq data=input_table options ; 
Tables variable_list / options ; 
BY variable(s) ; 
Weight variable ; 
Output out=output_table;
Run;

**Important:** Unlike other Procs, the `Tables` instruction **replaces** `Var`.    
### Example:

In [None]:
Proc Freq data=temp;
Tables sexe csp;
Run;

### Contingency Table Example :

In [None]:
Proc Freq data=temp order=freq;
Tables sexe*csp;
Run;

### Useful `Tables` Options:
1) `Nofreq` → Hides frequencies.
2) `Nopercent` → Hides percentages.
3) `Norow` → Hides row percentages.
4) `Nocol` → Hides column percentages.
5) `Nocum` → Hides cumulative frequencies.
6) `Missing` → Includes missing values in calculations.
7) `Misprint` → Displays missing values but does not count them in percentages.


---

<br>

<div align="center">
  <span style="font-family:'Lucida Calligraphy'; font-size:24px; color:Red;">
     THE END  Thank You
  </span>
</div>

---