In [1]:
libname certadv '/folders/myfolders/certadv';

SAS Connection established. Subprocess id is 20130



# CHAPTER 3: JOIN

Every time join happen in proc SQL, SAS generate <font color = red>Catersian</font> product first

## Inner join
   <font color=red>Note:</font>
   
   - Table alias required: self-join or same tables name in dif. library
   - can use <font color=blUE>COALESCE</font> function to overlay columns with same name
   
## Natural Join
<font color=red>ON clause implied</font>, auto join by same name -> <font color=red> dont use</font> ON clause.
Natural join similar to join with USING clause => same collumn appear once in result

## Outer Join
    Table1 [Left, right, full] outer join table2 on ...
  
## SQL join vs DATA step match-merge
   DATA <font color=red>merge</font> required data to be <font color=red>sorted</font>.   
   - If all value match -> the same result (SQL only select, <font color=red>not creating</font> table)
   - If some values match -> MERGE will show the correct value for join column (alike columns in 2 tables) while SQL will show 1 column from SELECT clause. Need to use <font color=blue>COALESCE</font> function to show same results



# CHAPTER 4: JOIN USING SET OPERATOR

## Set operator
4 types:
* Except: <font color=red>unique</font> rows from 1st table that are not in 2nd table
* Intersect: <font color=red>unique</font> rows **common** to both tables. Take <font color=red>**Precedence**</font> if there are multiple SET operator
* Union: <font color=red>unique</font> rows from both tables
* Outer Union: **all** rows from both table

Set operation contains:
- 2 querries (2 SELECT)
- A SET operator
- **one or both** keywords *ALL* - *CORR*
    - ALL: **not** suppress duplicate rows (not allowed with <font color=BLUE>OUTER UNION</font>) -> faster
    - CORR: overlays column that have same name in both tables.
        - When used with <font color=blue>EXCEPT, INTERSECT OR UNION</font>, CORR suppress columns that are **not in** both tables. 
        - When use with <font color=blue>OUTER UNION</font>, overlay same-named cols and displays non-matching cols without overlaying.

Place a semicolon after the **last** SELECT statement only.

### Combine and Overlay Column

By default, EXCEPT, INTERSECT and UNION overlay columns based on **position** in the SELECT clause (must be **same data type**), ignore columns name. Result columns name inherit from the first non-missing-column-name table.

### Processing
Set operation that display only unique rows (EXCEPT, INTERSECT or UNION), PROC SQL makes 2 passes
    1. eliminate duplicate rows in tables
    2. Select rows that meet criteria (and overlay columns if requested)
For Union OUTER -> only step 2





## EXCEPT
**Select unique rows from 1st table that are not in 2nd table.**

Example Using ALL and CORR together
* Using CORR -> examine and display only *common* column in both table: **X**
* Using ALL -> dont remove duplicate in table Col1 (ignore first pass)-> row 2, 3 appear in output. -> eliminate row 1, 4, 5 because each on of these three rows match a value in Col2 for value of X.

**Note:** When using ALL with EXCEPT operator, a row in table1 need a  matching row in table2 to be removed. If 3 rows in table1 match 1 row in table 2, only first row is removed

## INTERSECT
**Select unique rows that are common to both tables, overlays column.**

If use one INTERSECT operator, order of SELECT doesn't matter. If use multiple stacked INTERSECT, order is important.
## UNION
**Select unique rows from both tables and overlays columns**
## UNION OUTER
**Select all rows and don't overlay columns**
- Can't use All keyword
- will return a *stair-like* table
- If using CORR, overlaid the same column and stair-like for other column (lots of null cells)

# CHAPTER 5: SUBQUERIES

## Subseting data using subqueries

**Subqueries** (AKA *nested queries* or *inner queries*): 
- Query that is nested as part of another query 
- Depend on the parent clause, a Subquery can return a signle value or set of values
    - If return a set of values, need one of the operators: Conditional **IN**, comparison operator modified by **ANY or ALL**, conditional **EXISTS**
    - Example: WHERE dob < ANY (SELECT ...)
    - SAS function Alternative: using **< MAX** is more efficient than using **< ANY**.
- Often used in <font color=blue>HAVING</font> or <font color=blue>WHERE</font> clause

2 Types of Subquery:
- Noncorrelated: self-contained, **independent** of outer query
- Correlated: Dependent subquery. Evaluated each row in the outer query and require more processing tiem that non-correlated subquery -> a join is more efficient alternative

## Create and Manage Views

**View**: a stored query expression that reads data values from *underlying* file, table or dbms. Contain only logic for accessing data, not the data -> **virtual table**

* Create VIEW: <font color=blue>CREATE VIEW</font> view_name AS SELECT column1, col2,... FROM table;
    - Assume that table in FROM clause is in the same libref as the VIEW
    - When submitted, do not execute but complie and store the SELECT statement
    - *.sas7bvew
* Describe view: <font color=blue>DESCRIBE VIEW </font> view_name;
    - If a view base on another view, can use **FEEDBACK** option to display how the underlying view is defined
* Manage view
    - Avoide <font color=blue>ORDER BY</font> in view def to reduce processing time. User can use ORDER BY in their query 
    - If the same data is retrive many times -> create a table better than view
    - Create view on stable table (not changing structure)
    - If the view resides in the same library as contributing table, using one-level table name in FROM clause
    - Using Libname-clause: CREATE VIEW lib1.view1 AS SELECT * FROM lib2.table1 <font color=blue>USING libname lib2 'sas-lib-2'</font>
* Updating view: can use INSERT, DELETE and UPDATE statement
    - Only update a single table through a view (can't be joined or linked or contain subquery)
    - Can update by col's alias - Can't update derived column
    - Can update view with WHERE - Can't update view contain ORDER BY or HAVING
    - **Doesn't change the definition of view** - Only change the data in underlying table
* Drop view: <font color=blue>DROP VIEW</font> view-name;

QUIZ:
*When PROC SQL remerges data, it displays a related message in the SAS log.*




# CHAPTER 6: ADVANCED SQL TECHNIQUES

## Data-driven macro variables with Proc SQL

* Create Macro variables with <font color=blue>INTO</font> clause
    - **SYNTAX** <font color=blue>SELECT col1, col2,... INTO: var1, var2,... FROM Table</font>
    - ONLY use INTO in outer query, not subquery
    - Can create multiple macro varialbes from query results
    - value use BEST8. format
    - Variable store only the **first row** of Select INTO
    - <font color=blue>%PUT varname=&varName</font> - print value in log. eg: <font color=blue>%put avgsal=&avgSal;</font> or <font color=blue>%put &=AvgSal;</font>
* Remove leading and trainling blankds: <font color=blue>INTO: varname TRIMMED</font>
* Concatenating values in macro variables
 - using <font color=blue>SEPARATED BY "delimiter"</font> after INTO clause to concatenate all values in column into one macro variables. **remove trailing or leading blank**
* Apply format: SELECT var1 format=$upcase23. into: var1macro.

## Accessing DBMS with SAS/ACCESS

* 2 ways to connect to dtabase:
     - SQL pass-through facility: send native SQL code to DBMS and run there, retrieve data to be used in PROC SQL
         + SYNTAX: <font color=blue>PROC SQL;
             CONNECT TO</font> DBMS-name <AS *alias*> (connection-options);
             QUIT;
         + DISCONNECT FROM DBMS-name | alias: end the connection. Sas will perfrom implicit disconnect when proc SQL terminate
         + return message generated by DBMS stored in macro var SQLXRC and SQLXMSG
         + pass the SQL code within parentheses after CONNECT TO clause
     - SAS/ACCESS LIBNAME statement+
         + convert PROC SQL syntax to native code then process
         + <font color=blue>LIBNAME</font> libref engine <engine-options>;
         + close dbms connection: <font color=blue>LIBNAME</font> libref clear;
       
## FedSQL procedure
- Is a SAS implementation of SQL 1999
- Provide common ANSI SQL syntax across all data sources: **vender-neutrala SQL dialect**
- compare with PROC SQL
    + 17 ANSI data types including BIGINT, BINARY(n), Char(), varchar(); can work with numbers larger than 15-16 digits. Proc SQL limited to SAS types: numeric or character
    + PROC FedSQL prove scalable, threaed, high-perf way to access, manage and share data in multiple sources. PROC SQL is mostly single-threaded
    + FedSQL: ANSI 3, few SAS enhancement; SQL: ANSI 2, many non-ANSI standard SAS enhancements
- PROC FedSQL requires a connection string in LIBNAME statment -> reference DBMS table like SAS library
- LIMIT clause: place after FROM, to limit the output rows - similar to OUTOBS= in proc SQL
- PUT function: FedSQL can't use format in SELECT clause like SQL. <font color=blue> select SalesRep,
          put(Sales1, dollar10.2) as Sales1</font>
- FedSQL: don't provide any options or support system options except SAS invocation options (NOPRINT and NUMBER).
    
    
    


# CHAPTER 7: MACRO VARIABLES
## Intro to macro variables
Macro variables enable you to substitue text in SAS program, provide info abt: OS, SAS session, text strings => dynamic programming
2 types:
    - Automatic macro vars: provide by SAS
        + FIXED VALUE are set when SAS start: SYSDATE[9], SYSDAY (weekday), SYSTIME, SYSENV, SYSSCP (OS: WIN or LINUX), SYSVER, SYSJOBID
        + Value chang based on submitted SAS statements: SYSLAST (most recent created SAS dataset in form `lib.name`), SYSPARM, SYSERR
    - User-define macro vars
One macro var contains 1 text string, indepdent from SAS dataset. Can be referenced **anywhere** in SAS program except within *datalines*.

*Example:* `%let year=2012;
        data emp&year; 
        if year(hiredate)=&year; run;`

<font color=red>Note:</font> Don't need double quote for string literal. If using ", it will store in the value. Leading blanks are stipped from assigned values.

## SAS processing macro

Code -> input stack -> word scanner -> transforms text into components called *tokens*
4 types of **token**:
- Name: max 32 chars, begin with letter or underscore, include letter, digit and underscore
- Number: SAS floating-point numeric value (include leading sign, exponent indicator). Date, time and datetime is number token.
- Special: any char that have special meaning in SAS
- Literal: 
Complier perform SYNTAX check at the end of each statement, will run global stamement immediately.

**Macro triggers**
% or & follow my a name token (%LET or &Amt).

<font color=red>Note:</font> word scanner doesn't recognize macro triggers in single quotation -> if you want to use macro in string literal, use double quote "

## Debug SAS macro variables

you can't see value of macro that complier receives -> use SYMBOLGEN option or <font color=blue>%PUT</font> statement. Use %SYMDEL to remove macro variables from the global symbol table.

`OPTIONS NOSYMBOLGEN | SYMBOLGEN;`
With SYMBOLGEN on, each macro var appear in log with name and value.

`%put CARTYPE=&CarType` or `%put &=cartype` return in log `CARTYPE=SEDAN`

%PUT optional arguments:
* list all macro vars: `_ALL_`
* list auto vars : `_AUTOMATIC_`
* User-define global var: `_GLOBAL_`
* `_LOCAL_`
* `_USER_` (GLOBAL + LOCAL)

**%SYMDEL Statement**

`%SYMDEL macro-var-1 <...macro-var-n> <nowarn>`

can't use SAS variable list or macro expression that generate SAS var list in SYMDEL.

Default: warning if var not exists -> use <font color=blue>NOWARN</font> option

## Delimiting macro variable references

Using a **period** to delimit the reference -> force the end or reference rather than using space of special token.

`
%let CarType=Wagon;
%let lib=sashelp; 
title "&CarType.s from the &lib..CARS Table";` return "Wagons from the sashelp.CARS Table"



## CHAPTER 8: Storing and Processing Text
Macro function use to manipulate text in a SAS program.
## Macro character function
Similar to data step function. Part of macro language. Cannot use SAS data set varibles in macro function
* %UPCASE, %LOWCASE or %QLOWCASE (text)
* %SUBSTR(str,start_pos, `<length>`) (if omit length -> from start_pos to end. If string contain **special char** or **mnemonic operator** -> use %QSUBSTR.
    - if start_pos > len(str) -> return null and warning message in log.
    - if length +start_pos > len(str) -> return from start_pos to end + waning message
* extract a word -
    - %INDEX(source, str): return position of 1st occurrenc of str in source, return 0 when no match
    - %SCAN (arg, n `<,charlist<,modifiers>>`): search argument and return n*th* word based on all characters in charlist as delimiters. If using K modifier -> all chars not in charlist as dilimiters
* %LENGTH(str) : return length of str (0 if str is null)
## The %SYSFUNC function
* Execute SAS function or user-define function in macro facility.
* All args in DATA step function within %SYSFUNC must be seperated by commas. Can't use arg list preceded by the word <font color=blue>OF</font>
* Can take floating-point number as argument
* instead of INPUT and PUT, use INPUTN, INPUTC, PUTN and PUTC.
* <font color=red>caution</font> value returned by SAS function might be truncated
* Syntax: %SYSFUNC(function(arg1<...argn><,format>)
    - function cannot be a macro function
    - arg can be macro variable reference of text expression
    - format: apply to the result of function
* Eg: `%let string=william SMITH; %put %sysfunc(propcase(&string));` return `William Smith`

## The %EVAL function

* evalue arithmetic and logcal expression using **interger arithmetic**
* if division result in fraction -> truncated as integer
* if at least 1 operand can't be interpreted as number -> treat the expression as logical. otherwise trated as arithmetic
* 0 is false, other number is true

* **%SYSEVALF** perform floating-point evaluation
    - return value in BEST32. format, always text
    - only macro function that can evaluate logical expression that contain floating-point or missing value
    - %SYSEVALF(expression <,conversion-type>): conversion-type to convert value to other type
        + BOOLEAN: 0 if expression is 0 or missing, 1 other wise.
        + CEIL, FLOOR, INTEGER: if exp is positive, INTEGER return same value as FLOOR.

## Macro function to Mask special characters

* Using single or double quotation
* %STR function:
    - use to mask tokens during compilation
    - **enable macro triggers to work normally**
    - preserves leading and trailing blanks.
    - when mask tokens that typically appear in pairs [ `' " ( )`) ], need additional step: precede the token with % as follow:  `%STR(%')`
* %NRSTR:
    - NR is no resolution
    - hide the meaning of & or %
    - perform same as %STR except it also mask macro triggers (& and %)
* %SUPERQ
    - locate the macro variable named in its argument and retrieve the masked value of that macro variable without permitting any resolution to occur -> ensures macro processor never attempt to resolve macro triggers in text
    - only quoting function prevent the resolution or macro variable
    - accept only name of macro variable as arg without &
* %BQUOTE: mask at execution time, doesn't require a % before single-of-pair character like %STR
* MACRO Q functions: process text and return value as **quoted** text -> output never be mistaken for macro code
    - %QUPCASE
    - %QSUBSTR
    - %QSCAN
    - %QSYSFUNC

In [2]:
%let a=one;
%let b=two;
%let c=%nrstr(&a &b);

%put C: &c;
%put With SUBSTR: %substr(&c,1,2);
%put With QSUBSTR: %qsubstr(&c,1,2);

## Create macro variables during PROC SQL step execution
* using INTO clause and NOPRINT option to create or update macro var.
Modifier: `<TRIMMED> <SEPARATED BY delimiter>`.
* Macro var name a precede by a colon (:)
* <font color=blue>&SQLOBS</font> hold the no. of rows read from proc SQL
* If don't know the no. or rows: using numbered macro var: `...into :courseid1- , :place1- FROM ...`. (courseid1, courseid2 ...)

## Creating Macro Variables during DATA Step Execution

* CALL SYMPUTX routine: 
    - assign value in data step to macro var (or create it)
    - remove leading and trailing blanks from both argument
    - CALL SYMPUTX(macro-var, value <,symbol-table>): symbol-table: where to store macro variable: G (global), L (Local), F (any)
    - use BEST12. formate to conver number to char
* PUT(source, format) function:
    - control the number-to-char conversion
    
## Referencing macro variables Indirectly

Rule:
* When multiple apersands or percent signs precede a name token, macro resolves && to &
* To reference macro var indirectly, macro processor scan and resolve from left to right.

# CHAPTER 9 MACRO PROGRAM

## Define and call macro

` %MACRO macro-name;`

`body`

`%MEND <macro-name>;`

### Compile macro
Submit macro defination to compile it -> word scanner divide macro into token -> send to macro processor:
- Check macro language syntax (non-macro language syntax are check when **execute**) -> write syntax error message to SAS log
- Store compiled macro in SAS catalog entry if no syntax error. By default, catalog named work.SASMACR is opened, catalog entry named Macro-name.Macro is created.

### OPTIONS MCOMPILENOTE = *NONE | NOAUTOCALL | ALL*;

By default, no note is written to the log if a macro compiles successfully. Using MCOMPILENOTE option to change that behavior:
- NONE: default value, no note in log
- NOAUTOCALL: log note for macro compilation except autocall macros
- ALL: log for all macro compilations

### Calling macro
Requirements:
- percent sign (%) before the name of macro
- call anywhere in a program except into DATALINES
- **don't require semicolon** because it is not a SAS statement

example: `%printit`

### Macro execution
- macro processor receive macro name with %  from word scanner
- search catalog (work.SASmacr by default) for an entry macro-name.macro
- Execute macro language statement
- send remaining text to input stack for word scanning
- suspends macro execution when compiler receive a **global SAS statement** or **step boudary**.
- resume execution after SAS code executes

Macro call is processed before any SAS language statments (DATA steps). During macro execution, macro processor can communicate with:
- global and local symbol table
- input stack


## Macro parameter

* **Position parameters:**
    - When define just list the name of paremeters separated by commas.
    
    `%macro printdsn(dsn,vars);`
    - To call the macro, put the list of value in same order. Value in the list can be null, text, macro variable references or macro calls.
* **Keyword parameters:**
    - name of parameter with equal sign: 
    
    `%macro printdsn(dsn=certadv.courses,vars=course_code course_title days);`
    - Default value in macro definition, nullable
    - Order of keyword in calling macro is not important
* **Mixed parameters:**   
    - Positional parameters **must be listed before** any keyword parameters.

    `%macro printdsn(dsn, vars=course_title course_code days);`

## Variable scope
- **Global** macro variabble:
    - can be created any time
    - persist for duration of SAS session
    - be referenced any where (except DATALINES)
    - to delete, must explicitly using %SYMDEL statement
- **Local** macro variables:
    - are created only during execution of a macro program
    - persist during macro execution -> referenced only while macro is executing
    - when macro terminate execution -> local symbol table is deleted automatically
    
*If a same name variable exist in both local and global symbol table:*
- can't call the global because macro processor take local value first
- if create a macro var in DATA step with SYMPUT routine, can't call that until reach step boundary.

- Can create global macro variable using:
    - %LET: outside of macro def
    - DATA step with SYMPUT or SYMPUTX routine
    - SELECT INTO in PROC SQL
    - %GLOBAL statement:  `%GLOBAL var1 var2 var3;`
        - create 1 or more macro vars in the global symbol table, assigns null values
        - can be used inside or outside of macro definition
        - can create READONLY macro var with initial value
        - has no effect on vars that already in global symbol table

- Local symbol table: created when a macro include a parameter list is called or a local var is created. Contains macro vars:
    - can be created and initialized at macro invocation (by parameters)
    - can be created or updated during macro execution
    - can be referenced within the macro

- Can create local macro vars using:
    - %LET within macro def
    - DATA step with SYMPUT or SYMPUTX
    - SELECT INTO in PROC SQL **within MACRO def**
    - %LOCAL statement:   `
        - can appear only inside macro def
        - create 1 or more var in local table, assign null 
        - no effect on vars already in local symbol table
    - <font color=red>Note</font>
        - SYMPUT can create local var only local symbol table exist and no same-name var in global. If local symbol table is not exists, value assigned to global var.
        - SYMPUTX call provide 3rd arg: symbol-table. If local table isn't exists but specified in SYMPUTX -> create local table and macro variable
     
## Debug macro

* OPTION MPRINT | NOMPRINT:
    - display text by macro execution with prefix MPRINT(macro-name)
    - can output to file using MFILE option
* Comments in Macro Program: /* this is comment */;

## Conditional Processing

`%IF expression %THEN text; <%ELSE text;>`
expression can be any valid expressionthat resolves to an integer. 0 False, other number is True

* %DO - %END WITH %IF - %THEN: a section of code processed depend on the %IF condition

* OPTION MLOGIC | NOMLOGIC; use to debug macro, print log message for begin and end of macro, value of parameter, macro statement and %if condition

## Iterative Processing

Syntax 
`%DO index-variable=start %TO stop <%BY increment>;
    text
%END;`


# CHAPTER 10: ADVANCED MACRO TECHNIQUES

## Store macro in external files

Save macro to external file to store them permanently. %INCLUDE to insert code from external file. If file contain macro def -> compile when %INCLUDE is submitted -> macro is available for calling. %INCLUDE is global SAS statement.

`%INCLUDE file-specification </SOURCE2>;` SOURCE2 OPTION to write included code in log

## Session compiled Macro - AUTOCALL library

macro def submitted -> compile and store in temp SAS catalog WORK.sasmacr (macro-name.macro) -> known as *session compiled macro*: only available for that session and be **deleted** at the end of session
To use macro in different section -> using AUTOCALL facility

* can store macro def in predefined source libraries -> autocall lib. if macro def in autocall lib, **don't need** to submit or include def before calling it
* filename: should be all lowercase, ext .sas, same name as macro
* autocall lib: a directory or a SAS catalog
* To access AUTOCALL lib, using 2 options: MAUTOSOUCE is default (|NOMAUTOSOURCE), SASAUTOS=(lib1,lib2...) must be set to identify the location of lib(s)

## Data-driven macro calls: using<font color=blue> DOSUBL DATA</font> step function

The DOSUBL function imports macro variables from the calling environment, and exports macro variables back to the calling environment. return 0 if code was able to exec, non-zero if not.

`data _null_;
   set certadv.FlightCrewNew;
   rc=dosubl(cats('%DelayReport(',empid,')'));
run;`





# CHAPTER 11: ARRAYS
Array provides a way to reference a group of columns for processing in DATA step.
## 1-D array
* Define: `ARRAY array-name [number-of-elements] <array-elements>;` All elements must be in same data types (all num or all char) <font color=red>Avoid</font> using SAS function name for array name -> array works but can't use the function
    * Number of Elements
        * The () {} [] is equivalent to define number of elements: health(5) ~ health{5} ~ health[5]
        * use `*`: auto define number of elements - must list elenments
        * can use range when define:   `array health[2:4] Temp Pulse Resp;`
    * Array elements:
        - optional unless use * for no. of elements
        - order of elements is not related to PDV
        - can use column list:
            - Numbered range: x1-xn
            - Name rage list (as appear in PDV): `x--b`, `x-numberic-b`, `x-character-b`
            - name prefix list: REV: (all cols begin with REV such as REVJAN, REVFEB, REVMAR)
            - Special name list: `_NUMERIC_`, `_CHARACTER_`, `_ALL_` (all cols in same type)
* Do statement: `DO i = 1 TO number-of-elements;`
                `array-name[i] ...`
                `end;
<font color=red>Unknown No. of elements</font>: using DIM(array-name)
* Compilation and processing:
    - syntax error in <font color=blue>ARRAY</font> statement detect in compilation phase -> ARRAY statement is **compile-time statement** -> ignore in execution phase
    - array name not included in PDV because it's not variable, just reference to variables
    - at begining `_N_` set to 1, other variable set to missing
    - load 1st to PDV, process the loop
    - output to dataset first row with new var 
    - process 2nd row...
    
## Expanding use of 1-D array

* Array column type: default: numeric, length 8
     - Using $ and length to change to character: 
    
    `array States[50] $12 State1-State50;`  
    `array States[50] $ State1-State50;` default length 8
* Lower bound: default **1**
    - Can change to year: 
    
    `array years[2011:2016] yr2011-yr2016;` when call:  do i=2011 to 2016;...
* Initial value: 

    `array goal[4] g1 g2 g3 g4 (initial values);` separated by comma or blank in **parentheses**
    
**Note** : It is possible to assign initial value to array without specifying each array elenment: ` array Var[4] (1 2 3 4);`

* `_TEMPORARY_`: will delete in final table, retain between iteration of data step

## 2-D array

* array[a,b]: a is row, b correspond to col. **However**, in PDF, values specified as columns.
* Good for table loockup



# CHAPTER 12: HASH OBJECTS

## Declare hash objects
* Hash object is an in-memory table contains key and data components. When key values are found, data values copied to PDV
* Only use in DATA step, delete after DATA step over.
* Create at execution (array create when compile) -> size is dynamic
* **KEY** component: 
    - 1 or more cols
    - *can be numeric, char or both*
    - *key components must be defined as PDV cols*, **don't** need to be **sorted**
    - Unique
* **DATA** component:
    - can have multiple components per 1 row of key
    - *can be numeric, char or both*
    - *must be defined as PDV cols*
* Declare a hash object: DECLARE statment create an instance of data, init data for object. 
 
        `DECLARE object object-name(<argument_tag-1:value-1, …>);`
    - object: hash or hiter (iterator)
    - argument_tag-1: DATASET: ‘data-set-name <(data-set-option)>’. some options WHERE=, DROP=, KEEP=, RENAME=, or OBS=.
    
    `declare hash States(dataset:'work.population_usstates                   (where=(StatePop2017>20000000))');`
    - If duplicate key values -> use MJULTIDATA argument:
    
    `declare hash ContName(MULTIDATA:'YES');`

## Defining and using hash object

* **Methods**: 
    - add( <key:key-value, data:data-value,...>): add data in PDV to corresponding key and data component within hash obj
    - 3 define methods:
        - definekey(), definedata(): a col can be numeric or char (char col can be literal in quote, char col or char expression). can have multiple definekey or definedata for 1 hash object.
        - definedone(): load the hash form data table
    - find: search hash obj for curret key value: return 0 if found -> copy to PDV
    
         - Syntax: number of key:value pairs depent on definekey method
         
         `obj.find(<key:value,Key2:value2>);` 
         - if PDV vol has the same name as key -> don't need to include arg: 
         
         `rc = contName.find();` 
         rc = 0 if found, non-zero if not
    - obj.output: create one or more dataset contain data in hash obj.
        - using ORDERED option in <font color=blue>DECLARE</font> statement to control output (only if output is use, ORDERED is not controlling the sort of the hash obj)
        
        `declare hash ContName (ordered:'descending');`
        - END= in SET statement to determine the end of data (temp cols in PDV, not in output table). Value of end=1 is last row
        
        `set certadv.country(keep=ContinentID Country CountryName) end=lastrow;`
        
        `if lastrow=1 then ContName.output(dataset: 'work.contname');`

## Hash iterator object (HITER)
an ordered view of hash object, allow you to retrieve hash obj in either forward or reverse key order -> declare a hash obj before hiter

`declare hash Customer (dataset: 'certadv.orderfact', ordered: 'descending');`

`...statements..`

`declare hiter C('Customer');`

* Methods on hash iterator obj (same like SQL cursor)
    - FIRST
    - LAST
    - NEXT
    - PREV


    
    

# CHAPTER 13: SAS UTILITY PROC

## Picture formats with FORMAT proc
create a custom format for **numeric** value

`PICTURE phonepix OTHER= '(999)999-9999';`

* Using range value:  
    proc format;
       picture rainamt
           0-2='9.99 slight'
           2<-4='9.99 moderate'
           4<-<10='9.99 heavy'
           other='999 check value';
        run;
* Create custom date, time and datetime format using directive: specify data type
    PICTURE format-name (DEFAULT=length)
          value-or-range-1=‘directives’
          (DATATYPE=DATE|TIME|DATETIME);
       RUN;
      default length is 8
    - date directive
| Directive |  Resulst |
|-----------|----------|
|   %A      | Wednesday|
|   %a      | Wed|
|   %d      | 2 or 11 |
|   %0d     | 02 |
|   %B      | January|
|   %3B      | Jan|
|   %m      | 4 or 12 (month number)|
|   %0m      | 04 (month number)|
|   %Y      | 2019|
|   %0y      | 19|
    
    - hour directive
        
| Directive |  Resulst |
|-----------|----------|
|   %H      | 21 (24-hour)|
|   %0H     | 21 (24-hour with leading zero|
|   %I      | 9 (12-hour) |
|   %0I     | 09 |
|   %M      | Minutes in 1 or 2 digit (13)|
|   %0M      | 13 (minutes with leading 0)|
|   %S %0S      | 4 or 04 second|
|   %p      | AM or PM|
    
    - example 
    proc format;
        picture mydate (default=10)
           low-high='%0d-%3b%Y' (datatype=date);
    run;
    
* Customer numeric format
    - Syntax : 
        PICTURE format-name (ROUND DEFAULT=length)
          value-or-range-1=‘digit selectors with non-numeric characters’
          (MULT | MULTIPLIER=n PREFIX=‘prefix’);

## create function with PROC FCMP

* create function with DATA step syntax:

    PROC FCMP OUTLIB=libref.table.package;
        FUNCTION function-name(arguments)<$> <length>;
        … programming statements.. .
        RETURN(expression);
        ENDSUB;
    QUIT;

* OPTIONS CMPLIB=libref.table: specify the table to find function (this is a global option)


# CHAPTER 14: ADVANCED FUNCTION

## LAG, COUNT, FIND
* LAG function: using in data step (with a by statement to define the queue)
    data work.stockprev;
       set certadv.Stock6Mon(drop=Close);
       **by date;**
       FirstPrevDay=**lag1(Open);**
       SecondPrevDay=lag2(Open);
       ThirdPrevDay=lag3(Open);
    run;
* 3 Count function
    - Count: `COUNT(string, substring <,modifiers>)` count number of time a substring appear within a string.
        - modifier: 
            - t: trim leading or trailing blank
            - i: ignore case
    - CountC: `COUNTC(string, character-list <,modifiers>)` count number of char appear in string. using v modifiler to count not appear
    - countW: `COUNTW(string <,delimiters><,modifiers>)` count the number of word in string using delimiter list
* 3 Find function:
    - FIND (string, substring<, modifiers><, start-position>);
    - FINDC (string, character-list <, modifiers> <, start-position>);
    - FINDW (string, word<, delimiters><, modifiers> <, start-position>); :using e modifier to return the word count instead of character

## Perl RegEx

* Metacharacter

| Char |  description |
|-----------|----------|
|   (...)     | Grouping|
|   \d     | a digit|
|   \D      | a non-digit |
| [...]  | Group of character |
| ^ | begin of string|
|$ | end of string|
| * | previous char 0 or more times|
| + | prev char 1 or more|
| ? | prev char 0 or 1|

* **PRXMATCH** (Perl-regular-expression, source); return the matching position in source
    - can use **a constant** or **a column** or **partern identifier**  as expression 
    - **a column**: add `o` or `O` in the end of expression to prevent complilation for the whole column
    - PRXPARSE: 
    
    `Exp='/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/o';`
    `Pid=prxparse(Exp);`    
    `Loc=prxmatch(Pid,PhoneNumber);`
* **PRXCHANGE**(Perl-regular-expression, times, source):
    - expression start with s, end with modifier (i for ignore case). `s/regex/newetext/i`
    - times: number of change will be made (using -1 to change every matching text)
    - example: 
    
    `prxchange('s/(FB)/Facebook/i',-1,Social);`
    
    - **capture buffer**: use number to identify regex group, can use the number with $ to replace. example
    
    `LatLong=prxchange('s/(-?\d+\.\d*)(@)(-?\d+\.\d*)/$3$2$1/', -1, LongLat);`