## GWU STAT 4197/STAT 6197
## Week 3 SAS Code Examples (Part 1): Working with Formats and Informats
#### (Source: SAS Documentation for Code Explanation)

* User-Defined Formats

    * Creating, Storing, Accessing, and Maintaining Formats 
    * Grouping Data Values Using Formats
    * Removing Formats, and Labels from SAS Data Sets
    

* User-Defined Informats



### Defining character formats for discreate character values
#### The FORMAT procedure enables you to define your own formats for variable values. Formats determine how variable values are printed in the PROC FREQ output below. Note the following:
* Character format name (user-defined)
* Character data values
* labels
* Format applied to the PROC FREQ step

In [None]:
* Ex1_Numeric_Character_Formats.sas (Part 1);
Title 'Format for character values';
options nocenter nodate nosource;
proc format;
value $regionfmt
    'AFR' = 'Africa'
    'AMR' = 'Americas'
    'EUR' = 'Europe'
    'EMR'  ='Eastern Mediterranean'
    'SEAR' = 'South-East Asia'
    'WPR' = 'Western Pacific';
 run;     
proc freq data=sashelp.demographics; 
  tables region; 
  format region $regionfmt.;
run;


### Defining numeric formats for ranges of numeric data values
#### The FORMAT procedure enables you to define your own formats for ranges of numeric data values. Formats determine how variable values are printed in PROC FREQ output below. Note the following:
* Numeric format name (user-defined)
* Ranges of numeric data values with keywords LOW and OTHER
* labels
* Format applied to the PROC FREQ step

The special keyword LOW is used to define the lowest data value. 
Because it is a numeric format, LOW does not format missing values. In contrast, for character 
formats, LOW includes missing or blank values.


In [3]:
*Ex1_Numeric_Character_Formats.sas (Part 2);
options nocenter nodate nosource;
proc format;
  value numfmt
           Low - <0  = "Nonresponse (excluding missing values)"
           0         ="Never"
           1-5       = "Within past 5 years"
           6-High    = "More than 5 years ago"
           .         ="Missing" ;
   value $charfmt
           Low-<'0'  = "Nonresponse (including missing values)"
          '0'        = "Never"
          '1'-'5'    = "Within past 5 years"
          '6'-High   = "More than 5 years ago" ; 
 run;
data work.have;
input id $ 1 Colonoscopy 3-4 c_Colonoscopy $6-7;
datalines;
A -1 -1 
B   
C 3   3
D -9 -9
F 3  3
G 5  5
H 6  6 
I    
J 7  7
;
proc freq data=work.have;
tables colonoscopy c_colonoscopy /nopercent nocum missing;
Format colonoscopy numfmt. c_colonoscopy $charfmt.;
run;

Colonoscopy,Frequency
Missing,2
Nonresponse (excluding missing values),2
Within past 5 years,3
More than 5 years ago,2

c_Colonoscopy,Frequency
Nonresponse (including missing values),4
Within past 5 years,3
More than 5 years ago,2


### Creating Formats for Overlapping Ranges 
#### NOTSORTED and MULTILABEL Options in the VALUE Statement with PROC FORMAT
(e.g., month-to-date, quarter-to-date, and year-to-date).


* In the VALUE statement with the PROC FORMAT,  the multilabel option maps values to more than one label.

* The NOTSORTED option is used on the VALUE statement to maintain the desired order.

"The MLF option is used in the reporting procedure to allow for processing of formats with multiple labels. 
When we want to use this format in a procedure we need to specify the appropriate options for both the 
MULTILABEL and NOTSORTED options." 

Using SAS® Formats:  So Much More than “M” = “Male” by Pete Lund - 2011 SAS Global Forum 




### Create an Example Data Set
#### for male and female individuals between Jan 1-Dec 31, 2016 (200 rows)
 

In [1]:
*Ex25_date_group.sas (Part 1);
data work.Have (drop=rownum date_a date_b);
call streaminit(123);
  date_a="01JAN2016"d;
  date_b="31DEC2016"d;
    do rownum = 1 to 100;
    do gender = "Female", "Male"; 
    Date_p=date_a + floorz((date_b-date_a) * rand("uniform"));
    expense = floorz(1000*rand("Uniform"));
    format date_p date9.;
    output;
  end;
 end;
run;
proc print data=work.Have (obs=5); run;

Obs,gender,Date_p,expense
1,Female,31JUL2016,35
2,Male,29JAN2016,387
3,Female,30APR2016,361
4,Male,03MAY2016,169
5,Female,21JAN2016,79


### Calculate expenses for overlapping date ranges

In [3]:
*Ex25_date_group.sas (Part 2);
proc format;
value MultiDates (notsorted multilabel)
'1sep2016'd - '30sep2016'd = 'MTD'
'1jul2016'd - '30sep2016'd = 'QTD'  
'1jan2016'd - '31Dec2016'd = 'YTD' 
; 
run;
proc tabulate data=work.Have; 
  class date_p gender / mlf preloadfmt order=data;
  var expense;
  format date_p MultiDates.;
  tables (gender all),  date_p*expense*sum*f=dollar10.;
run; 


Unnamed: 0_level_0,Date_p,Date_p,Date_p
Unnamed: 0_level_1,MTD,QTD,YTD
Unnamed: 0_level_2,expense,expense,expense
Unnamed: 0_level_3,Sum,Sum,Sum
gender,"$6,649","$14,019","$48,301"
Female,"$6,649","$14,019","$48,301"
Male,"$3,221","$9,849","$51,323"
All,"$9,870","$23,868","$99,624"


#### Nested Format
A “nested” format is simply one where one or more of the labels is another SAS format. It can be either a SAS-supplied or user-defined format. 
One use for nested formats is to subset values out of larger ranges.

In [6]:
*Ex2_Nested_Formats.sas;
proc format;
value date_grp_fmt
  low-'03jul1995'd          = 'Pre July 4th 1995'
  '04jul1995'd-'31jul1995'd = [mmddyy8.]
  '01aug1995'd-high         = 'Aug 1-Dec 31, 1995';
  
value sales_fmt
  low-<5000 = 'Less than $5,000'
  5000-9999 = '$5,000-<$10,000'
  10000-high = [dollar12.2];
  run;
  title 'Nested formats';
  proc freq data=sashelp.mdv;
  tables shipdate sales93;
  format shipdate date_grp_fmt.
    sales93 sales_fmt.;
  run;


SHIPDATE,Frequency,Percent,Cumulative Frequency,Cumulative Percent
Pre July 4th 1995,66,51.56,66,51.56
07/09/95,2,1.56,68,53.13
07/11/95,1,0.78,69,53.91
07/17/95,1,0.78,70,54.69
07/18/95,2,1.56,72,56.25
07/23/95,1,0.78,73,57.03
07/26/95,1,0.78,74,57.81
07/29/95,1,0.78,75,58.59
"Aug 1-Dec 31, 1995",53,41.41,128,100.0

SALES93,Frequency,Percent,Cumulative Frequency,Cumulative Percent
"Less than $5,000",123,96.09,123,96.09
"$5,000-<$10,000",3,2.34,126,98.44
"$12,063.00",1,0.78,127,99.22
"$15,611.00",1,0.78,128,100.0


### PROC FORMAT CNTLIN= Option

Creating a User-Defined Format from a SAS Data Set

If there is a long list of variable values and if the values and 
their labels are available in an electronic file 
(ASCII, EXCEL or data base mode), the file can be read into SAS to 
create a SAS data set. There is no need to type this long list under the PROC FORMAT VALUE statement!  

Requirements: The data set must have three required columns–
FMTNAME, START, and LABEL.  The data set can have the optional 
column called the TYPE column with values of ‘C’ for the character variable 
and ‘N’ for the numeric variable.

The CNTLIN=input-control-SAS-data-set (as shown in line 42 below) 
specifies a SAS data set from which PROC FORMAT builds INFORMATs. 
Note that CNTLIN= builds FORMATS and INFORMATS without using a VALUE, 
PICTURE, or INVALUE statement. 


In [8]:
*Ex4_Value_cntlin_compared.sas (Part 1);
proc format ;
value $xcausesfmt A00 = "Cholera "
                  A00.0 = "Cholera due to Vibrio cholerae 01, biovar cholerae" 
                  A00.1= "Cholera due to Vibrio cholerae 01, biovar eltor" 
                  A01.1= "Paratyphoid fever A" 
                  A01.2= "Paratyphoid fever B" 
                  A01.3= "Paratyphoid fever C" 
                  A01.4= "Paratyphoid fever, unspecified" 
                  A02= "Other salmonella infections" 
                  A02.0= "Salmonella enteritis" 
                  A02.1= "Salmonella septicaemia" ;
data have1; 
input id $ cause_dth_code $ @@;
format cause_dth_code $xcausesfmt.; 
datalines; 
12345 A01.4 23456 A01.3 34567 A02.0
; 
title "Format created using the PROC FORMAT VALUE statement";
proc print data=have1 noobs; run;
title;

id,cause_dth_code
12345,"Paratyphoid fever, unspecified"
23456,Paratyphoid fever C
34567,Salmonella enteritis


In [9]:
*Ex4_Value_cntlin_compared.sas (Part 2);
data causes_of_death;
 retain FMTNAME '$causesfmt' type 'C';;
input START $ LABEL & $50.;
datalines;
A00    Cholera 
A00.0  Cholera due to Vibrio cholerae 01, biovar cholerae 
A00.1  Cholera due to Vibrio cholerae 01, biovar eltor 
A01.1  Paratyphoid fever A 
A01.2  Paratyphoid fever B 
A01.3  Paratyphoid fever C 
A01.4  Paratyphoid fever, unspecified 
A02    Other salmonella infections 
A02.0  Salmonella enteritis 
A02.1  Salmonella septicaemia
;

proc sort data=causes_of_death
  out=causes_of_death nodupkey;
  by START;
run;
proc format cntlin=causes_of_death;
run;

data have2; 
input id $ cause_dth_code $ @@;
format cause_dth_code $causesfmt.; 
datalines; 
12345 A01.4 23456 A01.3 34567 A02.0
; 
title "Format created using the PROC FORMAT cntlin= optiion";
proc print data=have2; run;
title;

Obs,id,cause_dth_code
1,12345,"Paratyphoid fever, unspecified"
2,23456,Paratyphoid fever C
3,34567,Salmonella enteritis


### PROC FORMAT - to count invalid dates in a SAS data set

* Creating a user-defined format (DATE_FMT) can be viewed as a table look-up that uses 1-to-1 or many-to-1 mappings of values. 

* In the INPUT statement below, the ?? format modifier for the S_DATE variable suppresses the invalid data message and, in addition, prevents the automatic variable _ERROR_ from being set to 1 when invalid data are read.


* A temporary format (DATE_FNT) is applied to the variable S_DATE in PROC FREQ step.
(This is an alternative to the IF-THEN-ELSE code in data step.)



[See SAS® Documentation for details]


In [10]:
*Ex6_Finding_Invalids.sas (Part 1);
options nodate nonumber;
PROC FORMAT;
VALUE date_fmt 
   LOW-HIGH = 'valid date'
    other='invalid date';  
run;
DATA work.HAVE;
infile datalines firstobs=2;
input Name $ 1-7 
      @8 s_date ?? mmddyy10.
      @8 s_date_ch $10.;
format s_date mmddyy10. ;
datalines;
12345678901234567890
Alfred 04/22/2005
Alice  01/15/2005
Barbara12/20/2004
Carol  10/29/1999
Henry  02/31/2007
Philip 02/31/2005
Ronald 02/29/2006
;
 Title 'Table lookup using a user-defined format';
proc freq DATA=work.Have;
  table s_date /missing ;
  format S_date date_fmt.;
RUN;

Title "Listing of S_DATE_CH values (invalid dates)"; 
proc print DATA=HAVE noobs;
  var Name s_date_ch;
  where S_date= .;
RUN;
title;

s_date,Frequency,Percent,Cumulative Frequency,Cumulative Percent
invalid date,3,42.86,3,42.86
valid date,4,57.14,7,100.0

Name,s_date_ch
Henry,02/31/2007
Philip,02/31/2005
Ronald,02/29/2006


### The INVALUE Statement in PROC FORMAT
* reads and converts the raw data values using the INVALUE statement.  

* is used here to create an INFORMAT to convert the character string into a numeric variable while reading the data into SAS.

In [11]:
*Ex8_Invalue_statement.sas (Part 1);
options nocenter nonumber nodate nosource;
proc format ;
    invalue scorefmt (upcase just)
            'A'=95  'B'=84  
            'C'=79  'D'=60;
run;
data Grade_data1;
  input @1 id $4.  @6 grade scorefmt2. ;
datalines;
S001 A 
S002 D 
S003 B 
S004 B
S005  D 
S006 C 
S007 c 
;
title 'PROC FORMAT INVALUE Invalue Statement with UPCASE and JUST Options (Part 1)';
proc print data=Grade_data1 noobs ; run;
title;

id,grade
S001,95
S002,60
S003,84
S004,84
S005,60
S006,79
S007,79


### INVALUE Statement in PROC FORMAT
You can use the INVALUE statement to define an INFORMAT, which you can use in the INPUT statemen.
* \_SAME_  and OTHER keywords.

In [12]:
*Ex8_Invalue_statement.sas (Part 2);
options nocenter nonumber nodate nosource;
proc format ;
    invalue gpafmt (upcase)
            2.5-4.0 = _SAME_
            'B' = 3.0
            OTHER=. 
;
data GPA_data;
  input id $  GPA :gpafmt3. @@;
datalines;
S001 2.8 S002 3.7 S003 4.0 S004 B 
S005 2.7 S006 3.2 S007 . 
;
title 'PROC FORMAT INVALUE Statement with the UPCASE Option';
title2 'and _SAME_ and OTHER Keywords (Part 2)';
proc print data=GPA_data noobs ;
run;
title;

id,GPA
S001,2.8
S002,3.7
S003,4.0
S004,3.0
S005,2.7
S006,3.2
S007,.


### The INVALUE Statement in PROC FORMAT - Another Example

In [13]:
*Ex8_Invalue_statement.sas (Part 3);
options nocenter nonumber nodate nosource;
proc format library=work;                                    
    value respfmt_1x                            
    0="TOTAL"                                              
    1="YES"                                                
    2="NO"   
    -8="DON'T KNOW" 
     -7="REFUSED"
    -9="NOT ASCERTAINED"                                        
    -1="INAPPLICABLE" ;

  value respfmt_2x                            
    0="TOTAL"                                              
    1="YES"                                                
    2="NO"   
    LOW- <0="INVALID"                   
    ;
  run;  
data Fictitious_Data; 
   infile datalines firstobs=2; 
   input @1 Did_you_ever_use_SAS 2. @5 freq 9.;                                                    
datalines;  
0123456789 
-9     251
-8     722
-7       6
-1   28323
 1    4134 
 2    3504 
 0   36940
;       
title 'PROC FORMAT VALUE Statement (Part 3)'; 
title2;
 proc print data=Fictitious_Data noobs; 
 format Did_you_ever_use_SAS respfmt_1x. freq comma6.;
run;                                              

title;

Did_you_ever_use_SAS,freq
NOT ASCERTAINED,251
DON'T KNOW,722
REFUSED,6
INAPPLICABLE,28323
YES,4134
NO,3504
TOTAL,36940


### Grouping data values using formats in the PROC FREQ step

In the code snippet below, the format values are ranges; the special keyword LOW is used 
to define the lowest value. The special keyword HIGH is used to define the lowest value. 

Note that we we are grouping data values using formats in the PROC FREQ step.


In [14]:
*Ex9_Create_vars_Different_Ways.sas (part 2);
proc format ;
 value agefmt low-17 = '0-17 Years'
              18-49 = '18-49 Years'
              50-64 = '50-64 Years'
              65-High = '65+ Years' ;
data Have2;
 input age @@ ;
 datalines;
  0 5 10 17 40 48 50 59 62 81 99 100
  ; 
title 'Frequency Table by Grouping Data Values Using Formats in PROC FREQ (Part 2)';
title2;
proc freq data=Have2; 
 table age; format age agefmt.;
run;
title;

age,Frequency,Percent,Cumulative Frequency,Cumulative Percent
0-17 Years,4,33.33,4,33.33
18-49 Years,2,16.67,6,50.0
50-64 Years,3,25.0,9,75.0
65+ Years,3,25.0,12,100.0


## Where Formats Are Stored
#### Without the LIBRARY= option in the PROC FORMAT (as in the following code snippet), formats are stored in the work.formats catalog and exists for the dutation of SAS session.

In [None]:
*Ex17_Temporary_Permanent_Catalogs.sas;
options nodate nonumber nonotes nosource;
PROC FORMAT; 
     value regionfmt
        1='Northeast' 2='Midwest' 
        3='South'  4='West';
run;

#### With the LIBRARY=library option specified, the format REGIONFMT is permanently stored in a catalog called FORMATS in the folder referenced by the libref library. 

In [8]:
options nodate nonumber nonotes nosource;
LIBNAME library 'C:\SASCourse\Week3\SAS_codes';
PROC FORMAT LIBRARY=library; 
     value regionfmt
        1='Northeast' 2='Midwest' 
        3='South'  4='West';
run;


The SAS System

E3969440A681A2408885998500000010


#### If the LIBRARY= option specifies libref, formats are stored permanently in libref.formats.

#### In the code below, with the LIBRARY=sds option specified, the format X_REGIONFMT is permanently stored in a catalog called FORMATS (named by default) in the folder referenced by the libref sds. 

In [5]:
options nocenter nodate nonumber;
ods html close;
LIBNAME sds 'C:\SASCourse\Week3\SAS_Codes';
PROC FORMAT LIBRARY=sds; 
     value x_regionfmt
       1='Northeast' 2='Midwest'
       3='South' 4='West';
run;


The SAS System

91         ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
91       ! ods graphics on / outputfmt=png;
[38;5;21mNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1[0m
92         
93         options nocenter nodate nonumber;
94         ods html close;
95         LIBNAME sds 'C:\SASCourse\Week3\SAS_Codes';
[38;5;21mNOTE: Libref SDS was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: C:\SASCourse\Week3\SAS_Codes[0m
96         PROC FORMAT LIBRARY=sds;
97              value x_regionfmt
98                1='Northeast' 2='Midwest'
99                3='South' 4='West';
[38;5;21mNOTE: Format X_REGIONFMT is already on the library SDS.FORMATS.[0m
[38;5;21mNOTE: Format X_REGIONFMT has been written to SDS.FORMATS.[0m
100        run;

[38;5;21mNOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 second

#### If the LIBRARY= option specifies libref.catalog, formats are stored permanently in that catalog.

#### In the code below, the LIBRARY=XSDS.CATALOGPOP specified, formats are permanently stored in the catalog called CATALOGPOP (rather than the default catalog name FORMATS) in the folder referenced by the libref XSDS.  

In [9]:
options nocenter nodate nonumber;
ods html close;
LIBNAME xsds 'C:\SASCourse\Week3\SAS_Codes';
PROC FORMAT LIBRARY=xsds.catalogpop; 
     value regionfmt
        1='Northeast' 2='Midwest' 
        3='South'  4='West';
run;


The SAS System

E3969440A681A2408885998500000011


### How to Use formats that were created and stored permanently earlier

#### In the code below, the LIBNAME statement that associates a libref (named library) with a SAS library (storage location that has format catalog named FORMATS. For users’ convenience, SAS has included a library called LIBRARY in the search path. So there is no need to use the OPTIONS FMTSEARCH= statement for searching the format catalog FORMATS.  



In [17]:
*Ex18_LIBRARY_library.sas;
OPTIONS nodate nonumber nocenter;
%LET path=C:\SASCourse\Week3\SAS_Codes;
LIBNAME sds "&path";
LIBNAME library "&path";
title 'Ex18_LIBRARY_library.sas';
PROC FREQ data=sds.pop; 
 TABLES region;
 format region regionfmt.;
RUN;
title;


Region,Region,Region,Region,Region
region,Frequency,Percent,Cumulative Frequency,Cumulative Percent
Northeast,9,17.65,9,17.65
Midwest,12,23.53,21,41.18
South,17,33.33,38,74.51
West,13,25.49,51,100.0


### When to Use the FMTSEARCH= System Option

#### The format catalog is permanently stored in a folder that is referenced by a libref other than library (sds is the libref in this example) in a LIBNAME statement.

#### In the code below, the FMTSEARCH = options (required) in the OPTIONS statement tells SAS to search the format catalog catalogpop in the folder referenced by the libref sds. This catalog contains the format (i.e., REGIONFMT) for the REGION variable that was earlier saved to the SAS library referenced by the libref sds and  then applied to the same variable in the DATA step when the SAS data set pop was created.


In [27]:
*Ex19_options_FMTSEARCH.sas;
OPTIONS nodate nonumber nocenter;
%LET path=C:\SASCourse\Week3\SAS_Codes;
LIBNAME sds "&path";
Options FMTSEARCH = (sds.catalogpop);
title 'Ex19_options_FMTSEARCH.sas';
PROC FREQ data=sds.pop; 
 TABLES region;
 format region regionfmt.;
RUN;
title;

Region,Region,Region,Region,Region
region,Frequency,Percent,Cumulative Frequency,Cumulative Percent
Northeast,9,17.65,9,17.65
Midwest,12,23.53,21,41.18
South,17,33.33,38,74.51
West,13,25.49,51,100.0


### FMTLIB Option with PROC FORMAT
* displays the contents of a user-defined format
How to Display the Contents of a User-Defined Format

#### The FMTLIB option of PROC FORMAT displays the start and end values of the format range as well as the label. 


In [25]:
options nocenter nonumber nodate formchar = "|----|+|---+=|-/\<>*";
Libname sds 'C:\SASCourse\Week3\SAS_Codes';
proc format library = sds.catalogpop fmtlib;
select REGIONFMT;
title 'Formats for Pop Data Set';
run;

### How to List the Member(s) of a Format Catalog

#### PROC CATALOG is used to list the members of a format catalog (e.g., SDS.pop2013catalog as shown below).  This catalog includes only one member.



In [21]:
*Ex5_print_catalog_fmtlib.sas;
options nocenter nonumber nodate nosource;
options FORMCHAR = '1----1+1---+=1-1\<>*';
libname sds 'C:\SASCourse\Week3\SAS_Codes';
proc catalog catalog = sds.catalogpop;
contents;
run;



Contents of Catalog SDS.CATALOGPOP,Contents of Catalog SDS.CATALOGPOP,Contents of Catalog SDS.CATALOGPOP,Contents of Catalog SDS.CATALOGPOP,Contents of Catalog SDS.CATALOGPOP,Contents of Catalog SDS.CATALOGPOP
#,Name,Type,Create Date,Modified Date,Description
1,REGIONFMT,FORMAT,02/01/2023 22:42:26,02/01/2023 22:42:26,


## Removing Attributes (e.g., FORMAT and LABEL, etc.) of the Data Set
* Create a temporary SAS data set containing attributes based on SASHELP.MDV
* Remove Attributes from the data set using the MODIFY statement in PROC DATASETS

In [21]:
options nocenter nodate nonumber nosource;
title1 'Ex24_remove_labels_formats_informats.sas (Part 1)';
title2 'Metadata in sashelp.mdv';
proc contents data=sashelp.mdv varnum;
ods select position;
run;

data mdv;
  set sashelp.mdv;
 run;
title 'Ex24_remove_labels_formats_informats.sas (Part 2)';
title2 'Label, format, and informat removed from work.mdv';
proc datasets lib=work memtype=data nolist;
     modify mdv;
     attrib _all_ label=' ';
     attrib _all_ format=;
     attrib _all_ informat=;
run;
quit;
proc contents data=mdv varnum;
ods select position;
run;

title;


Variables in Creation Order,Variables in Creation Order,Variables in Creation Order,Variables in Creation Order,Variables in Creation Order,Variables in Creation Order
#,Variable,Type,Len,Format,Informat
1,CODE,Char,10,,
2,ORIGCITY,Char,8,,
3,COUNTRY,Char,14,,
4,TYPE,Char,14,,
5,CITY,Char,14,,
6,COMPANY,Char,50,$30.,
7,SHIPDATE,Num,8,DATE7.,DATE.
8,SALES94,Num,8,COMMA10.2,
9,MONTH,Num,8,,
10,YEAR,Num,8,,

Variables in Creation Order,Variables in Creation Order,Variables in Creation Order,Variables in Creation Order
#,Variable,Type,Len
1,CODE,Char,10
2,ORIGCITY,Char,8
3,COUNTRY,Char,14
4,TYPE,Char,14
5,CITY,Char,14
6,COMPANY,Char,50
7,SHIPDATE,Num,8
8,SALES94,Num,8
9,MONTH,Num,8
10,YEAR,Num,8



#### Picture Format
Use the PICTURE statement under PROC FORMAT to specify a template 
 (up to 40 characters enclosed in quotation marks) for labeling numbers.

There are three types of characters in the template.
* Digit selectors (e.g., 0 through 9)
* Message characters (e.g., M for Million, B for Billion) 
* Directives (special characters e.g.,  %A %B %d %Y - to format date values)


In [21]:
*Ex3_Picture_Statement.sas (Part 1);
Proc format;
 picture week_x 1-52='99'; /*Non-zero digit selector*/
 picture week_y 1-52 ='00';  /*Zero digit selector*/
 run;
 data have;
   input week @@;
     * Create two new variables based on the original variable;
     week_x = week;
     week_y = week;
 datalines; 
 1 3  6 8 9 14 15 34 52
 ;
options nocenter nodate nosource;
title 'Ex3_Picture_Statement.sas (Part 1)';
title2 'Non-zero digit selectors in the PICTURE format add zeros to the formatted value for WEEK_X as needed';
title3 'Zero digit selectors in the PICTURE format do not add zeros to the formatted value for WEEK_Y';
proc print data=have noobs;
 var week week_x week_y;
 format week_x week_x. week_y week_y.;
run;
title;

week,week_x,week_y
1,1,1
3,3,3
6,6,6
8,8,8
9,9,9
14,14,14
15,15,15
34,34,34
52,52,52


### Picture fromat for "Billion" or "Million" Figures
Create picture formats to display:
* "billion" figures in millions (template showing digit selectors)
* "million" figures in thousands (template showing digit selectors)
* "million" figures with template showing with message charactes

Code explanation (PROC FORMAT features):

* Keywords are low-high, representing the range of non-missing values to which the format will be applied.

* The MULT= specifies the number to multiply the variable's value before it is formatted.

* The Round option with the PICTURE statement rounds the data to the nearest integer before formatting.

* The message character (e.g., M) is inserted into the picture after the numeric digits are formatted.


In [4]:
*Ex3_Picture_Statement.sas (Part 2);
proc format;
picture thou (round)
      low-high='000,000,000' (mult=.001);
picture mil (round)
      low-high='0,000,000,000' (mult=.000001);
picture m (round)
      low-high='0,000.9 M' (mult=.00001);
run;

  data work.Pop2005;
  input name $1-14 pop: comma.;
  pop_x = pop;
  pop_y=pop;
  pop_z=pop;
  datalines;
CHINA              1,323,344,591
INDIA              1,103,370,802
UNITED STATES        298,212,895
INDONESIA            222,781,487
BRAZIL               186,404,913
PAKISTAN             157,935,075
RUSSIA               143,201,572
;
title 'Ex3_Picture_Statement.sas (Part 2)';
title2 'User-defined format that expresses the numbers in thousands and millions';
proc print data=work.pop2005 noobs split='*';
label pop='Population*Size'
      pop_x= 'Population*Size*(in millions)'
      pop_y= 'Population*Size*(in thousands)'
      pop_z= 'Population*Size*(in M)';
Format pop comma14. pop_x mil. pop_y thou. pop_z m.;
run;
title;

name,Population Size,Population Size (in millions),Population Size (in thousands),Population Size (in M)
CHINA,1323344591,1323,1323345,"1,323.3 M"
INDIA,1103370802,1103,1103371,"1,103.4 M"
UNITED STATES,298212895,298,298213,298.2 M
INDONESIA,222781487,223,222781,222.8 M
BRAZIL,186404913,186,186405,186.4 M
PAKISTAN,157935075,158,157935,157.9 M
RUSSIA,143201572,143,143202,143.2 M


#### The PICTURE statement in PROC FORMAT that Expresses the Decimal Values in Percentages
Code explanation (PROC FORMAT features):

* The Round option with the PICTURE statement rounds the data to the 
  nearest integer before formatting.

* PREFIX= specifies a character prefix for the formatted value.

* Leading 0's as digit selectors mean blanks.

* Nines mean some values.


In [5]:
*Ex3_Picture_Statement.sas (Part 3);
proc format;
  picture test (round)
         low-<0='09.99' (prefix='-')
         0-<10 ='09.99'
        10-<100='99.9'
       100-999 ='999';
run;

DATA temp;
INPUT  Some_value @@;
 datalines;
  457.677 7.219 0.303 -0.027 95.307 752.789 
  ; 
title 'Ex3_Picture_Statement.sas (Part 3)';
title2 'User-defined format that expresses the decimal values in percentages';

proc print noobs; 
   var Some_value ;
   format Some_value test.;
run;
title;  

Some_value
458.0
7.22
0.3
-0.03
95.3
753.0


### The PICTURE statement in PROC FORMAT that Reproduces the SAS-defined Formats for Percentages

In [6]:
*Ex3_picture_statement.sas (Part 4);
PROC FORMAT;
 PICTURE p_fmt (ROUND)
     LOW-<0 = "009.99%" (PREFIX="-" MULT=10000)
         0-HIGH = "009.99%" (MULT=10000);
PICTURE p_fmt_x (ROUND)
     LOW-<0 = "009.99" (PREFIX="-" MULT=10000)
         0-HIGH = "009.99" (MULT=10000);
RUN;
DATA work.have;
 INPUT Value1 @@;
 Value2 = Value1; Value3 = Value1;
 Value4 = Value1; Value5 = Value1;
DATALINES;
0.0345678  -0.00123456  -0.456789 .120
;
options nodate nonumber;
title 'Ex3_Picture_Statement.sas (Part 4)';
title2 'SAS Formats and User-Defined Formats Applied';

PROC PRINT DATA=work.have SPLIT="*" NOOBS;
 VAR Value: ;
 FORMAT Value2 PERCENT8.2  Value3 PERCENTN8.2  
        Value4 p_fmt. Value5 p_fmt_x.;
 LABEL Value1="No"*"Format"*"Applied"
       Value2="SAS"*"Percent"*"Format"*"PERCENT8.2"*"Applied"
       Value3="SAS"*"Percent"*"Format"*"PERCENTN8.2"*"Applied"
       Value4="User"*"Picture"*"Format 1"*"Applied"
       Value5="User"*"Picture"*"Format 2"*"Applied";
RUN;
title;

No Format Applied,SAS Percent Format PERCENT8.2 Applied,SAS Percent Format PERCENTN8.2 Applied,User Picture Format 1 Applied,User Picture Format 2 Applied
0.03457,3.46%,3.46%,3.46%,3.46
-0.00123,( 0.12%),-0.12%,-0.12%,-0.12
-0.45679,(45.68%),-45.68%,-45.68%,-45.68
0.12,12.00%,12.00%,12.00%,12.0


### The PICTURE statement in PROC FORMAT that Dispalys Dates

   The % followed by a letter indicates a directive.
   
   * %A - full weekday name
   * %B - full month name
   * %D - day of the month 
   * %Y - four-digit year
   
   The Datatype PICTURE option specifies that the above format 
   will be applied to a SAS date, SAS time or SAS datetime. 


In [7]:
*Ex3_picture_statement.sas (Part 5);
proc format; 
      picture date_fmt(default = 45)
      other='%A, %B %D,%Y' (datatype=date); 
   run;
 
 data have; 
    some_date1=today(); 
    some_date2='01Jul2019'd;
  run;
options nodate nonumber;
title 'Ex3_Picture_Statement.sas (Part 5)';
title2 'User-Defined Picture Formats Applied';
title;
PROC PRINT DATA=have NOOBS;
 VAR Some: ; 
format Some:  date_fmt.;
run;


some_date1,some_date2
"Wednesday, September 11,2019","Monday, July 01,2019"


In [None]:
* Simulate data for applying different SAS-defined formats to the date variable;
data test;
  call streaminit(5);
  do Date='01jan2006'd to '31dec2013'd;
    j=RAND('Normal', 5000,1000);
       output;
    end;
run;
proc print data=test noobs; 
format date date9.;
where year(date)= 2006;
run;

In [None]:
* Code obtained from SAS Documentation;
proc format;
   value MYfmt
        /* Format dates prior to 31DEC2011 using only a year. */
        low-'31DEC2011'd=[year4.]

        /* Format 2012 dates using the month and year. */
        '01jan2012'd-'31DEC12'd=[monyy7.]

        /* Format dates 01JAN2013 and beyond using the day, month, and year. */
        '01JAN2013'd-high=[yyq6.]

        /* Catch missing values. */
        other='n/a';
run;
proc means data=test sum  maxdec=1;
      var j;
      class date;
      format date myfmt.;
run;