## The George Washington University  
## STAT 4197/6197 
### Week 2 - DATA Step: Reading Data, and Creating Reports 
### SAS Code Examples - Part 2

* Reading Microsoft Excel Data into SAS Data Set by Using Excel Engine vs. PROC Step (PROC IMPORT)
    
* Using the \_NULL_ Data Set for:
    * Simple calculations
    * Displaying characteritsics of SAS Data Sets
    * Debugging code
    * Creating macro variables from a value in a Data Set
    
    * Creating customized tables and reports by using
        * PUT or PUTLOG statement
        * PUT and FILE statement
        
* Getting SAS Data Set into Excel Spreadheets by Using PROC EXPORT

[6 ways to use the \_NULL_ data set in SAS by Rick Wicklin](https://blogs.sas.com/content/iml/2018/06/11/6-ways-_null_-data-set-sas.html)

* Handling Missing Values
* Specifying the LENGTH Statement for the Numeric in the DATA Step

\_INFILE_ 
* specifies a character variable that references the contents of the current input buffer for this INFILE statement. 


### Reading Excel Spreasheets into SAS Data Sets
#### Method 1: Specifying the XLSX Engine in the LIBNAME Statement

The setting of the VALIDVARNAME system option allows the use of column names that contain embedded spaces and special characters.

The LIBNAME statement references the whole Excel file, which is viewed as a SAS library and, the members inside (spreadsheet or named range) are viewed as data files.
The XLSX engine accesses the XLSX file directly when reading the Excel data into SAS.
Bitness (32-bit versus 64-bit) does not matter.

The SET statement uses the Excel sheet as an input data file for this data step. Below is the SAS Code.

The LIBNAME statement references the whole Excel file, 
 which is viewed as a SAS library and, the members inside 
 (spreadsheet or named range) are viewed as data files.
 
 The SET statement uses the Excel sheet as an input data file 
 for this DATA step. 
 
 The last LIBNAME specifies the libref and clear option to disassociate the libref from the SAS library.

In [None]:
*Ex20_Import_Excel_x.sas (Part 1);
options validvarname=any nonotes nosource;
libname XL XLSX 'C:\SASCourse\Week2\SAS_Codes\Class.xlsx';
data work.class;
  set XL.Sheet1;
run;
libname XL CLEAR; 
proc print data=work.class (obs=5); 
run;


### Reading Excel Spreasheets into SAS Data Set
#### Method 2: PROC IMPORT 

In [None]:
*Ex20_Import_Excel_x.sas (Part 2);
 options nodate nonumber nodate;
  PROC IMPORT DATAFILE= "C:\SASCourse\Week2\SAS_Codes\class.xlsx"
     dbms=xlsx REPLACE OUT= work.class_x;
     SHEET="Sheet1";
     GETNAMES=YES;
 RUN;
Title;
 proc print data=work.class_x (obs=5); 
 run;


### Use DATA  \_Null_ as if you are using a calculator

In [None]:
options nocenter  nonotes  nonumber nodate nosource;
ods html close;
data _Null_;
* You have been given the following two pieces of information;
height = 69.0;
weight = 112.5 ;

* Caclculate the body mass index (BMI) based on height (in inches) and weight (in pounds);
BMI = round((weight / (height*height) ) * 703, .1);

* Convert height (in inches) to height (in meters);
    height_in_meters = round(height * 0.0254, .01);
 put   height=F5.1  weight= bmi=  height_in_meters=;
 
RUN;



### Displaying Characteristics of SAS Data Sets by Using DATA \_NULL_

#### The following code is obtained from [Rick Wicklin](https://blogs.sas.com/content/iml/2018/06/11/6-ways-_null_-data-set-sas.html)

In [None]:
data _NULL_;
set Sashelp.Class;
array char[*} $ _CHAR_;
array num[*} _NUMERIC_;
nCharVar  = dim(char);
nNumerVar = dim(num);
put "Sashelp.Class: " nCharVar= nNumerVar= ;
stop;   /* stop processing after first observation */
run;

### Creating macro variables from a value in a Data Set

In [None]:
*Ex51_Motivation_for_macro_variables (Part 1);
options nocenter nodate nonumber nosource;
ods html close;
proc means data=sashelp.class mean maxdec=1 noprint;
 var weight;
 output out=stats mean=average_wgt;
run;

*The following code will fail and is prevented from execution by changing the cell type to Markdown);
data test;

 set SASHELP.class;
 
 *This line of code does not work;
 
 weight_ratio=weight/OVERALL_MEAN; run;

### Note for the SAS code in the next cell
#### This CALL SYMPUTX routine has two parameters separated by a comma inside parentheses.  
* The first parameter, a constant enclosed in quote marks (OVERALL_MEAN) is the name of the macro variable being created. 

* The second parameter is he value of the DATA step vatribale AVERAGE_WGT being assigned to the macro variable.  

* The code creates 1 macro variables, and its value is a character string.  The macro variable resideS in the GLOBAL symbol table.

#### The %PUT statement 
* writes text strings and values of the macro variables to the SAS log, starting in column one
* writes a blank line if text is not specified
* does not require quotation marks around text
* is valid in open code
* can appear
    * before the DATA step
    * after the DATA step
    * in the middle of the DATA step


In [None]:
*Ex51_Motivation_for_macro_variables (Part 3);
 * Create a macro variable using CALL SYMPUTX;
options nocenter nodate nonumber nosource;
ods html close;
 data _null_;
  set stats;
  call symputx('OVERALL_MEAN', average_wgt);
 run; 
 %put _user_;

### Note for the SAS Code in the next cell

* The macro variable is referenced within double quotation marks in the TITLE statement.

* You must use quotation marks to enable macro variable resolution. Single quotation marks prevent macro varaiable resolution.

* SYSDATE9 is an automatic macro variable, set at SAS invocation, and always available.

In [None]:
*Ex_Motivation_for_macro_variables (Part 4);
  *The macro variable value can be retrieved in a data step; 
options nocenter nodate nonumber nosource;
 data test2;
  set SASHELP.class;
  weight_ratio=weight/"&OVERALL_MEAN";
 run;
 title "Macro variable retrived in the DATA step -  executed on &sysdate9";
 proc print data=test2 (obs=5); run;
 title;

### The following is in reference to the SAS code below.

* The PUT statement writes to the LOG or to an External File with a FILE statement, but the PUTLOG statement always writes to the LOG.

* The keyword \_NULL_ on the DATA statement is used to execute the data step without creating a data set.

* The PUT statement is used to create output records to the LOG window.  The special SAS name list \_ALL_ refers to all variables on the data step and program data vector including \_N_ and \_ERROR_\.


In [None]:
*Ex21_put.sas;
* List all DATA step variables and their values;
options nosource nodate nonumber nonotes;
ODS EXCLUDE ALL;
data _null_;
  set sashelp.class(obs=2);
  put _all_;
run;
ODS EXCLUDE NONE;

### PUT Statement
* In the PUT statement, the variable list argument is \_ALL_; the FORMAT argument is the equal sign so that the variable name preceded its value.

In [None]:
* Exclude automatic variables;
options nonotes;
data _null_;
  set sashelp.class(obs=2);
  put (_all_)(=);
run;

### PUT Statement

* The PUT statement is used to tell SAS to write each of the variables in the data step and program data vector  (except  \_N_ and \_ERROR_) that must precede its value. 

* An additional format argument / is used so that each variable and its value separated by an equal sign is output to a separate line.  

In [None]:
* Put each value on a new line;
data _null_;
  set sashelp.class(obs=2);
  put (_all_)(=/);
run;

In [None]:
/* Put each value on a new line and apply 
a common format to all numeric variables*/
data _null_;
  set sashelp.class(obs=2);
  put (_all_)(=/12.2);
run;

## \_INFILE_
#### The statement put \_infile_\; copies the contents of the most recently filled input record buffer to the SAS Log or whatever output destination is in effect.

In [2]:
*Ex34_put_putlog.sas (Part 2);
/*Use the PUTLOG or PUT statement to write to the SAS log*/
options nocenter nodate nonumber nonotes nosource; 
ods exclude all;
data _null_;
  input;
  if _N_ =1 then putlog 'Address of the Stat Department:';
   putlog _INFILE_ ;
 datalines4;
Department of Statistics
Columbian College of Arts & Sciences
Rome Hall
801 22nd St NW, 7th Floor
Washington, DC, 20052
Phone: 202-994-6356 | Fax: 202-994-6917
;;;;
ods exclude none;


The SAS System

Address of the Stat Department:
Department of Statistics                                                        
Columbian College of Arts & Sciences                                            
Rome Hall                                                                       
801 22nd St NW, 7th Floor                                                       
Washington, DC, 20052                                                           
Phone: 202-994-6356 | Fax: 202-994-6917                                         

The SAS System

E3969440A681A2408885998500000004


### Note for the SAS code in the next cell

* During the first iteration, the variable names are printed each starting with the column position specified, to the LOG window by default. 

* In the the FORMATTED PUT statement, formats are specified as format-list arguments.  For example, a character format $20.  is applied to the variable NAME. 

* A character format $2. is applied to the variable SEX.  

* A numeric format 3. is applied to the variable AGE.  Another numeric format 8.2 is applied to the variables HEIGHT and WEIGHT.


In [7]:
/* List values as a table and apply formats 
to groups of variables*/
options nonotes;
data _null_;
  set sashelp.class(obs=2);
  if _n_=1 then put @1 'NAME' @19 'SEX' @23 'AGE' 
                    @30 'HEIGHT' @38 'WEIGHT';
  put (_all_)(1*$20.,1*$2.,1*3.,2*8.2);
run;


The SAS System

NAME              SEX AGE    HEIGHT  WEIGHT
Alfred              M  14   69.00  112.50
Alice               F  13   56.50   84.00

The SAS System

E3969440A681A2408885998500000009


### Note for the SAS code in the next cell

* List values as a table.

* Apply formats to groups of variables. 

* Route output to the standard SAS output window. 

The PUT statement creates the tabular output to the OUTPUT window, not the LOG window.

In [8]:

options nodate nonumber;
title;
data _null_;
  set sashelp.class(obs=2);
  file print;
  if _n_=1 then put @1 'NAME' @19 'SEX' @23 'AGE' 
                    @30 'HEIGHT' @38 'WEIGHT';
  put (_all_)(1*$20.,1*$2.,1*3.,2*8.2);
run;

### Note for the SAS code in the next cell

* List values as a table.

* The PUT statement creates the tabular output to a file that is specified in the FILE statement, not to the LOG or OUTPUT window.

In [3]:
Data _Null_;
 file " C:\sascourse\week2\SAS_Codes\class2.csv ";
 set sashelp.class;
 put (_all_) (',');
run;


                                                           The SAS System

E3969440A681A2408885998500000005


### Note for the SAS code in the next cell

* List values as a table.

* Create a header for the tabular data file.

* Apply formats to groups of variables. 

* The PUT statement creates the tabular output to a file that is specified in the FILE statement to OUTPUT window.


In [9]:
options nodate nonumber;
data _null_;
  set sashelp.class(obs=2);
  file print;
  if _n_=1 then put @1 'NAME' @19 'SEX' @23 'AGE' 
                    @30 'HEIGHT' @38 'WEIGHT';
  put (_all_)(1*$20.,1*$2.,1*3.,2*8.2);
run;


### Note for the SAS code in the next cell

* List values as a table and apply formats to groups of variables. 

* In the SET statement, the END= option defines a temporary variable whose value is 1 when the DATA step is processing the last observation. At all other times, the value of variable is 0. Although the DATA step can use the END= variable, SAS does not add it to the resulting data set”. [SAS Documentation]

* The FILE statement creates a regular raw data file.

* During the first iteration, the variable names are printed each starting with the column position specified, to the LOG window by default.

* In the FORMATTED PUT statement, formats specified as format-list arguments.  For example, a character format $20. is applied to the variable NAME. 

* A character format $2. is applied to the variable SEX.  

* A numeric format 3. is applied to the variable AGE.  Another numeric format 8.2 is applied to the variables HEIGHT and WEIGHT.

* The PUTLOG statement is used to write an informational message to the LOG.  Note that we have preceded a message text with User’s Note to better identify the output in the log.

In [6]:
options nodate nonumber;
title;
data _null_;
  set sashelp.class(obs=2) END=last;
  file "C:\sascourse\week2\SAS_Codes\class_data2.txt";
  if _n_=1 then put @1 'NAME' @19 'SEX' @2 'AGE' 
                    @30 'HEIGHT' @38 'WEIGHT';
  put (_all_)(1*$20.,1*$2.,1*3.,2*8.2);
  if last then putlog "User's NOTE: Writing to the File is completed";
run;


The SAS System

User's NOTE: Writing to the File is completed

The SAS System

E3969440A681A2408885998500000008


In [6]:
%showLog


                                                           The SAS System

User's NOTE: Writing to the File is completed

                                                           The SAS System

E3969440A681A2408885998500000007


In [7]:
*Ex22_DM_CSV_report.sas (Part 3);
 * Create a report using DATA _NULL_ , and file print, put statements; 
  data _null_;
    set sashelp.class (obs=5) end=eof;
    file print notitles; 
   *File 'C:\SASCourse\Week2\SAS_Codes\class_22_3.csv'; 
    If _n_=1 then put @5 "Children's Demographic Characteristics";
    if _n_=1 then put @5 38*'-';
    If _n_=1 then put 
         @5 'Name' +6 'Sex' +3 'Age' +1 'Height' +2 'Weight';
    if _n_=1 then put @5 38*'-';
    put  @5 name $8.  -r
         +3 sex $1.
         +3 age 3.
         +3 height 4.1
         +3 weight  6.1;
  if eof then do;
          put @5 38*'-'/;
          put @5 'Data Source: SASHELP.CLASS;' _N_ : z2. 'cases.'; 
          put @5 "Date Prepared: %sysfunc(today(), worddate).";
  end;
  run;

In [8]:
*Ex22_DM_CSV_report.sas (Part 2);
options notes;
 * Create a csv file using DATA _NULL_ , and file, put statements; 
  data _null_;
    set sashelp.class (obs=5);
    file 'C:\SASCourse\Week2\SAS_Codes\class_22_2.csv' dlm=',';
    If _n_=1 then put 'Name, Sex, Age, Height, Weight';
    put Name Sex Age Height Weight;
  run;


                                                           The SAS System


[38;5;21mNOTE: The file 'C:\SASCourse\Week2\SAS_Codes\class_22_2.csv' is:
      Filename=C:\SASCourse\Week2\SAS_Codes\class_22_2.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=0,
      Last Modified=01Feb2023:21:53:00,
      Create Time=23Jun2022:02:32:40[0m

[38;5;21mNOTE: 6 records were written to the file 'C:\SASCourse\Week2\SAS_Codes\class_22_2.csv'.
      The minimum record length was 18.
      The maximum record length was 30.[0m
[38;5;21mNOTE: There were 5 observations read from the data set SASHELP.CLASS.[0m
[38;5;21mNOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
      [0m


                                                           The SAS System

E3969440A681A2408885998500000009


In [None]:
%showLog

In [11]:
filename csv 'C:\Sascourse\Week2\SAS_Codes\class4.csv';
data _null_;
set sashelp.class;
file csv dlm=',';
put ( _all_ ) (+0);
run;


                                                           The SAS System

[38;5;21mNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1[0m

[38;5;21mNOTE: The file CSV is:
      Filename=C:\Sascourse\Week2\SAS_Codes\class4.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=0,
      Last Modified=01Feb2023:21:54:19,
      Create Time=23Jun2022:02:32:40[0m

[38;5;21mNOTE: 19 records were written to the file CSV.
      The minimum record length was 17.
      The maximum record length was 21.[0m
[38;5;21mNOTE: There were 19 observations read from the data set SASHELP.CLASS.[0m
[38;5;21mNOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      [0m


                                                           The SAS System

E3969440A681A2408885998500000012


#### Getting SAS Data Set into Excel Spreadheets by Using PROC EXPORT

In [12]:
proc export data=sashelp.class
    outfile='c:\sascourse\week2\SAS_Codes\sashelp_class1.csv'
    dbms=csv
    replace;
run;


                                                           The SAS System

[38;5;21mNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1[0m


[38;5;21mNOTE: The file 'c:\sascourse\week2\SAS_Codes\sashelp_class1.csv' is:
      Filename=c:\sascourse\week2\SAS_Codes\sashelp_class1.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=0,
      Last Modified=01Feb2023:21:54:39,
      Create Time=23Jun2022:02:32:40[0m

[38;5;21mNOTE: 20 records were written to the file 'c:\sascourse\week2\SAS_Codes\sashelp_class1.csv'.
      The minimum record length was 17.
      The maximum record length was 26.[0m
[38;5;21mNOTE: There were 19 observations read from the data set SASHELP.CLASS.[0m
[38;5;21mNOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      [0m

19 records created in c:\sascourse\week2\SAS_Codes\sashelp_class1.csv from SASHELP.CLASS.
  
  
[38;5;21mNOTE: "c:\sascourse\week2\SAS_Codes\sashelp_class1.csv

 ### OPTIONS MISSING =   
 to display the missing values as a single character rather than as the default period.

In [None]:
*Ex19_Missing.sas (Part 1);
*Instream data include dot(.) as a missing value;
options nocenter nodate nonumber MISSING='M' ;
data Example_M_Equal;
  input x y z c name $;
  format x y z c percent12.2;
datalines;
.38 .0324  1.0 .345  John
.12 .      .    .606 Carl
.15 .7476  .    .049 Choi
 .  .22    .    .    Rubi
.35  .    .     .    Beth
;
title 'Missing values (.) Assigned a Character Value M';
title2 "by adding Missing= 'M' to the OPTIONS statement";
proc print data=Example_M_Equal;
  var name;
  sum x y z c;
run;
title;

### Use the MISSING statement
#### to tell SAS that the value M in the input data lines are to be considered special missing values rather than invalid numeric data value.

In [None]:
*Ex19_Missing.sas ;
* Instream data include a character value in a numeric field; 
options nocenter nodate nonumber  ;
data Example_M_C;
  length state $20;
  infile datalines  FIRSTOBS=2;
  input state  N_Var1-N_Var4 ;
  missing M;
  datalines;
state     N_Var1   N_Var2  N_Var3 N_Var4
Alabama     13.2   236     58     21.2
Alaska      10     263     48      M
Arizona      8.1   294     80     31
Arkansas     8.8   190     50     19.5
California   9     276     91      M
Colorado     7.9    M       78    38.7
Connecticut  3.3   110     77     11.1
;
title 'Missing values (.) Assigned a Character Value M';
title2 "by specefying M in the MISSING statement";
proc print data=Example_M_C noobs; 
var state  N_Var:;
run;
title;

### Use of special missing values  (._, .a., and .z) to numeric variables

In [None]:
*Ex19_Missing.sas;
options nocenter nodate nonumber nosource;
data Example_M_S;
  input x y z c name $;
  format x y z c percent12.2;
datalines;
.38 .0324  1.0 .345   John
.12 .a     .z    .606 Carl
.15 .7476  .z    .049 Choi
._  .22    .z     .   Rubi
.35  .a    .z     .   Beth
;
title 'Special Types of Missing Values Printed';
proc print data=Example_M_S noobs;
  var name;
  sum x y z c;
run;
title;


### Specifying the LENGTH Statement for the Numeric in the DATA Step
A variable's length (the number of bytes used to store it) is 
related to its type.

* Character variables can be up to 32,767 bytes long.
* All numeric variables have a default length of 8 bytes.
* Numeric values (no matter how many digits they contain) are stored as floating numbers in 8 bytes.

In [None]:
*Ex23_Length.sas;
data temp;
length x 4 y 3 ;
     do x=9006 to 9010;
        y=x;
       output;
     end;
proc print data=temp noobs; run;

In [2]:
options nodate nonumber nonotes;
ods html close;
data _null_;
  infile 'C:\SASCourse\Week2\SAS_Codes\Ex25_read_from_web.sas';
  input;
  put _infile_;
run;


7                                                          The SAS System                           16:24 Saturday, February 4, 2023

43         ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
43       ! ods graphics on / outputfmt=png;
[38;5;21mNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1[0m
44         
45         options nodate nonumber nonotes;
46         ods html close;
47         data _null_;
48           infile 'C:\SASCourse\Week2\SAS_Codes\Ex25_read_from_web.sas';
49           input;
50           put _infile_;
51         run;
*Ex25_read_from_web.sas;
Filename raw url 
    'http://data.princeton.edu/wws509/datasets/effort.dat';
data have1;
   infile raw  firstobs=2 truncover ;
   input record $80. ;
   put _all_;
   if _n_=5 then stop;
run;
proc print data=have1; run;

data have2;
   infile raw  firstobs=2 obs=5 truncover ;
   input country $ setting  effort  change ;
   put _all_;
 run;
proc pri

In [1]:
** Ex36_Week_2_List_of_Files.sas;
PROC IML;
SUBMIT / R;
setwd ("C:/SASCourse/Week2/SAS_Codes")
list.files(pattern="SAS", 
           full.names = TRUE, 
           ignore.case = TRUE)
ENDSUBMIT;
QUIT;
