### GWU STAT 4197/STAT 6197

#### Week 5, Part 1: Controling and Managing SAS Data Sets
#### (Source: SAS Documentation for Code Explanation)

#### The KEEP= data set option on the SET statement 
* is used to keep the specified variables when SAS reads the input data set.

#### The RENAME= data set option on an input data set 
* is used to change the name of the variable when SAS reads the input data set.

#### The RENAME = and WHERE=  data set options on the SET statement

If you use RENAME= with WHERE processing such as a WHERE statement or a WHERE= data set option, the new name is applied before the data is processed. 

You must use the new name in the WHERE expression.
(SAS(R) Documentation)


In [1]:
*Ex1_data_set_options_statements.sas (Part 1);
** KEEP=, RENAME=, and WHERE= data Set Options;
options nocenter nodate nodate;
data work.class1;
 set sashelp.class (keep=name age sex
                    rename=(name=Student_Name sex=Gender)
                    where =(age >13 )
                   );
                   
run;
title "KEEP=, RENAME=, and WHERE= data Set Options";
proc print data=work.class1;
run;
title;

Obs,Student_Name,Gender,Age
1,Alfred,M,14
2,Carol,F,14
3,Henry,M,14
4,Janet,F,15
5,Judy,F,14
6,Mary,F,15
7,Philip,M,16
8,Ronald,M,15
9,William,M,15


#### DROP, KEEP, RENAME, WHERE Statements

* The DROP statement specifies the names of the variables to drop from the output data set.
* The KEEP statement specifies the names of the variables to keep in the output data set.
* The RENAME statement specifies the names of the variables to be renamed in the output data set.
* The WHERE statement select observations before they are read into the program data vector.


In [1]:
*Ex1_data_set_options_statements.sas (Part 2);
** KEEP, RENAME, and WHERE Data Set Statements;
options nocenter nodate nodate;
data work.class2;
  set sashelp.class ;
  keep name age sex;
  rename name=Student_Name sex=Gender;
  where age >13;
run;
title 'KEEP, RENAME, and WHERE Statements';
proc print data=work.class2;
run;
title;

Obs,Student_Name,Gender,Age
1,Alfred,M,14
2,Carol,F,14
3,Henry,M,14
4,Janet,F,15
5,Judy,F,14
6,Mary,F,15
7,Philip,M,16
8,Ronald,M,15
9,William,M,15


#### INDSNAME= Option with the SET Statement

* The INDSNAME= creates a temporary variable that contains the name of the data set from which the current observation is read.

* The LENGTH function returns an integer that represents the position of the rightmost non-blank character in string.

* Your turn: Why is the length(value) 41?

[Read more about the INDSNAME= option here.](https://blogs.sas.com/content/iml/2015/08/03/indsname-option.html)



In [4]:
*Ex1_data_set_options_statements.sas (Part 3);
options nocenter nodate;
data dsn2014 dsn2015 dsn2016 dsn2017 dsn2018;
   course='Stat 4197/6197';
run;
 data want;
 retain course value_x year;
    length year $ 8;
    set dsn: INDSNAME = value;
    value_x = value;
    Year=substr(value,(length(value)-3));
   run;
   title 'INDSNAME = Data Set options';
   proc print data = want noobs;
  run;
 proc contents data = want; 
 ods select variables;
 run;
 title;

course,value_x,year
Stat 4197/6197,WORK.DSN2014,2014
Stat 4197/6197,WORK.DSN2015,2015
Stat 4197/6197,WORK.DSN2016,2016
Stat 4197/6197,WORK.DSN2017,2017
Stat 4197/6197,WORK.DSN2018,2018

Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes
#,Variable,Type,Len
1,course,Char,14
2,value_x,Char,41
3,year,Char,8


#### FIRSTOBS = and OBS = Data Set Options
* The FIRSTOBS = data set option specifies a starting point for processing an input data set.
* The OBS = data set option specifies an ending point for processing an input data set.


In [2]:
*Ex1_data_set_options_statements.sas (Part 4);
 options nocenter nonumber nodate;
Data work.Class;
  set sashelp.class (FIRSTOBS=7 OBS=10);
run;
title 'FIRSTOBS= and OBS= Options';
proc print data=work.class;
run;
title;

Obs,Name,Sex,Age,Height,Weight
1,Jane,F,12,59.8,84.5
2,Janet,F,15,62.5,112.5
3,Jeffrey,M,13,62.5,84.0
4,John,M,12,59.0,99.5


#### The SORT procedure does at least the following
* orders the data set by the,  values of the variable(s) listed in the BY statement in ascending order by default
* either replaces the original data set or creates a new data set
* prodices  only an output data set, but no report

#### OUT = Option
* The output data set has been named using the OUT= option so that the input data set is not overwritten.


In [3]:
*Ex2A_SORT_nodupkey_noduprecs.sas (Part 1);
options nocenter nonumber nodate;
data work.HAVE;
  input ID $ visit_date :mmddyy. 
        visit_type :& $25.;
  format visit_date mmddyy10. ;
datalines;
A01 01152015 Emergency Room Visit
A01 07252015 Physician Office Visit
A01 07252015 Physician Office Visit
A02 02202015 Physician Office Visit
A02 02202015 Emergency Room Visit
A05 01122015 Outpatient Visit
;
Title "Sorting the data in ascending order by all variables in the data set";
proc sort data=work.Have
     out=work.visit_date_A; 
by id; 
run;
proc print data=work.visit_date_A noobs; 
run;
title;

ID,visit_date,visit_type
A01,01/15/2015,Emergency Room Visit
A01,07/25/2015,Physician Office Visit
A01,07/25/2015,Physician Office Visit
A02,02/20/2015,Physician Office Visit
A02,02/20/2015,Emergency Room Visit
A05,01/12/2015,Outpatient Visit


#### PROC SORT's NODUPKEY option

* With NODUPKEY option, PROC SORT keeps one observation and deletes all subsequent observations that have duplicate *by variable* values (i.e., duplicate values for variable ID specified in the BY statement). 
#### OUT = Option
* The output data set has been named using the OUT= option so that the input data set is not overwritten.


In [15]:
title "NODUPKEY Option with PROC SORT BY ID variable";
proc sort data = work.HAVE nodupkey 
   out=work.nodupkey_id; 
by id;
proc print data=work.nodupkey_id noobs; 
run;
title;

ID,visit_date,visit_type
A01,01/15/2015,Emergency Room Visit
A02,02/20/2015,Physician Office Visit
A05,01/12/2015,Outpatient Visit


#### PROC SORT's NODUPRECS (or NODUPREC or NODUP) option

* With NODUPRECS (or NODUPREC or NODUP) option, PROC SORT identifies observations with the same values of all the variables (not just one variable that is specified in the BY statement) and then deletes duplicate observations from the output data set. 

In [4]:
title "NODUPRECS Option with PROC SORT";
proc sort data = work.HAVE noduprecs 
   out=work.noduprecs_id; 
by id;
proc print data=work.noduprecs_id noobs; 
run;
title;

ID,visit_date,visit_type
A01,01/15/2015,Emergency Room Visit
A01,07/25/2015,Physician Office Visit
A02,02/20/2015,Physician Office Visit
A02,02/20/2015,Emergency Room Visit
A05,01/12/2015,Outpatient Visit


#### DUPOUT= Option

The DUPOUT= identifies a temporary data set (i.e., work.dupoutobs) that is created by the SORT procedure with duplicate observations for all the variables (not just one specified in the BY statement) identified by the NODUPRECS option.


In [20]:
proc sort data = work.HAVE noduprecs
  out=work.noduprecs_id 
  DUPOUT=work.dupoutobs ;
 BY ID;
run;
Title "Listing of duplicate observations";
proc print data=work.dupoutobs noobs; 
run;
title;

ID,visit_date,visit_type
A01,07/25/2015,Physician Office Visit


In [23]:
Title "Listing of nonduplicate observations due to NODUPRECS option";
proc print data=work.noduprecs_id noobs; 
run;

ID,visit_date,visit_type
A01,01/15/2015,Emergency Room Visit
A01,07/25/2015,Physician Office Visit
A02,02/20/2015,Physician Office Visit
A02,02/20/2015,Emergency Room Visit
A05,01/12/2015,Outpatient Visit


#### NOUNIQUEKEYS Option and OUT= Keyword with PROC SORT (Starting with SAS® 9.3)

* The NOUNIQUEKEYS option deletes observations from the output SAS data set where the value of the BY-variable(s) is unique. 

* The OUT= keyword stores observations with non-unique values of the BY-variables in an output SAS data set (i.e. duplicates). 

* A BY-group is a group that is formed by one or more observations with the same value of the BY variables. 

* The option UNIQUEOUT= specifies an output SAS data set (i.e., singles) containing the observations eliminated by the NOUNIQUEKEYS option.

[SAS® Documentation]


In [5]:
** New options with PROC SORT;
proc sort data = have nouniquekeys
          out = duplicates
          uniqueout = singles;
by ID Visit_date visit_type;
Title "List of Exact Duplicates - NOUNIQUEKEYS and UNIQUEOUT Options with PROC SORT";
proc print data=duplicates noobs; run;

ID,visit_date,visit_type
A01,07/25/2015,Physician Office Visit
A01,07/25/2015,Physician Office Visit


In [25]:
Title "List of Singles - NOUNIQUEKEYS and UNIQUEOUT Options with PROC SORT";
proc print data=singles noobs; run;
title;

ID,visit_date,visit_type
A01,01/15/2015,Emergency Room Visit
A02,02/20/2015,Emergency Room Visit
A02,02/20/2015,Physician Office Visit
A05,01/12/2015,Outpatient Visit


In [6]:
*Ex3_Direct_Access.sas (Data creation);
options nocenter nodate nonumber;
DATA TEST (drop=seed);
 seed=123;
  do Student  = 1 to 25;
     /*  a * ranuni(seed) + b   -> interval: <b, a + b> */
     Tuition = ceil(ranuni(seed)*201+12000);  
     Food = ceil(ranuni(seed)*312+4000);
     Books = ceil(ranuni(seed)*512+2000);    
     output;
   end;
  FORMAT Tuition Food Books  dollar8.;  
title1 'Creation of Example Data';
proc print data = TEST (obs=5) noobs; RUN;

Student,Tuition,Food,Books
1,"$12,151","$4,101","$2,092"
2,"$12,183","$4,112","$2,114"
3,"$12,159","$4,125","$2,064"
4,"$12,038","$4,243","$2,224"
5,"$12,195","$4,083","$2,366"


#### POINT = Option in the SET Statement
* This option specifies a temporary variable determines which observation is read.

#### STOP Statement
* The statement prevents the continuous processing of the DATA step.

In [4]:
*Ex3_Direct_Access.sas (Part 1);
options nocenter nonumber nodate;
 data try1;
  obsnum= 5;
   set TEST point=obsnum;
   output; 
  stop;
run;
title1 'Accessing any single observation' ;
proc print data=try1 noobs; run;

Student,Tuition,Food,Books
5,"$12,195","$4,083","$2,366"


In [31]:
*Ex3_Direct_Access.sas (Part 2);
options nocenter nodate nonumber;
data try2;
  do obsnum = 3,5,8;
   set TEST point=obsnum;
   output; 
  end;
  stop;
run;
title1 'Accessing any particular observations' ;
proc print data=try2 noobs; run;


Student,Tuition,Food,Books
3,"$12,159","$4,125","$2,064"
5,"$12,195","$4,083","$2,366"
8,"$12,155","$4,220","$2,169"


#### NOBS = Option in the SET statement
* This option assigns the number of observations in the SAS data set to a temporary variable.

In [5]:
*Ex3_Direct_Access.sas (Part 3);
options nocenter nonumber nodate;
data try3;
  do obsnum = 1 to num_of_obs by 5;
   set TEST point=obsnum nobs=num_of_obs;
   output; 
  end;
  stop;
run;
title1 'Accessing any Nth observation' ;
proc print data=try3 noobs; run;
title1;

Student,Tuition,Food,Books
1,"$12,151","$4,101","$2,092"
6,"$12,112","$4,166","$2,442"
11,"$12,119","$4,098","$2,277"
16,"$12,194","$4,030","$2,347"
21,"$12,051","$4,073","$2,407"


In [33]:
*Ex4_sample_select.sas (Part 1);
options nocenter nodate nonumber;
proc surveyselect data=SASHELP.HEART
  method=srs n=100 out=WORK.HEART;
run;

0,1
Selection Method,Simple Random Sampling

0,1
Input Data Set,HEART
Random Number Seed,568376001
Sample Size,100
Selection Probability,0.019198
Sampling Weight,52.09
Output Data Set,HEART


In [7]:
*Ex5_how_many_obs.sas (Part 1);
options nonotes nosource nodate nonumber leftmargin=1cm;
DATA  _NULL_;
 SET sashelp.heart NOBS=numobs;
 if numobs then PUT @7 "Number of cases =" numobs comma7.;
 stop;
run;

In [38]:
*Ex5_how_many_obs.sas ((Part 2);
DATA   _NULL_;
  SET sashelp.heart END=last;
  count+1;
  if last then PUT @7 "Number of cases =" count comma7.;
run;

In [7]:
*Ex5_how_many_obs.sas (Part 3);
options nocenter nonumber nodate;
ods html close;
DATA  _NULL_;
 if 0 then SET sashelp.heart NOBS=N;
   CALL SYMPUTX('total', N);
 stop;
run;
/* Below are 3 ways to display the value of the macro variable (&total) */
%PUT &total;
%PUT Number of cases = %SYSFUNC(left(&total));
%PUT Number of cases = %SYSFUNC(left(%qsysfunc(putn(&total, comma7.))));



The SAS System

150        ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
150      ! ods graphics on / outputfmt=png;
[38;5;21mNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1[0m
151        
152        *Ex5_how_many_obs.sas (Part 3);
153        options nocenter nonumber nodate;
154        ods html close;
155        DATA  _NULL_;
156         if 0 then SET sashelp.heart NOBS=N;
157           CALL SYMPUTX('total', N);
158         stop;
159        run;

[38;5;21mNOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds
      [0m

160        /* Below are 3 ways to display the value of the macro variable (&total) */
161        %PUT &total;
5209
162        %PUT Number of cases = %SYSFUNC(left(&total));
Number of cases = 5209
163        %PUT Number of cases = %SYSFUNC(left(%qsysfunc(putn(&total, comma7.))));
Number of cases = 5,209
164        


In [8]:
*Ex5_how_many_obs.sas ((Part 4);
PROC SQL noprint;
select count(*)into :OBSCOUNT
 from sashelp.heart;
quit;
%PUT Number of cases = %SYSFUNC(left(%qsysfunc(putn(&total, comma7.))));


The SAS System

172        ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
172      ! ods graphics on / outputfmt=png;
[38;5;21mNOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1[0m
173        
174        *Ex5_how_many_obs.sas ((Part 4);
175        PROC SQL noprint;
176        select count(*)into :OBSCOUNT
177         from sashelp.heart;
178        quit;
[38;5;21mNOTE: PROCEDURE SQL used (Total process time):
      real time           0.28 seconds
      cpu time            0.03 seconds
      [0m

179        %PUT Number of cases = %SYSFUNC(left(%qsysfunc(putn(&total, comma7.))));
Number of cases = 5,209
180        
181        
182        ods html5 (id=saspy_internal) close;ods listing;
183        

The SAS System

184        
