### GWU STAT 4197/STAT 6197
##### Week 6 SAS Code Examples:
* By-Group Processing 
* Combining summary and detail data

[SAS Documentation: SET Statement and By-Group Processing SET Statement](https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=lestmtsref&docsetTarget=p00hxg3x8lwivcn1f0e9axziw57y.htm&locale=en#p00fwyxqcqptpcn10ivxt1r470q1%5C)

#### Objectives
* Define First. and Last. processing
* Calculate an accumulating for groups of data
* Use a subsetting IF statement to output selected observations

#### Scenario
* You have data on calorie in-take for breakfast, lunch and dinner for 4 persons (12 data points)

* You will need to create a new data set and listing report that has the calorie in-take totals for each person.

##### Steps

* Step 1: Sort the data by person ID after reading the raw data.
* Step 2: Summarize the observations by pesrons
* Step 3: Write only the last observation for each BY group


In [1]:
*Ex10_first_var_last_var.sas (Part 1);
options nodate nocenter nonumber nosource; 
DATA work.Have;
INPUT ID $ calorie_intake;
 DATALINES;
 A 200
 A 800
 A 500
 C 250
 C 850
 C 550
 B 300
 B 900
 B 600
 D 260
 D 900
 D 800
;



SAS Connection established. Subprocess id is 11044



#### By-Group Processing
The BY statement in the DATA step enables SAS to process data in groups.

A BY statement in the DATA step below creates two temporary variables 
(First. /Last.Values) for each variable listed in the BY statement. See the log below.

The FIRST.variable is set to 1 when an observation is the first in a BY group.  
Otherwise it equals to 0.

The LAST.variable is set to 1 when an observation is the last in a BY group.  
Otherwise it equals to 0.


In [2]:
*Ex10_first_var_last_var.sas (Part 2);

PROC SORT data=work.Have 
   out=work.Sorted_have; 
 BY ID; 
run;

DATA _NULL_;
 SET work.sorted_have; BY ID; 
 PUTLOG ID= First.ID=  LAST.ID= calorie_intake=;
run;


#### Summarazing Data by Groups
In the DATA step, SAS identifies the beginning and end of each BY group by creating two temprary for each BY variables: the FIRST. and LAST. variables.

These temporary variables are available for DATA step programming but are not added to
the output data set.



In [3]:
*Ex10_first_var_last_var.sas (Part 3);
options nodate nocenter nonumber; 
Data work.person (drop=calorie_intake);
  SET work.sorted_have ; 
  BY ID;
  if first.id then total_intake=0; 
  total_intake+calorie_intake;
  if last.id;
run;
proc print data=work.person;
run;

Obs,ID,total_intake
1,A,1500
2,B,1800
3,C,1650
4,D,1960


##### Combine summary value (average) to the detail dataset and then calculate the deviation of the individual weight from the mean.

* PROC MEANS or
* PROC SUMMARY
* DATA step

or just 

* PROC SQL

##### Creating Summary Data Set (Method 1)
PROC MEANS generates descriptive statistics.  The OUTPUT statement with OUT= option creates a SAS data set with the summary statistics

In [22]:
proc means data=sashelp.class noprint;
var weight;
output out=summary_data_m (drop=_TYPE_ _FREQ_) mean=avg_weight;
run;

title1 'Summarized values from PROC MEANS output data set';
proc print data=summary_data_m noobs; run;

avg_weight
100.026


##### Creating Summary Data Set (Method 2)
PROC SUMMARY generates descriptive statistics. The OUTPUT statement with OUT= option creates a SAS data set with the summary statistics

In [10]:
*Ex11_summary_detail.sas (Part 2);
options nocenter nodate nonumber;
*** Summary value using PROC SUMMARY;
proc summary data=sashelp.class;
     var weight;
      output out=summary_data_s (drop=_TYPE_ _FREQ_)
      mean(weight)=s_avg_weight;
run;
title1 'Summarized values from PROC SUMMARY output data set';
proc print data=summary_data_s noobs; 
run;

s_avg_weight
100.026


##### Combining the Summary and Detail Data
Use two SET statements in the DATA step to combine the summary and detailed data.

In [23]:
*Ex11_summary_detail.sas (Part 3);
options nocenter nodate nonumber nosource;
data class;
        if _n_=1 then  set summary_data_m;
        set sashelp.class (keep=name weight);
        weight_deviation=1-(weight/avg_weight);
run;
title1 'Combine the summary data with the detail data using two SET statements';
proc print data=class;
var name weight avg_weight weight_deviation;
format avg_weight weight  5.1  weight_deviation percent8.2;
run;



Obs,Name,Weight,avg_weight,weight_deviation
1,Alfred,112.5,100.0,(12.47%)
2,Alice,84.0,100.0,16.02%
3,Barbara,98.0,100.0,2.03%
4,Carol,102.5,100.0,( 2.47%)
5,Henry,102.5,100.0,( 2.47%)
6,James,83.0,100.0,17.02%
7,Jane,84.5,100.0,15.52%
8,Janet,112.5,100.0,(12.47%)
9,Jeffrey,84.0,100.0,16.02%
10,John,99.5,100.0,0.53%


##### Combining the Summary and Detail Data Using the SQL Procedure
Calculate the summary data and merge it with the detail data in a single PROC SQL step.

In [22]:
*Ex11_summary_detail.sas (Part 4);
title1 'Combine the summary data with the detail data using PROC SQL';
PROC SQL;
select  name 
       ,weight format=5.1
       ,mean(weight) as avg_weight format=5.1
       ,1-(weight/calculated avg_weight) 
          as weight_deviation format=percent8.2
  from sashelp.class;
 quit;


Name,Weight,avg_weight,weight_deviation
Alfred,112.5,100.0,(12.47%)
Alice,84.0,100.0,16.02%
Barbara,98.0,100.0,2.03%
Carol,102.5,100.0,( 2.47%)
Henry,102.5,100.0,( 2.47%)
James,83.0,100.0,17.02%
Jane,84.5,100.0,15.52%
Janet,112.5,100.0,(12.47%)
Jeffrey,84.0,100.0,16.02%
John,99.5,100.0,0.53%


In [24]:
*Ex11_summary_detail.sas (Part 5);
* DATA Step Approach  ;
data detail_class;
 length new_var $1;
 set sashelp.class;
  new_var='C';
 run;
 data x_summary_data_m;
   length new_var $1;
   set summary_data_m;
     new_var='C';
 run;

data mclass (drop=new_var);
 merge detail_class 
        x_summary_data_m;
 by new_var;
 weight_deviation=1-(weight/avg_weight);
 run;

 title1 'DATA step using the MERGE statement';
proc print data=mclass;
format weight avg_weight 5.1 
       weight_deviation percent8.2;
run;




Obs,Name,Sex,Age,Height,Weight,avg_weight,weight_deviation
1,Alfred,M,14,69.0,112.5,100.0,(12.47%)
2,Alice,F,13,56.5,84.0,100.0,16.02%
3,Barbara,F,13,65.3,98.0,100.0,2.03%
4,Carol,F,14,62.8,102.5,100.0,( 2.47%)
5,Henry,M,14,63.5,102.5,100.0,( 2.47%)
6,James,M,12,57.3,83.0,100.0,17.02%
7,Jane,F,12,59.8,84.5,100.0,15.52%
8,Janet,F,15,62.5,112.5,100.0,(12.47%)
9,Jeffrey,M,13,62.5,84.0,100.0,16.02%
10,John,M,12,59.0,99.5,100.0,0.53%


In [25]:
*Ex11_summary_detail.sas (Part 6);
*** PROC Step, CALL SYMPUTX, DATA Step;
Options nocenter nodate nonumber;
proc means data=sashelp.class noprint;
var weight;
output out=mystats mean=ave_weight;
run;

data _null_;
  set mystats;
  call symputx('AverageWeight',ave_weight);
run;

data x_class;
        set sashelp.class (keep=name weight);
        weight_deviation=1-(weight/&AverageWeight);
run;
title1 "PROC Step, CALL SymputX, and DATA Step";
title2 "Mean weight: %sysfunc(putn(&AverageWeight, 5.1)) lbs";
proc print data=x_class;
var name weight weight_deviation;
format weight 5.1  weight_deviation   percent8.2;
run;

Obs,Name,Weight,weight_deviation
1,Alfred,112.5,(12.47%)
2,Alice,84.0,16.02%
3,Barbara,98.0,2.03%
4,Carol,102.5,( 2.47%)
5,Henry,102.5,( 2.47%)
6,James,83.0,17.02%
7,Jane,84.5,15.52%
8,Janet,112.5,(12.47%)
9,Jeffrey,84.0,16.02%
10,John,99.5,0.53%


In [27]:
*Ex12_SUM_Statement_vs_2_SETs.sas (Part 1);
options nocenter nodate nonumber;
DATA sale_by_mon ;
  INPUT mon $  sale @@;
  cum_sale+sale;
  DATALINES;
  Jan 164083 Feb 164260 Mar 163747 Apr 164759 
  May 165617 Jun 166098 Jul 167305 Aug 167797 
  Sep 169407 Oct 170681 Nov 171025 Dec 172995
  ;
run;
title1 'Example Data Set';
PROC PRINT DATA=sale_by_mon ; 
  FORMAT sale cum_sale dollar12.;
   SUM sale ;
  run;

Obs,mon,sale,cum_sale
1.0,Jan,"$164,083","$164,083"
2.0,Feb,"$164,260","$328,343"
3.0,Mar,"$163,747","$492,090"
4.0,Apr,"$164,759","$656,849"
5.0,May,"$165,617","$822,466"
6.0,Jun,"$166,098","$988,564"
7.0,Jul,"$167,305","$1,155,869"
8.0,Aug,"$167,797","$1,323,666"
9.0,Sep,"$169,407","$1,493,073"
10.0,Oct,"$170,681","$1,663,754"


In [30]:
*Ex12_SUM_Statement_vs_2_SETs.sas (Part 2);
DATA xsale;
  SET sale_by_mon(keep=cum_sale)
       POINT=last nobs=last;
  SET sale_by_mon (drop=cum_sale);
  Percent_sale = sale/cum_sale;
run;
title1 'Combining the summary data with the detail data';
PROC PRINT DATA=xsale; 
  VAR mon sale Percent_sale;
  SUM sale Percent_sale;
  FORMAT sale dollar10. Percent_sale percent12.2;
run;


Obs,mon,sale,Percent_sale
1.0,Jan,"$164,083",8.17%
2.0,Feb,"$164,260",8.18%
3.0,Mar,"$163,747",8.16%
4.0,Apr,"$164,759",8.21%
5.0,May,"$165,617",8.25%
6.0,Jun,"$166,098",8.27%
7.0,Jul,"$167,305",8.33%
8.0,Aug,"$167,797",8.36%
9.0,Sep,"$169,407",8.44%
10.0,Oct,"$170,681",8.50%


[Norman, Rod. 2012. A Mean Way to Count, Enumerating the Values of Multiple Variables Using
Formats with the Means Procedure. PharmaSUG 2012 – CC30](https://www.pharmasug.org/proceedings/2012/CC/PharmaSUG-2012-CC30.pdf)