### GWU STAT 4197/6197

#### Week 5, Part 4: Restrucring or Transposing Data


#### The TRANSPOSE procedure
* transposes selected variables into observations
* transposes numeric variables by default
* transposes character variables only if explicitly listed in the VAR statement
* create a new SAS data set, not a report

(Source: SAS(R) Documentation/Slides)

#### Create an example data set 
* six observations (rows)
* three variables (columns)
    * 1 character and 2 numeric variables

In [1]:
*Ex6_long_to_wide.sas (Part 1);
options nocenter nodate nonotes nosource;
ods html close;
data Have;
 input year $ gdp cpi ;
 datalines;
2010 101.226 218.056
2011 103.315 224.939
2012 105.220 229.594
2013 106.935 232.957
2014 108.694 236.736
2015 109.782 237.017
;
proc print data=Have; run;



Obs,year,gdp,cpi
1,2010,101.226,218.056
2,2011,103.315,224.939
3,2012,105.22,229.594
4,2013,106.935,232.957
5,2014,108.694,236.736
6,2015,109.782,237.017


#### Use PROC TRANSPOSE to restructure the above example data set.
* When transposed, the resulting data set has two rows, one for each numeric variables, gdp and cpi.
* By default, the character variable (Year) has not been transposed.
* The variables COL1-COL6 represent six observations in the input data sets.


In [2]:
*Ex6_long_to_wide.sas (Part 2);
proc transpose data=Have out=wide1;run;
Title1 'Transposed data with DEFAULT names of the new variables';
proc print data=WIDE1 noobs; run;

_NAME_,COL1,COL2,COL3,COL4,COL5,COL6
gdp,101.226,103.315,105.22,106.935,108.694,109.782
cpi,218.056,224.939,229.594,232.957,236.736,237.017


#### ID Statement
* The ID statement identifies the character variable (Year) whose value become the names of the new columns.
* If you specify a numeric version the variable YEAR in the ID statement, its value would get printed with an underscore (\_); \_2010 \_2011, \_2012, \_2013 \_2014, \_2015, \_2016.

#### \_NAME_ 

\_NAME_ is an automatic variable in the output data set that holds the variable name(s) in the input data set from which the values originate.

In [5]:
*Ex6_long_to_wide.sas (Part 2);
proc transpose data=Have out=wide1;
id year;
run;
Title 'Transposed data with DEFAULT names of the new variables';
proc print data=WIDE1 noobs;
run;
title;

_NAME_,2010,2011,2012,2013,2014,2015
gdp,101.226,103.315,105.22,106.935,108.694,109.782
cpi,218.056,224.939,229.594,232.957,236.736,237.017


#### PREFIX= option 

* The PREFIX= option is declared in the PROC TRANSPOSE statement to attach a prefix (Year) to the value of the variable (YEAR) in the ID statement.

#### NAME= option

* The NAME= option is used n to replace \_NAME_.  The \_NAME_ is now named as “Indicator” in the output data set 


In [23]:
*Ex6_long_to_wide.sas (Part 3);
proc transpose data=HAVE out=wide2 prefix=Year 
                         name=Indicator;
id year;
var GDP CPI ; run;
Title 'Transposed data with DEFAULT names changed';
title2 'ID statement, PREFIX=, NAME= options added';
proc print data=WIDE2 noobs; run;
title;

Indicator,Year2010,Year2011,Year2012,Year2013,Year2014,Year2015
gdp,101.226,103.315,105.22,106.935,108.694,109.782
cpi,218.056,224.939,229.594,232.957,236.736,237.017


#### Transposing "Wide" to "Long" Format

In [2]:
proc means data=sashelp.class mean std Q1 Median Q3 noprint;
   var weight;
    output out=work.summary 
    Mean = Mean	std =std
    Q1=Q1 Median=Median Q3=Q3;
run;
options validvarname=ANY;   
data work.x_summary;
   set work.summary;
    IQR = Q3 - Q1;
    'Upper Limit'n = Q3 + 1.5*IQR;
   'Lower Limit'n = Q1 - 1.5*IQR;
drop _TYPE_ _FREQ_ IQR;
run;
title 'Original Data Set (Wide Format)';
proc print; 
run; 


Obs,Mean,std,Q1,Median,Q3,Upper Limit,Lower Limit
1,100.026,22.7739,84,99.5,112.5,155.25,41.25


In [3]:
proc transpose data=x_summary 
 out=y_summary(rename=(COL1=Estimates)) name=Stat;
run;
title 'Transposed Data Set (Long Format)';
proc print data=y_summary noobs label;
label stat= 'Descriptiove Statistics';
run;
title;

Descriptiove Statistics,Estimates
Mean,100.026
std,22.774
Q1,84.0
Median,99.5
Q3,112.5
Upper Limit,155.25
Lower Limit,41.25


#### Reshaping the Long Data to Wide Data Using an ARRAY Statement

In [7]:
*Ex6_long_to_wide.sas (Part 5);
** Long to wide format;
Data Long;
input (Name Test) ($) Score ;
datalines;
John Test1 75
John Test2 85
John Test3 76
John Test4 72
John Test5 78
John HW1   82
John HW2   85
John Midterm 68
John Final   75
Hena Test1 75
Hena Test2 80
;
proc sort data=Long; by Name; run;
data want(drop=i Test Score);
   set Long;
   by Name;
   array TestScore (9) Test1 Test2 Test3 Test4 Test5 HW1 HW2 Midterm Final;
   if first.Name then i=1;
   TestScore(i)=score;
   if last.Name then output;
   i+1;
   retain Test1 Test2 Test3 Test4 Test5 HW1 HW2 Midterm Final;
run;
Title1 'Reshaping data in long format to wide format using a DATA step and an ARRAY statement';
proc print data=Want noobs;
run;

Name,Test1,Test2,Test3,Test4,Test5,HW1,HW2,Midterm,Final
Hena,75,80,.,.,.,.,.,.,.
John,75,85,76,72,78,82,85,68,75


In [8]:
*Ex6_long_to_wide.sas (Part 6);
proc transpose data=Long out=t_wide (drop=_NAME_);
   by Name;
   var score;
   id Test;
run;
Title1 'Reshaping data in long format to wide format using PROC TRANSPOSE';
proc print data=t_wide  noobs;
var name Test1 Test2 Test3 Test4 Test5 HW1 HW2 Midterm Final;
run;


Name,Test1,Test2,Test3,Test4,Test5,HW1,HW2,Midterm,Final
Hena,75,80,.,.,.,.,.,.,.
John,75,85,76,72,78,82,85,68,75


#### Reshaping data in wide format to long format using PROC TRANSPOSE

In [9]:
*Ex7_wide_to_long.sas;
DATA wide;
 INPUT Student_id $ Test1-Test5;
DATALINES;
A001  80 80 80 82 75
B002  85 72 85 89 81
C003  87 88 89 91 79
D004  87 88 89 90 82
;
proc sort data=wide; by Student_ID; run;
proc transpose data=wide 
   out=long (rename=(_NAME_=Test col1=Score));
by Student_id;
run;
title1 'Reshaping data in wide format to long format';
title2 'BY statement added to PROC TRANSPOSE step';
proc print data=long noobs; 
run;




Student_id,Test,Score
A001,Test1,80
A001,Test2,80
A001,Test3,80
A001,Test4,82
A001,Test5,75
B002,Test1,85
B002,Test2,72
B002,Test3,85
B002,Test4,89
B002,Test5,81


#### Reshaping data in wide format to long format using DATA step/ ARRAY statement

In [15]:
data Long_x;
set wide;
array Tests[*] Test1-Test5;
do _t = 1 to dim(Tests);
  Test = tests[_t];
  output;
end;
keep Student_id Test;
run;
title1 'Wide data transposed to long';
title2 'Using the DATA step and ARRAY statement';
proc print data=long_x noobs; 
run;

Student_id,Test
A001,80
A001,80
A001,80
A001,82
A001,75
B002,85
B002,72
B002,85
B002,89
B002,81


In [4]:
*Ex8_multi_transpose_x.sas (Part 1);
options nocenter nonumber nodate;
data have;
   input family_id $ month Ins_paid copay @@;
datalines;
F002 1 350 60 F002 2 100  30 F002 3 88 20 F002 4 20  0
F002 5 450 90 F002 6 70 30
F001 1 245 60 F001 2 100  0 F001 3 0 0 F001 4 120  30
F001 5 345 60 F001 6 95 30 
;
title1 'Data in LONG format';
proc print data=HAVE noobs; run;

family_id,month,Ins_paid,copay
F002,1,350,60
F002,2,100,30
F002,3,88,20
F002,4,20,0
F002,5,450,90
F002,6,70,30
F001,1,245,60
F001,2,100,0
F001,3,0,0
F001,4,120,30


In [12]:
*Ex8_multi_transpose_x.sas (Part 2);
proc sort data=have;  by family_id; run;
proc transpose data= have 
          out=have_t name=stat; ;
   by family_id month;
   var Ins_paid copay;
run;

title1 'Listing from the first transposition of the orginal data set';
proc print data=have_t noobs; run;


family_id,month,stat,COL1
F001,1,Ins_paid,245
F001,1,copay,60
F001,2,Ins_paid,100
F001,2,copay,0
F001,3,Ins_paid,0
F001,3,copay,0
F001,4,Ins_paid,120
F001,4,copay,30
F001,5,Ins_paid,345
F001,5,copay,60


In [14]:
title 'Descriptor portion of the first-transposed data';
proc contents data=have_t position; 
ods select variables;
run;

Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes
#,Variable,Type,Len,Label
4,COL1,Num,8,
1,family_id,Char,8,
2,month,Num,8,
3,stat,Char,8,NAME OF FORMER VARIABLE


In [55]:
*Ex8_multi_transpose_x.sas (Part 3);
proc transpose data=have_t out=have_tt(drop=_NAME_);
by family_id;
var col1;
id stat month;
run;
title1 'Listing from the second transposition of the data set';
proc print data=have_tt noobs;
run;

family_id,Ins_paid1,copay1,Ins_paid2,copay2,Ins_paid3,copay3,Ins_paid4,copay4,Ins_paid5,copay5,Ins_paid6,copay6
F001,245,60,100,0,0,0,120,30,345,60,95,30
F002,350,60,100,30,88,20,20,0,450,90,70,30


In [1]:
*Ex9_transpose_by.sas (Part 1);
 options nodate nonumber ;
 proc format;
 value cat_fmt 1 = 'Water'
               2 = 'Phone'
               3 = 'Electricity';
 data have;
  length year $4;
  input year Bill_type mean @@;
  format Bill_type cat_fmt.;
  label bill_type = 'Type of Bills'
        mean = 'Average Monthly Bill ($)';
  datalines;
  2010 1 256.3 2011 1 235.4 2012 1 215.5
  2013 1 210.7 2014 1 209.3 2010 2 145.5
  2011 2 150.8 2012 2 147.1 2013 2 180.8
  2014 2 142.9 2010 3 219.5 2011 3 245.8
  2012 3 242.0 2013 3 239.8 2014 3 223.8
  ;
  run;
 proc sort data=have; by Bill_type; run;
 title1 'Original data table from a family';
 proc print data=have label noobs; run;

SAS Connection established. Subprocess id is 9580



year,Type of Bills,Average Monthly Bill ($)
2010,Water,256.3
2011,Water,235.4
2012,Water,215.5
2013,Water,210.7
2014,Water,209.3
2010,Phone,145.5
2011,Phone,150.8
2012,Phone,147.1
2013,Phone,180.8
2014,Phone,142.9


In [3]:
 *Ex9_transpose_by.sas (Part 2);
 proc transpose data= have out=have_t (drop=_NAME_);
           by Bill_type; 
           id year;
           *idlabel year;
           var mean;
run;
title1 'Transposed data table (Average Monthly Bill of a family)';
proc print data=have_t (drop=_label_) noobs label; 
run;

Type of Bills,2010,2011,2012,2013,2014
Water,256.3,235.4,215.5,210.7,209.3
Phone,145.5,150.8,147.1,180.8,142.9
Electricity,219.5,245.8,242.0,239.8,223.8
