# Combining Data in SAS

We are going to talk about:
- concatenating
- interleaving
- merging
- updating
- modifying

## Concatenating

Concatenating in SAS is to combine one or more SAS data sets, one after another, into a single data set. The number of observations in the new data set is the sum of the number of observations in the original data sets. There are two ways to do that:
- the SET statement in a DATA step
- the APPEND procedure

In [97]:
DATA Sales;
INPUT EmployeeID $ 1-9 Name $ 11-29 @30 HireDate DATE9. Salary HomePhone $;
FORMAT HireDate DATE9.;
Department = 'Sales           ';
DATALINES ;
429685482 Martin, Virginia   09aug2002 45000 493-0824
244967839 Singleton, MaryAnn 24Apr2004 34000 929-2623
996740216 Leighton, Maurice  16dec2001 57000 933-6908
675443925 Freuler, Carl      15feb2010 54500 493-3993
845729308 Cage, Merce        19oct2009 64000 286-0519
;

PROC PRINT DATA = Sales;
TITLE 'Sale department employees';
RUN;

Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department
1,429685482,"Martin, Virginia",09AUG2002,45000,493-0824,Sales
2,244967839,"Singleton, MaryAnn",24APR2004,34000,929-2623,Sales
3,996740216,"Leighton, Maurice",16DEC2001,57000,933-6908,Sales
4,675443925,"Freuler, Carl",15FEB2010,54500,493-3993,Sales
5,845729308,"Cage, Merce",19OCT2009,64000,286-0519,Sales


In [98]:
DATA Customer_support;
INPUT EmployeeID $ 1-9 Name $ 11-29 @30 HireDate DATE9. Salary HomePhone $;
FORMAT HireDate DATE9. ;
Department = 'Customer support';
DATALINES ;
324897451 Sayre, Jay         15nov2005 66000 933-2998
598723234 Tolson, Andrew     18mar2000 54000 929-9984
432842452 Jensen, Helga      01feb2004 70300 289-2135
893421341 Kulenic, Marie     24jun2004 54800 872-1342
988431421 Zweerink, Anna     07Jul2011 59000 929-3726
;
RUN;

PROC PRINT DATA = Customer_support;
TITLE 'Customer support department employees';
RUN;

Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department
1,324897451,"Sayre, Jay",15NOV2005,66000,933-2998,Customer support
2,598723234,"Tolson, Andrew",18MAR2000,54000,929-9984,Customer support
3,432842452,"Jensen, Helga",01FEB2004,70300,289-2135,Customer support
4,893421341,"Kulenic, Marie",24JUN2004,54800,872-1342,Customer support
5,988431421,"Zweerink, Anna",07JUL2011,59000,929-3726,Customer support


#### Contatenating the two tables as follwos. 

In [99]:
DATA Dept1_2;
  SET Sales Customer_support;
RUN;

PROC PRINT DATA = Dept1_2;
TITLE 'Employees in Sales and Customer support department';
RUN;

Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department
1,429685482,"Martin, Virginia",09AUG2002,45000,493-0824,Sales
2,244967839,"Singleton, MaryAnn",24APR2004,34000,929-2623,Sales
3,996740216,"Leighton, Maurice",16DEC2001,57000,933-6908,Sales
4,675443925,"Freuler, Carl",15FEB2010,54500,493-3993,Sales
5,845729308,"Cage, Merce",19OCT2009,64000,286-0519,Sales
6,324897451,"Sayre, Jay",15NOV2005,66000,933-2998,Customer support
7,598723234,"Tolson, Andrew",18MAR2000,54000,929-9984,Customer support
8,432842452,"Jensen, Helga",01FEB2004,70300,289-2135,Customer support
9,893421341,"Kulenic, Marie",24JUN2004,54800,872-1342,Customer support
10,988431421,"Zweerink, Anna",07JUL2011,59000,929-3726,Customer support


### Creating new data table 

In [100]:
DATA Security;
INPUT EmployeeID $ 1-9 Name $ 11-30 Gender $ 31 @33 HireDate DATE9. Salary;
FORMAT HireDate DATE9. ;
Department = 'Security';
DATALINES ;
453356433 Saparilas, Thearesa F 09may2005 45000
832113412 Brosnihan, Dylan    M 04jan2009 49000
243753981 Chao, Daeyong       M 28sep2004 48500
544213416 Slifkin, Leah       F 24jul2011 54000
933145671 Perry, Marguerite   F 19apr2010 49500
;

PROC PRINT DATA = Security;
TITLE 'Security department employees';
RUN;

Obs,EmployeeID,Name,Gender,HireDate,Salary,Department
1,453356433,"Saparilas, Thearesa",F,09MAY2005,45000,Security
2,832113412,"Brosnihan, Dylan",M,04JAN2009,49000,Security
3,243753981,"Chao, Daeyong",M,28SEP2004,48500,Security
4,544213416,"Slifkin, Leah",F,24JUL2011,54000,Security
5,933145671,"Perry, Marguerite",F,19APR2010,49500,Security


### Contatenating departments 

In [101]:
DATA Dept1_3;
  SET Sales Customer_support Security;
RUN;

PROC PRINT DATA = Dept1_3;
TITLE 'Employees in Sales, Customer support, ';
TITLE 'and Security departmens';
RUN;

Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department,Gender
1,429685482,"Martin, Virginia",09AUG2002,45000,493-0824,Sales,
2,244967839,"Singleton, MaryAnn",24APR2004,34000,929-2623,Sales,
3,996740216,"Leighton, Maurice",16DEC2001,57000,933-6908,Sales,
4,675443925,"Freuler, Carl",15FEB2010,54500,493-3993,Sales,
5,845729308,"Cage, Merce",19OCT2009,64000,286-0519,Sales,
6,324897451,"Sayre, Jay",15NOV2005,66000,933-2998,Customer support,
7,598723234,"Tolson, Andrew",18MAR2000,54000,929-9984,Customer support,
8,432842452,"Jensen, Helga",01FEB2004,70300,289-2135,Customer support,
9,893421341,"Kulenic, Marie",24JUN2004,54800,872-1342,Customer support,
10,988431421,"Zweerink, Anna",07JUL2011,59000,929-3726,Customer support,


### Concatenate when varialbes have different types

In [102]:
DATA Accounting;
INPUT EmployeeID 1-9 Name $ 11-29 Gender $ 30 @32 HireDate DATE9. Salary;
FORMAT HireDate DATE9. ;
Department = 'Accounting';
DATALINES ;
652453421 Gardinski, Barbara F 29may2001 59000
235312326 Robertson, Hannah  F 14mar2004 65000
234523214 Sopheak, Leng      M 03apr2011 62000
326574341 Chentha, Sok       F 09feb2014 51000
456343223 Vibol, Soung       M 12oct2012 45000
;
RUN;

PROC PRINT DATA = Accounting;
TITLE 'Accounting department employee';
RUN;


Obs,EmployeeID,Name,Gender,HireDate,Salary,Department
1,652453421,"Gardinski, Barbara",F,29MAY2001,59000,Accounting
2,235312326,"Robertson, Hannah",F,14MAR2004,65000,Accounting
3,234523214,"Sopheak, Leng",M,03APR2011,62000,Accounting
4,326574341,"Chentha, Sok",F,09FEB2014,51000,Accounting
5,456343223,"Vibol, Soung",M,12OCT2012,45000,Accounting


In [103]:
DATA New_Accounting (RENAME=(TempVar = EmployeeID)DROP = EmployeeID);
   SET Accounting;
   TempVar = put(EmployeeID, 9.);
RUN;

PROC PRINT DATA = Accounting;
RUN;

PROC Datasets LIBRARY = WORK;
    CONTENTS DATA = New_Accounting;
RUN;

Obs,EmployeeID,Name,Gender,HireDate,Salary,Department
1,652453421,"Gardinski, Barbara",F,29MAY2001,59000,Accounting
2,235312326,"Robertson, Hannah",F,14MAR2004,65000,Accounting
3,234523214,"Sopheak, Leng",M,03APR2011,62000,Accounting
4,326574341,"Chentha, Sok",F,09FEB2014,51000,Accounting
5,456343223,"Vibol, Soung",M,12OCT2012,45000,Accounting

Directory,Directory.1
Libref,WORK
Engine,V9
Physical Name,/tmp/SAS_work293500003A59_localhost.localdomain
Filename,/tmp/SAS_work293500003A59_localhost.localdomain
Inode Number,671572
Access Permission,rwx------
Owner Name,sasdemo
File Size,4KB
File Size (bytes),4096

#,Name,Member Type,File Size,Last Modified
1,ACCOUNTING,DATA,128KB,03/29/2020 16:04:57
2,CUSTOMER_SUPPORT,DATA,128KB,03/29/2020 16:04:50
3,DEP1_5,DATA,128KB,03/29/2020 15:59:14
4,DEPT1_2,DATA,128KB,03/29/2020 16:04:51
5,DEPT1_3,DATA,128KB,03/29/2020 16:04:55
6,DEPT1_4,DATA,128KB,03/29/2020 16:04:01
7,DEPT1_5,DATA,128KB,03/29/2020 16:01:00
8,DROP,DATA,128KB,03/29/2020 15:19:53
9,NEW_ACCOUNTING,DATA,128KB,03/29/2020 16:04:58
10,REGSTRY,ITEMSTOR,32KB,03/29/2020 14:40:04

0,1,2,3
Data Set Name,WORK.NEW_ACCOUNTING,Observations,5
Member Type,DATA,Variables,6
Engine,V9,Indexes,0
Created,03/29/2020 16:04:59,Observation Length,56
Last Modified,03/29/2020 16:04:59,Deleted Observations,0
Protection,,Compressed,NO
Data Set Type,,Sorted,NO
Label,,,
Data Representation,"SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64",,
Encoding,utf-8 Unicode (UTF-8),,

Engine/Host Dependent Information,Engine/Host Dependent Information.1
Data Set Page Size,65536
Number of Data Set Pages,1
First Data Page,1
Max Obs per Page,1166
Obs in First Data Page,5
Number of Data Set Repairs,0
Filename,/tmp/SAS_work293500003A59_localhost.localdomain/new_accounting.sas7bdat
Release Created,9.0401M6
Host Created,Linux
Inode Number,671619

Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes
#,Variable,Type,Len,Format
5,Department,Char,10,
6,EmployeeID,Char,9,
2,Gender,Char,1,
3,HireDate,Num,8,DATE9.
1,Name,Char,19,
4,Salary,Num,8,


In [104]:
DATA Dept1_4;
  SET Sales Customer_support Security New_Accounting;
RUN;

PROC PRINT DATA = Dept1_4;
RUN;

Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department,Gender
1,429685482,"Martin, Virginia",09AUG2002,45000,493-0824,Sales,
2,244967839,"Singleton, MaryAnn",24APR2004,34000,929-2623,Sales,
3,996740216,"Leighton, Maurice",16DEC2001,57000,933-6908,Sales,
4,675443925,"Freuler, Carl",15FEB2010,54500,493-3993,Sales,
5,845729308,"Cage, Merce",19OCT2009,64000,286-0519,Sales,
6,324897451,"Sayre, Jay",15NOV2005,66000,933-2998,Customer support,
7,598723234,"Tolson, Andrew",18MAR2000,54000,929-9984,Customer support,
8,432842452,"Jensen, Helga",01FEB2004,70300,289-2135,Customer support,
9,893421341,"Kulenic, Marie",24JUN2004,54800,872-1342,Customer support,
10,988431421,"Zweerink, Anna",07JUL2011,59000,929-3726,Customer support,


### Using SET statement when variables have different formats

In [115]:
DATA Shipping;
INPUT EmployeeID $ 1-9 Name $ 11-29 Gender $ 30 @32 HireDate DATE11. @42 Salary ;
FORMAT HireDate DATE11.;
FORMAT Salary COMMA6.;
Department = 'Shipping           ';
DATALINES ;
456323452 Carlton, Susan     F 28Jan2012 41000
234342125 Hoffman, Gerald    M 23oct2012 40500
234586429 DePuis, David      M 23aug2011 45000
234621390 Landau, Jennifer   F 30apr2012 43500
324612563 Mekongsok, Sao     M 15oct2013 45000
;
RUN;

PROC PRINT DATA = Shipping;
TITLE 'Shipping department employees';
RUN;


Obs,EmployeeID,Name,Gender,HireDate,Salary,Department
1,456323452,"Carlton, Susan",F,28-JAN-2012,41000,Shipping
2,234342125,"Hoffman, Gerald",M,23-OCT-2012,40500,Shipping
3,234586429,"DePuis, David",M,23-AUG-2011,45000,Shipping
4,234621390,"Landau, Jennifer",F,30-APR-2012,43500,Shipping
5,324612563,"Mekongsok, Sao",M,15-OCT-2013,45000,Shipping


In [116]:
DATA Dept1_5; 
   SET Sales Customer_support Security New_Accounting Shipping;
RUN;

PROC PRINT DATA = Dept1_5;
TITLE 'Employees in Sale, Customer support, Security, ';
TITLE 'Accouting, and Shipping department';
RUN;

Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department,Gender
1,429685482,"Martin, Virginia",09AUG2002,45000,493-0824,Sales,
2,244967839,"Singleton, MaryAnn",24APR2004,34000,929-2623,Sales,
3,996740216,"Leighton, Maurice",16DEC2001,57000,933-6908,Sales,
4,675443925,"Freuler, Carl",15FEB2010,54500,493-3993,Sales,
5,845729308,"Cage, Merce",19OCT2009,64000,286-0519,Sales,
6,324897451,"Sayre, Jay",15NOV2005,66000,933-2998,Customer support,
7,598723234,"Tolson, Andrew",18MAR2000,54000,929-9984,Customer support,
8,432842452,"Jensen, Helga",01FEB2004,70300,289-2135,Customer support,
9,893421341,"Kulenic, Marie",24JUN2004,54800,872-1342,Customer support,
10,988431421,"Zweerink, Anna",07JUL2011,59000,929-3726,Customer support,


In [117]:
 DATA Dept1_5; 
   SET Shipping Sales Customer_support Security New_Accounting;
RUN;

PROC PRINT DATA = Dept1_5;
TITLE 'Employees in Sale, Customer support, Security, ';
TITLE 'Accouting, and Shipping department';
RUN;

Obs,EmployeeID,Name,Gender,HireDate,Salary,Department,HomePhone
1,456323452,"Carlton, Susan",F,28-JAN-2012,41000,Shipping,
2,234342125,"Hoffman, Gerald",M,23-OCT-2012,40500,Shipping,
3,234586429,"DePuis, David",M,23-AUG-2011,45000,Shipping,
4,234621390,"Landau, Jennifer",F,30-APR-2012,43500,Shipping,
5,324612563,"Mekongsok, Sao",M,15-OCT-2013,45000,Shipping,
6,429685482,"Martin, Virginia",,09-AUG-2002,45000,Sales,493-0824
7,244967839,"Singleton, MaryAnn",,24-APR-2004,34000,Sales,929-2623
8,996740216,"Leighton, Maurice",,16-DEC-2001,57000,Sales,933-6908
9,675443925,"Freuler, Carl",,15-FEB-2010,54500,Sales,493-3993
10,845729308,"Cage, Merce",,19-OCT-2009,64000,Sales,286-0519


In [112]:
DATA Research;
INPUT EmployeeID $ 1-9 Name $ 11-37 Gender $ 38 @40 HireDate DATE9. Salary;
FORMAT HireDate DATE9. ;
Department = 'Research';
DATALINES ;
823443453 Schoenberg, Margerite      F 19nov2004 60500
324632423 Addison-Hardy, Jonathan    M 23feb2011 63500
213462346 McNaughton, Elizabeth      F 24jul2001 65000
234652321 Tharrington, Catherine     F 28sep2004 60000
324568634 Prangipani, Christopher    M 12aug2008 63000
;
RUN;

PROC PRINT DATA = Research;
TITLE 'Research department employees';
RUN;


Obs,EmployeeID,Name,Gender,HireDate,Salary,Department
1,823443453,"Schoenberg, Margerite",F,19NOV2004,60500,Research
2,324632423,"Addison-Hardy, Jonathan",M,23FEB2011,63500,Research
3,213462346,"McNaughton, Elizabeth",F,24JUL2001,65000,Research
4,234652321,"Tharrington, Catherine",F,28SEP2004,60000,Research
5,324568634,"Prangipani, Christopher",M,12AUG2008,63000,Research


In [114]:
DATA Dept1_6;
   SET Sales Customer_support Security Shipping Research;
RUN;

PROC PRINT DATA = Dept1_6;
TITLE 'Employees in all departments';
RUN;

Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department,Gender
1,429685482,"Martin, Virginia",09AUG2002,45000,493-0824,Sales,
2,244967839,"Singleton, MaryAnn",24APR2004,34000,929-2623,Sales,
3,996740216,"Leighton, Maurice",16DEC2001,57000,933-6908,Sales,
4,675443925,"Freuler, Carl",15FEB2010,54500,493-3993,Sales,
5,845729308,"Cage, Merce",19OCT2009,64000,286-0519,Sales,
6,324897451,"Sayre, Jay",15NOV2005,66000,933-2998,Customer support,
7,598723234,"Tolson, Andrew",18MAR2000,54000,929-9984,Customer support,
8,432842452,"Jensen, Helga",01FEB2004,70300,289-2135,Customer support,
9,893421341,"Kulenic, Marie",24JUN2004,54800,872-1342,Customer support,
10,988431421,"Zweerink, Anna",07JUL2011,59000,929-3726,Customer support,


## Using APPEND Procedure

APPEND Procedure adds the observations from one SAS data set to the end of another SAS data set. PROC APPEND does not process the observations in the first data set. It adds the observations in the second data set directly to the end of the original data set.

In [120]:
PROC APPEND BASE = Sales DATA=Customer_support;
RUN;

PROC PRINT DATA = Sales;
TITLE 'Employees in the sales and customer support department';
RUN;


Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department
1,429685482,"Martin, Virginia",09AUG2002,45000,493-0824,Sales
2,244967839,"Singleton, MaryAnn",24APR2004,34000,929-2623,Sales
3,996740216,"Leighton, Maurice",16DEC2001,57000,933-6908,Sales
4,675443925,"Freuler, Carl",15FEB2010,54500,493-3993,Sales
5,845729308,"Cage, Merce",19OCT2009,64000,286-0519,Sales
6,324897451,"Sayre, Jay",15NOV2005,66000,933-2998,Customer support
7,598723234,"Tolson, Andrew",18MAR2000,54000,929-9984,Customer support
8,432842452,"Jensen, Helga",01FEB2004,70300,289-2135,Customer support
9,893421341,"Kulenic, Marie",24JUN2004,54800,872-1342,Customer support
10,988431421,"Zweerink, Anna",07JUL2011,59000,929-3726,Customer support


In [123]:
* We muse FORCE option when the DATA set contains a variable that is not in the BASE dataset
g
PROC APPEND BASE = Sales DATA = Security FORCE;  
RUN;

PROC PRINT DATA = Sales;
TITLE 'Employees in sales, customer support, security department';
RUN;

Obs,EmployeeID,Name,HireDate,Salary,HomePhone,Department
1,429685482,"Martin, Virginia",09AUG2002,45000,493-0824,Sales
2,244967839,"Singleton, MaryAnn",24APR2004,34000,929-2623,Sales
3,996740216,"Leighton, Maurice",16DEC2001,57000,933-6908,Sales
4,675443925,"Freuler, Carl",15FEB2010,54500,493-3993,Sales
5,845729308,"Cage, Merce",19OCT2009,64000,286-0519,Sales
6,324897451,"Sayre, Jay",15NOV2005,66000,933-2998,Customer support
7,598723234,"Tolson, Andrew",18MAR2000,54000,929-9984,Customer support
8,432842452,"Jensen, Helga",01FEB2004,70300,289-2135,Customer support
9,893421341,"Kulenic, Marie",24JUN2004,54800,872-1342,Customer support
10,988431421,"Zweerink, Anna",07JUL2011,59000,929-3726,Customer support


In [None]:
A