### Analyzing MEPS-HC Data with SAS® 9.4 M6 
#### By Pradip K. Muhuri

## Exercise 1

#### Objective
* Generate the following estimates
     * mean health care expenses per person
     * mean health expenses per person with an expense ( overall, and by age group)

#### Data and Analysis
     * Use 2017 MEPS Full-Year Consolidated File
     * Run PROC FREQ for data checks
     * Run PROC SURVEYMEANS for complex survey estimates

### MEPS Full-Year Consolidated File, 2017

This is a person-level data which includes annual variables such as 
* total annual healthcare expenditures by type of care
* payment source, and type of provider seen
* annual and monthly health insurance type indicators
* health conditions, healthcare access and utilization
* quality of care, patient satisfaction, and demographics

[Read here more anout the 2017 Full-Year Consolidated File.](https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-201)

This file contains a total of 31,880 persons who were part of one of the two MEPS panels for whom data were collected in that year:

* 2017 portion of Round 3, Rounds 4 and 5 for Panel 21
* Rounds 1, 2 and the 2017 portion of Round 3 for Panel 22


## Why Use PROC SURVEYMEANS

The MEPS-HC uses  sample design features including stratification, clustering, and oversampling. [Due to complexities in the MEPS-HC  sample designs](https://meps.ahrq.gov/data_files/publications/mr33/mr33.shtml), we must specify the survey weight and design characteristics in PROC SURVEYMEANS step when estimating the parameter for the U.S. civilian noninstitutionalized population.

[See SAS/STAT® 15.1 User’s Guide Introduction to Survey Sampling and Analysis Procedures](https://support.sas.com/documentation/onlinedoc/stat/151/introsamp.pdf)
```
PROC SURVEYMEANS DATA=WORK.PUF201;
   VAR TOTEXP17;
   STRATUM VARSTR;
   CLUSTER VARPSU;
   WEIGHT PERWT17F;
RUN;

(In the above example code ...)
The VAR statement identifies the variable to be analyzed.
The STRATUM statement lists the variable that form the strata.
The CLUSTER statement specifies the cluster identification variable.
The WEIGHT statement names the sampling weight variable.

```    
Notes: If you do not specify statistic-keywords in the PROC SURVEYMEANS statement, it computes the NOBS, MEANS, STDERR, and CLM statistics by default. If you specify the statistic-keywords of your interest including SUM (i.e., estimated population total when the appropriate sampling weights are used) in that statement, the procedure computes STD by default.


The following SAS program (split into four separate code cells) generates the following estimates on national health care expenses for the civilian noninstitutionalized population.

### Code snippets for DATA Step
* Subset the number of variables
* Create new variables

In [None]:
options nocenter nodate nonumber;
ods html close;
proc datasets lib=work nolist kill; quit; /* delete  all files in the WORK library */
libname CDATA "C:\DATA"; 
/* READ IN DATA FROM 2017 CONSOLIDATED DATA FILE (HC-201) */
DATA WORK.PUF201;
  SET CDATA.H201 (KEEP = TOTEXP17 AGELAST VARSTR  VARPSU  PERWT17F);
       /* Create a new TOTEXP17_X variable */
       TOTEXP17_x = TOTEXP17; 
run;

### Code snippets for PROC FORMAT

In [None]:
options nocenter nodate nonumber nosource;
ods html close;
PROC FORMAT;
  VALUE AGECAT
        0-64 = '0-64'
       65-high = '65+';

   VALUE totexp17_x
       0-high     = 'Some Expense'
       other      = 'None';
RUN;

### Code snippets for PROC SURVEYMEANS

In [None]:
options nocenter nodate nonumber;
ods html close;
ods graphics off; /*Suppress the graphics */
TITLE 'OVERALL EXPENSES';
PROC SURVEYMEANS DATA=WORK.PUF201 NOBS SUMWGT MEAN STDERR SUM ;
    VAR TOTEXP17  ;
    STRATUM VARSTR;
    CLUSTER VARPSU;
    WEIGHT PERWT17F;
RUN;

#### Code explanation for the DOMAIN statement
##### TOTEXP17_X('Some Expense')*AGELAST  in the code cell below indicates that only the results associated with TOTEXP17_X='Some Expense' (subpopulation) for each category of AGECAT are of interest here.

In [None]:
options nocenter nodate nonumber ls=132;
TITLE 'MEAN EXPENSE PER PERSON WITH AN EXPENSE, FOR OVERALL, AGE 0-64, AND AGE 65+';
ods graphics off; /*Suppress the graphics */
ODS EXCLUDE STATISTICS; /* Not to generate output for the overall population */
PROC SURVEYMEANS DATA= WORK.PUF201 NOBS SUMWGT MEAN STDERR SUM ;
    VAR  TOTEXP17;
    STRATUM VARSTR ;
    CLUSTER VARPSU ;
    WEIGHT  PERWT17F ;
    DOMAIN TOTEXP17_X('Some Expense')  TOTEXP17_X('Some Expense')*AGELAST ;
    FORMAT TOTEXP17_X TOTEXP17_X. AGELAST agecat. ;
RUN;

In [None]:
options nocenter nodate nonumber;
ods html close;
proc datasets lib=work nolist kill; quit; /* delete  all files in the WORK library */
libname CDATA "C:\DATA"; 
/* READ IN DATA FROM 2017 CONSOLIDATED DATA FILE (HC-201) */
DATA WORK.PUF201;
  SET CDATA.H201 (KEEP = TOTEXP17 OBVEXP17 OPTEXP17 ERTEXP17 IPTEXP17 RXEXP17
                         AGELAST VARSTR  VARPSU  PERWT17F);
       /* Create a new TOTEXP17_X variable */
       TOTEXP17_x = TOTEXP17; 
run;

In [None]:
options nocenter nodate nonumber nosource;
ods html close;
PROC FORMAT;
  VALUE AGECAT
       0-34 = '0-34'
       35-64 = '35-64'
       65-high = '65+';

   VALUE totexp17_x
       0-high     = 'Some Expense'
       other      = 'None';
RUN;

In [None]:
options nocenter nodate nonumber ls=132;
libname CDATA "C:\DATA";
TITLE 'MEAN EXPENSE PER PERSON WITH AN EXPENSE, FOR OVERALL, AGE 0-64, AND AGE 65+';
ods graphics off; /*Suppress the graphics */
ODS select domain; /* Generate output for domain only */
PROC SURVEYMEANS DATA= work.puf201  ;
    VAR  TOTEXP17 OBVEXP17 OPTEXP17 ERTEXP17  IPTEXP17 RXEXP17;
    STRATUM VARSTR ;
    CLUSTER VARPSU ;
    WEIGHT  PERWT17F ;
    DOMAIN TOTEXP17_X('Some Expense')  TOTEXP17_X('Some Expense')*AGELAST ;
    FORMAT TOTEXP17_X TOTEXP17_X. AGELAST agecat. ;
RUN;