### MEPS Workshop, April 14-15, 2020
### Analyzing MEPS-HC Data with SAS® 9.4M6 
#### By Pradip K. Muhuri, PhD

## Exercise 4

### Objective

* Estimate the percentage distribution of insurance status (in the second year) of  individuals who were aged 26-30 with high income and uninsured for the whole (first) year 

### Data and Analysis
    * Combine data from MEPS Longitudinal Files (Panels 19, 20, and 21)
    * Run PROC FREQ and PROC MEANS for data checks
    * Run PROC SURVEYMEANS for complex survey estimates


### MEPS Longitudinal File (Panel 21 as an Example)  - Exercise  4

* This file is a two-year longitudinal file derived from the respondents to the MEPS Panel 21 sample. The persons on this data set represent those who were in the MEPS population (U.S. civilian noninstitutionalized) for all or part of the 2016-2017 period. 

* The file contains a longitudinal weight variable (LONGWT) and all variables from the 2016 and 2017 full-year consolidated data files (HC-192 and HC-201, respectively). 

[Read here more about variables in Panel 21 Longitudinal File.](https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-202)
 
* The weight variable (LONGWT), when applied to the persons who participated in both 2016 and 2017, will enable the user to make national estimates of person-level changes in selected variables (e.g., health insurance, health status, utilization and expenditures).



### Concatenate Panels 19, 20, and 21 data sets into one SAS data set
* KEEP only needed variables when loading the original SAS data set
* Create new variables including the subpopulation variable

In [1]:
options nodate nonumber nosource;
LIBNAME CDATA 'C:\DATA';
DATA WORK.POOL;
       SET CDATA.H183 (KEEP=DUPERSID INSCOVY1 INSCOVY2 LONGWT VARSTR VARPSU POVCATY1 AGEY1X PANEL)
           CDATA.H193 (KEEP=DUPERSID INSCOVY1 INSCOVY2 LONGWT VARSTR VARPSU POVCATY1 AGEY1X PANEL)
           CDATA.H202 (KEEP=DUPERSID INSCOVY1 INSCOVY2 LONGWT VARSTR VARPSU POVCATY1 AGEY1X PANEL);
    
    POOLWT = LONGWT/3 ; /* Pooled survey weight */

     /*Create a dichotomous SUBPOP variable 
     (POPULATION WITH AGE=26-30, UNINSURED WHOLE FIRST YEAR, AND HIGH INCOME)
     */
   
     IF INSCOVY1=3 AND 26 LE AGEY1X LE 30 AND POVCATY1=5 THEN SUBPOP=1;
     ELSE SUBPOP=2;
  RUN;

SAS Connection established. Subprocess id is 11360



### Code snippet for PROC FORMAT

In [2]:
options nodate nonumber nosource;
PROC FORMAT;
 VALUE POVCAT 
    1 = '1 POOR/NEGATIVE'
    2 = '2 NEAR POOR'
    3 = '3 LOW INCOME'
    4 = '4 MIDDLE INCOME'
    5 = '5 HIGH INCOME' ;

 VALUE INSF
    -1= '-1 INAPPLICABLE'
     1 = '1 ANY PRIVATE'
     2 = '2 PUBLIC ONLY'
     3 = '3 UNINSURED';  

VALUE  SUBPOP (max= 50)
    1 = 'AGE 26-30, UNINSURED WHOLE YEAR, AND HIGH INCOME'
    2 ='OTHERS';
run;


Use PROC FREQ for data checks

In [3]:
TITLE "COMBINED MEPS DATA FROM PANELS 19, 20, and 21";
options nolabel;
PROC FREQ DATA=POOL ;
tables subpop*INSCOVY2 /lsit missing;
FORMAT INSCOVY2 INSF. SUBPOP SUBPOP.;
RUN;


SUBPOP,INSCOVY2,Frequency,Percent,Cumulative Frequency,Cumulative Percent
"AGE 26-30, UNINSURED WHOLE YEAR, AND HIGH INCOME",-1 INAPPLICABLE,1,0.0,1,0.0
"AGE 26-30, UNINSURED WHOLE YEAR, AND HIGH INCOME",1 ANY PRIVATE,21,0.04,22,0.05
"AGE 26-30, UNINSURED WHOLE YEAR, AND HIGH INCOME",2 PUBLIC ONLY,3,0.01,25,0.05
"AGE 26-30, UNINSURED WHOLE YEAR, AND HIGH INCOME",3 UNINSURED,43,0.09,68,0.14
OTHERS,-1 INAPPLICABLE,549,1.13,617,1.27
OTHERS,1 ANY PRIVATE,26027,53.63,26644,54.9
OTHERS,2 PUBLIC ONLY,17010,35.05,43654,89.95
OTHERS,3 UNINSURED,4878,10.05,48532,100.0


#### Code explanation for the next cell
* With no statistic-keywords specifiedin the PROC SURVEYMEANS statement, it computes the NOBS, MEANS, STDERR, and CLM statistics by default.

* ODS GRAPHICS OFF;  - suppresses the graphics 
* ODS EXCLUDE STATISTICS; - tells SAS not to generate output for the overall population 

* PROC SURVEYMEANS estimates mean out-of-pocket health care expenses for individuals who were aged 26-30 years with high income and uninsured for the whole year

* In order for PROC SURVEYMEANS to generate percentage distribution of INSCOV2, we include the INSCOVY2 variable in both VAR and CLASS statements.  



In [6]:
TITLE 'INSURANCE STATUS IN THE SECOND YEAR FOR THOSE W/ AGE=26-30, UNINSURED WHOLE YEAR, AND HIGH INCOME IN THE FIRST YEAR';
TITLE2 'AVERAGE ESTIMATES OVER 3 PANELS (19, 20, AND 21)';
ODS GRAPHICS OFF;
ODS EXCLUDE STATISTICS;
PROC SURVEYMEANS DATA=POOL; 
    VAR  INSCOVY2;
    STRATUM VARSTR ;
    CLUSTER VARPSU ;
    WEIGHT  POOLWT;
    CLASS INSCOVY2;
    DOMAIN  SUBPOP("AGE 26-30, UNINSURED WHOLE YEAR, AND HIGH INCOME");
    FORMAT INSCOVY2 INSF. SUBPOP SUBPOP.;
RUN;

Data Summary,Data Summary.1
Number of Strata,165
Number of Clusters,371
Number of Observations,48532
Sum of Weights,326752883

Class Level Information,Class Level Information,Class Level Information
Variable,Levels,Values
INSCOVY2,4,-1 INAPPLICABLE 1 ANY PRIVATE 2 PUBLIC ONLY 3 UNINSURED

Statistics for SUBPOP Domains,Statistics for SUBPOP Domains,Statistics for SUBPOP Domains,Statistics for SUBPOP Domains,Statistics for SUBPOP Domains,Statistics for SUBPOP Domains,Statistics for SUBPOP Domains,Statistics for SUBPOP Domains
SUBPOP,Variable,Level,N,Mean,Std Error of Mean,95% CL for Mean,95% CL for Mean.1
"AGE 26-30, UNINSURED WHOLE YEAR, AND HIGH INCOME",INSCOVY2,-1 INAPPLICABLE,1,0.00585,0.00588,0.0,0.01744199
,,1 ANY PRIVATE,21,0.328143,0.072554,0.18510075,0.47118619
,,2 PUBLIC ONLY,3,0.037996,0.026175,0.0,0.08960166
,,3 UNINSURED,43,0.628011,0.075076,0.47999594,0.77602608
