Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stata vs. Matlab Code #8

Open
pmgblz opened this issue Apr 2, 2019 · 1 comment
Open

Stata vs. Matlab Code #8

pmgblz opened this issue Apr 2, 2019 · 1 comment

Comments

@pmgblz
Copy link

pmgblz commented Apr 2, 2019

I have a question regarding the Stata vs. the Matlab code. The two different programs seem to give different results for me. I am using your example code and example data and the same variables for both commands (the difference in means is the same). Discrepancies also seem to persist when using a larger number of bootstrap iterations in both programs (e.g. 10,000).

To generate reproducible code, please see below for the code I ran using your first example:

Here is the code I ran in Stata (from mhtexp_examples.do):

clear all
insheet using data.csv, comma names
//Creating outcome variable
gen amountmat = amount * (1+ratio)
gen groupid = (redcty==1 & red0 == 1) + (redcty==0 & red0 == 1)*2 + (redcty==0 & red0 == 0)*3 + (redcty==1 & red0 == 0)*4
replace groupid = . if groupid == 0

mata: mata mlib index

// help mhtexp

//Example 1: Hypothesis testing with multiple outcomes: 
//  We consider four outcome variables: response rate, dollars given not including
//  match, dollars given including match, and amount change.
mhtexp gave amount amountmat amountchange, treatment(treatment) 
// results are the same when explicitly using bootstrap(3000) as it is the default
mhtexp gave amount amountmat amountchange, treatment(treatment) bootstrap(3000)

This is the code I ran in Matlab (from mhtexp_examples.m):

data = importdata('data.csv');
data = data.data;               % read the dataset
B=3000;                         % the number of simulated samples


%% Hypothesis testing with multiple outcomes: 
% We consider four outcome variables: response rate, dollars given not including
% match, dollars given including match, and amount change.
amountmat = data(:,1).*(data(:,10)+ones(size(data,1),1));             % dollars raised per letter including match
Y = [data(:,[12,1]) amountmat data(:,35)];     % the matrix of outcomes
D = data(:,8);                                 % the vector of treatment statuses
sub = ones(size(D,1),1);                       % the subgroup ID's
numoc = size(Y,2);                             % the number of outcomes
numsub = size(unique(sub),1);                  % the number of subgroups
numg = size(unique(D),1)-1;                    % the number of treatment groups (not including the control group)
combo = [zeros(numg,1) (1:numg)'];             % We compare each treatment to the control.
numpc =size(combo,1);                          % the number of pairs of treatment (control) groups of interest
select = ones(numoc,numsub,numpc);             % We are interested in all the numoc*numsub*numpc hypotheses.
[example1] = mhtexp(B,Y,sub,D,combo,select)

I have also attached screenshots of the results I am getting. Could you please clarify?

Thank you!

Stata Output:
stata_results

Matlab Output:
matlab_results

@seidelj
Copy link
Owner

seidelj commented Apr 2, 2019

Hi, I wrote this code for Yang Xu, who is the corresponding author on the paper detailing this method. Others have raised similar questions. Here is quoted text from Yang on the issue

It seems like even if you set the same seed, Stata and MATLAB have different initial states. So it would be a very involved process to generate the same simulated the samples. The real discrepancy lies in Table 4. I think the 4th and the 6th hypotheses have almost the same p-value for single testing. So a little bit discrepancy between the single p-values due to different “seeds" may lead to different orderings in multiple testing. But this issue will only show up when the single testing p-values are very large so that it won’t qualitatively change our results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants