Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dichotomous/polychotomous dependent variable #14

Closed
saharnazb opened this issue Apr 13, 2023 · 9 comments
Closed

Dichotomous/polychotomous dependent variable #14

saharnazb opened this issue Apr 13, 2023 · 9 comments

Comments

@saharnazb
Copy link

Can this method (either in stata or R) be applied when the dependent variable is a factor variable (dichotomous/polychotomous)?

@kylebutts
Copy link
Owner

The method "works" whenever the first-stage model is specified correctly for the outcome variable. For example, if you have an indicator variable as an outcome and you think you've correctly specified the (linear) propensity score model. In general, though, the first-stage model is unlikely to hold

@saharnazb
Copy link
Author

RIght. So the first stage is estimated usig OLS and not GMM, correct? I assumed the first stage is also estimated using GMM. Then, no distributional assumption is imposed on errors. But it's not the case. Thanks for your response

@kylebutts
Copy link
Owner

I'm not sure I understand the question. OLS is a form of a GMM estimator with moments given by:
CleanShot 2023-04-13 at 10 37 51@2x

Additionally, OLS never requires a distributional assumption on the errors. It only needs the conditional mean of the error is zero. The distributional assumption is an additional assumption used to prove efficiency of the OLS estimator.

@saharnazb
Copy link
Author

saharnazb commented Apr 13, 2023

Well OLS imposes the assumption of normal dist. on error terms. It helps us to make statistical inference. In OLS, if the errors are normally distributed with mean zero and constant variance, then the OLS estimator is consistent (and efficient).
Estimating a specification with binary dep. var. leads to predicted values less than 0 and more than 1. OLS assumes that the outcome var is continuous and normally distributed. Binary variables are inherently dichotomous and take only two values. Also, with binary outcome, the variance of errors will depend on the value of the independent variables, resulting in a violation of the constant variance assumption. So, the logtistic regression is suggested. But GMM do not impose normal dist. assumption.
I am sorry if my question was confusing. I am searching a way to test pretrends and do an event study for my case where outcome is polychotomous, data is repeated cross-sections, and treatment is staggered. That is why, I am searching the literature of DID to be consistent with my scenario. I am more of an applied economist and have not been successful to master DID literature yet. After looking at csdid, jwdid, and did2s commands in stata, I am trying to find out which could be the best for me. csdid is not suitable for binary outcome.

@friosavila
Copy link

friosavila commented Apr 13, 2023 via email

@saharnazb
Copy link
Author

Thank you F for your time.
Maybe I am making a mistake. I will refer to my textbooks regarding how violation of normality assumption could relate to the inference, hypothesis testing, consistency and efficiency of the estimators. Probably, I am confused. Thank you for your explanations. I will check on details.
But binary variable does not have a continuous distribution. We can only assume it as being continuous for LPM.
jwdid for some reason is not converging and I could not find the reason for the error yet (possibly something in the way I set it up). In the meantime working on the error, I tried to check out if there are other options available.

@friosavila
Copy link

friosavila commented Apr 13, 2023 via email

@saharnazb
Copy link
Author

Thank you. I am not sure if this (issue of another code) is the right place to send details. Can I email you?
I believe the problem is the number of controls. I am controling for 35-40 variables with 180,000 observations.
However, even without controls, I get a lot of dots in the results table instead of t-stats, variances and CIs.

@kylebutts
Copy link
Owner

hi Saharnaz I think you have concepts here confused. 1. OLS consistency does not depend on normality of the errors. It does depend on the zero conditional mean. 2. OLS does not assume variables are continuous. that is why we can use it for almost all kind of models. Standard errors can always be corrected if one believes they are not homoskedastic.(robust standard erros for once) 3. MLE Does impose distributional assumptions. Otherwise you cannot use it. GMM , as OLS, does not impose distributional assumptions. Just conditions on the Moments. 4. All the commands you mention actually can handle binary variables as dependent variables, but you need to acknowledge that they use LPM (or something similar to it). 5. If you want to use something that handles binary variables explicitly you can use jwdid jwdid y x1 x2 x3, .... method(logit) Here, however, the parallel test assumption is not on the observed probability, but on the latent variable. F

On Thu, Apr 13, 2023 at 2:43 PM Saharnaz Babaei-Balderlou < @.> wrote: Well OLS imposes the assumption of normal dist. on error terms. It helps us to make statistical inference. In OLS, if the errors are normally distributed with mean zero and constant variance, then the OLS estimator is consistent (and efficient). Estimating a specification with binary dep. var. leads to predicted values less than 0 and more than 1. OLS assumes that the outcome var is continuous and normally distributed. Binary variables are inherently dichotomous and take only two values. Also, with binary outcome, the variance of errors will depend on the value of the independent variables, resulting in a violation of the constant variance assumption. So, the logtistic regression is suggested. But MLE and GMM do not impose normal dist. assumption. I am sorry if my question was confusing. I am searching a way to test pretrends and do an event study for my case where outcome is polychotomous, data is repeated cross-sections, and treatment is staggered. That is why, I am searching the literature of DID to be consistent with my scenario. I am more of an applied economist and have not been successful to master DID literature yet. After looking at csdid, jwdid, and did2s commands in stata, I am trying to find out which could be the best for me. csdid is not suitable for binary outcome. — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFW3UEF2Y44WX37NGC3XBBCMFANCNFSM6AAAAAAW5H3ROE . You are receiving this because you are subscribed to this thread.Message ID: @.>

Agreed on all fronts with @friosavila!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants