Dichotomous/polychotomous dependent variable #14

saharnazb · 2023-04-13T15:26:28Z

Can this method (either in stata or R) be applied when the dependent variable is a factor variable (dichotomous/polychotomous)?

kylebutts · 2023-04-13T15:30:39Z

The method "works" whenever the first-stage model is specified correctly for the outcome variable. For example, if you have an indicator variable as an outcome and you think you've correctly specified the (linear) propensity score model. In general, though, the first-stage model is unlikely to hold

saharnazb · 2023-04-13T15:44:10Z

RIght. So the first stage is estimated usig OLS and not GMM, correct? I assumed the first stage is also estimated using GMM. Then, no distributional assumption is imposed on errors. But it's not the case. Thanks for your response

kylebutts · 2023-04-13T16:39:57Z

I'm not sure I understand the question. OLS is a form of a GMM estimator with moments given by:

Additionally, OLS never requires a distributional assumption on the errors. It only needs the conditional mean of the error is zero. The distributional assumption is an additional assumption used to prove efficiency of the OLS estimator.

saharnazb · 2023-04-13T18:43:04Z

Well OLS imposes the assumption of normal dist. on error terms. It helps us to make statistical inference. In OLS, if the errors are normally distributed with mean zero and constant variance, then the OLS estimator is consistent (and efficient).
Estimating a specification with binary dep. var. leads to predicted values less than 0 and more than 1. OLS assumes that the outcome var is continuous and normally distributed. Binary variables are inherently dichotomous and take only two values. Also, with binary outcome, the variance of errors will depend on the value of the independent variables, resulting in a violation of the constant variance assumption. So, the logtistic regression is suggested. But GMM do not impose normal dist. assumption.
I am sorry if my question was confusing. I am searching a way to test pretrends and do an event study for my case where outcome is polychotomous, data is repeated cross-sections, and treatment is staggered. That is why, I am searching the literature of DID to be consistent with my scenario. I am more of an applied economist and have not been successful to master DID literature yet. After looking at csdid, jwdid, and did2s commands in stata, I am trying to find out which could be the best for me. csdid is not suitable for binary outcome.

friosavila · 2023-04-13T19:08:43Z

hi Saharnaz I think you have concepts here confused. 1. OLS consistency does not depend on normality of the errors. It does depend on the zero conditional mean. 2. OLS does not assume variables are continuous. that is why we can use it for almost all kind of models. Standard errors can always be corrected if one believes they are not homoskedastic.(robust standard erros for once) 3. MLE Does impose distributional assumptions. Otherwise you cannot use it. GMM , as OLS, does not impose distributional assumptions. Just conditions on the Moments. 4. All the commands you mention actually can handle binary variables as dependent variables, but you need to acknowledge that they use LPM (or something similar to it). 5. If you want to use something that handles binary variables explicitly you can use jwdid jwdid y x1 x2 x3, .... method(logit) Here, however, the parallel test assumption is not on the observed probability, but on the latent variable. F

…

On Thu, Apr 13, 2023 at 2:43 PM Saharnaz Babaei-Balderlou < ***@***.***> wrote: Well OLS imposes the assumption of normal dist. on error terms. It helps us to make statistical inference. In OLS, if the errors are normally distributed with mean zero and constant variance, then the OLS estimator is consistent (and efficient). Estimating a specification with binary dep. var. leads to predicted values less than 0 and more than 1. OLS assumes that the outcome var is continuous and normally distributed. Binary variables are inherently dichotomous and take only two values. Also, with binary outcome, the variance of errors will depend on the value of the independent variables, resulting in a violation of the constant variance assumption. So, the logtistic regression is suggested. But MLE and GMM do not impose normal dist. assumption. I am sorry if my question was confusing. I am searching a way to test pretrends and do an event study for my case where outcome is polychotomous, data is repeated cross-sections, and treatment is staggered. That is why, I am searching the literature of DID to be consistent with my scenario. I am more of an applied economist and have not been successful to master DID literature yet. After looking at csdid, jwdid, and did2s commands in stata, I am trying to find out which could be the best for me. csdid is not suitable for binary outcome. — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASZKKFW3UEF2Y44WX37NGC3XBBCMFANCNFSM6AAAAAAW5H3ROE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

saharnazb · 2023-04-13T19:38:04Z

Thank you F for your time.
Maybe I am making a mistake. I will refer to my textbooks regarding how violation of normality assumption could relate to the inference, hypothesis testing, consistency and efficiency of the estimators. Probably, I am confused. Thank you for your explanations. I will check on details.
But binary variable does not have a continuous distribution. We can only assume it as being continuous for LPM.
jwdid for some reason is not converging and I could not find the reason for the error yet (possibly something in the way I set it up). In the meantime working on the error, I tried to check out if there are other options available.

friosavila · 2023-04-13T19:54:50Z

if jwdid is not converging, is probably because a) not enough data. You have very few observations per cohort per year b) too many controls. This will indirectly affect a) I would definitely need more information to say something about why jwdid is not working for you.

…

On Thu, Apr 13, 2023 at 3:38 PM Saharnaz Babaei-Balderlou < ***@***.***> wrote: Thank you F for your time. Maybe I am making a mistake. I will refer to my textbooks regarding how violation of normality assumption could relate to the inference, hypothesis testing, consistency and efficiency of the estimators. Probably, I am confused. Thank you for your explanations. I will check on details. But binary variable does not have a continuous distribution. We can only assume it as being continuous for LPM. jwdid for some reason is not converging and I could not find the reason for the error yet (possibly something in the way I set it up). In the meantime working on the error, I tried to check out if there are other options available. — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASZKKFQ3ZUP7ZLQ3ARLVNS3XBBI2NANCNFSM6AAAAAAW5H3ROE> . You are receiving this because you commented.Message ID: ***@***.***>

saharnazb · 2023-04-13T20:08:41Z

Thank you. I am not sure if this (issue of another code) is the right place to send details. Can I email you?
I believe the problem is the number of controls. I am controling for 35-40 variables with 180,000 observations.
However, even without controls, I get a lot of dots in the results table instead of t-stats, variances and CIs.

kylebutts · 2023-04-13T20:33:12Z

hi Saharnaz I think you have concepts here confused. 1. OLS consistency does not depend on normality of the errors. It does depend on the zero conditional mean. 2. OLS does not assume variables are continuous. that is why we can use it for almost all kind of models. Standard errors can always be corrected if one believes they are not homoskedastic.(robust standard erros for once) 3. MLE Does impose distributional assumptions. Otherwise you cannot use it. GMM , as OLS, does not impose distributional assumptions. Just conditions on the Moments. 4. All the commands you mention actually can handle binary variables as dependent variables, but you need to acknowledge that they use LPM (or something similar to it). 5. If you want to use something that handles binary variables explicitly you can use jwdid jwdid y x1 x2 x3, .... method(logit) Here, however, the parallel test assumption is not on the observed probability, but on the latent variable. F
…
On Thu, Apr 13, 2023 at 2:43 PM Saharnaz Babaei-Balderlou < @.> wrote: Well OLS imposes the assumption of normal dist. on error terms. It helps us to make statistical inference. In OLS, if the errors are normally distributed with mean zero and constant variance, then the OLS estimator is consistent (and efficient). Estimating a specification with binary dep. var. leads to predicted values less than 0 and more than 1. OLS assumes that the outcome var is continuous and normally distributed. Binary variables are inherently dichotomous and take only two values. Also, with binary outcome, the variance of errors will depend on the value of the independent variables, resulting in a violation of the constant variance assumption. So, the logtistic regression is suggested. But MLE and GMM do not impose normal dist. assumption. I am sorry if my question was confusing. I am searching a way to test pretrends and do an event study for my case where outcome is polychotomous, data is repeated cross-sections, and treatment is staggered. That is why, I am searching the literature of DID to be consistent with my scenario. I am more of an applied economist and have not been successful to master DID literature yet. After looking at csdid, jwdid, and did2s commands in stata, I am trying to find out which could be the best for me. csdid is not suitable for binary outcome. — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFW3UEF2Y44WX37NGC3XBBCMFANCNFSM6AAAAAAW5H3ROE . You are receiving this because you are subscribed to this thread.Message ID: @.>

Agreed on all fronts with @friosavila!

kylebutts closed this as completed Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dichotomous/polychotomous dependent variable #14

Dichotomous/polychotomous dependent variable #14

saharnazb commented Apr 13, 2023

kylebutts commented Apr 13, 2023

saharnazb commented Apr 13, 2023

kylebutts commented Apr 13, 2023

saharnazb commented Apr 13, 2023 •

edited

Loading

friosavila commented Apr 13, 2023 via email

saharnazb commented Apr 13, 2023

friosavila commented Apr 13, 2023 via email

saharnazb commented Apr 13, 2023

kylebutts commented Apr 13, 2023

Dichotomous/polychotomous dependent variable #14

Dichotomous/polychotomous dependent variable #14

Comments

saharnazb commented Apr 13, 2023

kylebutts commented Apr 13, 2023

saharnazb commented Apr 13, 2023

kylebutts commented Apr 13, 2023

saharnazb commented Apr 13, 2023 • edited Loading

friosavila commented Apr 13, 2023 via email

saharnazb commented Apr 13, 2023

friosavila commented Apr 13, 2023 via email

saharnazb commented Apr 13, 2023

kylebutts commented Apr 13, 2023

saharnazb commented Apr 13, 2023 •

edited

Loading