Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Reintroduce stepwise/forward/backward regression with factors #2597

Closed
PeterKlaren opened this issue Feb 17, 2024 · 10 comments · Fixed by jasp-stats/jaspRegression#288

Comments

@PeterKlaren
Copy link

PeterKlaren commented Feb 17, 2024

Description

Reintroduce stepwise/forward/backward regression with factors

Purpose

To use dummy variables in multiple regression

Use-case

No response

Is your feature request related to a problem?

No

Is your feature request related to a JASP module?

Regression

Describe the solution you would like

The use of factors (dummy variables representing a categorical variable) in linear regression

Describe alternatives that you have considered

None

Additional context

If I am not mistaken, in earlier JASP versions multiple linear regression was possible with a combination of numeric variables and (categorical) factors. In v 0.18.3 (and some earlier as well) the inclusion of a factor in a backward elimination or forward selection or stepswise procedure now results in an error message "Stepwise procedures are not supported for models containing factors."

I am curious as to the rationale for this.

Peter Klaren

@tomtomme
Copy link
Member

@PeterKlaren
Confirmed also for jasp 0.19 beta.
Lets hope that Johnny finds an answer.
In the meantime - can you update the title of your feature request?

@JohnnyDoorn JohnnyDoorn changed the title [Feature Request]: [Feature Request]: Reintroduce stepwise/forward/backward regression with factors Feb 20, 2024
@JohnnyDoorn
Copy link

Hi @PeterKlaren ,

I think we removed this because it was fairly buggy, as a quick solution. I also have to say I am not a fan of these automatic procedures since they are solely based on an arbitrary p-value cut-off, rather than having the decision procedure be based on theory and substantive reflection. Hierarchical regression is still possible by adding variables to the null model, and I am currently working on having blocks of regression predictors, to enable comparing more than 2 models at the same time. Would such a feature already help you out in this sense, or is there still a solid usecase for having these automatic procedures? (I'm genuinely curious because I am also aware I'm operating inside my little methodology bubble..)

Kind regards,
Johnny

@tomtomme
Copy link
Member

@JohnnyDoorn @PeterKlaren
See https://quantpsych.net/wp-content/uploads/2022/08/fife_donofrio_2022.pdf
In there @dustinfife (and many before him) argue that hierarchical linear regression is no good for EDA in comparison to random forests. From the abstract:

"As some utilize more exploratory tools, it may be tempting to
employ multiple linear regression models. In this paper, we advocate for the
use of Random Forest (RF) models. RF is able to obtain better predictive
performance than traditional regression, while also inherently protecting
against overfitting as well as detecting nonlinear effects and interactions
among predictors."

Also it is common to critique the mis-use of stepwise etc. to do CDA.

So yeah, there are already better alternatives out there to tackle the needs of EDA. But many researchers stick to what they know (stepwise, blocks etc.) and want all features that SPSS provide.
I certainly would not know how to deal with this dilemma.

@EJWagenmakers
Copy link
Collaborator

If SPSS provides this, so should we (imo). We also provide p-values, after all.

@PeterKlaren
Copy link
Author

Hi Johnny and others,

I am aware of the drawbacks of stepwise because of the use of subjective p- and F-values (see also for example: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-018-0143-6). In my statistics courses for biology undergrads I advocate the use of manual entry of predictors as hypothesis/theory-driven and the royal way of analysis. (And I will surely read up on RF models!). But if the rationale of removing stepwise from JASP is the use of subjective p-value thresholds then you should consider removing all NHST from JASP as well?!

Still, students will encounter stepwise procedures when using other software or reading books (other than my lecture notes, of course). They then should have an understanding of how different regression algorithms work to let them make an informed choice.

Also, and from a teaching and very practical point of view: I have been using JASP for six years now in my courses. I treat multiple regression (forward, backward, stepwise, forced entry) with factors (dummy variables), and only found out now that stepwise with factors is not supported anymore in recent JASP versions. Of course I should have read the "what's new in this version" info, but it makes my lecture notes a bit obsolete. It would be a hassle to review JASP's new and abandoned features every year in order to update teaching materials.

PK

@tomtomme
Copy link
Member

tomtomme commented Feb 23, 2024

You and EJ made solid points. So lets hope we can reintroduce the feature in a less buggy state.

@JohnnyDoorn
Copy link

JohnnyDoorn commented Feb 23, 2024

Hi @PeterKlaren ,

Thanks for elaborating - I see your point and will look into a proper fix for this.
edit: I think long ago (> 3 years ago) only factors with 2 levels were allowed in linear regression - then with the introduction of the allowing all factors, it broke the stepwise functionality and this measure was taken to avoid the procedure breaking down.

@JohnnyDoorn
Copy link

@PeterKlaren
I see the issue now in the code, and have fixed it such that you can do stepwise regression with factors that have 2 levels (so, dummy variables); when factors contain more than 2 levels, JASP gives an error message, advising to redo the analysis with dummy variables - does that work for your usecase? Having it work with factors with more than 2 levels requires a bigger rewrite, and I think this never worked in the first place.
Similarly, I see there interaction effects are also not allowed in stepwise procedures - is this also something that is required?

@PeterKlaren
Copy link
Author

Dummy variables will work well for my purposes.
Interaction effects often are more interesting than main effects, so this would be a nice feature as well.

@JohnnyDoorn
Copy link

Alright, that also seemed fairly straightforward to include - thanks for providing this little push!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants