WIP/ENH: adding support for categorial factors #527

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
3 participants
Contributor

dengemann commented Oct 8, 2012

This relates to our recent discussion on the mailing list

  • added privat _recode function that internally recodes the x factor
  • added additional positional argument, a dict that allows the user to specify the remapping done by _recode
  • i would have prefered a kwarg but this however messes up the ax.plot below. The options i see whithin this approach are a) allowing users to optinoally pass a tuple like (x, x_levels). so no additional positional argument is required and b) explicitly passing a dict with plotting parameters instead of **kwargs

Wdyt?

dengemann added some commits Oct 8, 2012

@dengemann dengemann WIP/ENH: adding support for categorial factors
- added privat _recode function that internally recodes the x factor
- added additional positional argument, a dict that allows the user to specify the remapping done by _recode
- i would have prefered a kwarg but this however messes up the ax.plot below. The options i see whithin this approach are a) allowing users to optinoally pass a tuple like (x, x_levels). so no additional positional argument is required and b) explicitly passing a dict with plotting parameters instead of **kwargs

Wdyt?
8c83c71
@dengemann dengemann ENH/WIP: adding simple example demonstrating categorial factorplots 7c61c31

@jseabold jseabold commented on the diff Oct 24, 2012

statsmodels/graphics/factorplots.py
@@ -4,7 +4,7 @@
import utils
-def interaction_plot(x, trace, response, func=np.mean, ax=None, plottype='b',
+def interaction_plot(x, trace, response, x_levels, func=np.mean, ax=None, plottype='b',
@jseabold

jseabold Oct 24, 2012

Owner

Are we okay with adding args like this? I don't much mind, but it breaks backwards compatibility.

Owner

jseabold commented Oct 24, 2012

Do you think we really need the x_levels argument? Couldn't we just check the dtype in the plot and call _recode with some default levels e.g., range(n_unique)? Thoughts?

Contributor

dengemann commented Oct 24, 2012

On 24.10.2012, at 23:21, Skipper Seabold notifications@github.com wrote:

Hi,

Do you think we really need the x_levels argument?

would be happy to drop it -- feels unnatural to me. although on the other hand side it's really explicit --- less magic

Couldn't we just check the dtype in the plot and call _recode with some default levels e.g., range(n_unique)? Thoughts?

yes, makes sense. the thing that made me hesitate with something like this was that users might associate certain levels with 'hierarchical' meaning, which might not always be obvious from looking at the data. Fo instance you might want to put your control condition on zero (left) and your treatment condition on one (right).
makes sense?

D


Reply to this email directly or view it on GitHub.

Owner

jseabold commented Oct 24, 2012

Sure. We can update the ticklabels with the categories though, so this may alleviate some of this - they'll never see the levels. Now if you really want to control treatment on left, etc. you might be better off rolling your own plot?

Contributor

dengemann commented Oct 24, 2012

On 25.10.2012, at 00:02, Skipper Seabold notifications@github.com wrote:

Sure. We can update the ticklabels with the categories though, so this may alleviate some of this - they'll never see the levels.

yes, sure -- setting ticklabels from categorials would rock
Now if you really want to control treatment on left, etc. you might be better off rolling your own plot?

point taken.

Reply to this email directly or view it on GitHub.

Owner

josef-pkt commented Oct 25, 2012

just a generic comment:

It takes me 5 minutes to understand what the argument names mean, even with reading the doc string.

Contributor

dengemann commented Oct 25, 2012

indeed, something got messed up in the doc string.

i'll update the commit in the course of the next days to reflect the current state of the discussion.
thanks!

On 25.10.2012, at 18:55, Josef Perktold notifications@github.com wrote:

just a generic comment:

It takes me 5 minutes to understand what the argument names mean, even with reading the doc string.


Reply to this email directly or view it on GitHub.

Contributor

dengemann commented Oct 25, 2012

... or did you refer to the arg names in general (pre-commit)?

On 25.10.2012, at 18:55, Josef Perktold notifications@github.com wrote:

just a generic comment:

It takes me 5 minutes to understand what the argument names mean, even with reading the doc string.


Reply to this email directly or view it on GitHub.

Owner

josef-pkt commented Oct 25, 2012

in general, I think already before your changes.
Mainly I didn't understand what "trace" means, why we have a letter x, but y is "response"

(factor1, factor2, response)
(x1, x2_levels, response)
(endog, exog, groups)

in general: x1 could be continuous if we have continuous-categorical interaction.

I'm reading the function completely out of context and never tried it, so it's not obvious to me what this means, except for the basic doc string example.

I don't have a comment about the pull request directly, since I haven't figured out the levels and labels yet. (busy with other things.)

Contributor

dengemann commented Nov 14, 2012

Closing this one, continued on clean PR.

dengemann closed this Nov 14, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment