Skip to content

LabeledArray and CategoricalArray #4

@nalimilan

Description

@nalimilan

I've just discovered this package, it's really cool! I was precisely willing to implement something like this.

Something which I've been wondering for some time based on my experience in R is the relationship between LabeledArray and CategoricalArray (or labelled and factor in R). As I see it, value labels are just the way Stata, SAS and SPSS deal with categorical variables. Unfortunately that's a leaky abstraction which doesn't really allow knowing whether a variable is supposed to be categorical or continuous (see e.g. this discussion). In R, I find the divide between labelled vectors and factors to be annoying as it creates a schism which makes data handling even more complex than it needs to be. Maybe we can do better in Julia thanks to its powerful type system and by learning from previous experiences (since the fundamental design isn't set in stone as in R and Stata)?

Of course currently CategoricalArray isn't able to store variables with value labels as it only allows for consecutive reference codes starting from 1 -- so it loses the underlying values if they do not fit that scheme. But it could store an additional field giving the mapping from each level to a custom code. Do you think this would allow merging LabeledArray and CategoricalArray?

I know it's possible in Stata to assign value labels only to some values. This does not seem too problematic as a level could be generated automatically by calling string on the value. I've also read that some variables may have labels attached to some values even though they are truly continuous, but I couldn't find examples of this. Probably value labels are used for continuous variables only to attach labels to missing values, which makes more sense than attaching labels to arbitrary numeric values. Maybe Likert scales are an intermediate case which can be considered as continuous with some assumptions, but treating them as categorical by default sounds safer.

Thanks in advance for your feedback! My experience designing CategoricalArrays is that it's super hard to get right (and issues keep being spotted from time to time), so I figure we'd better join forces if that's possible.

Cc: @bkamins, @pdeffebach

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions