# Creating Factor Variables

In the previous warm-up we explored how factor variables could
be used to split a dataset.  Such splits are usually performed in
order to apply a calculation to each split and perhaps even
combine the results in a later step.  This scenario is so
common that it has its own name: **split-apply-combine**.

In the last warm-up we used factor variables that came with the
original dataset for the split.  It's great when such factors
are readily available.  But sometimes we need to split according
to criteria that are not available with existing factor variables.
In this case we often create one or more factor variables with
values that capture the desired criteria and then perform the
split with these new factor variables.


## Regular Patterns

Sometimes the data in your dataset is structured in regular
patterns.  A useful function for generating factor variables
in regular patterns is **gl** (for Generate Levels).  A few
examples will help.

In [1]:
gl(2, 4, labels=c('this', 'that'))

In [2]:
gl(2, 1, 8, labels=c('this', 'that'))

The parameters to `gl` have the following description.

* `n` - the number of levels to generate,
* `k` - the number of consecutive times each level is repeated.
* `l` - (optional) the total length, `n * k` by default
* `labels` - (optional) names assigned to the factor values, defaults to integers

We can see from the outputs above that the result is a regular
pattern of two constants; so the first parameter is `2` in both
cases.  The difference is in the number of times each constant
is repeated.  In the first case, each constant is repeated `4`
times.  This result in groups of four adjacent elements.

The second example alternates every element; so the second
parameter is `1`.  The default length of such a pattern is
`n * k = 2 * 1 = 2`.

In [4]:
gl(2, 1, labels=c('this', 'that'))

In order to get eight elements like in the first example, we need
to specify the optional third parameter as `8`.


## Level Interactions

We can create a factor from two existing factors through their
**interaction** - that is, through the cross product of their
possible values.

In [5]:
f1 <- gl(2, 2, labels=c('this', 'that'))
f1
f2 <- gl(2, 1, labels=c('one', 'other'))
f2
interaction(f1, f2)

Note that `f2` is only length `2`; `f1` is length `4`.
Two factors must be the same length in order to interact them.
Since the length of `f1` is a multiple of the length of `f2`,
*recycling* was used to extend `f2` for the interaction.