# Factor in R
> Categorical & Continuous Variables

The notebook is based on [this tutorial](https://www.guru99.com/r-factor-categorical-continuous.html)

## What is factor in R?

Factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables.

In a dataset, we can distinguish two types of variables: **categorical** and **continuous**.

* In a **categorical** variable, the value is limited and usually based on a particular finite group. For example, a categorical variable can be countries, year, gender, occupation.
* A **continuous** variable, however, can take any values, from integer to decimal. For example, we can have the revenue, price of a share, etc..

## Categorical variables

R stores categorical variables into a factor. Let's check the code below to convert a character variable into a factor variable. Characters are not supported in machine learning algorithm, and the only way is to convert a string to an integer.

Syntax
```r
factor(x = character(), levels, labels = levels, ordered = is.ordered(x))
```
* x: A vector of data. Need to be a string or integer, not decimal.
* Levels: A vector of possible values taken by x. This argument is optional. The default value is the unique list of items of the vector x.
* Labels: Add a label to the x data. For example, 1 can take the label `male` while 0, the label `female`.
* ordered: Determine if the levels should be ordered.

Let's create a factor dataframe

In [3]:
# Create a gender vector
gender_vector <- c("Male","Female","Female","Male","Male")
class(gender_vector)

In [4]:
gender_vector

Convert vector to factor

In [5]:
factor_gender_vector <- factor(gender_vector)
class(factor_gender_vector)

In [6]:
factor_gender_vector

It is important to transform a **string** into factor when we perform Machine Learning task.

A categorical variable can be divided into **nominal** categorical variable and **ordinal** categorical variable.

### Nominal Categorical Variable
A categorical variable has several values but the order does not matter. For instance, male or female categorical variable do not have ordering.

In [7]:
# Create a color vector
color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
# Convert the vector to factor
factor_color <- factor(color_vector)
factor_color # the levels in return are ordered in an alphabetical order

### Ordinal Categorical Variable
Ordinal categorical variables do have a natural ordering. We can specify the order, from the lowest to the highest with order = TRUE and highest to lowest with order = FALSE.

In [10]:
# Create Ordinal categorical vector 
day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')

# enforcing a meaningful order in factor
factor_day <- factor(day_vector, order = TRUE, 
                     levels=c('morning',"midday","afternoon","evening","midnight"))
factor_day

### Occurance stats

In [11]:
# Append the line to above code
# Count the number of occurence of each level

summary(factor_day)

R ordered the level from 'morning' to 'midnight' as specified in the levels parenthesis.

## Continuous variable

Continuous class variables are the default value in R. They are stored as numeric or integer. We can see it from the dataset below. mtcars is a built-in dataset. It gathers information on different types of car. We can import it by using mtcars and check the class of the variable mpg, mile per gallon. It returns a numeric value, indicating a continuous variable.

In [12]:
dataset <- mtcars

In [14]:
dataset$mpg

In [15]:
dataset

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
