# R factors

### Factors in R
Factors are the data objects which are used to categorize the data and store it
as levels. They can store both strings and integers. Factors are created using the `factor()`
function by taking a vector as input.

In [1]:
#Example
x=c('A','B','C','D','A','B','0','1','2')
is.factor(x) #checking if x is a factor

In [2]:
y=factor(x)
y

In [3]:
#checking if y is a factor
is.factor(y)

### Changing the order of the levels in a factor  
The order of the levels in a factor can be changed by applying the `factor()` function
again with new order of the levels.

In [4]:
factor(y,levels = c('A','B','C','D','0','1','2'))

# R Data Frame

###  Data frame in R  
A data frame is a table or a two-dimensional array-like structure in which each
column contains values of one variable and each row contains one set of values from each
column. It is created using the function `data.frame()`. <br> 
Following are the characteristics of a
data frame.
1. The column names should be non-empty.
2. The row names should be unique.
3. The data stored in a data frame can be of numeric, factor or character type.
4. Each column should contain same number of data items.

In [5]:
#Create a dataframe of employee data
employee.data=data.frame(emp_id=c(1:3),
                    emp_name=c('Amar','Akbar','Anthony'),
                    salary=c(10000,200000,300000),
                    start_date=as.Date(c('2012-01-01','2013-09-23','2014-11-15')),
                    stringsAsFactors=FALSE)

In [6]:
#Print the data frame
employee.data

emp_id,emp_name,salary,start_date
1,Amar,10000.0,2012-01-01
2,Akbar,200000.0,2013-09-23
3,Anthony,300000.0,2014-11-15


In [7]:
##Create a dataframe of employee data setting stringAsFactor=TRUE
employee.data=data.frame(emp_id=c(1:3),
                    emp_name=c('Amar','Akbar','Anthony'),
                    salary=c(10000,200000,300000),
                    start_date=as.Date(c('2012-01-01','2013-09-23','2014-11-15')),
                    stringsAsFactors=TRUE)

In [8]:
#Print the data frame
employee.data

emp_id,emp_name,salary,start_date
1,Amar,10000.0,2012-01-01
2,Akbar,200000.0,2013-09-23
3,Anthony,300000.0,2014-11-15


**Note:** The argument ‘stringsAsFactors’ is an argument to the ‘data.frame()’ function in
R. It is a logical that indicates whether strings in a data frame should be treated as factor
variables or as just plain strings. By default, ‘stringsAsFactors’ is set to TRUE.

### Structure of the data frame
The structure of the data frame can be seen by using `str()` function.

In [9]:
str(employee.data)

'data.frame':	3 obs. of  4 variables:
 $ emp_id    : int  1 2 3
 $ emp_name  : Factor w/ 3 levels "Akbar","Amar",..: 2 1 3
 $ salary    : num  1e+04 2e+05 3e+05
 $ start_date: Date, format: "2012-01-01" "2013-09-23" ...


### Summary of the data frame
The statistical summary and nature of the data can be obtained by applying
`summary()` function.

In [10]:
summary(employee.data)

     emp_id       emp_name     salary         start_date        
 Min.   :1.0   Akbar  :1   Min.   : 10000   Min.   :2012-01-01  
 1st Qu.:1.5   Amar   :1   1st Qu.:105000   1st Qu.:2012-11-11  
 Median :2.0   Anthony:1   Median :200000   Median :2013-09-23  
 Mean   :2.0               Mean   :170000   Mean   :2013-07-14  
 3rd Qu.:2.5               3rd Qu.:250000   3rd Qu.:2014-04-20  
 Max.   :3.0               Max.   :300000   Max.   :2014-11-15  

### Extracting columns from the dataframe
Write data-frame-name.data then $ and name of column inside data.frame(). If
more than one column separate them by comma.

In [11]:
#Extracting the salaries of the employees
data.frame((employee.data$salary))

X.employee.data.salary.
10000.0
200000.0
300000.0


### Extracting rows from the data frame  
Write data-frame-name `[i : k, ]` if you want to extract rows form i to k.

In [12]:
employee.data[2:3,] #don't miss the comma

Unnamed: 0,emp_id,emp_name,salary,start_date
2,2,Akbar,200000.0,2013-09-23
3,3,Anthony,300000.0,2014-11-15


The End