A `data frame` is a table or a `two-dimensional` array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame.

1. The column names should be non-empty.
2. The row names should be unique.
3.The data stored in a data frame can be of numeric, factor or character type.
4. Each column should contain same number of data items.

In [2]:
# Create the data frame.
emp.data <- data.frame(
    emp_id = c (1:5), 
    emp_name = c("Rani","Divya","Raju","Ramya","Ram"),
    salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   date_of_Join = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)
# Print the data frame.
emp.data 

emp_id,emp_name,salary,date_of_Join
1,Rani,623.3,2012-01-01
2,Divya,515.2,2013-09-23
3,Raju,611.0,2014-11-15
4,Ramya,729.0,2014-05-11
5,Ram,843.25,2015-03-27


`Get the Structure of the Data Frame`
The structure of the data frame can be seen by using str() function.

In [3]:
str(emp.data)

'data.frame':	5 obs. of  4 variables:
 $ emp_id      : int  1 2 3 4 5
 $ emp_name    : chr  "Rani" "Divya" "Raju" "Ramya" ...
 $ salary      : num  623 515 611 729 843
 $ date_of_Join: Date, format: "2012-01-01" "2013-09-23" ...


In [4]:
print(summary(emp.data)) 

     emp_id    emp_name             salary       date_of_Join       
 Min.   :1   Length:5           Min.   :515.2   Min.   :2012-01-01  
 1st Qu.:2   Class :character   1st Qu.:611.0   1st Qu.:2013-09-23  
 Median :3   Mode  :character   Median :623.3   Median :2014-05-11  
 Mean   :3                      Mean   :664.4   Mean   :2014-01-14  
 3rd Qu.:4                      3rd Qu.:729.0   3rd Qu.:2014-11-15  
 Max.   :5                      Max.   :843.2   Max.   :2015-03-27  


`**Extract Data from Data Frame**`
Extract specific column from a data frame using column name.

In [5]:
result <- data.frame(emp.data$emp_name,emp.data$salary)
result

emp.data.emp_name,emp.data.salary
Rani,623.3
Divya,515.2
Raju,611.0
Ramya,729.0
Ram,843.25


In [13]:
# Extract first two rows.
result <- emp.data[1,]
result

emp_id,emp_name,salary,date_of_Join
1,Rani,623.3,2012-01-01


In [14]:
# Extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
result

Unnamed: 0,emp_name,date_of_Join
3,Raju,2014-11-15
5,Ram,2015-03-27


In [15]:
# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
v

emp_id,emp_name,salary,date_of_Join,dept
1,Rani,623.3,2012-01-01,IT
2,Divya,515.2,2013-09-23,Operations
3,Raju,611.0,2014-11-15,IT
4,Ramya,729.0,2014-05-11,HR
5,Ram,843.25,2015-03-27,Finance


In [16]:
emp.data

emp_id,emp_name,salary,date_of_Join,dept
1,Rani,623.3,2012-01-01,IT
2,Divya,515.2,2013-09-23,Operations
3,Raju,611.0,2014-11-15,IT
4,Ramya,729.0,2014-05-11,HR
5,Ram,843.25,2015-03-27,Finance


In [17]:

# Create the second data frame
emp.newdata <-data.frame(
   emp_id = c (6:8), 
   emp_name = c("Rasmi","Pranab","Tusar"),
   salary = c(578.0,722.5,632.8), 
   date_of_Join = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
   dept = c("IT","Operations","Fianance"),
   stringsAsFactors = FALSE
)
emp.newdata

emp_id,emp_name,salary,date_of_Join,dept
6,Rasmi,578.0,2013-05-21,IT
7,Pranab,722.5,2013-07-30,Operations
8,Tusar,632.8,2014-06-17,Fianance


In [18]:
# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
emp.finaldata

emp_id,emp_name,salary,date_of_Join,dept
1,Rani,623.3,2012-01-01,IT
2,Divya,515.2,2013-09-23,Operations
3,Raju,611.0,2014-11-15,IT
4,Ramya,729.0,2014-05-11,HR
5,Ram,843.25,2015-03-27,Finance
6,Rasmi,578.0,2013-05-21,IT
7,Pranab,722.5,2013-07-30,Operations
8,Tusar,632.8,2014-06-17,Fianance


In [19]:
x = emp.finaldata
x

emp_id,emp_name,salary,date_of_Join,dept
1,Rani,623.3,2012-01-01,IT
2,Divya,515.2,2013-09-23,Operations
3,Raju,611.0,2014-11-15,IT
4,Ramya,729.0,2014-05-11,HR
5,Ram,843.25,2015-03-27,Finance
6,Rasmi,578.0,2013-05-21,IT
7,Pranab,722.5,2013-07-30,Operations
8,Tusar,632.8,2014-06-17,Fianance


In [20]:
x['salary']<600

salary
False
True
False
False
False
True
False
False


In [21]:
# Create a, b, c, d variables
a <- c(10,20,30,40)
b <- c('book', 'pen', 'textbook', 'pencil_case')
c <- c(TRUE,FALSE,TRUE,FALSE)
d <- c(2.5, 8, 10, 7)
# Join the variables to create a data frame
df <- data.frame(a,b,c,d)
df

a,b,c,d
10,book,True,2.5
20,pen,False,8.0
30,textbook,True,10.0
40,pencil_case,False,7.0


In [22]:
# Name the data frame
names(df) <- c('ID', 'items', 'store', 'price')
df

ID,items,store,price
10,book,True,2.5
20,pen,False,8.0
30,textbook,True,10.0
40,pencil_case,False,7.0


In [23]:
# Print the structure
str(df)

'data.frame':	4 obs. of  4 variables:
 $ ID   : num  10 20 30 40
 $ items: Factor w/ 4 levels "book","pen","pencil_case",..: 1 2 4 3
 $ store: logi  TRUE FALSE TRUE FALSE
 $ price: num  2.5 8 10 7


### Slice Data Frame

It is possible to SLICE values of a Data Frame. We select the rows and columns to return into bracket precede by the name of the data frame.

A data frame is composed of rows and columns, df[A, B]. A represents the rows and B the columns. We can slice either by specifying the rows and/or columns.

From picture 1, the left part represents the rows, and the right part is the columns. Note that the symbol : means to. For instance, 1:3 intends to select values from 1 to 3.

![](https://lh3.googleusercontent.com/-w0N3CuICLJ4/XXnz9odzUjI/AAAAAAAAhmw/6e6vlEBDdh8vHqTbtKwhxg4V2UXh_Cx7gCK8BGAsYHg/s0/2019-09-12.png)

In below diagram we display how to access different selection of the data frame:

* The yellow arrow selects the row 1 in column 2
* The green arrow selects the rows 1 to 2
* The red arrow selects the column 1
* The blue arrow selects the rows 1 to 3 and columns 3 to 4

Note that, if we let the left part blank, R will select all the rows. By analogy, if we let the right part blank, R will select all the columns.
![](https://lh3.googleusercontent.com/-WZzs8CzMc98/XXnz-4HuvoI/AAAAAAAAhm0/Pl5zLDPm_HMgOLRhYQvN9HdQoK6VMo9YQCK8BGAsYHg/s0/2019-09-12.png)


In [25]:
## Select row 1 in column 2
df[1,2]

In [26]:
## Select Rows 1 to 2
df[1:2,]

ID,items,store,price
10,book,True,2.5
20,pen,False,8.0


In [27]:
## Select Columns 1
df[,1]


In [28]:
## Select Rows 1 to 3 and columns 3 to 4
df[1:3, 3:4]

store,price
True,2.5
False,8.0
True,10.0


It is also possible to select the columns with their names. For instance, the code below extracts two columns: ID and store.

In [29]:
# Slice with columns name
df[, c('ID', 'store')]

ID,store
10,True
20,False
30,True
40,False


### Append a Column to Data Frame
You can also append a column to a Data Frame. You need to use the symbol $ to append a new variable.

In [30]:
# Create a new vector
quantity <- c(10, 35, 40, 5)

# Add `quantity` to the `df` data frame
df$quantity <- quantity
df

ID,items,store,price,quantity
10,book,True,2.5,10
20,pen,False,8.0,35
30,textbook,True,10.0,40
40,pencil_case,False,7.0,5


In [26]:
quantity <- c(10, 35, 40,50)


In [27]:
# Add `quantity` to the `df` data frame
df$quantity <- quantity

In [29]:
# Select the column ID
df$ID

In [31]:
# Select price above 5
subset(df, subset = price > 5)

Unnamed: 0,ID,items,store,price,quantity
2,20,pen,False,8,35
3,30,textbook,True,10,40
4,40,pencil_case,False,7,50
