# R Dataframe
> Create, Append, Select, Subset

The note is based on [this tutorial](https://www.guru99.com/r-data-frames.html)

## What is a Data Frame?

A **data frame** is a list of vectors which are of equal length. A matrix contains only one type of data, while a data frame accepts different data types (numeric, character, factor, etc.).

## Create a dataframe
We can create a data frame by passing the variable a,b,c,d into the data.frame() function. We can name the columns with name() and simply specify the name of the variables.
```
data.frame(df, stringsAsFactors = TRUE)
```

Arguments:

* df: It can be a matrix to convert as a data frame or a collection of variables to join
* stringsAsFactors: Convert string to factor by default
We can create our first data set by combining four variables of same length.

In [1]:
# Create a, b, c, d variables
a <- c(10,20,30,40)
b <- c('book', 'pen', 'textbook', 'pencil_case')
c <- c(TRUE,FALSE,TRUE,FALSE)
d <- c(2.5, 8, 10, 7)

# Join  the variables to create a data frame
df  <- data.frame(a ,b , c, d) 
# use arguement variable name as columnname
df

a,b,c,d
<dbl>,<fct>,<lgl>,<dbl>
10,book,True,2.5
20,pen,False,8.0
30,textbook,True,10.0
40,pencil_case,False,7.0


We can see the column headers have the same name as the variables. We can change the column name with the function names(). 

Check the example below:

In [3]:
# Name the data frame
names(df) <- c('ID','items','store','price')
df

ID,items,store,price
<dbl>,<fct>,<lgl>,<dbl>
10,book,True,2.5
20,pen,False,8.0
30,textbook,True,10.0
40,pencil_case,False,7.0


In [4]:
 # print the structure
str(df)

'data.frame':	4 obs. of  4 variables:
 $ ID   : num  10 20 30 40
 $ items: Factor w/ 4 levels "book","pen","pencil_case",..: 1 2 4 3
 $ store: logi  TRUE FALSE TRUE FALSE
 $ price: num  2.5 8 10 7


By default, data frame returns string variables as a factor.

## Slice dataframe

> ```[ROWS,COLUMNS ]```

It is possible to SLICE values of a Data Frame. We select the rows and columns to return into bracket precede by the name of the data frame.

A data frame is composed of rows and columns, df[A, B]. A represents the rows and B the columns. We can slice either by specifying the rows and/or columns.

In [9]:
df[2:3,]

Unnamed: 0_level_0,ID,items,store,price
Unnamed: 0_level_1,<dbl>,<fct>,<lgl>,<dbl>
2,20,pen,False,8
3,30,textbook,True,10


In [8]:
df[2:4,c("ID","price")]

Unnamed: 0_level_0,ID,price
Unnamed: 0_level_1,<dbl>,<dbl>
2,20,8
3,30,10
4,40,7


## Append a column to data frame  

 You can also append a column to a Data Frame. You need to use the symbol $ to append a new variable.

In [10]:
# Create a new vector
quantity <- c(10, 35, 40, 5)

In [11]:
df$quantity <- quantity

In [12]:
df

ID,items,store,price,quantity
<dbl>,<fct>,<lgl>,<dbl>,<dbl>
10,book,True,2.5,10
20,pen,False,8.0,35
30,textbook,True,10.0,40
40,pencil_case,False,7.0,5


In [15]:
df$remark<-c('good','good','medium','not good')

In [16]:
df

ID,items,store,price,quantity,remark
<dbl>,<fct>,<lgl>,<dbl>,<dbl>,<chr>
10,book,True,2.5,10,good
20,pen,False,8.0,35,good
30,textbook,True,10.0,40,medium
40,pencil_case,False,7.0,5,not good


Row numbers has to be the same

In [17]:
df$remark<-c('good','good','medium',)

ERROR: Error in c("good", "good", "medium", ): argument 4 is empty


## Select a column of data frame

In [18]:
 # Select the column ID
df$ID

## Subset a dataframe
In the previous section, we selected an entire column without condition. It is possible to subset based on **whether or not a certain condition was true** .

We use the subset() function.

```r
subset(x, condition)

```

arguments:
* x: data frame used to perform the subset
* condition: define the conditional statement

In [19]:
subset(df, subset = price >5)

Unnamed: 0_level_0,ID,items,store,price,quantity,remark
Unnamed: 0_level_1,<dbl>,<fct>,<lgl>,<dbl>,<dbl>,<chr>
2,20,pen,False,8,35,good
3,30,textbook,True,10,40,medium
4,40,pencil_case,False,7,5,not good


In [20]:
subset(df, subset = store)

Unnamed: 0_level_0,ID,items,store,price,quantity,remark
Unnamed: 0_level_1,<dbl>,<fct>,<lgl>,<dbl>,<dbl>,<chr>
1,10,book,True,2.5,10,good
3,30,textbook,True,10.0,40,medium


In [21]:
subset(df, quantity>30)

Unnamed: 0_level_0,ID,items,store,price,quantity,remark
Unnamed: 0_level_1,<dbl>,<fct>,<lgl>,<dbl>,<dbl>,<chr>
2,20,pen,False,8,35,good
3,30,textbook,True,10,40,medium


Combination of conditions

In [24]:
subset(df, (quantity >30)&(store))

Unnamed: 0_level_0,ID,items,store,price,quantity,remark
Unnamed: 0_level_1,<dbl>,<fct>,<lgl>,<dbl>,<dbl>,<chr>
3,30,textbook,True,10,40,medium


In [25]:
subset(df, (quantity >40)|(store))

Unnamed: 0_level_0,ID,items,store,price,quantity,remark
Unnamed: 0_level_1,<dbl>,<fct>,<lgl>,<dbl>,<dbl>,<chr>
1,10,book,True,2.5,10,good
3,30,textbook,True,10.0,40,medium
