This repository goes through basics of R, exploring the key strengths of the language, while analyzing several datasets.
- Command-line Tools
- R-basics
- Objects, Vectors and Classes
- Flow of Control
- Sort, Order and Rank
- Dataset tools
- Graph functions
- Data Wrangling tools
Run R-script from the console
Rscript filename.r
Open R shell in the console
R
Open help for a function (1)
help(functionName)
Open help for a function (2)
?functionName
Show all the objects present in the workspace
ls()
Input a string from user
inp = readline(promt = "optional prompt text")
Cast into integer
num = as.integer(inp)
Download a package
install.packages ("package_name")
Load a package into your script
library (package_name)
Return the class of an object
class(objectName)
Return the length of an object
length(objectName)
Return TRUE
if two objects are identical; FALSE
otherwise
identical (object1, object2)
Create a vector or sequence or array with 3 entries (replace entries with values or variables; extendible to less or more number of entries)
vec = c(entry1, entry2, entry3)
Create a vector with integers from n to m
n:m # one way do to it
seq(n,m) # another way do to it
seq(n,m,jump) # increment elements by a factor of jump
seq(n,m,length.out=l) # evenly spread out elements between n and m containing l elements incrementing by the same amount
Access elements of a vector
vec[n] # access element number n
vec[c(n,m)] # access elements number n and m; extensible to more number of elements; can be accessed using other multi-vector indices mentioned above
vec["name"] # access value of a table with name specified
Return a table with frequency of each entry in a vector
table(seq)
Return the index of the lexical minimum of a vector
which.min(vector_name)
Return the index of lexical maximum of a vector
which.max(vector_name)
Arithmetic operations with two vectors (extendible to more than two)
vector1 + vector2 # adds element-wise and returns the sum vector
vector1 - vector2 # subtracts element-wise and returns the difference vector
vector1 * vector2 # adds element-wise and returns the sum vector
vector1 / vector2 # divides vectors element-wise and returns it
vector1 %% vector2 # divides vectors element-wise and computes the remainder; returns it
vector1 ^ n # returns vector1 with all its elements raised to power n
Logical operations on numbers and vectors
num1 & num2 # TRUE if both != 0; FALSE otherwise
num1 | num2 # TRUE if any one != 0; FALSE otherwise
num1 == num2 # TRUE if num1 = num2; FALSE otherwise
vector1 < n # returns a vector with value TRUE for those elements of vector1 < n and FLASE everywhere else
Other logical operations: <=, >, >=, !=, !
Conditionals
if (condition) {
# statements
} else {
# alternate statements
}
Conditional functions
ifelse(condition, if-true-statement, if-false-statement) # works on vectors as well
any(logical_vector) # returns TRUE if any element of logical_vector is TRUE; returns FALSE otherwise
all(logical_vector) # returns TRUE if all elements of logical_vector are TRUE; returns FALSE otherwise
Function defining
funcName = function(parameters) {
# statements
}
For loop
for (var in vector) {
# statements
}
Return sorted vector
sort(vector_name)
Return the vector containing indices required to sort the vector
order(vector_name)
Return the vector containing indices required to sort the vector, in descending order
order(vector_name, decreasing=TRUE)
Return the indices of vector items where they appear in the sorted list
rank(vector_name)
To illustrate these concepts, consider the following example.
items | sort(items) | order(items) | rank(items) | order(items, decreasing=TRUE) |
---|---|---|---|---|
31 | 4 | 2 | 3 | 4 |
4 | 15 | 3 | 1 | 5 |
15 | 31 | 1 | 2 | 1 |
92 | 65 | 5 | 5 | 3 |
65 | 92 | 4 | 4 | 2 |
Note. Indexing in R begins from 1.
Return indices of elements of first vector present in the second vector
match(c("a", "i", "f"), c("a", "e", "i", "o", "u")) # 1 3 NA
Return indices of those elements of a logical vector that have the value TRUE
which(c(TRUE, FALSE, FALSE, TRUE)) # 1 4
Return a vector containing TRUE
if element of first vector is present in second vector, and FALSE
otherwise
c("a", "i", "f") %in% c("a", "e", "i", "o", "u") # TRUE TRUE FALSE
Display all datasets available
data()
Load a dataset
data(dataset_name)
Return structure of a data-frame
str(dataset_name)
Return the column names of a data-frame
names(dataset_name)
View the first 6 entries of a data-frame
head(dataset_name)
View the last 6 entries of a data-frame
tail(dataset_name)
Access a column from a data-frame (1)
dataset_name$column_name
Access a column from a data-frame (2)
dataset_name[["column_name"]]
Return a new data-frame (table) with columns col1
, col2
, col3
(extendible to more or less columns)
data.frame (col_name_1=col1, col_name_2=col2, col_name_3=col3)
Retrieve indices from a dataset containing NA's; returns a list with TRUE for positions having an NA and false otherwise
is.na(dataset_name)
Return number of rows in a dataset (excluding the header)
nrows(dataset_name)
Draw a scatter-plot between two quantities
plot (x-quantity, y-quantity)
Plot a histogram showing frequencies a quantity
hist (quantity)
Plot a boxplot showing for the regions in the dataset and their murder rates
boxplot (rate~region, data = murders)
It involves the use of dplyr package.
Open up R console, and enter:
packages.install("dplyr")
Now in your R script file, load dplyr
onto your script
library(dplyr)
Return a new table with added column
mutate(tableName, newColumnName = newColumn)
Return a subtable containing columns col1
, col2
, col3
from table tab
(extendable to more / less columns)
select(tab, col1, col2, col3)
Return those tuples from a table whose certain attribute satisfies a certain certain condition
filter(tableName, condition)
Use of pipe operator (%>%
): feed in the result of one data-wrangle operation as an input to another data-wrangle operation
shapes %>% filter(type='quad') %>% select(len, breadth) # choose those rows from the shapes dataset that are quadilaterals, and select the len and breath columns from them