My personal notes to be in track while I learn Data Analysis in R for the first time online.

This repository goes through basics of R, exploring the key strengths of the language, while analyzing several datasets.

Table of Contents

  1. Command-line Tools
  2. R-basics
    • Objects, Vectors and Classes
    • Flow of Control
    • Sort, Order and Rank
  3. Dataset tools
  4. Graph functions
  5. Data Wrangling tools

Command-line Tools

Run R-script from the console
Rscript filename.r
Open R shell in the console
Open help for a function (1)
Open help for a function (2)
Show all the objects present in the workspace
Input a string from user
inp = readline(promt = "optional prompt text")
Cast into integer
num = as.integer(inp)


Objects, Vectors and Classes

Download a package
install.packages ("package_name")
Load a package into your script
library (package_name)
Return the class of an object
Return the length of an object
Return TRUE if two objects are identical; FALSE otherwise
identical (object1, object2)
Create a vector or sequence or array with 3 entries (replace entries with values or variables; extendible to less or more number of entries)
vec = c(entry1, entry2, entry3)
Create a vector with integers from n to m
n:m # one way do to it
seq(n,m) # another way do to it
seq(n,m,jump) # increment elements by a factor of jump
seq(n,m,length.out=l) # evenly spread out elements between n and m containing l elements incrementing by the same amount
Access elements of a vector
vec[n] # access element number n
vec[c(n,m)] # access elements number n and m; extensible to more number of elements; can be accessed using other multi-vector indices mentioned above
vec["name"] # access value of a table with name specified
Return a table with frequency of each entry in a vector
Return the index of the lexical minimum of a vector
Return the index of lexical maximum of a vector
Arithmetic operations with two vectors (extendible to more than two)
vector1 + vector2 # adds element-wise and returns the sum vector
vector1 - vector2 # subtracts element-wise and returns the difference vector
vector1 * vector2 # adds element-wise and returns the sum vector
vector1 / vector2 # divides vectors element-wise and returns it
vector1 %% vector2 # divides vectors element-wise and computes the remainder; returns it
vector1 ^ n # returns vector1 with all its elements raised to power n
Logical operations on numbers and vectors
num1 & num2 # TRUE if both != 0; FALSE otherwise
num1 | num2 # TRUE if any one != 0; FALSE otherwise
num1 == num2 # TRUE if num1 = num2; FALSE otherwise
vector1 < n # returns a vector with value TRUE for those elements of vector1 < n and FLASE everywhere else
Other logical operations: <=, >, >=, !=, !

Flow of Control

if (condition) {
  # statements
} else {
  # alternate statements
Conditional functions
ifelse(condition, if-true-statement, if-false-statement) # works on vectors as well
any(logical_vector) # returns TRUE if any element of logical_vector is TRUE; returns FALSE otherwise
all(logical_vector) # returns TRUE if all elements of logical_vector are TRUE; returns FALSE otherwise
Function defining
funcName = function(parameters) {
  # statements
For loop
for (var in vector) {
  # statements

Sort, Order, and Rank

Return sorted vector
Return the vector containing indices required to sort the vector
Return the vector containing indices required to sort the vector, in descending order
order(vector_name, decreasing=TRUE)
Return the indices of vector items where they appear in the sorted list

To illustrate these concepts, consider the following example.

items sort(items) order(items) rank(items) order(items, decreasing=TRUE)
31 4 2 3 4
4 15 3 1 5
15 31 1 2 1
92 65 5 5 3
65 92 4 4 2

Note. Indexing in R begins from 1.

Indexing tools

Return indices of elements of first vector present in the second vector
match(c("a", "i", "f"), c("a", "e", "i", "o", "u")) # 1 3 NA
Return indices of those elements of a logical vector that have the value TRUE
which(c(TRUE, FALSE, FALSE, TRUE)) # 1 4
Return a vector containing TRUE if element of first vector is present in second vector, and FALSE otherwise
c("a", "i", "f") %in% c("a", "e", "i", "o", "u") # TRUE TRUE FALSE

Dataset tools

Display all datasets available
Load a dataset
Return structure of a data-frame
Return the column names of a data-frame
View the first 6 entries of a data-frame
View the last 6 entries of a data-frame
Access a column from a data-frame (1)
Access a column from a data-frame (2)
Return a new data-frame (table) with columns col1, col2, col3 (extendible to more or less columns)
data.frame (col_name_1=col1, col_name_2=col2, col_name_3=col3)
Retrieve indices from a dataset containing NA's; returns a list with TRUE for positions having an NA and false otherwise
Return number of rows in a dataset (excluding the header)

Graph functions

Draw a scatter-plot between two quantities
plot (x-quantity, y-quantity)
Plot a histogram showing frequencies a quantity
hist (quantity)
Plot a boxplot showing for the regions in the dataset and their murder rates
boxplot (rate~region, data = murders)

Data Wrangling tools

It involves the use of dplyr package.

Setting up dplyr onto your system

Open up R console, and enter:
Now in your R script file, load dplyr onto your script

Using dplyr tools

Return a new table with added column
mutate(tableName, newColumnName = newColumn)
Return a subtable containing columns col1, col2, col3 from table tab (extendable to more / less columns)
select(tab, col1, col2, col3)
Return those tuples from a table whose certain attribute satisfies a certain certain condition
filter(tableName, condition)
Use of pipe operator (%>%): feed in the result of one data-wrangle operation as an input to another data-wrangle operation
shapes %>% filter(type='quad') %>% select(len, breadth) # choose those rows from the shapes dataset that are quadilaterals, and select the len and breath columns from them


