Skip to content

My personal notes to be in track while I learn Data Analysis in R for the first time online.

Notifications You must be signed in to change notification settings

rafi007akhtar/R-lab

Repository files navigation

R-lab

This repository goes through basics of R, exploring the key strengths of the language, while analyzing several datasets.

Table of Contents

  1. Command-line Tools
  2. R-basics
    • Objects, Vectors and Classes
    • Flow of Control
    • Sort, Order and Rank
  3. Dataset tools
  4. Graph functions
  5. Data Wrangling tools

Command-line Tools

Run R-script from the console
Rscript filename.r
Open R shell in the console
R
Open help for a function (1)
help(functionName)
Open help for a function (2)
?functionName
Show all the objects present in the workspace
ls()
Input a string from user
inp = readline(promt = "optional prompt text")
Cast into integer
num = as.integer(inp)

R-basics

Objects, Vectors and Classes

Download a package
install.packages ("package_name")
Load a package into your script
library (package_name)
Return the class of an object
class(objectName)
Return the length of an object
length(objectName)
Return TRUE if two objects are identical; FALSE otherwise
identical (object1, object2)
Create a vector or sequence or array with 3 entries (replace entries with values or variables; extendible to less or more number of entries)
vec = c(entry1, entry2, entry3)
Create a vector with integers from n to m
n:m # one way do to it
seq(n,m) # another way do to it
seq(n,m,jump) # increment elements by a factor of jump
seq(n,m,length.out=l) # evenly spread out elements between n and m containing l elements incrementing by the same amount
Access elements of a vector
vec[n] # access element number n
vec[c(n,m)] # access elements number n and m; extensible to more number of elements; can be accessed using other multi-vector indices mentioned above
vec["name"] # access value of a table with name specified
Return a table with frequency of each entry in a vector
table(seq)
Return the index of the lexical minimum of a vector
which.min(vector_name)
Return the index of lexical maximum of a vector
which.max(vector_name)
Arithmetic operations with two vectors (extendible to more than two)
vector1 + vector2 # adds element-wise and returns the sum vector
vector1 - vector2 # subtracts element-wise and returns the difference vector
vector1 * vector2 # adds element-wise and returns the sum vector
vector1 / vector2 # divides vectors element-wise and returns it
vector1 %% vector2 # divides vectors element-wise and computes the remainder; returns it
vector1 ^ n # returns vector1 with all its elements raised to power n
Logical operations on numbers and vectors
num1 & num2 # TRUE if both != 0; FALSE otherwise
num1 | num2 # TRUE if any one != 0; FALSE otherwise
num1 == num2 # TRUE if num1 = num2; FALSE otherwise
vector1 < n # returns a vector with value TRUE for those elements of vector1 < n and FLASE everywhere else
Other logical operations: <=, >, >=, !=, !

Flow of Control

Conditionals
if (condition) {
  # statements
} else {
  # alternate statements
}
Conditional functions
ifelse(condition, if-true-statement, if-false-statement) # works on vectors as well
any(logical_vector) # returns TRUE if any element of logical_vector is TRUE; returns FALSE otherwise
all(logical_vector) # returns TRUE if all elements of logical_vector are TRUE; returns FALSE otherwise
Function defining
funcName = function(parameters) {
  # statements
}
For loop
for (var in vector) {
  # statements
}

Sort, Order, and Rank

Return sorted vector
sort(vector_name)
Return the vector containing indices required to sort the vector
order(vector_name)
Return the vector containing indices required to sort the vector, in descending order
order(vector_name, decreasing=TRUE)
Return the indices of vector items where they appear in the sorted list
rank(vector_name)

To illustrate these concepts, consider the following example.

items sort(items) order(items) rank(items) order(items, decreasing=TRUE)
31 4 2 3 4
4 15 3 1 5
15 31 1 2 1
92 65 5 5 3
65 92 4 4 2

Note. Indexing in R begins from 1.

Indexing tools

Return indices of elements of first vector present in the second vector
match(c("a", "i", "f"), c("a", "e", "i", "o", "u")) # 1 3 NA
Return indices of those elements of a logical vector that have the value TRUE
which(c(TRUE, FALSE, FALSE, TRUE)) # 1 4
Return a vector containing TRUE if element of first vector is present in second vector, and FALSE otherwise
c("a", "i", "f") %in% c("a", "e", "i", "o", "u") # TRUE TRUE FALSE

Dataset tools

Display all datasets available
data()
Load a dataset
data(dataset_name)
Return structure of a data-frame
str(dataset_name)
Return the column names of a data-frame
names(dataset_name)
View the first 6 entries of a data-frame
head(dataset_name)
View the last 6 entries of a data-frame
tail(dataset_name)
Access a column from a data-frame (1)
dataset_name$column_name
Access a column from a data-frame (2)
dataset_name[["column_name"]]
Return a new data-frame (table) with columns col1, col2, col3 (extendible to more or less columns)
data.frame (col_name_1=col1, col_name_2=col2, col_name_3=col3)
Retrieve indices from a dataset containing NA's; returns a list with TRUE for positions having an NA and false otherwise
is.na(dataset_name)
Return number of rows in a dataset (excluding the header)
nrows(dataset_name)

Graph functions

Draw a scatter-plot between two quantities
plot (x-quantity, y-quantity)
Plot a histogram showing frequencies a quantity
hist (quantity)
Plot a boxplot showing for the regions in the dataset and their murder rates
boxplot (rate~region, data = murders)

Data Wrangling tools

It involves the use of dplyr package.

Setting up dplyr onto your system

Open up R console, and enter:
packages.install("dplyr")
Now in your R script file, load dplyr onto your script
library(dplyr)

Using dplyr tools

Return a new table with added column
mutate(tableName, newColumnName = newColumn)
Return a subtable containing columns col1, col2, col3 from table tab (extendable to more / less columns)
select(tab, col1, col2, col3)
Return those tuples from a table whose certain attribute satisfies a certain certain condition
filter(tableName, condition)
Use of pipe operator (%>%): feed in the result of one data-wrangle operation as an input to another data-wrangle operation
shapes %>% filter(type='quad') %>% select(len, breadth) # choose those rows from the shapes dataset that are quadilaterals, and select the len and breath columns from them

About

My personal notes to be in track while I learn Data Analysis in R for the first time online.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages