# Exploring the Basics of R Programming

This chapter explores the basics of R programming using a series of examples of increasing complexity.

## Hello, World!

This is the canonical first program in any language. It's fairly straightforward.

In [211]:
print("Hello, World!")

[1] "Hello, World!"


If you just want the text and not other stuff like the '\[1\]' and the quotes around the phrase. Use the <code>cat()</code> function.

In [212]:
cat("Hello, World!")

Hello, World!

Use <code>print()</code> when debugging or displaying values in scripts (it shows indexing, structure, etc.), and <code>cat()</code> when you want plain output.

The '\[1\]' that you see printed before the output is the index of the first item on the line the the data structure. For "Hello, World" it is not useful, but it can be very useful when printing more complex data strucutres like vectors.

In [213]:
# Assign numbers from 0 to 50 (inclusive) to y.
y <- 0:50
print(y) # print the value

 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[26] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
[51] 50


In the output above the numbers in square brackets indicate the index of the first number that follows them on the line. i.e '\[1\]' indicates that the '0' that follows it is the element at index 1 in the vector y.  i.e. <code>y[1] == 0</code>. Similarly on the next two lines, the index of the first element printed on each line is 26 and 51 respectively. i.e. <code>y[26] == 25</code> and <code>y[51] == 50</code>. You can see that by inspecting the values of the vector y at those indices below.

In [214]:
y[1]
y[26]
y[51]

Why does <code>print()</code> do this? Because, this makes it easy for a reader of the output to figure out how many element were printed in each line.

## Comments
Note the text following the # on the first line in the code snippet above. Also note the text following the print(y) function call. These are comments. In R, a comment start with the '<code>#</code>' (hash) sign and end with end of the line. Once cool trick that programmers often use, is to 'comment out' lines of code they don't want to execute, but don't want to delete either.

In [215]:
# The follwing lines are commented out. Nothing is run.
# v <- 10
# print(v)

## The Assignment Operator
Notice that we've also introduced some new notation when creating the vector y.  The first symbol <code><-</code> is the assignment symbol. It assigns the value on the right to a variable on the left.

You can also use the <code>'='</code> sign which is very common in other programming languages. But if you come from a math background you will find the <code><-</code> very intuitive.

In [216]:
y = 1:10
y

You can also assign values in the other direction, left to right using <code>-></code>. This is not common though. And not recommended.

In [217]:
1:10 -> y
y

There are subtle footguns that arise when you use either '<code>=</code>' or '<code>-></code>' in R.  The best policy is to stick to the standard '<codE><-</code>' assignment operator.

## Vectors
The other new notation that we encounter when we create vector y, was the notation <code>1:50</code>. This creates a vector of numbers from 1 to 50 inclusive. i.e. all numbers from 1 to 50 including both 1 and 50 (a total of 51 numbers.)

The '<code>:</code>' operator can be used to create an inverse sequence from 50 down to 1 for example.

In [218]:
y <- 50:1
print(y)

 [1] 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26
[26] 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1


Notice that first number in the vector is 50 and it goes down from there.

### Note on Immediate Mode Evaluation

You will notice that at times the code above assigns a value to a variable like y. And then, instead of calling print, the variable name is just placed on its own in a separate line. This is a simple way to force the R interpreter to display the value of the variable.

It works in situations where you are doing something iteractive and want to just quickly see the value of variable or expression. But for more structured situations, like self contained programs that run on thair own its not recommended.

Here's an example. Without the <code>y</code>, the assignment operation produces no output.

In [219]:
y <- c(1, 2, 3)

Note that there is no output when you run the code. But putting the variable by itself on a line triggers an evaluation for the variable to display its value.

In [220]:
y

Y in this case is treated as an expression. Any expression, will be evaluated. For example:

In [221]:
7 + 5/2

In general, only rely on immediate mode evaluation when working interactively with code. For standalone, scripts use <code>print()</code> or <code>cat()</code> to display the values of variables.

### The c() function

Getting back to vectors there are other ways to create a vector. The canonical way to do so is the <code>c</code> function.

In [222]:
u <- c(1, 2, 3)
print(u)

[1] 1 2 3


In R, the c() function is one of the most fundamental functions.

### What it does

It combines values into a vector (the most basic data structure in R). The name c comes from "combine" (or sometimes described as "concatenate"). It reflects the function’s purpose: to combine multiple values into a single vector. This naming convention goes back to the S language (R’s predecessor), where <code>c()</code> was also used for this purpose.

The c function can concatenate different types of values, numbers, strings, booleans etc.

In [223]:
bv <- c(TRUE, FALSE, FALSE, FALSE, TRUE) # bv is a vector of boolean values.
print(bv)

# This is a vector of strings
sv <- c("New Delhi", "Islamabad", "Dhaka", "Khatmandu", "Bejing",
        "Kabul", "Colombo", "Thimpu", "Malé")
print(sv)

# Vector of floating point numbers
fv <- c(1.75, 3.2, -5.6, 0.33)
print(fv)

[1]  TRUE FALSE FALSE FALSE  TRUE
[1] "New Delhi" "Islamabad" "Dhaka"     "Khatmandu" "Bejing"    "Kabul"    
[7] "Colombo"   "Thimpu"    "Malé"     
[1]  1.75  3.20 -5.60  0.33


The <code>c()</code> function, will only create vectors where all the elements are of the same 'type'. If you give it a mix of types, it will convert all of them to a type that is common. This might occasionally be surprising.

In [224]:
mv <- c(1, 2.5, TRUE, "Hello")
print(mv)

[1] "1"     "2.5"   "TRUE"  "Hello"


Notice that when printed, mv is just a vector of strings. You know that becuase all the values are enclosed in double-quotes (which is used to denote a string in R.)

In [225]:
fv <- c(1, 2.5, -3/2)
print(fv)

[1]  1.0  2.5 -1.5


In this case fv is a vector of floating point numbers, despite the first element being an integer. This normally fine. But may have unforseen performance implications when you manipulate the vector. Integer arithmetic is much faster than floating point arithmetic. For large vectors (+100 million numbers on computers of 2025) this can be significant degradation.

### The seq function

Another way to create lists is to use the <code>seq()</code> function.
This is the basic syntax:
<code>seq(from, to, by, length.out, along.with)</code>

* <code>from</code> → starting value of the sequence.
* <code>to</code> → ending value of the sequence.
* <code>by</code> → increment (or decrement if negative).
* <code>length.out</code> → desired length of the sequence (R will adjust the step size automatically).
* <code>along.with</code> → makes a sequence of the same length as another object.

In [226]:
# Unlike 1:10, which always increments or decrements in steps of 1,
# seq allows you to change the step size
v <- seq(from=1, to=10, by=2)
print(v)
v <- try(seq(from=10, to=1, by=2)) # Note the error. Must use a negative increment
print(v)

[1] 1 3 5 7 9
Error in seq.default(from = 10, to = 1, by = 2) : 
  wrong sign in 'by' argument
[1] "Error in seq.default(from = 10, to = 1, by = 2) : \n  wrong sign in 'by' argument\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in seq.default(from = 10, to = 1, by = 2): wrong sign in 'by' argument>


Note the error. When the R interpreter encounters an error, it will stop exectuing the program at that point and print an error. (Unless the error is
handled by the program itself.) See try/catch described later.

When the code is included in a Jupyter notebook, any unhandled errors like this
will stop execution of all code further down the page. To avoid this we wrap the erronous call to seq() in a try() function. This is advanced stuff. Don't worry about it for now.

In [227]:
# Set by to -2. Now everything works as expected.
v <- seq(from=10, to=1, by=-2 )
print(v)

[1] 10  8  6  4  2


You can do some interesting things with <code>seq()</code>.

In [228]:
v <- seq(from=0, to=1, length.out=5) # 5 equally spaced floating point numbers between 0 and 1.
print(v)
x <- c(10, 20, 30, 40)
v <- seq(along.with = x) # Generates numeric labels (i.e indices) for values in x
print(x)
print(v)


[1] 0.00 0.25 0.50 0.75 1.00
[1] 10 20 30 40
[1] 1 2 3 4


Note that there are two way we can call a function like <code>seq()</code>. We can use argument labels like <code>from=0</code> or call the function without any lables. In the latter case the values are assigned to the function in the order in which they are declared in the functions signature. In general, you should prefer using argument labels.

Also note that you SHOULD NOT use the assignment operator <code><-</code> when passing values to argument labels. Subtle errors arise. (See the comment.)

You can get the length of a vector using the <code>length()</code> function.

In [229]:
v <- 1:50
lv <- length(v)
print(lv)

[1] 50


### The subscript operator
You can also manipulate individual elements in a vector using the subscript operator <code>[]</code>. You can assign values to a given position in a vector using the subscript operator as well.

In [230]:
third_value = v[3]
print(third_value)
v[3] <- 21
print(v) # Notice the third element in line [1] of print's output is now 21.

[1] 3
 [1]  1  2 21  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50


### 1-based indexing
Note that R uses 1-based indexing. i.e The first element in a collection is at index 1. This makes intuitive sense to most non-programmers, that R was targeted at. But you need to remember this, particularly if you are also going to be programming in other languages like Python, C, Java, etc., all of which use 0-based indexing. (i.e. The first element of a collection in these languages is at element 0.)

### Inserting and Appending Values into Vectors
You can insert and append elements to a vector. R will actually create new vector with the elements you want when you do this. To append elements you can actually just use the <code>c()</code> function. Or you can use the aptly named <code>append()</code> function. The latter is more flexible as you can also use it insert elements into a vector or even prepend elements into a vector.

In [231]:
v <- c(1, 2, 3)
v <- c(v, 4) # Append 4 to the vector.
print(v)

# You can also use the append function
# after specfies the index after which to insert the element
# The first argument is the vector to operate on, the second
# argument is the value to insert, the after argument specifies
# the position of the insert.
v <- append(v, 99, after=length(v))
print(v)

# You can use append to insert an element at specific point
# in the vector.
v <- append(v, 21, after=2)
print(v)

# You can 'prepend' an element to the begining of a vector
# using append.
v <- append(v, 11, after=0) # Here 0 means before the first element
print(v)

# You can insert multiple values in one go
v <- append(v, c(5, 7), after=3)
print(v)

# And you can splice one vector into another
u <- c(1, 1, 1)
v <- append(v, u, after=4)
print(v)

[1] 1 2 3 4
[1]  1  2  3  4 99
[1]  1  2 21  3  4 99
[1] 11  1  2 21  3  4 99
[1] 11  1  2  5  7 21  3  4 99
 [1] 11  1  2  5  1  1  1  7 21  3  4 99


### Vector Assignment and Copy-on-modify Semantics

Vectors can be assigned to other vectors. R behaves as if the two variables have completely different copies of the values that are assigned to them. In reality, for performance reasons, R only creates a single copy of the actual vector until one of the variables modifies the vector. At that point R will create a copy of the entire vector. You may want to keep this in mind if you are dealing with very large vectors.

In the code below u is called *alias* of v, because it refers to the same block of memory under a different name. When you modify v, you trigger a copy operation to allow v to be safely modified without affecting u.  If v is very large vector, the seemingly innocuous, assignment operation can prove to be very expensive. It's best to keep variable aliasing to a minimum.

In [232]:
v <- c(1, 2, 3)
u <- v
print(v)
print(u)

# You can modify u and v independently now.
v[2] = 20
print(v)
print(u)


[1] 1 2 3
[1] 1 2 3
[1]  1 20  3
[1] 1 2 3


**[ADVANCED]** Some times, particularly in a complex program, you want to understand what is going on within the R interpreter in terms of copy-on-modify behaviour. To do this use the <code>tracemem()</code> function.

In [233]:
v <- 1:10
tracemem(v) # Keep track of, and report changes to memory used by v.
v[3] <- 21
u <- v
v[4] <- 10
print(u) # R is smart if you comment this line out there will be no copy-on-modify.


tracemem[0x5917c6e5d510 -> 0x5917c7c61e18]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 
tracemem[0x5917c7c61e18 -> 0x5917c870ef88]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 
tracemem[0x5917c870ef88 -> 0x5917c983cdb8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatc

Vectors, as stated before are a foundational data structure in R. This is because R is primarily used for analysis of large data sets and having vectors as the primary data structure makes such analysis easier.

## Lists

Vectors are ordered collections of *homogeneous* items. All items in a vector must be of the same type. If you want to handle vectors of a different type
you need lists.

In [234]:
l <- list("cabbage", 23.5, 10)
print(l)

[[1]]
[1] "cabbage"

[[2]]
[1] 23.5

[[3]]
[1] 10



Lists elements can be labelled.

In [235]:
l <- list(item="cabbage", price=23.5, quantity=10)
print(l)

$item
[1] "cabbage"

$price
[1] 23.5

$quantity
[1] 10



You can of course have lists of vectors with more than one element.

In [236]:
l <- list(items=c("cabbage", "cauliflower", "broccoli"), prices=c(23.5, 15.1, 32.8), quantities=c(10, 30, 5))
print(l)

$items
[1] "cabbage"     "cauliflower" "broccoli"   

$prices
[1] 23.5 15.1 32.8

$quantities
[1] 10 30  5



Notice that <code>print()</code> prefixes each element of the list with a separate double square bracket label for lists without labels and $label_name for lists with labels. This tells the reader which element in the list they are looking at.

But the <code>print()</code> function's output format also hints at the underlying implementation of lists. Lists are built on top of vectors. They are collections of vectors. Each element in the list is itself a vector. In the cases above each element is a vector of length 1.

In [237]:
v <- l[1]
print(v)
e <- v[1]
print(e) # Gets the first element of the first vector in list l.

$items
[1] "cabbage"     "cauliflower" "broccoli"   

$items
[1] "cabbage"     "cauliflower" "broccoli"   



You can access individual element is a list using the double-bracked subscripting operator.

In [238]:
v <- l[[1]]
print(v)
print(typeof(v)) # character means its a list of strings.

# If you want the first element of the element in the list
e <- l[[1]][1]
print(e)

[1] "cabbage"     "cauliflower" "broccoli"   
[1] "character"
[1] "cabbage"


You can also use the single-bracket operator. But this returns a sub-list, not the vector element of the list. You can see that by looking at the type of the object returned withthe <code>typeof()</code> function, which prints 'list' in this case.

In [239]:
sl <- l[1]
print(sl)
print(typeof(sl)) # sl is a list not a vector.

$items
[1] "cabbage"     "cauliflower" "broccoli"   

[1] "list"


You can also access elements of a list by name either with the double-bracket operator or the $ operator. The latter is a shorthand for the double-bracket notation.

In [240]:
# Using the name of the element and the double-bracket notation
v <- l[["items"]]
print(v)

# Using the $ operator.
v <- l$items
print(v)
# Get the second element
e <- v[2]
print(e)
# Or more directly
e <- l$item[2]
print(e)

# Or with the double-bracket operator
e <- l[["items"]][2]
print(e)

[1] "cabbage"     "cauliflower" "broccoli"   
[1] "cabbage"     "cauliflower" "broccoli"   
[1] "cauliflower"
[1] "cauliflower"
[1] "cauliflower"


## Data Frames

Data frames are essentially lists where all the elements are all of equal length.

In [241]:
df <- data.frame(
  id=c(1, 2, 3),
  name=c('Akbar', 'Bharat', 'Chetna'),
  passed=c(TRUE, FALSE, TRUE)
)
print(typeof(df)) # It's just a list.

# The print method is smart about formatting data frames.
# It does it differently than it does lists.
print(df)

[1] "list"
  id   name passed
1  1  Akbar   TRUE
2  2 Bharat  FALSE
3  3 Chetna   TRUE


First the name. The dot in the name does not mean anything, its just part of the name of function 'data.frame'. There isn't anything named data that frame is a part of. This can be confusing at first, if you come from another programming language like Python where the dot is the attribute operator in the language.

Notice that unlike lists, the <code>print()</code> function formats the the elements of the list as columns. so <code>df[["name"]]</code> will give you the first element or column of the list. This is just formatting though. There is no change to the memory layout.

In [242]:
# Get the first column of the frame:
names <- df[["name"]]
print(names)
# Or alternatively use the $ operator
names <- df$name
print(names)


[1] "Akbar"  "Bharat" "Chetna"
[1] "Akbar"  "Bharat" "Chetna"


R enhances the capabilities of the single-bracket operator <code>[]</code> to treat a data frame as a two-dimensional object. You can retrieve elements by row and column. You can get an entire rows or columns by omiting the the column value or the row value respectively.

In [243]:
akbar_passed <- df[1, 3] # The first row, column
print(akbar_passed)

akbar <- df[1,]
print(akbar)
print(typeof(akbar)) # Note that akbar is also a data frame. But with only one row.

names <- df[, 2]
print(names)
print(typeof(names)) # But this time, names is NOT a data frame, its a vector of strings

[1] TRUE
  id  name passed
1  1 Akbar   TRUE
[1] "list"
[1] "Akbar"  "Bharat" "Chetna"
[1] "character"


You can add entire columns to data frame, and you can replace columns in a data frame with other values.

In [244]:
df[["marks"]] <- c(80, 60, 90)
print(df) # df has a new column.

# You can replace the values in a column with new values.
df[["passed"]] <- c(TRUE, TRUE, TRUE)
print(df) # df has a new column.

  id   name passed marks
1  1  Akbar   TRUE    80
2  2 Bharat  FALSE    60
3  3 Chetna   TRUE    90
  id   name passed marks
1  1  Akbar   TRUE    80
2  2 Bharat   TRUE    60
3  3 Chetna   TRUE    90


You can add rows using the <code>rbind()</code> function. rbind is short for 'row bind'.

In [245]:
# Add a row as a list. Note that the list *must* have
# all the columns of the data frame to which it is being added.
# You cannot have missing columnd.
dharam <- list(id=4, name="Dharam", passed=FALSE, marks=25)
df <- rbind(df, dharam)
print(df)

# You can also add a row as a data frame.
# Note that the data frame being added *must* have
# all the columns of the data frame to which it is being added.
eknath <- data.frame(id=5, name="Eknath", passed=TRUE, marks=50)
df <- rbind(df, eknath)
print(df)

  id   name passed marks
1  1  Akbar   TRUE    80
2  2 Bharat   TRUE    60
3  3 Chetna   TRUE    90
4  4 Dharam  FALSE    25
  id   name passed marks
1  1  Akbar   TRUE    80
2  2 Bharat   TRUE    60
3  3 Chetna   TRUE    90
4  4 Dharam  FALSE    25
5  5 Eknath   TRUE    50


You can use dplyr layer library's bind_rows function to bind rows. This is much more flexible as you can add rows where some column values are missing. Notice that missing values are given the value NA. Which is R's equivalent of Python's None or Java's null.

In [246]:
library("dplyr")
feroz <- data.frame(name="Feroz", marks=75)
df <- bind_rows(df, feroz)
print(df)

  id   name passed marks
1  1  Akbar   TRUE    80
2  2 Bharat   TRUE    60
3  3 Chetna   TRUE    90
4  4 Dharam  FALSE    25
5  5 Eknath   TRUE    50
6 NA  Feroz     NA    75


Notice how we imported a library to use a function from it.