In [176]:
!pip install -q rpy2

In [177]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


# Regular expressions

Regular expressions is a general term which covers the idea of pattern searching, typically in a string (or a vector of strings).

* grepl(), which returns a logical indicating if the pattern was found

* grep(), which returns a vector of index locations of matching pattern instances

For both of these functions you'll pass in a pattern and then the object you want to search.

In [178]:
%%R

text <- "Hi there, do you know who you are voting for?"

In [179]:
%%R

grepl('voting',text)

[1] TRUE


In [180]:
%%R

grepl('Hi',text)


[1] TRUE


In [181]:
%%R

grepl('Sammy',text)


[1] FALSE


In [182]:
%%R

vector <- c('a','b','c','d')


In [183]:
%%R

grep('a',vector)


[1] 1


In [184]:
%%R

grep('c',vector)


[1] 3


# Built-in R Features - Math
*   abs(): computes the absolute value.
*   sum(): returns the sum of all the values present in the input.
*   mean(): computes the arithmetic mean.
*   round(): rounds values (additional arguments to nearest)

In [185]:
%%R

v <- c(-1,0,1,2,3,4,5)


In [186]:
%%R

abs(-2)


[1] 2


In [187]:
%%R

abs(v)

[1] 1 0 1 2 3 4 5


In [188]:
%%R

sum(v)


[1] 14


In [189]:
%%R

mean(v)


[1] 2


In [190]:
%%R

round(23.1231)


[1] 23


In [191]:
%%R

round(23.1231234,2)


[1] 23.12


In [192]:
# **Function	Description**

# abs(x)	absolute value
# sqrt(x)	square root
# ceiling(x)	ceiling(3.475) is 4
# floor(x)	floor(3.475) is 3
# trunc(x)	trunc(5.99) is 5
# round(x, digits=n)	round(3.475, digits=2) is 3.48
# signif(x, digits=n)	signif(3.475, digits=2) is 3.5
# cos(x), sin(x), tan(x)	also acos(x), cosh(x), acosh(x), etc.
# log(x)	natural logarithm
# log10(x)	common logarithm
# exp(x)	e^x

# Built-in R Features - Data Structures
R contains quite a few useful built-in functions to work with data structures.

*   seq(): Create sequences
*   sort(): Sort a vector
*   rev(): Reverse elements in object
*   str(): Show the structure of an object
*   append(): Merge objects together (works on vectors and lists)









In [193]:
%%R

# seq(start,end,step size)
seq(0, 100, by = 3)

 [1]  0  3  6  9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72
[26] 75 78 81 84 87 90 93 96 99


In [194]:
%%R

v <- c(1,4,6,7,2,13,2)
v

[1]  1  4  6  7  2 13  2


In [195]:
%%R

sort(v)

[1]  1  2  2  4  6  7 13


In [196]:
%%R

sort(v,decreasing = TRUE)

[1] 13  7  6  4  2  2  1


In [197]:
%%R

v2 <- c(1,2,3,4,5)
rev(v2)

[1] 5 4 3 2 1


In [198]:
%%R

str(v)

 num [1:7] 1 4 6 7 2 13 2


In [199]:
%%R

append(v,v2)

 [1]  1  4  6  7  2 13  2  1  2  3  4  5


In [200]:
%%R

sort(append(v,v2))


 [1]  1  1  2  2  2  3  4  4  5  6  7 13


# Data Types

*   is.*(): Check the class of an R object
*   as.*(): Convert R objects

In [201]:
%%R

v <- c(1,2,3)
is.vector(v)

[1] TRUE


In [202]:
%%R

is.list(v)

[1] FALSE


In [203]:
%%R

as.list(v)

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3



In [204]:
%%R

as.matrix(v)


     [,1]
[1,]    1
[2,]    2
[3,]    3


# Timestamps

R gives us a variety of tools for working with timestamp information. You can use the as.Date() function to convert a character string to a Date object, which will allow it to contain more time information. The string will need to be in a standard time format.

## Date

In [205]:
%%R

Sys.Date()

[1] "2025-05-05"


In [206]:
%%R

# Set as a variable
today <- Sys.Date()
today

[1] "2025-05-05"


In [207]:
# Code	Value
# %d	Day of the month (decimal number)
# %m	Month (decimal number)
# %b	Month (abbreviated)
# %B	Month (full name)
# %y	Year (2 digit)
# %Y	Year (4 digit)

In [208]:
%%R

# YYYY-MM-DD
as.Date('1990-11-03')

[1] "1990-11-03"


In [209]:
%%R

# Using Format
as.Date("Nov-03-90",format="%b-%d-%y")

[1] "1990-11-03"


In [210]:
%%R

# Using Format
as.Date("November-03-1990",format="%B-%d-%Y")

[1] "1990-11-03"


## Time

R uses a **POSIXct** and  **strptime()** object type to store time information.

Most times, we'll actually be using the strptime() function, instead of POSIXct.


In [211]:
%%R

as.POSIXct("11:02:03",format="%H:%M:%S")

[1] "2025-05-05 11:02:03 UTC"


In [212]:
%%R

as.POSIXct("November-03-1990 11:02:03",format="%B-%d-%Y %H:%M:%S")

[1] "1990-11-03 11:02:03 UTC"


In [213]:
%%R
# strptime is faster
strptime("11:02:03",format="%H:%M:%S")

[1] "2025-05-05 11:02:03 UTC"


# Apply

The **apply** command in R allows you to apply a function across an array, matrix or data frame.

You can do this in several ways, depending on the value you specify to the **MARGIN** argument,

which is usually set to 1, 2 or c(1, 2). (1: rows, 2: columns, c(1, 2): rows and columns)

apply ( array, matrix or data frame  , Function , ...)


In [214]:
%%R

df <- data.frame(x = 1:4, y = 5:8, z = 10:13)
df

  x y  z
1 1 5 10
2 2 6 11
3 3 7 12
4 4 8 13


In [215]:
%%R

apply(X = df, MARGIN = 1, FUN = sum)


[1] 16 19 22 25


In [216]:
%%R

# You can set the MARGIN argument to c(1, 2) or, to apply the function to each value of the data frame.
apply(df, c(1, 2), sum)


     x y  z
[1,] 1 5 10
[2,] 2 6 11
[3,] 3 7 12
[4,] 4 8 13


In [217]:
%%R
# If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed.


apply(df, c(2, 1), sum)


  [,1] [,2] [,3] [,4]
x    1    2    3    4
y    5    6    7    8
z   10   11   12   13


In [218]:
%%R

# Sum by rows to a subset of data
apply(df[c(1, 2), ], 1, sum)


 1  2 
16 19 


In [219]:
%%R

# Sum by columns to a subset of data
apply(df[, c(1, 3)], 2, sum)

 x  z 
10 46 


In [220]:
%%R

# Apply the mean by rows removing NA values
apply(df, 1, mean, na.rm = TRUE)


[1] 5.333333 6.333333 7.333333 8.333333


In [221]:
%%R

# Minimum values of by columns
apply(df, 2, min)


 x  y  z 
 1  5 10 


In [222]:
 %%R

# Range (min and max values) by column
 apply(df, 2, range)


     x y  z
[1,] 1 5 10
[2,] 4 8 13


In [223]:
%%R

# Summary for each row
apply(df, 1, summary)

             [,1]      [,2]      [,3]      [,4]
Min.     1.000000  2.000000  3.000000  4.000000
1st Qu.  3.000000  4.000000  5.000000  6.000000
Median   5.000000  6.000000  7.000000  8.000000
Mean     5.333333  6.333333  7.333333  8.333333
3rd Qu.  7.500000  8.500000  9.500000 10.500000
Max.    10.000000 11.000000 12.000000 13.000000


In [224]:
%%R

# Summary for each column
apply(df, 2, summary)

           x    y     z
Min.    1.00 5.00 10.00
1st Qu. 1.75 5.75 10.75
Median  2.50 6.50 11.50
Mean    2.50 6.50 11.50
3rd Qu. 3.25 7.25 12.25
Max.    4.00 8.00 13.00


# New Section