### ***Environment using VS Code***

***Step1 - setup your conda env with R***

<code>
conda create -n r_env python=3.11 r-base=4.3 r-irkernel -c conda-forge<br>
conda activate r_env
</code>

***This installs Python, R, IRkernel***

***`Step2` - install jupyter and client tools***

<code>
conda install notebook jupyterlab jupyter_client -c conda-forge
</code>

***`Step3` - Register the R Kernel ***

Launch R inside your environment
<code>
R
</code>

Then Run <br>
<code>
install.packages("IRkernel")<br>
IRkernel::installspec()
</code>

### ***R BASICS***

#### ***Getting Help***

In [1]:
# help('data.frame')
# help('vector')
# ?data.frame
# ?vector

#### ***Variables, Remainder Oparation, Variable Assignment***

In [2]:
# variable assignment
dividend <- 11
divisor <- 3
remainder <- dividend %% divisor # remainder operation %% assignment
print(remainder)

[1] 2


#### ***R Data Types***

- ***`Numeric Class`: Integer, Floating Point, Decimal***
- ***`Logical Class`: True or False***
- ***`Character Strings`:***
- ***`class(datatype)`: returns data type***

In [3]:
print(class(remainder))
print(class(TRUE))
print(class("Hello R"))

[1] "numeric"
[1] "logical"
[1] "character"


#### ***Vector***

- ***`combined` function***
- ***Cannot mix datatypes in vectors***

In [4]:
n.vec <- c(1, 2, 3, 4, 5)
char.vec <- c("a", "b", "c")
bool.vec <- c(TRUE, FALSE, TRUE)
print(n.vec)
print(class(n.vec))
print(length(n.vec))
cat("\n")
print(char.vec)
print(class(char.vec))
cat("\n")
print(bool.vec)
print(class(bool.vec))

[1] 1 2 3 4 5
[1] "numeric"
[1] 5

[1] "a" "b" "c"
[1] "character"

[1]  TRUE FALSE  TRUE
[1] "logical"


- ***`names` attribute/metadata which can be attached to a vector***

In [5]:
temp <- c(98, 87, 101)
names(temp) <- c("Dallas", "Chicago", "New York")
print(temp)
print(names(temp))
print(attributes(temp))

  Dallas  Chicago New York 
      98       87      101 
[1] "Dallas"   "Chicago"  "New York"
$names
[1] "Dallas"   "Chicago"  "New York"



In [6]:
days <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
temp1 <- c(88, 90, 85, 87, 92, 95, 89)
names(temp1) <- days
print(temp1)
print(temp1["Wed"])
print(temp1[c("Mon", "Fri")])
print(temp1[temp1 > 90])

Mon Tue Wed Thu Fri Sat Sun 
 88  90  85  87  92  95  89 
Wed 
 85 
Mon Fri 
 88  92 
Fri Sat 
 92  95 


- ***Element by Element Operation on Vectors***

In [7]:
v1 <- c(1, 2, 3)
v2 <- c(10, 20, 30)
print(v1 + v2)
print(v1 * v2)
print(v2 / v1)
print(v2 - v1)

[1] 11 22 33
[1] 10 40 90
[1] 10 10 10
[1]  9 18 27


- ***`sum` function - to perform numeric sum***
- ***`mean` function - to perform numeric mean***
- ***`sd` function - to perform standard deviation***
- ***`max` function - to find maximum***
- ***`min` function - to find minimum***
- ***`prod` function - to perform numeric product***

In [8]:
x1 <- c(1, 2, 3, 4, 5)
print(sum(x1))
print(mean(x1))
print(sd(x1))
print(max(x1))
print(min(x1))
print(prod(x1))

[1] 15
[1] 3
[1] 1.581139
[1] 5
[1] 1
[1] 120


- ***Operations on Vectors***

In [9]:
v1 <- c(1, 2, 3, 4, 5)
print(v1 < 2) # element by element comparison
print(v1 == 2) # element by element comparison
print(v1 >= 2) # element by element comparison
print(v1 != 2) # element by element comparison

cat("\n")
v2 <- c(10, 20, 30, 40, 50)
print(v1 < v2) # element by element comparison

[1]  TRUE FALSE FALSE FALSE FALSE
[1] FALSE  TRUE FALSE FALSE FALSE
[1] FALSE  TRUE  TRUE  TRUE  TRUE
[1]  TRUE FALSE  TRUE  TRUE  TRUE

[1] TRUE TRUE TRUE TRUE TRUE


- ***Vector Indexing (starts at 1) and Slicing***

In [10]:
v2 <- c(10, 20, 30, 40, 50)
print(v2[3]) # single indexing
print(v2[c(2, 5)]) # multiple indexing
print(v2[v2 > 25]) # conditional indexing
print(v2[2:4]) # slicing from index 2 to 4

[1] 30
[1] 20 50
[1] 30 40 50
[1] 20 30 40


#### ***Matrices***

***Matrix - Same Data Type***

In [11]:
# sequencial sequence
seq1 <- seq(1, 10, 2) # default increment is 1
print(seq1)
print(class(seq1))
cat("\n")

# colon sequence
seq2 <- 1:12
print(seq2)
print(class(seq2))
cat("\n")
# matrix creation by row
mat1 <- matrix(seq2, nrow = 3, ncol = 4, byrow = TRUE)
print(mat1)
cat("\n")

# matrix creation by column
mat2 <- matrix(seq2, nrow = 3, ncol = 4, byrow = FALSE)
print(mat2)
cat("\n")

[1] 1 3 5 7 9
[1] "numeric"

 [1]  1  2  3  4  5  6  7  8  9 10 11 12
[1] "integer"

     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12



In [12]:
goog <- c(1345, 1360, 1375, 1380, 1390, 1400)
aapl <- c(265, 270, 275, 280, 290, 300)
msft <- c(180, 185, 190, 195, 200, 205)
stocks <- c(goog, aapl, msft)
print(stocks)
cat("\n")

stocks.matrix <- matrix(stocks, nrow = 3, byrow = TRUE)
print(stocks.matrix)
cat("\n")

rownames(stocks.matrix) <- c("GOOG", "AAPL", "MSFT") # assign row names
colnames(stocks.matrix) <- c("Mon", "Tues", "Wed", "Thu", "Fri", "Sat") # assign column names
print(stocks.matrix)
cat("\n")

 [1] 1345 1360 1375 1380 1390 1400  265  270  275  280  290  300  180  185  190
[16]  195  200  205

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1345 1360 1375 1380 1390 1400
[2,]  265  270  275  280  290  300
[3,]  180  185  190  195  200  205

      Mon Tues  Wed  Thu  Fri  Sat
GOOG 1345 1360 1375 1380 1390 1400
AAPL  265  270  275  280  290  300
MSFT  180  185  190  195  200  205



***Matrix Arithmetic***

In [13]:
mat1 <- matrix(1:25, nrow = 5, byrow = TRUE)
print(mat1)
cat("\n")

print("Scalar Multiplication: mat1 * 5")
print(mat1 * 5) # scalar multiplication
cat("\n")

print("Matrix Multiplication: mat1 * mat1 - element wise")
print(mat1 * mat1) # matrix addition

cat("\n")
print("Matrix multiplication: mat1 %*% t(mat1)")
print(mat1 %*% t(mat1)) # matrix multiplication

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
[5,]   21   22   23   24   25

[1] "Scalar Multiplication: mat1 * 5"
     [,1] [,2] [,3] [,4] [,5]
[1,]    5   10   15   20   25
[2,]   30   35   40   45   50
[3,]   55   60   65   70   75
[4,]   80   85   90   95  100
[5,]  105  110  115  120  125

[1] "Matrix Multiplication: mat1 * mat1 - element wise"
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    4    9   16   25
[2,]   36   49   64   81  100
[3,]  121  144  169  196  225
[4,]  256  289  324  361  400
[5,]  441  484  529  576  625

[1] "Matrix multiplication: mat1 %*% t(mat1)"
     [,1] [,2] [,3] [,4] [,5]
[1,]   55  130  205  280  355
[2,]  130  330  530  730  930
[3,]  205  530  855 1180 1505
[4,]  280  730 1180 1630 2080
[5,]  355  930 1505 2080 2655


***Matrix Logical***

In [14]:
print(mat1 < 15)
cat("\n")

print(mat1[mat1 > 15]) # conditional indexing

      [,1]  [,2]  [,3]  [,4]  [,5]
[1,]  TRUE  TRUE  TRUE  TRUE  TRUE
[2,]  TRUE  TRUE  TRUE  TRUE  TRUE
[3,]  TRUE  TRUE  TRUE  TRUE FALSE
[4,] FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE

 [1] 16 21 17 22 18 23 19 24 20 25


***Operations on Matrix***

- ***`colSums`: sum across columns***
- ***`rowSums`: sum across rows***
- ***`rowMeans`: mean for the row***
- ***`t`: transpose***
- ***`dim`: dimensions of the matrics***
- ***`rbind`: add a row***

In [15]:
print(stocks.matrix)

cat("\n")
print("Column Sums:")
print(colSums(stocks.matrix)) # sum along columns

cat("\n")
print("Row Sums:")
print(rowSums(stocks.matrix)) # sum along rows

cat("\n")
print("Row Means:")
print(rowMeans(stocks.matrix)) # mean along rows

cat("\n")
print("Column Means:")
print(colMeans(stocks.matrix)) # mean along columns

cat("\n")
print("Transpose of Matrix:")
print(t(stocks.matrix)) # transpose of matrix

cat("\n")
print("Dimensions of Matrix:")
print(dim(stocks.matrix)) # dimensions of matrix

cat("\n")
t(stocks.matrix)
print("add facebook stock prices as new row")
FB <- c(265, 270, 275, 280, 290, 300)
stocks.matrix <- rbind(stocks.matrix, FB)
print(stocks.matrix)

      Mon Tues  Wed  Thu  Fri  Sat
GOOG 1345 1360 1375 1380 1390 1400
AAPL  265  270  275  280  290  300
MSFT  180  185  190  195  200  205

[1] "Column Sums:"
 Mon Tues  Wed  Thu  Fri  Sat 
1790 1815 1840 1855 1880 1905 

[1] "Row Sums:"
GOOG AAPL MSFT 
8250 1680 1155 

[1] "Row Means:"
  GOOG   AAPL   MSFT 
1375.0  280.0  192.5 

[1] "Column Means:"
     Mon     Tues      Wed      Thu      Fri      Sat 
596.6667 605.0000 613.3333 618.3333 626.6667 635.0000 

[1] "Transpose of Matrix:"
     GOOG AAPL MSFT
Mon  1345  265  180
Tues 1360  270  185
Wed  1375  275  190
Thu  1380  280  195
Fri  1390  290  200
Sat  1400  300  205

[1] "Dimensions of Matrix:"
[1] 3 6



Unnamed: 0,GOOG,AAPL,MSFT
Mon,1345,265,180
Tues,1360,270,185
Wed,1375,275,190
Thu,1380,280,195
Fri,1390,290,200
Sat,1400,300,205


[1] "add facebook stock prices as new row"
      Mon Tues  Wed  Thu  Fri  Sat
GOOG 1345 1360 1375 1380 1390 1400
AAPL  265  270  275  280  290  300
MSFT  180  185  190  195  200  205
FB    265  270  275  280  290  300


- ***`cbind`: to add column to an existing matrix***

In [16]:
avg <- rowMeans(stocks.matrix)
print("Average stock prices for the week:")
print(avg)

cat("\n")
print("add average as new column")
stocks.matrix <- cbind(stocks.matrix, avg)
print(stocks.matrix)

[1] "Average stock prices for the week:"
  GOOG   AAPL   MSFT     FB 
1375.0  280.0  192.5  280.0 

[1] "add average as new column"
      Mon Tues  Wed  Thu  Fri  Sat    avg
GOOG 1345 1360 1375 1380 1390 1400 1375.0
AAPL  265  270  275  280  290  300  280.0
MSFT  180  185  190  195  200  205  192.5
FB    265  270  275  280  290  300  280.0


***Matrix Indexing***

In [17]:
mat <- matrix(1:50, nrow = 5, byrow = TRUE)
print(mat)
cat("\n")

print("get first row")
print(mat[1, ]) # first row

cat("\n")
print("all rows and all columns")
print(mat[, ]) # all rows and all columns

cat("\n")
print("get first 3 rows and first 4 columns")
print(mat[1:3, 1:4]) # first 3 rows and first 4 columns

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]   11   12   13   14   15   16   17   18   19    20
[3,]   21   22   23   24   25   26   27   28   29    30
[4,]   31   32   33   34   35   36   37   38   39    40
[5,]   41   42   43   44   45   46   47   48   49    50

[1] "get first row"
 [1]  1  2  3  4  5  6  7  8  9 10

[1] "all rows and all columns"
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]   11   12   13   14   15   16   17   18   19    20
[3,]   21   22   23   24   25   26   27   28   29    30
[4,]   31   32   33   34   35   36   37   38   39    40
[5,]   41   42   43   44   45   46   47   48   49    50

[1] "get first 3 rows and first 4 columns"
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]   11   12   13   14
[3,]   21   22   23   24


***Factor and Categorical Matrics***

- ***`factor`: Converts a character or numerical vector into a categorical variable: `Nominal` (no order), `Ordinal` (order)***
- ***`summary`: Gives the statistics overview***

In [18]:
anumal <- factor(c("cat", "dog", "rabbit", "dog", "cat", "cat"))
print(anumal)

cat("\n")
print("Ordinal Factor")
temp <- c("low", "medium", "high", "medium", "low", "low")
ord.factor <- factor(temp, levels = c("low", "medium", "high"), ordered = TRUE)
print(ord.factor)

cat("\n")
print("Statistics Summary of animal factor and ordinal factor")
print(summary(anumal))
print(summary(ord.factor))

[1] cat    dog    rabbit dog    cat    cat   
Levels: cat dog rabbit

[1] "Ordinal Factor"
[1] low    medium high   medium low    low   
Levels: low < medium < high

[1] "Statistics Summary of animal factor and ordinal factor"
   cat    dog rabbit 
     3      2      1 
   low medium   high 
     3      2      1 


#### ***Dataframes***

***`Builtin Dataframes in R`***

- ***`state`***
- ***`USPersonalExpenditure`***
- ***`women`***
- ***`data()`: gives all the available dataframes in R***

In [19]:
# state.x77
# USPersonalExpenditure
# women
data()

Data sets in package ‘datasets’:

AirPassengers           Monthly Airline Passenger Numbers 1949-1960
BJsales                 Sales Data with Leading Indicator
BJsales.lead (BJsales)
                        Sales Data with Leading Indicator
BOD                     Biochemical Oxygen Demand
CO2                     Carbon Dioxide Uptake in Grass Plants
ChickWeight             Weight versus age of chicks on different diets
DNase                   Elisa assay of DNase
EuStockMarkets          Daily Closing Prices of Major European Stock
                        Indices, 1991-1998
Formaldehyde            Determination of Formaldehyde
HairEyeColor            Hair and Eye Color of Statistics Students
Harman23.cor            Harman Example 2.3
Harman74.cor            Harman Example 7.4
Indometh                Pharmacokinetics of Indomethacin
InsectSprays            Effectiveness of Insect Sprays
JohnsonJohnson          Quarterly Earnings per Johnson & Johnson Share
LakeHuron               Level 

***`Functions for Dataframes`***

- ***`head`: Returns first 6 rows***
- ***`tail`: Returns last 6 rows***
- ***`str`: Structure of the dataframe***
- ***`summary`: Statistical Summary***

In [20]:
head(state.x77)

str(state.x77)

summary(state.x77)

Unnamed: 0,Population,Income,Illiteracy,Life Exp,Murder,HS Grad,Frost,Area
Alabama,3615,3624,2.1,69.05,15.1,41.3,20,50708
Alaska,365,6315,1.5,69.31,11.3,66.7,152,566432
Arizona,2212,4530,1.8,70.55,7.8,58.1,15,113417
Arkansas,2110,3378,1.9,70.66,10.1,39.9,65,51945
California,21198,5114,1.1,71.71,10.3,62.6,20,156361
Colorado,2541,4884,0.7,72.06,6.8,63.9,166,103766


 num [1:50, 1:8] 3615 365 2212 2110 21198 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
  ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...


   Population        Income       Illiteracy       Life Exp    
 Min.   :  365   Min.   :3098   Min.   :0.500   Min.   :67.96  
 1st Qu.: 1080   1st Qu.:3993   1st Qu.:0.625   1st Qu.:70.12  
 Median : 2838   Median :4519   Median :0.950   Median :70.67  
 Mean   : 4246   Mean   :4436   Mean   :1.170   Mean   :70.88  
 3rd Qu.: 4968   3rd Qu.:4814   3rd Qu.:1.575   3rd Qu.:71.89  
 Max.   :21198   Max.   :6315   Max.   :2.800   Max.   :73.60  
     Murder          HS Grad          Frost             Area       
 Min.   : 1.400   Min.   :37.80   Min.   :  0.00   Min.   :  1049  
 1st Qu.: 4.350   1st Qu.:48.05   1st Qu.: 66.25   1st Qu.: 36985  
 Median : 6.850   Median :53.25   Median :114.50   Median : 54277  
 Mean   : 7.378   Mean   :53.11   Mean   :104.46   Mean   : 70736  
 3rd Qu.:10.675   3rd Qu.:59.15   3rd Qu.:139.75   3rd Qu.: 81162  
 Max.   :15.100   Max.   :67.30   Max.   :188.00   Max.   :566432  

***`Custom Dataframes - Our own Dataframes`***

- ***`data.frame()`: Creates dataframe***
- ***`read.csv(filename)`: reads csv file into a dataframe***
- ***`write.csv(df,file=filename)`: writes dataframes into csv***
- ***`nrow(df)`: number of rows***
- ***`ncol(df)`: number of columns***
- ***`colnames(df)`: names of the columns***
- ***`rbind(df1,df2)`: binds 2 dataframes by row***

In [21]:
print("Stock Matrix:")
print(stocks.matrix)

cat("\n")
print("Custom Dataframe - stocks.df")
stocks.df <- data.frame(stocks.matrix)
print(stocks.df)

cat("\n")
print("Structure of stocks.df:")
str(stocks.df)

cat("\n")
print("Write into a stocks.csv file")
write.csv(stocks.df, file = "stocks.csv", row.names = FALSE)

cat("\n")
print("Read from stocks.csv file into stocks2.df")
stocks2.df <- read.csv("stocks.csv")
print(stocks2.df)

cat("\n")
print("Number of rows and columns in stocks2.df")
print(nrow(stocks2.df))
print(ncol(stocks2.df))

cat("\n")
print("Column names of stocks2.df")
print(colnames(stocks2.df))

[1] "Stock Matrix:"
      Mon Tues  Wed  Thu  Fri  Sat    avg
GOOG 1345 1360 1375 1380 1390 1400 1375.0
AAPL  265  270  275  280  290  300  280.0
MSFT  180  185  190  195  200  205  192.5
FB    265  270  275  280  290  300  280.0

[1] "Custom Dataframe - stocks.df"
      Mon Tues  Wed  Thu  Fri  Sat    avg
GOOG 1345 1360 1375 1380 1390 1400 1375.0
AAPL  265  270  275  280  290  300  280.0
MSFT  180  185  190  195  200  205  192.5
FB    265  270  275  280  290  300  280.0

[1] "Structure of stocks.df:"
'data.frame':	4 obs. of  7 variables:
 $ Mon : num  1345 265 180 265
 $ Tues: num  1360 270 185 270
 $ Wed : num  1375 275 190 275
 $ Thu : num  1380 280 195 280
 $ Fri : num  1390 290 200 290
 $ Sat : num  1400 300 205 300
 $ avg : num  1375 280 192 280

[1] "Write into a stocks.csv file"

[1] "Read from stocks.csv file into stocks2.df"
   Mon Tues  Wed  Thu  Fri  Sat    avg
1 1345 1360 1375 1380 1390 1400 1375.0
2  265  270  275  280  290  300  280.0
3  180  185  190  195  200  205  192

- ***`Accessing Dataframe Rows`***

In [22]:
print("print the first row of the dataframe")
print(stocks.df[1, ])

cat("\n")
print("print the first column of the dataframe")
print(stocks.df[, 1])

cat("\n")
print("print the 'avg' column of the dataframe")
print(stocks.df[,'avg'])

cat("\n")
print(stocks.df$avg)

cat("\n")
print(stocks.df[1:3,c('Mon', 'Wed', 'avg')])

cat("\n")
print(stocks.df['avg'])

[1] "print the first row of the dataframe"
      Mon Tues  Wed  Thu  Fri  Sat  avg
GOOG 1345 1360 1375 1380 1390 1400 1375

[1] "print the first column of the dataframe"
[1] 1345  265  180  265

[1] "print the 'avg' column of the dataframe"
[1] 1375.0  280.0  192.5  280.0

[1] 1375.0  280.0  192.5  280.0

      Mon  Wed    avg
GOOG 1345 1375 1375.0
AAPL  265  275  280.0
MSFT  180  190  192.5

        avg
GOOG 1375.0
AAPL  280.0
MSFT  192.5
FB    280.0


- ***`subset`:***

In [23]:
subset(stocks.df, subset = avg > 500)

Unnamed: 0_level_0,Mon,Tues,Wed,Thu,Fri,Sat,avg
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
GOOG,1345,1360,1375,1380,1390,1400,1375


***`mtcars dataframe`***

In [24]:
head(mtcars)

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [25]:
summary(mtcars)

      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000  

In [26]:
str(mtcars)

'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...


In [27]:
mtcars$mpg

In [28]:
mtcars[['mpg']] # returns vector values
head(mtcars['mpg']) # returns a dataframe

head(mtcars[c('mpg', 'cyl')])

Unnamed: 0_level_0,mpg
Unnamed: 0_level_1,<dbl>
Mazda RX4,21.0
Mazda RX4 Wag,21.0
Datsun 710,22.8
Hornet 4 Drive,21.4
Hornet Sportabout,18.7
Valiant,18.1


Unnamed: 0_level_0,mpg,cyl
Unnamed: 0_level_1,<dbl>,<dbl>
Mazda RX4,21.0,6
Mazda RX4 Wag,21.0,6
Datsun 710,22.8,4
Hornet 4 Drive,21.4,6
Hornet Sportabout,18.7,8
Valiant,18.1,6


In [29]:
mtcars[mtcars$mpg > 20,] # returns all rows where mpg > 20

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
Toyota Corona,21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1


In [30]:
mtcars[mtcars$mpg > 20 & mtcars$cyl == 6, ]

cat("\n")
mtcars[mtcars$mpg > 20 & mtcars$cyl == 6, c('cyl', 'mpg', 'hp')]

cat("\n")
subset(mtcars, subset = (mpg > 20 & cyl == 6), select = c(cyl, mpg, hp))

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1





Unnamed: 0_level_0,cyl,mpg,hp
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
Mazda RX4,6,21.0,110
Mazda RX4 Wag,6,21.0,110
Hornet 4 Drive,6,21.4,110





Unnamed: 0_level_0,cyl,mpg,hp
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
Mazda RX4,6,21.0,110
Mazda RX4 Wag,6,21.0,110
Hornet 4 Drive,6,21.4,110


***`Missing Data`***

- ***`is.na(df)`: checks for missing values***
- ***`any(is.na(df))`: better way to check for missing values :-)***
- ***`df[is.na(df)] <- 0`: replace all null values with 0***

In [31]:
# is.na(mtcars) # check for missing values

any(is.na(mtcars)) # better way to check for missing values :-)

any(is.na(mtcars$mpg)) # better way to check for missing values :-)

#### ***R Lists***

***Lists allows us to store different structures in a single variable, we normally don't do operations on lists, but just use as organization tool***

In [32]:
lista <- list(name = "John Doe",
          age = 30,
          married = TRUE,
          scores = c(90, 85, 88),
          address = data.frame(street = c("123 Main St", "456 Maple Ave"),
                               city = c("Dallas", "Austin"),
                               zip = c(75001, 73301))
)

print(lista)

$name
[1] "John Doe"

$age
[1] 30

$married
[1] TRUE

$scores
[1] 90 85 88

$address
         street   city   zip
1   123 Main St Dallas 75001
2 456 Maple Ave Austin 73301



In [33]:
lista$name
lista$age
lista$married
lista$scores
lista$address

street,city,zip
<chr>,<chr>,<dbl>
123 Main St,Dallas,75001
456 Maple Ave,Austin,73301


In [34]:
lista[1]
lista[2]
lista[3]
lista[4]
lista[5]

street,city,zip
<chr>,<chr>,<dbl>
123 Main St,Dallas,75001
456 Maple Ave,Austin,73301


### ***Data Input Output with R***

***`CSV Files`***

- ***`write.csv(df)`: writes df into csv file***

In [35]:
write.csv(mtcars,file="mtcars_output.csv",row.names=FALSE)

- ***`read.csv(filename)`: reads csv file into a dataframe***

In [36]:
my.mtcars <- read.csv("mtcars_output.csv")
head(my.mtcars)

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<int>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>,<int>
1,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
2,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
3,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
4,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
5,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
6,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


***`Excel Files`***

- ***`install.packages("tidyverse")`: Install tidyverse Package, this will provide following packages***
    - ***`ggplot2`: data visualization***
    - ***`dplyr`: data manipulation***
    - ***`tidyr`: data tidying***
    - ***`readr`: reading data***
    - ***`purrr`: functional programming***
    - ***`tibble`: modern data frames***
    - ***`stringr`: string manipulation***
    - ***`forcats`: working with factors***
- ***`library(tidyverse)`: after installing `tidyverse` package, import them in the program***
- ***`library(readxl)`: although, readxl is installed, but for loading, we have to use this***

In [37]:
library(readxl)

- ***`excel_sheets(path)`: When we type single quote, it will list all the excel files in the current folder***

In [38]:
# excel_sheets('B2E Staffing Allocation Informational - All Reports 10.22.25.xlsx')

In [39]:
# my_data <- read_excel("B2E Staffing Allocation Informational - All Reports 10.22.25.xlsx", sheet = "CombineReports(6)", col_types = "text")
# my_data <- read_excel("B2E Staffing Allocation Informational - All Reports 10.22.25.xlsx", sheet = "CombineReports(6)")
# head(my_data)

In [40]:
# my_data[["2025.04"]] <- as.numeric(my_data[["2025.04"]])

# sum(my_data[["2025.04"]])

In [41]:
library(dplyr)

# my_data <- my_data %>%
#   mutate(across(starts_with("2025"), ~ as.numeric(.)))
# head(my_data)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




In [42]:
# summary(my_data)

In [43]:
# str(my_data)

- ***`install.packages('openxlsx')`: to write to an excel***

In [44]:
# install.packages('openxlsx')
library(openxlsx)

In [45]:
# write.xlsx(my_data, file = "staffing_allocation_output.xlsx")

***`SQL`***

***...Coming Soon...***

***`Web Scraping with R`***

- ***`install.packages("rvest")`: for scrapping statis HTML pages***
- ***`install.packages("RSelenium")`: for scrapping Javascript-heavy or dynamic sites***

In [46]:
# install.packages("rvest")
library(rvest)

In [47]:
library(rvest)

url <- "https://www.cnbc.com/world/?region=world"
page <- read_html(url)

headlines <- page %>%
  html_nodes("a.Card-title") %>%
  html_text()

print(headlines)


 [1] "Nasdaq falls again on Friday, putting it on track for worst week since April: Live updates"              
 [2] "Treasury yields are little changed as investors continue to face economic data blackout"                 
 [3] "Fed's Miran says stablecoin surge could help push interest rates lower"                                  
 [4] "Stocks making the biggest moves midday: Block, Archer Aviation, Akamai, Globus Medical and more "        
 [5] "Watch CNBC's full interview with Bank of England Governor Andrew Bailey"                                 
 [6] "From cleanrooms to chip power: How GlobalFoundries is steering the chip race"                            
 [7] "Watch CNBC's full interview with Novo Nordisk CEO Mike Doustdar"                                         
 [8] "Watch CNBC’s full interview with OPEC's secretary general at ADIPEC"                                     
 [9] "New ‘Executive Decisions’ podcast explores the choices that shape global leaders"                 

#### ***Programming in R***

***`If Else Block`***

<pre>
if (condition){
    # True condition code
}else if (condition){
    # True condition code
}else{
    # Else Condition
}
</pre>

***`While Loop`***

<pre>
while (condition is true){
    # code executed here
    # while condition is true
    if (condition){
        break # forcing the loop to exit
    }
}
</pre>

***`for Loop`***

<pre>
for (var in vector - iterable){
    # execute the code
}
</pre>

In [48]:
mat <- matrix(1:25, nrow=5, byrow=TRUE)

for (row in 1:nrow(mat)) {
  for (col in 1:ncol(mat)) {
    print(paste('The element at row', row, 'and column', col, 'is', mat[row, col]))
  }
  cat("\n")
}

[1] "The element at row 1 and column 1 is 1"
[1] "The element at row 1 and column 2 is 2"
[1] "The element at row 1 and column 3 is 3"
[1] "The element at row 1 and column 4 is 4"
[1] "The element at row 1 and column 5 is 5"

[1] "The element at row 2 and column 1 is 6"
[1] "The element at row 2 and column 2 is 7"
[1] "The element at row 2 and column 3 is 8"
[1] "The element at row 2 and column 4 is 9"
[1] "The element at row 2 and column 5 is 10"

[1] "The element at row 3 and column 1 is 11"
[1] "The element at row 3 and column 2 is 12"
[1] "The element at row 3 and column 3 is 13"
[1] "The element at row 3 and column 4 is 14"
[1] "The element at row 3 and column 5 is 15"

[1] "The element at row 4 and column 1 is 16"
[1] "The element at row 4 and column 2 is 17"
[1] "The element at row 4 and column 3 is 18"
[1] "The element at row 4 and column 4 is 19"
[1] "The element at row 4 and column 5 is 20"

[1] "The element at row 5 and column 1 is 21"
[1] "The element at row 5 and column 2 

***`Functions`***

<pre>
name_of_func <- function(input1, input2, input3 = 33){
    result <- input1 + input2
    return(result)
}
</pre>

***`Built-in R Functions`***

- ***`seq()`: create a sequence***
- ***`sort()`: sort a vector***
- ***`rev()`: reverse elements in object***
- ***`str()`: show the structure of an object***
- ***`append()`: Merge objects together (works on vectors and lists)***