# Code written for learning some rudimentary R through jupyter-notebooks

# In notebooks, we have two different types of cells
    This one is a Markdown cell

In [1]:
# This one is an R cell,
cat("Hello World")

# all of these code blocks should run in any R integrated development environment (IDE) such as RStudio and RGUI

# comments in R use the # symbol, this can be used to leave comments on code, or to disable code
# print("Hello World")

Hello World

# A couple benefits of a notebook over a regular R file is being able to intersperse markdown cells in your code for better commenting, and also to have your program be split up into cells, so you can work on one chunk of code at a time.
You can see some of the basic syntax for markdown [here](https://rmarkdown.rstudio.com/authoring_basics.html)

I also recommend taking a moment to learn the keyboard shortcuts of Jupyter Notebook, looking at `insert`, `cell` above.  Or keep in mind that they exist for when you want to save time later.
    
One I use often is `ctrl+/` which comments and uncomments lines in both the markdown and code cells

In [2]:
# You will likely see code that loads packages for more complex functionality, 
# but R does have many things built in
1+4

# There are several ways to display outputs

In [3]:
#cat() is concatenate and display on the standard output stream
#it also recognizes escape characters like \n for new line
cat("concatenate", "multiple", "things", TRUE, 3, "\n")

concatenate multiple things TRUE 3 


In [4]:
# print function
print("print does not recognize \n, but default ends with a new line")
print("and can only take one input")

[1] "print does not recognize \n, but default ends with a new line"
[1] "and can only take one input"


In [5]:
# message() used for error messages on the standard error stream
message("error"," ","stream")

#read more here: https://en.wikipedia.org/wiki/Standard_streams

error stream



In [6]:
# other escape characters
cat("\\ back slash \n newline, \" quotes \' \t horizontal tab")
# continues without stopping, \r and \b not that useful
cat("\n carriage return \r OVERWRITE")

cat("\n can\b to backspace")

\ back slash 
 newline, " quotes ' 	 horizontal tab
 OVERWRITEreturn 
 ca to backspace

# A variable is something that holds another value, in math it's used to store numbers, in code it can store many other things

In [7]:
# storing things in variables
variableA <- 3
variableB <- "this"
variableC <- 1:5
variableD <- list(variableA, variableB, variableC)

In [8]:
# displaying things in variables to output
variableA

In [9]:
variableB

In [10]:
variableC

In [11]:
variableD

## Variables can be used to represent the same value repeatedly

In [12]:
a<-42
a
a+1
a+2
a+3

## Their value will only change when something changes it

In [13]:
a <- 42
a

a <- 5 + 6
a

## When storing in a variable, the processing/calculation happens first, so you can self reference after a variable has already been made

In [14]:
a = 42
a

a = a*2
a

## side note: generally R does not care about spacing, so it's mainly whatever is legible for you

In [15]:
a  <-                    5
a

b<-'this'
b

print(                         42                         )

[1] 42


# Note that `=` and `<-` or `->` are assignment operators in R, but do have some nuances
There's also `<<-` and `->>`, read more [here](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/assignOps)

In [16]:
# strictly assigns right value to left variable
a = 5

# Are visibly directional:
b <- 7
9 -> c

# calling the name of a variable will display it in the shell,
    # but is distinct from print which displays on both the shell and when run as script?
a
b
print(c)


[1] 9


# You can use ls() to see variables in the environment

In [17]:
# list objects
ls()

# You can use rm() to remove specific items

In [18]:
rm(a)
rm(b, c) # can put multiple things to remove
ls()

# You can use rm() to remove all items by giving it the list of all objects in the environment

In [19]:
rm(list = ls())
ls()

# Common Data Types in R, which can be seen by using the class() function

## logical

In [20]:
bool <- TRUE
bool
class(bool)

## numeric

In [21]:
float <- 3.14
float
class(float)

value <- 42
value
class(value)

## integer, whole numbers take up less space in memory if that is a concern

In [22]:
i <- as.integer(3.14)
i
class(i)

## complex, can store imaginary values

In [23]:
complex <- 3+2i
complex
class(complex)

## character, includes strings (note R does not distinguish ' and " )

In [24]:
char <- "a"
char
class(char)

str <- 'string'
str
class(str)

# Data storage / structures:  just vector and list for now

## Vector
c() combines items and returns a vector of type matching input items, if input is a mix of data types, it will be a character vector

A 'vector' is not really a type so much as it's just a series of one data type, so class() will return the type of item stored, rather than 'vector', whereas 'list' does refer to a specific type of object

In [25]:
# a vector can only contain one type of data, but it can be any data type
c(1,2,3)
c("a","short","string")

# they can be stored in variables
vector<-c(TRUE,FALSE)

vector # access it by variable name
class(vector) # matches the type of data stored in it

In [26]:
#initialize a vector with default values where mode = data type, and length is how many entries it can contain
vector(mode="numeric",length=3)

vector("logical",length=4)

vector("list",length=2)

In [27]:
# create empty vector
vecN<-vector("numeric",3)
vecN

# fill vector by numeric index
vecN[3] = 42
vecN[1] = 6
vecN[2] = 7

vecN

## List

In [28]:
# a list can contain different types of items, including more complex items and other lists
list('a', TRUE, 3)

In [29]:
# list
usingList<-list('a', TRUE, 3)
usingList

In [30]:
class(usingList)

# item 3 is a numeric type, 3, so can be treated as a number
usingList[[3]] + 4

In [31]:
# concatenate
usingC<-c('a', TRUE, 3)
usingC

In [32]:
class(usingC)

# item 3 is a character type, "3", so cannot be treated as a number
usingC[[3]] + 4

ERROR: Error in usingC[[3]] + 4: non-numeric argument to binary operator


In [None]:
# a list can be stored in a variable and items can be accessed by numerical index
aList<-list('a', TRUE, 3)

# items can be accessed by numerical index
aList[[3]]
aList[[1]]

# str() can be used to see the structure of the data
str(aList)

## Named index

In [None]:
# a list can also have a named index
aList<-list("this"="a",
            "that"=TRUE,
            "another thing"=3)

# show structure and contents
str(aList)

# which can then be accessed by those names
aList["that"] # the item within aList
aList[["that"]] # the item within "that"

In [None]:
# those names can also be added/modified after the fact
names(aList)<-c("new","names","for all!")

str(aList)

## Different ways to access elements in a list

In [None]:
# creating a variable with a list in it using the list() function
nList<-list(1,2,3,4,5)

In [None]:
# to see the class
class( nList )

# call to see the list
nList

In [None]:
# str() can be used to see the structure of the list
str(nList)

In [None]:
# one set of square brackets [ ] is used to access the first level of a list
class(nList[1])

nList[1]

In [None]:
# [[ ]] is used to return the actual item stored in the list
class( nList[[1]] )

nList[[1]]

In [None]:
# accessing them differently changes what can be done with it, for example:

# this returns an error because it cannot arithmetically add 1 to a list
nList[1] + 1

In [None]:
# However, this works, because now it's accessing the number 1 as a number
nList[[1]] + 1

In [None]:
# combining lists
aList<-list('a', TRUE, 3)
bList<-list('b', FALSE, 4)

combined<-list(aList , bList)

combined

In [None]:
# access one list by numerical index
combined[[2]]

In [None]:
# acess one item in a list
    # list 1, item 1
combined[[1]][[1]]

    # list 1, item 2
combined[[1]][[2]]

    # list 1, item 3
combined[[1]][[3]]

# How to define a function and call it using function()

In [None]:
# definition
functionName<-function(){
    #code to execute
    print("This is the function's output")
}

In [None]:
# call
functionName()

# You can do more complex things when defining a function such as declaring parameters which receive arguments when called

In [None]:
# definition
add<-function(param1,param2){
    param1+param2
}

# call
add(3,5)

# You can even declare that those parameters have default values

In [None]:
# definition
myNameIs<-function(name = "n/a"){
    # sprintf() lets us insert specific variable types into character strings, %s means string
    sprintf("My name is %s", name)
}

In [None]:
# Call without argument
myNameIs()

In [None]:
# Call with argument
myNameIs("Inigo Montoya")

# You can type the name of a function to see it's declaration, print to see a little more information or use the help() function to see the built in documentation

In [None]:
median

#something along the lines of this, though it doesn't show how the function actually calculates the median
#median<-function(x, na.rm=FALSE,...){return(sum(x)/length(x))}

In [None]:
print(median)

In [None]:
help(median)

## Looking at some of the nuance between `=` and `<-`

In [None]:
# results in an error because = can only be used to populate parameters
median(x= y=1:10)

In [None]:
# declaring a variable called y and also presenting it as an argument
median(x= y<-1:10)

# note: this is not recommended, as it's kind of messy to read, I would put it on two lines.

In [None]:
y

# Logic expressions which can be combined to create more complex expressions
These are often used as conditions for tests and as triggers to do something

In [None]:
# equivalence, it's double = because a single = is the assignment operator
1 == 2

1 == 1

In [None]:
# works for strings as well
"this" == "that"

"this" == "this"

In [None]:
# greater than or less than
1 > 2
1 < 2

In [None]:
# comparing lengths of strings
"abc" > "d"
"ab" > "a"

In [None]:
# less than or equal to
1 <= 1

# greater than or equal to
1 >= 2

In [None]:
# not equivalent
1 != 2

1 != 1

In [None]:
# returns the opposite logical value
!TRUE
!FALSE

# works with variables
x=FALSE

!x

In [None]:
# & is logical AND which compares two values and returns TRUE if both are true
TRUE & TRUE
TRUE & FALSE
FALSE & FALSE

In [None]:
# " is logical OR which compares two values and returns TRUE if one or the other is TRUE
TRUE | TRUE
TRUE | FALSE
FALSE | FALSE

# Logic expressions as triggers for other events
`if`, `else`, and `else if` structure

In [None]:
x = TRUE
if(x==TRUE){
    print("x is true")
}

# if x were not true, then nothing happens, the code simply continues on

In [None]:
x = FALSE
if(x==TRUE){
    print("x is true")
} else {
    print("x is false")
}

In [None]:
# you can play around with the value of x here and rerun the block
x = 3

if(x==1){
    print("x equals 1")
} else if(x==2){
    print("x equals 2")
} else if(x==3){
    print("x equals 3")
} else {
    print("out of conditions")
}

# Loops are how code can repeat certain actions
`while` loop vs `for` loop

## while() is for logical tests
`while(logical test is true){
    do a thing
    iterate the logical test, else you end up in an infinite loop
}`

In [None]:
x = 1
while(x < 5){
    print(x)
    x=x+1 # if you don't iterate your variable/counter, you will end up in an endless loop
}

## for() works to iterate through items in a vector such as a list, etc [naturally iterates through items in the list]
`for(item in a list){
    do thing to / with item
}`

In [None]:
# prints items in a vector, 1:5 is an integer vector meaning 1, 2, 3, 4, 5
for(x in 1:5){
    print(x)
}

In [None]:
# c() combines items into a vector, regardless of type
vector = c("apple", "orange", "banana", 5, 10)
for(item in vector){
    print(item)
}

## A way to get the numerical value instead of the actual vector item, which can then be used to access elements of lists

In [None]:
vec2 = c("one","two", 3, TRUE, FALSE)

# length() provides the count of items, starting from 1
length(vec2)

# recall that this would just indicate a vector of 1, 2, 3, 4, 5
1:5

# so then this just serially prints each number
for(i in 1:length(vec2)){
    print(i)
}

## This method can be used to access the same elements in order

In [None]:
for(i in 1:length(vec2)){
    print( vec2[i])
}

## Or starting from a different element if we change the 1

In [None]:
# elements 3 to the end
for(i in 3:length(vec2)){
    print( vec2[i])
}

## You could also specify both numbers

In [None]:
# elements 2 through 4 
for(i in 2:4){
    print( vec2[i])
}

## This can also be used in something else that has elements in a vector as well, for example the first vector

In [None]:
for(i in 1:length(vec2)){
    print( vector[i] )
}

## Nested `for` loops can be used to iterate through two different vectors or multiple dimensions of one object (will show alongside dataframes)

In [None]:
x = 1:5
y = c("a", "b", "c")

for(i in x){
    for(j in y){
        # cat is concatenate and print
        # %i is for integer
        # \n is the new line character
        cat(sprintf("(%i, %s)\n", i, j))
    }
}

# Working with the file directory

## Get the current working directory, default is the folder the script is in

In [None]:
wd <- getwd()
wd

## Get the basename of the input directory path, using the working directory as an example

In [None]:
basename(wd)

## List the files of the input directory path, using the working directory as an example

In [None]:
list.files(wd)

## Add strings to each other using `paste()`, the default separation string is one space

In [None]:
# default behavior is to separate each string with a space, " "
paste("this","that","and another thing",".")

# but you can change that
paste("this","that","and another thing",".",sep=", ")

### `paste0()` default separate is no space

In [None]:
paste0("No","Space")