<img src="http://cognitiveclass.ai/wp-content/uploads/2017/11/cc-logo-square.png" width="150">




<h1 align=center>Writing your own CUSTOM FUNCTIONS in R</h1> 



## Table of Contents

<ul>
<li><a href="#About-the-Dataset">About the Dataset</a></li>
<li><a href="#What-is-a-Function?">What is a Function?</a></li>
<li><a href="#Explicitly-returning-outputs-in-user-defined-functions">Explicitly returning outputs in user-defined functions</a></li>
<li><a href="#Using-IF/ELSE-statements-in-functions">Using IF/ELSE statements in functions</a></li>
<li><a href="#Setting-default-argument-values-in-your-custom-functions">Setting default argument values in your custom functions</a></li>
<li><a href="#Using-functions-within-functions">Using functions within functions</a></li>
<li><a href="#Global-and-local-variables">Global and local variables</a></li>
</ul>
<p></p>
Estimated Time Needed: <strong>25 min</strong>


<hr>

<a id="ref0"></a>
<h2 align=center>About the Dataset</h2>

Imagine you got many movie recomendations from your friends and compiled all of the recomendations in a table, with specific info about each movie.

The table has one row for each movie and several columns

- **name** - The name of the movie
- **year** - The year the movie was released
- **length_min** - The lenght of the movie in minutes
- **genre** - The genre of the movie
- **average_rating** - Average rating on Imdb
- **cost_millions** - The movie's production cost in millions
- **sequences** - The amount of sequences
- **foreign** - Indicative of whether the movie is foreign (1) or domestic (0)
- **age_restriction** - The age restriction for the movie
<br>

You can see part of the dataset below

<img src="https://ibm.box.com/shared/static/6kr8sg0n6pc40zd1xn6hjhtvy3k7cmeq.png">

Lets first download the dataset that we will use in this notebook:

In [1]:
# code to download the dataset
download.file("https://ibm.box.com/shared/static/n5ay5qadfe7e1nnsv5s01oe1x62mq51j.csv", destfile="movies-db.csv")

<hr>

<a id='ref1'></a>
<center><h2>What is a Function?</h2></center>

A function is a re-usable block of code which performs operations specified in the function.

There are two types of functions :

- **Pre-defined functions**
- **User defined functions**

<b>Pre-defined</b> functions are those that are already defined for you, whether it's in R or within a package. For example, **`sum()`** is a pre-defined function that returns the sum of its numeric inputs.

<b>User-defined</b> functions are custom functions created and defined by the user. For example, you can create a custom function to print **Hello World**.

<h3><b>Pre-defined functions</b></h3>

There are many pre-defined functions, so let's start with the simple ones.

Using the **`mean()`** function, let's get the average of these three movie ratings:
- **Star Wars (1977)** - rating of 8.7 
- **Jumanji** - rating of 6.9
- **Back to the Future** - rating of 8.5

In [2]:
ratings <- c(8.7, 6.9, 8.5)
mean(ratings)

We can use the **`sort()`** function to sort the movies rating in _ascending order_.

In [3]:
sort(ratings)

You can also sort by _decreasing_ order, by adding in the argument **`decreasing = TRUE`**.

In [4]:
sort(ratings, decreasing = TRUE)

<div class="alert alert-success alertsuccess" style="margin-top: 20px">
<h4> [Tip] How do I learn more about the pre-defined functions in R? </h4>
<p></p>
We will be introducing a variety of **pre-defined functions** to you as you learn more about R. There are just too many functions, so there's no way we can teach them all in one sitting. But if you'd like to take a quick peek, here's a short reference card for some of the commonly-used pre-defined functions:   
https://cran.r-project.org/doc/contrib/Short-refcard.pdf
</div>

<h3>User-defined functions</h3>

Functions are very easy to create in R:

In [5]:
printHelloWorld <- function(){
    print("Hello World")
}
printHelloWorld()

[1] "Hello World"


To use it, simply run the function with **`()`** at the end:

In [6]:
printHelloWorld()    # NOTE: we just ran this command above

[1] "Hello World"


But what if you want the function to provide some **output** based on some **inputs**?

In [7]:
# NOTE: This is a pointless function and does nothing useful
add <- function(x, y) { 
    x + y
add(3, 4)

As you can see above, you can create functions with the following syntax to take in inputs (as its arguments), then provide some output.

**`f <- function(<arguments>) {  `    
  `  Do something`  
  `  Do something`  
  `  return(some_output)`  
`}  `**


<hr>

<a id='ref2'></a>
<center><h2>Explicitly returning outputs in user-defined functions</h2></center>

In R, the last line in the function is automatically inferred as the output the function. 

#### You can also explicitly tell the function to return an output.

In [8]:
add <- function(x, y){
    return(x + y)
}
add(3, 4)

It's good practice to use the `return()` function to explicitly tell the function to return the output.

<hr>

<a id='ref3'></a>
<center><h2>Using IF/ELSE statements in functions</h2></center>

The **`return()`** function is particularly useful if you have any IF statements in the function, when you want your output to be dependent on some condition:

In [9]:
isGoodRating <- function(rating){
    #This function returns "NO" if the input value is less than 7. Otherwise it returns "YES".
    
    if(rating < 7){
        return("NO") # return NO if the movie rating is less than 7
    
    }else{
        return("YES") # otherwise return YES
    }
}

isGoodRating(6)    # runs the function
isGoodRating(9.5)  # runs the function

<hr>

<a id='ref4'></a>
<center><h2>Setting default argument values in your custom functions</h2></center>

You can a set a default value for arguments in your function. For example, in the **`isGoodRating()`** function, what if we wanted to create a threshold for what we consider to be a good rating?  
  
Perhaps by default, we should set the threshold to 7:

In [31]:
# NOTE: This is not a good example of using a threshold.
#  If the function were more complex and referenced a constant
#    more than once, setting a threshold at the beginning would 
#    make maintaining the code and updating it easier.

isGoodRating <- function(rating, threshold = 7){
    if(rating < threshold){
        return("NO") # return NO if the movie rating is less than the threshold
    }else{
        return("YES") # otherwise return YES
    }
}

isGoodRating(6)
isGoodRating(10)

Notice how we did not have to explicitly specify the second argument (threshold), but we could specify it. Let's say we have a higher standard for movie ratings, so let's bring our threshold up to 8.5:

In [11]:
# Unlike using a constant, using a threshold gives the user 
#   the option to override the threshold and specify their own
#   value for threshold.

isGoodRating(8, threshold = 8.5)

Great! Now you know how to create default values. **Note that** if you know the order of the arguments, you do not need to write out the argument, as in:

In [12]:
# This is the same as the above.

isGoodRating(8, 8.5) #rating = 8, threshold = 8.5

<hr>

<a id='ref5'></a>
<center><h2>Using functions within functions</h2></center>

Using functions within functions is no big deal. In fact, you've already used the **`print()`** and **`return()`** functions. So let's try making our **`isGoodRating()`** more interesting.

Let's create a function that can help us decide on which movie to watch, based on its rating. We should be able to provide the name of the movie, and it should return **NO** if the movie rating is below 7, and **YES** otherwise.

First, let's read in our movies data:

In [13]:
my_data <- read.csv("movies-db.csv")
head(my_data)

name,year,length_min,genre,average_rating,cost_millions,foreign,age_restriction
Toy Story,1995,81,Animation,8.3,30.0,0,0
Akira,1998,125,Animation,8.1,10.4,1,14
The Breakfast Club,1985,97,Drama,7.9,1.0,0,14
The Artist,2011,100,Romance,8.0,15.0,1,12
Modern Times,1936,87,Comedy,8.6,1.5,0,10
Fight Club,1999,139,Drama,8.9,63.0,0,18


Next, do you remember how to check the value of the **average_rating** column if we specify a movie name?  
Here's how:

In [24]:
# This is a review of conditional execution based on a Boolean condition
# Returns the value(s) of cost_2014 when: cost_2014 > 8.3 equals TRUE
cost_2014 <- c(8.6, 8.5, 8.1)
cost_2014[cost_2014 > 8.3]    # return only the elements where this logical condition is TRUE


In [26]:
# This is a sandbox area to try out different things

# my_data <- read.csv("movies-db.csv")    # reads in the dataset
# my_data             # prints out the entire dataset
# head(my_data)         # prints out the header and first six rows

# prints the average rating for all cases
# my_data["average_rating"]

# prints the average rating if: my_data$name == "Fight Club"
# my_data[my_data$name == "Fight Club", "average_rating"]

# cond_test1 <- my_data[my_data$name == "The Artist", "average_rating"]
# akira


In [35]:
# NOTE: The threashold value in the "isGoodRating" function is 7
# creates a new variable named "akira" which is equal to the variable "average_rating" when name == "Akira" is TRUE

# Within myData, the row should be where the first column equals "Akira"
# AND the column should be "average_rating"

akira <- my_data[my_data$name == "Akira", "average_rating"]
akira

isGoodRating(akira)    # uses the "isGoodRatign" defined above

Now, let's put this all together into a function, that can take any **moviename** and return a **YES** or **NO** for whether or not we should watch it.

In [34]:
# NOTE: The threashold value in the "isGoodRating" function is 7
# NOTE: "my_data" is a dataframe that was created above
# Creates a new function named "watchMovie", which uses the "isGoodRating" function defined above.

# NOTE: this function requires us to specify a data source (which is a data frame named my_data)
watchMovie <- function(data, moviename){
    rating <- data[data["name"] == moviename,"average_rating"]
    return(isGoodRating(rating))
}

watchMovie(my_data, "Akira")        # movie rating is greater than 7
# watchMovie(my_data, "High School Musical")  # movie rating is less than 7

**Make sure you take the time to understand the function above.** Notice how the function expects two inputs: `data` and `moviename`, and so when we use the function, we must also input two arguments.

*But what if we only want to watch really good movies? How do we set our rating threshold that we created earlier? *
<br>
Here's how:

In [36]:
# NOTE: this function requires us to specify a data source
watchMovie <- function(data, moviename, my_threshold){
    rating <- data[data$name == moviename,"average_rating"]  # If "data[data$name == moviename" is TRUE, rating <- "average rating"
    return(isGoodRating(rating, threshold = my_threshold))
}

Now our watchMovie takes three inputs: **data**, **moviename** and **my_threshold**

In [37]:
# NOTE: this function requires us to specify a data source (which is a data frame named my_data)
watchMovie(my_data, "Akira", 7)

*What if we want to still set our default threshold to be 7?*
<br>
Here's how we can do it:

In [48]:
watchMovie <- function(data, moviename, my_threshold = 7){
    rating <- data[data[,1] == moviename,"average_rating"]
    return(isGoodRating(rating, threshold = my_threshold))
}

watchMovie(my_data,"Akira")    # uses the default threshold of 7
# watchMovie(my_data, "Akira", 9)  # changes the threshold to 9

As you can imagine, if we assign the output to a variable, the variable will be assigned to **YES**

In [49]:
a <- watchMovie(my_data, "Akira")
a

While the **watchMovie** is easier to use, I can't tell what the movie rating actually is. How do I make it *print* what the actual movie rating is, before giving me a response? To do so, we can simply add in a **print** statement before the final line of the function.  

We can also use the built-in **`paste()`** function to concatenate a sequence of character strings together into a single string.

In [51]:
# NOTE: this function DOES NOT require us to specify a data source
watchMovie <- function(moviename, my_threshold = 7){
    rating <- my_data[my_data[,1] == moviename,"average_rating"] # the "rating" is equal to the "average_rating" of whatever movie name I type in

    memo <- paste("The movie rating for", moviename, "is", rating)
    print(memo)
    
    return(isGoodRating(rating, threshold = my_threshold))
}

watchMovie("Akira")    # the default threshold value is 7
# watchMovie("Fight Club")
# watchMovie("Akira", 9)    # changes the threshold value to 9


[1] "The movie rating for Akira is 8.1"


Just note that the returned output is actually the resulting value of the function:

In [52]:
x <- watchMovie("Akira")

[1] "The movie rating for Akira is 8.1"


In [53]:
print(x)

[1] "YES"


<hr>

<a id='ref6'></a>
<center><h2>Global and local variables</h2></center>

So far, we've been creating variables within functions, but did you notice what happens to those variables outside of the function?  

Let's try to see what **memo** returns:

In [54]:
watchMovie <- function(moviename, my_threshold = 7){
    rating <- my_data[my_data[,1] == moviename,"average_rating"]
    
    memo <- paste("The movie rating for", moviename, "is", rating)
    print(memo)
    
    isGoodRating(rating, threshold = my_threshold)
}

watchMovie("Akira")

[1] "The movie rating for Akira is 8.1"


In [55]:
memo    # memo is a local variable within the function

ERROR: Error in eval(expr, envir, enclos): object 'memo' not found


**We got an error:**  ` object 'memo' not found`. **Why?**  

It's because all the variables we create in the function remain within the function. In technical terms, this is a **local variable**, meaning that the variable assignment does not persist outside the function. The `memo` variable only exists within the function.    

But there is a way to create **global variables** from within a function -- where you can use the global variable outside of the function. It is typically _not_ recommended that you use global variables, since it may become harder to manage your code, so this is just for your information.  

To create a **global variable**, we need to use this syntax:
> **`x <<- 1`**


Here's an example of a global variable assignment:

In [56]:
myFunction <- function(){
    y <<- 3.14           # <<- is used to define a global variable
    return("Hello World")
    }
myFunction()

In [57]:
y #created only in the myFunction function

<hr>

#### Scaling R with big data

As you learn more about R, if you are interested in exploring platforms that can help you run analyses at scale, you might want to sign up for a free account on [IBM Watson Studio](http://cocl.us/dsx_rp0101en), which allows you to run analyses in R with two Spark executors for free.

<hr>

### About the Author:  
Hi! It's [Aditya Walia](https://ca.linkedin.com/in/aditya-walia-14b678bb), the author of this notebook. I hope you found R easy to learn! There's lots more to learn about R but you're well on your way. Feel free to connect with me if you have any questions.
<hr>


Copyright &copy; [IBM Cognitive Class](https://cognitiveclass.ai). This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license/).
