# Create Tests

The next step for creating a DataCamp project in R is creating a few tests using the [`testthat` package](https://github.com/r-lib/testthat), which is how instructors deliver feedback on the code students write in a project. The [testing chapter](http://r-pkgs.had.co.nz/tests.html) in Hadley Wickham's _R packages_ book gets into the details of testing with `testthat`, along with workflow advice and concrete examples.

After installing the necessary libraries (described below), please create tests for the project tasks below, which were taken from a real live DataCamp project!

When complete, please email the link to your forked repo to projects@datacamp.com with the email subject line _DataCamp project tests_. If you have any questions, please reach out to projects@datacamp.com.

In [1]:
# To be able to run tests locally in the notebook, install the following:
# install.packages("devtools")
# install.packages("testthat")
# devtools::install_github('datacamp/IRkernel.testthat')

## An example of a `testthat` test

Instructions to the student in the project:

- Load the `readr` and `dplyr` packages.
- Load the dataset, "by_tag_year.csv", into a variable named `by_tag_year` using the `read_csv()` function (**not** `read.csv()`).
- Print `by_tag_year`.

A potential **incorrect** submission is as follows. Please process the cell below. *Note: `by_tag_year.csv` exists in a directory named `datasets` in the same directory as this `create_tests.ipynb` notebook.*

In [2]:
# Load packages
library(readr)

# Load dataset
by_tag_year <- read.csv("datasets/by_tag_year.csv")

# Inspect the dataset
head(by_tag_year)

year,tag,number,year_total
2008,.htaccess,54,58390
2008,.net,5910,58390
2008,.net-2.0,289,58390
2008,.net-3.5,319,58390
2008,.net-4.0,6,58390
2008,.net-assembly,3,58390


Inspect the `testthat` test below, then run the test locally. (Processing the cell above followed by the cell below will run the test locally.)

In [3]:
# These packages need to be loaded in the first `@tests` cell. 
library(testthat) 
library(IRkernel.testthat)

# The purpose of tests are to try to catch common errors and to 
# give the student a hint on how to resolve these errors.
# The solution should pass the tests.
run_tests({
    test_that("packages are loaded", {
    expect_true("readr" %in% .packages(), info = "Did you load the readr package?")
    expect_true("dplyr" %in% .packages(), info = "Did you load the dplyr package?")
    })
    
    test_that("by_tag_year is correct", {
    expect_is(by_tag_year, "tbl_df", 
        info = "Did you read in by_tag_year with read_csv (not read.csv)?")
    expect_equal(nrow(by_tag_year), 40518, 
        info = "Did you read in by_tag_year with read_csv?")
    })
})

0/2 tests passed
> fail :: packages are loaded
success
"dplyr" %in% .packages() isn't true.
Did you load the dplyr package?
 
---
> fail :: by_tag_year is correct
`by_tag_year` inherits from `data.frame` not `tbl_df`.
Did you read in by_tag_year with read_csv (not read.csv)?
 
---

A potential **correct** submission is as follows. Please process the cell below to overwrite the previous incorrect solution.

In [4]:
# Load package
library(readr)
library(dplyr)

# Load dataset
by_tag_year <- read_csv("datasets/by_tag_year.csv")

# Inspect the dataset
print(by_tag_year)


Attaching package: ‘dplyr’

The following object is masked from ‘package:testthat’:

    matches

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Parsed with column specification:
cols(
  year = col_double(),
  tag = col_character(),
  number = col_double(),
  year_total = col_double()
)


# A tibble: 40,518 x 4
    year tag           number year_total
   <dbl> <chr>          <dbl>      <dbl>
 1  2008 .htaccess         54      58390
 2  2008 .net            5910      58390
 3  2008 .net-2.0         289      58390
 4  2008 .net-3.5         319      58390
 5  2008 .net-4.0           6      58390
 6  2008 .net-assembly      3      58390
 7  2008 .net-core          1      58390
 8  2008 2d                42      58390
 9  2008 32-bit            19      58390
10  2008 32bit-64bit        4      58390
# ... with 40,508 more rows


In [5]:
run_tests({
    test_that("packages are loaded", {
    expect_true("readr" %in% .packages(), info = "Did you load the readr package?")
    expect_true("dplyr" %in% .packages(), info = "Did you load the dplyr package?")
    })
    
    test_that("by_tag_year is correct", {
    expect_is(by_tag_year, "tbl_df", 
        info = "Did you read in by_tag_year with read_csv (not read.csv)?")
    expect_equal(nrow(by_tag_year), 40518, 
        info = "Did you read in by_tag_year with read_csv?")
    })
})

2/2 tests passed

## Create a test
It's your turn to create a test now!

Instructions to the student in the project:
* Use `mutate()` to add a column called `fraction` to `by_tag_year`, representing `number` divided by `year_total`. Name the new table `by_tag_year_fraction`.
* Print `by_tag_year_fraction`.

A potential **incorrect** submission is as follows. Please process the cell below.

In [6]:
# Add fraction column
by_tag_year_fraction <- by_tag_year %>%
  mutate(frac = number / year_total^2)

# Print the new table
print(by_tag_year_fraction)

# A tibble: 40,518 x 5
    year tag           number year_total     frac
   <dbl> <chr>          <dbl>      <dbl>    <dbl>
 1  2008 .htaccess         54      58390 1.58e- 8
 2  2008 .net            5910      58390 1.73e- 6
 3  2008 .net-2.0         289      58390 8.48e- 8
 4  2008 .net-3.5         319      58390 9.36e- 8
 5  2008 .net-4.0           6      58390 1.76e- 9
 6  2008 .net-assembly      3      58390 8.80e-10
 7  2008 .net-core          1      58390 2.93e-10
 8  2008 2d                42      58390 1.23e- 8
 9  2008 32-bit            19      58390 5.57e- 9
10  2008 32bit-64bit        4      58390 1.17e- 9
# ... with 40,508 more rows


Please fill in the following `testthat` test template to test if:
- `by_tag_year_fraction` is of class `tbl_df`
- a new column named "fraction" was created (hint: use `colnames()` with the `%in%` operator)
- the contents of the "fraction" column are correct

Include a helpful feedback message for failing submissions. The test should fail since the above solution is incorrect.

In [7]:
run_tests({
    # .... YOUR TEST(S) HERE ....
    test_that("by_tag_year_fraction is correct", {
        expect_is(by_tag_year_fraction, "tbl_df", 
              info = "Did you use `mutate()` with the correct arguments?")
        expect_true("fraction"%in%colnames(by_tag_year_fraction),
                info = "by_tag_year_fraction does not have a column named 'fraction' did you use mutate to name your column 'fraction'?")
        expect_identical(by_tag_year_fraction$fraction[1], 54/58390, 
                         info = "The column fraction does not have the correct value, did you check your equation?")
    })
})

0/1 tests passed
> fail :: by_tag_year_fraction is correct
success
"fraction" %in% colnames(by_tag_year_fraction) isn't true.
by_tag_year_fraction does not have a column named 'fraction' did you use mutate to name your column 'fraction'?
 
---

A potential **correct** solution is as follows. Please process the cell below.

In [8]:
# Add fraction column
by_tag_year_fraction <- by_tag_year %>%
  mutate(fraction = number / year_total)

# Print the new table
print(by_tag_year_fraction)

# A tibble: 40,518 x 5
    year tag           number year_total  fraction
   <dbl> <chr>          <dbl>      <dbl>     <dbl>
 1  2008 .htaccess         54      58390 0.000925 
 2  2008 .net            5910      58390 0.101    
 3  2008 .net-2.0         289      58390 0.00495  
 4  2008 .net-3.5         319      58390 0.00546  
 5  2008 .net-4.0           6      58390 0.000103 
 6  2008 .net-assembly      3      58390 0.0000514
 7  2008 .net-core          1      58390 0.0000171
 8  2008 2d                42      58390 0.000719 
 9  2008 32-bit            19      58390 0.000325 
10  2008 32bit-64bit        4      58390 0.0000685
# ... with 40,508 more rows


Please copy and paste the test you just wrote into the cell below and process it. The test should pass.

In [9]:
run_tests({
    # .... YOUR TEST(S) HERE ....
    test_that("by_tag_year_fraction is correct", {
        expect_is(by_tag_year_fraction, "tbl_df", 
              info = "Did you use `mutate()` with the correct arguments?")
        expect_true("fraction"%in%colnames(by_tag_year_fraction),
                info = "by_tag_year_fraction does not have a column named 'fraction' did you use mutate to name your column 'fraction'?")
        expect_identical(by_tag_year_fraction$fraction[1], 54/58390, 
                         info = "The column fraction does not have the correct value, did you check your equation?")
    })
})

1/1 tests passed