Add travis, appveyor, covr, + rename function arguments

juliasilge · Jun 13, 2019 · 2182321 · 2182321
1 parent e117ba8
commit 2182321
Show file tree

Hide file tree

Showing 10 changed files with 101 additions and 20 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -3,3 +3,6 @@
 ^LICENSE\.md$
 ^README\.Rmd$
 ^CODE_OF_CONDUCT\.md$
+^\.travis\.yml$
+^appveyor\.yml$
+^codecov\.yml$
diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,7 @@
+# R for travis: see documentation at https://docs.travis-ci.com/user/languages/r
+
+language: R
+cache: packages
+
+after_success:
+  - Rscript -e 'covr::codecov()'
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -22,6 +22,7 @@ Suggests:
     tidytext,
     janeaustenr,
     ggplot2,
-    testthat (>= 2.1.0)
+    testthat (>= 2.1.0),
+    covr
 VignetteBuilder: knitr
 RoxygenNote: 6.1.1
diff --git a/R/bind_log_odds.R b/R/bind_log_odds.R
@@ -5,14 +5,14 @@
 #' is added as a column. This functions supports non-standard evaluation through
 #' the tidyeval framework.
 #'
-#' @param tbl A tidy dataset with one row per item and feature
-#' @param item Column of items for identifying differences, such as words or
+#' @param tbl A tidy dataset with one row per feature and set
+#' @param feature Column of features for identifying differences, such as words or
 #' bigrams with text data
-#' @param feature Column of features between which to compare items, such as
+#' @param set Column of sets between which to compare features, such as
 #' documents for text data
-#' @param n Column containing item-feature counts
+#' @param n Column containing feature-set counts
 #'
-#' @details The arguments \code{item}, \code{feature}, and \code{n}
+#' @details The arguments \code{feature}, \code{set}, and \code{n}
 #' are passed by expression and support \link[rlang]{quasiquotation};
 #' you can unquote strings and symbols. Grouping is preserved but ignored.
 #'
@@ -40,25 +40,25 @@
 #' @importFrom dplyr count left_join mutate rename group_by ungroup group_vars
 #' @export
 
-bind_log_odds <- function(tbl, item, feature, n) {
-    item <- enquo(item)
+bind_log_odds <- function(tbl, feature, set, n) {
     feature <- enquo(feature)
+    set <- enquo(set)
     n_col <- enquo(n)
 
     ## groups are preserved but ignored
     grouping <- group_vars(tbl)
     tbl <- ungroup(tbl)
 
-    freq1_df <- count(tbl, !!item, wt = !!n_col)
+    freq1_df <- count(tbl, !!feature, wt = !!n_col)
     freq1_df <- rename(freq1_df, freq1 = n)
 
-    freq2_df <- count(tbl, !!feature, wt = !!n_col)
+    freq2_df <- count(tbl, !!set, wt = !!n_col)
     freq2_df <- rename(freq2_df, freq2 = n)
 
-    df_joined <- left_join(tbl, freq1_df, by = as_name(item))
+    df_joined <- left_join(tbl, freq1_df, by = as_name(feature))
     df_joined <- mutate(df_joined, freqnotthem = freq1 - !!n_col)
     df_joined <- mutate(df_joined, total = sum(!!n_col))
-    df_joined <- left_join(df_joined, freq2_df, by = as_name(feature))
+    df_joined <- left_join(df_joined, freq2_df, by = as_name(set))
     df_joined <- mutate(df_joined,
                         freq2notthem = total - freq2,
                         l1them = (!!n_col + freq1) / ((total + freq2) - (!!n_col + freq1)),

diff --git a/README.Rmd b/README.Rmd
@@ -19,6 +19,9 @@ theme_set(theme_light())
 
 
 <!-- badges: start -->
+[![Travis build status](https://travis-ci.org/juliasilge/tidylo.svg?branch=master)](https://travis-ci.org/juliasilge/tidylo)
+[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/juliasilge/tidylo?branch=master&svg=true)](https://ci.appveyor.com/project/juliasilge/tidylo)
+[![Codecov test coverage](https://codecov.io/gh/juliasilge/tidylo/branch/master/graph/badge.svg)](https://codecov.io/gh/juliasilge/tidylo?branch=master)
 <!-- badges: end -->
 
 How can we measure how the usage or frequency of some **feature**, such as words, differs across some group or **set**, such as documents? One option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability; we haven't counted every feature the same number of times so how do we know which differences are meaningful? 

diff --git a/README.md b/README.md
@@ -10,6 +10,9 @@
 
 
 <!-- badges: start -->
+[![Travis build status](https://travis-ci.org/juliasilge/tidylo.svg?branch=master)](https://travis-ci.org/juliasilge/tidylo)
+[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/juliasilge/tidylo?branch=master&svg=true)](https://ci.appveyor.com/project/juliasilge/tidylo)
+[![Codecov test coverage](https://codecov.io/gh/juliasilge/tidylo/branch/master/graph/badge.svg)](https://codecov.io/gh/juliasilge/tidylo?branch=master)
 <!-- badges: end -->
 
 How can we measure how the usage or frequency of some **feature**, such as words, differs across some group or **set**, such as documents? One option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability; we haven't counted every feature the same number of times so how do we know which differences are meaningful? 

diff --git a/appveyor.yml b/appveyor.yml
@@ -0,0 +1,52 @@
+# DO NOT CHANGE the "init" and "install" sections below
+
+# Download script file from GitHub
+init:
+  ps: |
+        $ErrorActionPreference = "Stop"
+        Invoke-WebRequest http://raw.github.com/krlmlr/r-appveyor/master/scripts/appveyor-tool.ps1 -OutFile "..\appveyor-tool.ps1"
+        Import-Module '..\appveyor-tool.ps1'
+
+install:
+  ps: Bootstrap
+
+cache:
+  - C:\RLibrary
+
+environment:
+  NOT_CRAN: true
+  # env vars that may need to be set, at least temporarily, from time to time
+  # see https://github.com/krlmlr/r-appveyor#readme for details
+  # USE_RTOOLS: true
+  # R_REMOTES_STANDALONE: true
+
+# Adapt as necessary starting from here
+
+build_script:
+  - travis-tool.sh install_deps
+
+test_script:
+  - travis-tool.sh run_tests
+
+on_failure:
+  - 7z a failure.zip *.Rcheck\*
+  - appveyor PushArtifact failure.zip
+
+artifacts:
+  - path: '*.Rcheck\**\*.log'
+    name: Logs
+
+  - path: '*.Rcheck\**\*.out'
+    name: Logs
+
+  - path: '*.Rcheck\**\*.fail'
+    name: Logs
+
+  - path: '*.Rcheck\**\*.Rout'
+    name: Logs
+
+  - path: '\*_*.tar.gz'
+    name: Bits
+
+  - path: '\*_*.zip'
+    name: Bits
diff --git a/codecov.yml b/codecov.yml
@@ -0,0 +1,12 @@
+comment: false
+
+coverage:
+  status:
+    project:
+      default:
+        target: auto
+        threshold: 1%
+    patch:
+      default:
+        target: auto
+        threshold: 1%
diff --git a/man/bind_log_odds.Rd b/man/bind_log_odds.Rd
diff --git a/vignettes/tidy_log_odds.Rmd b/vignettes/tidy_log_odds.Rmd
@@ -84,7 +84,7 @@ Why you might choose log odds over tf-idf? TODO for Tyler
 
 ## Counting things other than words
 
-Text analysis is a main motivator for this implementation of weighted log odds, but this is a general approach for measuring how much more likely one item (any kind of item, not just a word or bigram) is to be associated than another for some set of features (any kind of feature, not just a document or book).
+Text analysis is a main motivator for this implementation of weighted log odds, but this is a general approach for measuring how much more likely one feature (any kind of feature, not just a word or bigram) is to be associated than another for some set or group (any kind of set, not just a document or book).
 
 To demonstrate this, let's look at everybody's favorite data about cars. What do we know about the relationship between number of gears and engine shape `vs`?
 
@@ -104,4 +104,4 @@ gear_counts %>%
 
 For engine shape `vs = 0`, having three gears has the highest log odds while for engine shape `vs = 1`, hvaing four gears has the highest log odds. This dataset is small enough that you can look at the count data and see how this is working. 
 
-More importantly, you can notice that this approach is useful both in the initial motivating example of text data but also more generally whenever you have counts in some kind of groups and you want to find what is more likely to come from which group, compared to the other groups.
+More importantly, you can notice that this approach is useful both in the initial motivating example of text data but also more generally whenever you have counts in some kind of groups and you want to find what feature is more likely to come from which group, compared to the other groups.