Skip to content

Commit

Permalink
add R script and makefile command to check urls for latest news
Browse files Browse the repository at this point in the history
  • Loading branch information
sckott committed Apr 23, 2018
1 parent 6041b51 commit 8deaca2
Show file tree
Hide file tree
Showing 6 changed files with 46 additions and 5 deletions.
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
check:
Rscript -e 'source("check_urls.R")'
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
ropensci weekly emails
========

check urls for a weekly news:

```
make check
```
2 changes: 1 addition & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ author: ropensci

markdown: kramdown

exclude: ["posts_heatmap_calendar.md", "ignore"]
exclude: ["posts_heatmap_calendar.md", "ignore", "Makefile", "check_urls.R", "README.md"]
2 changes: 0 additions & 2 deletions _site/README.md

This file was deleted.

4 changes: 2 additions & 2 deletions _site/feed.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
<description>rOpenSci Newsletter, including software, software reviews, what's happening in the community, events, and more.</description>
<link>http://localhost:4000/</link>
<atom:link href="http://localhost:4000/feed.xml" rel="self" type="application/rss+xml" />
<pubDate>Mon, 23 Apr 2018 16:01:47 -0700</pubDate>
<lastBuildDate>Mon, 23 Apr 2018 16:01:47 -0700</lastBuildDate>
<pubDate>Mon, 23 Apr 2018 16:03:02 -0700</pubDate>
<lastBuildDate>Mon, 23 Apr 2018 16:03:02 -0700</lastBuildDate>
<generator>Jekyll v3.7.3</generator>

<item>
Expand Down
35 changes: 35 additions & 0 deletions check_urls.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
options(stringsAsFactors = FALSE)

# helper functions
stract <- function(str, pattern) regmatches(str, regexpr(pattern, str))
last <- function(x) x[length(x)]

# get most recent news file path
f <- list.files("_site", full.names = TRUE, pattern = "[0-9]{4}-[0-9]{2}-[0-9]{2}")
fdates <- stract(f, "[0-9]{4}-[0-9]{2}-[0-9]{2}")
path <- file.path(grep(last(sort(fdates)), f, value = TRUE), "index.html")
cat("\nchecking ", path, "\n")

# extract URLs

require(xml2, quietly = TRUE, warn.conflicts = FALSE)
html <- read_html(path)
bod <- xml_find_all(html, "//body")
urls <- unique(xml_attr(xml_find_all(bod, '//a[contains(@href, "http")]'), "href"))
cat("found", length(urls), "URLs", "\n")

# check URLs

library(crul, quietly = TRUE, warn.conflicts = FALSE)
conn <- crul::Async$new(urls = urls)
res <- conn$get()
stats <- vapply(res, "[[", numeric(1), "status_code")
df <- data.frame(url = urls, code = stats)
bad <- df[df$code >= 400 | df$code < 200, ]
if (NROW(bad) == 0) {
cat("all good :)", "\n")
} else {
# cat("check the following:\n", paste0(urls[1:2], collapse = "\n "), "\n")
cat("check the following:", "\n")
print(bad)
}

0 comments on commit 8deaca2

Please sign in to comment.