-
- What does this package do? (explain in 50 words or less)
Web scraping allows to gather information of scientific value - mainly social science related in my experience. While scraping web pages one should respect permissions declared in robots.txt files.
The package provides functions to retrieve and parse robots.txt files. The core functionality is to check a bots/users permission for one or more resources (paths) for a given domain. To ease checking all functions have been bundled with relevant data into an R6 robotstxt class but everything works functional or object oriented depending on the users preferences.
-
- Paste the full DESCRIPTION file inside a code block (bounded by ``` on either end).
Package: robotstxt
Type: Package
Title: A 'robots.txt' Parser and Webbot/Webspider/Webcrawler Permissions Checker
Version: 0.1.0
Author: Peter Meissner
Maintainer: Peter Meissner <retep.meissner@gmai.com>
Description: Class ('R6') and accompanying methods to
parse and check 'robots.txt' files. Data fields are provided as
data frames and vectors. Permissions can be checked by providing
path character vectors and optional bot names.
License: MIT + file LICENSE
LazyData: TRUE
BugReports: https://github.com/petermeissner/robotstxt/issues
URL: https://github.com/petermeissner/robotstxt
Imports:
R6 (>= 2.1.1),
stringr (>= 1.0.0),
httr (>= 1.0.0)
Suggests:
knitr,
rmarkdown,
dplyr,
testthat
Depends:
R (>= 3.0.0)
VignetteBuilder: knitr
RoxygenNote: 5.0.1
-
- URL for the package (the development repository, not a stylized html page)
https://github.com/petermeissner/robotstxt
-
- What data source(s) does it work with (if applicable)?
robots.txt files like:
Package developers and users that want an easy way to be nice while gathering data from the web.
-
- Are there other R packages that accomplish the same thing? If so, what is different about yours?
None that I know of.
Yes, good guidelines!
No.
With or without ropensci.
yes, MIT
no:
* DONE
Status: OK
R CMD check succeeded
-
- Please add explanations below for any exceptions to the above:
Does not apply.
-
- If this is a resubmission following rejection, please explain the change in cirucmstances.
No, no resubmit.
Web scraping allows to gather information of scientific value - mainly social science related in my experience. While scraping web pages one should respect permissions declared in robots.txt files.
The package provides functions to retrieve and parse robots.txt files. The core functionality is to check a bots/users permission for one or more resources (paths) for a given domain. To ease checking all functions have been bundled with relevant data into an R6
robotstxtclass but everything works functional or object oriented depending on the users preferences.https://github.com/petermeissner/robotstxt
robots.txt files like:
Package developers and users that want an easy way to be nice while gathering data from the web.
None that I know of.
Yes, good guidelines!
No.
With or without ropensci.
yes, MIT
devtools::check()produce any errors or warnings? If so paste them below.no:
Does not apply.
No, no resubmit.