Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paths_allowed gives error if www is included in URL #50

Closed
mine-cetinkaya-rundel opened this issue May 1, 2020 · 10 comments
Closed

paths_allowed gives error if www is included in URL #50

mine-cetinkaya-rundel opened this issue May 1, 2020 · 10 comments

Comments

@mine-cetinkaya-rundel
Copy link
Contributor

See reprex below:

library(robotstxt)

# doesn't work
paths_allowed("https://www.google.com")
#> www.google.com
#> Error in if (is_http) {: argument is of length zero

# works
paths_allowed("https://google.com")
#>  google.com                      No encoding supplied: defaulting to UTF-8.
#> [1] TRUE

Created on 2020-05-01 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       macOS Catalina 10.15.4      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2020-05-01                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version date       lib source        
#>  assertthat     0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  backports      1.1.6   2020-04-05 [1] CRAN (R 4.0.0)
#>  callr          3.4.3   2020-03-28 [1] CRAN (R 4.0.0)
#>  cli            2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
#>  codetools      0.2-16  2018-12-24 [1] CRAN (R 4.0.0)
#>  crayon         1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  curl           4.3     2019-12-02 [1] CRAN (R 4.0.0)
#>  desc           1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools       2.3.0   2020-04-10 [1] CRAN (R 4.0.0)
#>  digest         0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
#>  ellipsis       0.3.0   2019-09-20 [1] CRAN (R 4.0.0)
#>  evaluate       0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi          0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  fs             1.4.1   2020-04-04 [1] CRAN (R 4.0.0)
#>  future         1.17.0  2020-04-18 [1] CRAN (R 4.0.0)
#>  future.apply   1.5.0   2020-04-17 [1] CRAN (R 4.0.0)
#>  globals        0.12.5  2019-12-07 [1] CRAN (R 4.0.0)
#>  glue           1.4.0   2020-04-03 [1] CRAN (R 4.0.0)
#>  highr          0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools      0.4.0   2019-10-04 [1] CRAN (R 4.0.0)
#>  httr           1.4.1   2019-08-05 [1] CRAN (R 4.0.0)
#>  knitr          1.28    2020-02-06 [1] CRAN (R 4.0.0)
#>  listenv        0.8.0   2019-12-05 [1] CRAN (R 4.0.0)
#>  magrittr       1.5     2014-11-22 [1] CRAN (R 4.0.0)
#>  memoise        1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
#>  pkgbuild       1.0.7   2020-04-25 [1] CRAN (R 4.0.0)
#>  pkgload        1.0.2   2018-10-29 [1] CRAN (R 4.0.0)
#>  prettyunits    1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
#>  processx       3.4.2   2020-02-09 [1] CRAN (R 4.0.0)
#>  ps             1.3.2   2020-02-13 [1] CRAN (R 4.0.0)
#>  R6             2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
#>  Rcpp           1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#>  remotes        2.1.1   2020-02-15 [1] CRAN (R 4.0.0)
#>  rlang          0.4.5   2020-03-01 [1] CRAN (R 4.0.0)
#>  rmarkdown      2.1     2020-01-20 [1] CRAN (R 4.0.0)
#>  robotstxt    * 0.6.2   2018-07-18 [1] CRAN (R 4.0.0)
#>  rprojroot      1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
#>  sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  spiderbar      0.2.2   2019-08-19 [1] CRAN (R 4.0.0)
#>  stringi        1.4.6   2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr        1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat       2.3.2   2020-03-02 [1] CRAN (R 4.0.0)
#>  usethis        1.6.1   2020-04-29 [1] CRAN (R 4.0.0)
#>  withr          2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun           0.13    2020-04-13 [1] CRAN (R 4.0.0)
#>  yaml           2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

It looks like the issue is related to httr and it's quite likely it's possible this PR might fix it, but I'm not sure.

@petermeissner
Copy link
Contributor

petermeissner commented May 1, 2020

Thanks for reporting.
I have not checked what is going in any depth, but I tried the PR you linked and it makes the error vanish.
Still, this needs some further investigation.

@petermeissner
Copy link
Contributor

So, its {httr} not being very forthcoming ...

  • using www.google.com it will break since the protocol (http or https) is not specified
  • using google.com will not break despite the protocol not being specified

... I would expect both to work or neither.

Checking the linked pull request and the comments I expect the issue to be resolved within httr so there I will do no changes in robotstxt.

@petermeissner
Copy link
Contributor

their will be no httr release in near future ... so I am working on a workaround within robotstxt

@mine-cetinkaya-rundel
Copy link
Contributor Author

Thank you!

@petermeissner
Copy link
Contributor

@mine-cetinkaya-rundel could you please check if the latest dev version works for you.

@mine-cetinkaya-rundel
Copy link
Contributor Author

@petermeissner It works, thank you! Though I do get warnings.

library(robotstxt)
packageVersion("robotstxt")
#> [1] '0.7.2'

# works, but warning
paths_allowed("https://www.google.com")
#>  www.google.com
#> Warning in FUN(X[[i]], ...): partial argument match of 'x' to 'xp'
#> [1] TRUE

# also works, but also warning
paths_allowed("https://google.com")
#>  google.com
#> Warning in FUN(X[[i]], ...): partial argument match of 'x' to 'xp'
#> [1] TRUE

Created on 2020-05-04 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       macOS Catalina 10.15.4      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2020-05-04                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date       lib source                             
#>  assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                     
#>  backports      1.1.6      2020-04-05 [1] CRAN (R 4.0.0)                     
#>  callr          3.4.3      2020-03-28 [1] CRAN (R 4.0.0)                     
#>  cli            2.0.2      2020-02-28 [1] CRAN (R 4.0.0)                     
#>  codetools      0.2-16     2018-12-24 [1] CRAN (R 4.0.0)                     
#>  crayon         1.3.4      2017-09-16 [1] CRAN (R 4.0.0)                     
#>  curl           4.3        2019-12-02 [1] CRAN (R 4.0.0)                     
#>  desc           1.2.0      2018-05-01 [1] CRAN (R 4.0.0)                     
#>  devtools       2.3.0      2020-04-10 [1] CRAN (R 4.0.0)                     
#>  digest         0.6.25     2020-02-23 [1] CRAN (R 4.0.0)                     
#>  ellipsis       0.3.0      2019-09-20 [1] CRAN (R 4.0.0)                     
#>  evaluate       0.14       2019-05-28 [1] CRAN (R 4.0.0)                     
#>  fansi          0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                     
#>  fs             1.4.1      2020-04-04 [1] CRAN (R 4.0.0)                     
#>  future         1.17.0     2020-04-18 [1] CRAN (R 4.0.0)                     
#>  future.apply   1.5.0      2020-04-17 [1] CRAN (R 4.0.0)                     
#>  globals        0.12.5     2019-12-07 [1] CRAN (R 4.0.0)                     
#>  glue           1.4.0      2020-04-03 [1] CRAN (R 4.0.0)                     
#>  highr          0.8        2019-03-20 [1] CRAN (R 4.0.0)                     
#>  htmltools      0.4.0.9003 2020-05-01 [1] Github (rstudio/htmltools@984b39c) 
#>  httr           1.4.1      2019-08-05 [1] CRAN (R 4.0.0)                     
#>  knitr          1.28       2020-02-06 [1] CRAN (R 4.0.0)                     
#>  listenv        0.8.0      2019-12-05 [1] CRAN (R 4.0.0)                     
#>  magrittr       1.5        2014-11-22 [1] CRAN (R 4.0.0)                     
#>  memoise        1.1.0      2017-04-21 [1] CRAN (R 4.0.0)                     
#>  pkgbuild       1.0.7      2020-04-25 [1] CRAN (R 4.0.0)                     
#>  pkgload        1.0.2      2018-10-29 [1] CRAN (R 4.0.0)                     
#>  prettyunits    1.1.1      2020-01-24 [1] CRAN (R 4.0.0)                     
#>  processx       3.4.2      2020-02-09 [1] CRAN (R 4.0.0)                     
#>  ps             1.3.2      2020-02-13 [1] CRAN (R 4.0.0)                     
#>  R6             2.4.1      2019-11-12 [1] CRAN (R 4.0.0)                     
#>  Rcpp           1.0.4.6    2020-04-09 [1] CRAN (R 4.0.0)                     
#>  remotes        2.1.1      2020-02-15 [1] CRAN (R 4.0.0)                     
#>  rlang          0.4.6      2020-05-02 [1] CRAN (R 4.0.0)                     
#>  rmarkdown      2.1        2020-01-20 [1] CRAN (R 4.0.0)                     
#>  robotstxt    * 0.7.2      2020-05-04 [1] Github (ropensci/robotstxt@891f1d4)
#>  rprojroot      1.3-2      2018-01-03 [1] CRAN (R 4.0.0)                     
#>  sessioninfo    1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                     
#>  spiderbar      0.2.2      2019-08-19 [1] CRAN (R 4.0.0)                     
#>  stringi        1.4.6      2020-02-17 [1] CRAN (R 4.0.0)                     
#>  stringr        1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                     
#>  testthat       2.3.2      2020-03-02 [1] CRAN (R 4.0.0)                     
#>  triebeard      0.3.0      2016-08-04 [1] CRAN (R 4.0.0)                     
#>  urltools       1.7.3      2019-04-14 [1] CRAN (R 4.0.0)                     
#>  usethis        1.6.1.9000 2020-05-01 [1] Github (r-lib/usethis@4487260)     
#>  withr          2.2.0      2020-04-20 [1] CRAN (R 4.0.0)                     
#>  xfun           0.13       2020-04-13 [1] CRAN (R 4.0.0)                     
#>  yaml           2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                     
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

@petermeissner
Copy link
Contributor

petermeissner commented May 7, 2020

That's weird, I cannot reproduce the warning on my Windows machine - neither with R 3.6.2 nor with R 4.0.0 (fresh install). Can you spot a difference, do you have an idea? Do you know where this comes from?

library(robotstxt)
packageVersion("robotstxt")
#> [1] '0.7.2'

paths_allowed("https://www.google.com")
#>  www.google.com
#> [1] TRUE

paths_allowed("https://google.com")
#>  google.com
#> [1] TRUE

Created on 2020-05-07 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2020-05-07                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package      * version date       lib source        
#>  assertthat     0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  backports      1.1.6   2020-04-05 [1] CRAN (R 4.0.0)
#>  callr          3.4.3   2020-03-28 [1] CRAN (R 4.0.0)
#>  cli            2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
#>  codetools      0.2-16  2018-12-24 [2] CRAN (R 4.0.0)
#>  crayon         1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  curl           4.3     2019-12-02 [1] CRAN (R 4.0.0)
#>  desc           1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools       2.3.0   2020-04-10 [1] CRAN (R 4.0.0)
#>  digest         0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
#>  ellipsis       0.3.0   2019-09-20 [1] CRAN (R 4.0.0)
#>  evaluate       0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi          0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  fs             1.4.1   2020-04-04 [1] CRAN (R 4.0.0)
#>  future         1.17.0  2020-04-18 [1] CRAN (R 4.0.0)
#>  future.apply   1.5.0   2020-04-17 [1] CRAN (R 4.0.0)
#>  globals        0.12.5  2019-12-07 [1] CRAN (R 4.0.0)
#>  glue           1.4.0   2020-04-03 [1] CRAN (R 4.0.0)
#>  highr          0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools      0.4.0   2019-10-04 [1] CRAN (R 4.0.0)
#>  httr           1.4.1   2019-08-05 [1] CRAN (R 4.0.0)
#>  knitr          1.28    2020-02-06 [1] CRAN (R 4.0.0)
#>  listenv        0.8.0   2019-12-05 [1] CRAN (R 4.0.0)
#>  magrittr       1.5     2014-11-22 [1] CRAN (R 4.0.0)
#>  memoise        1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
#>  pkgbuild       1.0.8   2020-05-07 [1] CRAN (R 4.0.0)
#>  pkgload        1.0.2   2018-10-29 [1] CRAN (R 4.0.0)
#>  prettyunits    1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
#>  processx       3.4.2   2020-02-09 [1] CRAN (R 4.0.0)
#>  ps             1.3.2   2020-02-13 [1] CRAN (R 4.0.0)
#>  R6             2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
#>  Rcpp           1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#>  remotes        2.1.1   2020-02-15 [1] CRAN (R 4.0.0)
#>  rlang          0.4.6   2020-05-02 [1] CRAN (R 4.0.0)
#>  rmarkdown      2.1     2020-01-20 [1] CRAN (R 4.0.0)
#>  robotstxt    * 0.7.2   2020-05-07 [1] local         
#>  rprojroot      1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
#>  sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  spiderbar      0.2.2   2019-08-19 [1] CRAN (R 4.0.0)
#>  stringi        1.4.6   2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr        1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat       2.3.2   2020-03-02 [1] CRAN (R 4.0.0)
#>  triebeard      0.3.0   2016-08-04 [1] CRAN (R 4.0.0)
#>  urltools       1.7.3   2019-04-14 [1] CRAN (R 4.0.0)
#>  usethis        1.6.1   2020-04-29 [1] CRAN (R 4.0.0)
#>  withr          2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun           0.13    2020-04-13 [1] CRAN (R 4.0.0)
#>  yaml           2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] C:/Users/peter/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.0/library

@mine-cetinkaya-rundel
Copy link
Contributor Author

I have the following in my .Rprofile:

options(
  warn = 1,
  warnPartialMatchArgs = TRUE,
  warnPartialMatchDollar = TRUE,
  warnPartialMatchAttr = TRUE
)

If you have warnPartialMatchArgs set to NULL, this might be the reason for the difference.

@petermeissner
Copy link
Contributor

fixed

@hrbrmstr
Copy link

"thanks, package spiderbar_0.2.3.tar.gz is on its way to CRAN." 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants