-
Notifications
You must be signed in to change notification settings - Fork 26
Station list request throws 403: Forbidden
#137
Comments
Never mind! Just saw the duplicates (primarily #133) |
Re-opening because this is still occuring for me with the latest release (see #136 (comment)). |
Well, it was passing all CI tests yesterday. Wonder what's changed (again). |
It might be something particular to my setup! I'll try to investigate this arvo. |
Not just you, I've confirmed it locally as well. |
Our staff are also reporting problems from inside our codebase, which uses Docker images that were last built a month ago. That does suggest a change on BOM's end. |
Just to clarify, are you using bomrang in your codebase? |
Yup (although we're looking to switch obs providers at some point later in the year, which'll likely mean that `bomrang` comes out).
…On Mon, 12 Apr 2021, 19:07 Adam H. Sparks, ***@***.***> wrote:
Just to clarify, are you using *bomrang* in your codebase?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#137 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRX6U5JFCH33SC7IGPB2N3TIKZ5VANCNFSM42YL3V4A>
.
|
I just tested. using the current version of bomrang's user agent string: options(HTTPUserAgent = paste0("{bomrang} R package (",
utils::packageVersion("bomrang"),
") https://github.com/ropensci/bomrang")) returns a "Error 403 forbidden" using: options(HTTPUserAgent = "")) Returns the requested data. I'm not one for conspiracies, but since we explicitly said that the request was coming from bomrang in the user agent string and now it's blocked after we just implemented it because RStudio was blocked... 😕 Does anyone know anyone at BOM? |
It's another change, but if that's the case we can just spoof a regular browser HTTPUserAgent 🕵️♂️ |
@Rensa and I are discussing some options. That's certainly one. We also discussed caching the station lists as the other functions do as well. |
I'll have an ask around and see if the station lists are on FTP too! |
The historical resource URLs are all HTTP requests as well, now that I look further. |
Since it's just HTTP requests, so far, that are being blocked in the package by BOM. I wonder, is there some statement that we've missed that this isn't allowable by BOM guidelines? |
OK, there is this:
http://www.bom.gov.au/other/copyright.shtml The 9 & 3 bulletins would fall into this category for sure. This one seems (to me) to be murkier, but perhaps this is what BOM is classifying it as and we should respect the TOS. |
It looks like the BOM has recently "made changes to the web site". Using python requests library to access the html observations, BOM throws the following error: Potential automated request detected! We are making changes to our website therefore web scraping is no longer supported. Please contact us by filling in the details at http://reg.bom.gov.au/screenscraper/screenscraper_enquiry_form/ and we will get in touch with you. |
Ah, OK, well I guess that's good to know that bit of added information. We're not seeing that with the R requests, only that it's forbidden suddenly with no warning or explanation like this. I looked at the form that this error message points to. I guess I can fill it out, but I'm not clear on what, if any, response would be given the statement of the copyright page on scraping for such a general request as bomrang where I'm not the end user per se. |
I'm also curious about these changes and I think it is a bad move. If you go to the FAQ at http://www.bom.gov.au/waterdata/, they specifically mention/demonstrate an API to access water data which I have been using:
No one in their right mind would use this except via scripting. Getting around using the user agent is trivial but not a long term solution. I might fill in the form too and ask them what is going on |
From the README
|
I'm manually visiting the station list URL in the browser and it's fine. Wondering if there've been any backend changes (eg. the curl discussion I recently saw on runapp) that're causing problems for anyone else? Can anyone else reproduce this?
Session Info
The text was updated successfully, but these errors were encountered: