-
Notifications
You must be signed in to change notification settings - Fork 7
Weather stations #18
Comments
Thanks for your thoughts! BackgroundYou're right, the reason there is only a copy-pasted CSV included in the package is that the listing is not available through the API. This was a quick-and-dirty solution that can be improved. Which stations should be included?Including all sorts of stations, not only weather, stations would make sense. There was never an explicit decision to include only weather stations, but I think that at the time the list was mostly dominated by them. Looking at the current list at http://ilmatieteenlaitos.fi/havaintoasemat there seems to be more variability now. Renaming Whether to include the list in the package, or to scrape live from the web?Scraping the table of observation stations directly from the web would make sense. I don't think adding a dependency package is that bad, although If the API changes, then other parts of the Which language should be used?I agree, although most users are probably Finnish, keeping everything in English is more straight forward. Should the need for using Finnish ever arise, we could address it then. Feel very welcome to leave another PR with the implementation. In case you're interested, I'm very happy to add you as a collaborator as well. |
Thanks. Pretty much as I thought. I'll write an implementation for a
I think that should cover all bases quite well. Changes in API would surely break things, but I meant that since the station list is not available through the API and would instead be scraped just from the website, I'd imagine the risk of a breaking change is much bigger. That's why I thought point number 5 could be included above. I can just go through my own fork and a PR, but happy to be listed as a collaborator too. I'll make a PR anyway so that the code gets reviewed. |
Excellent, few quick additional comments.
This is up to you, but I would leave the group-based filtering to the user.
While I see the logic of not depending on the HTML table (or Ideally the station listing too would be available over the API, perhaps this could be tipped to the FMI. Finally, note that as described in #14 , there is a new high-level API in the makings for the |
I agree it could (and maybe should) be left to the user. I was thinking more about the current
Good idea. Do you mind if I use
I'll put it in |
We don't have to worry about backward compatibility with the function names at this point, so I think renaming the function is fine.
|
Great to have you contributing here, Ilari! Just a quick thought on the function names and combatibility, it would be br, Juuso On Sun, Oct 9, 2016 at 12:07 PM, Joona Lehtomäki notifications@github.com
|
Thanks for chiming in, @ouzor !
Yes, it would be a simple as that, and if you think this is important then by all means. However, from design point of view I think if we do this then such wrappers should be implemented for all station types for consistency (e.g. |
It seems that right now all tests that have been written are in |
I hadn't even noticed that there are tests in |
Regarding the comment from @ouzor - why not use the .Deprecated function within the old function for a while? This will produce a warning message that the old function is going to be replaced soon. But I also agree that for this particular pkg the backwards compatibility is not yet such an issue, just replacing the function name would do as well. |
Actually that is exactly what I already did. That way things shouldn't break for anyone (even though they might not even expect the API to be stable), but it also kind of justifies for having no corresponding functions for other groups, like At the moment, I have the function return a tibble. Is that ok? Compared to returning a vanilla I added a download test that should fail if the FMI website changes too much, and another test that compares the local copy within the package to a downloaded one to see if it's still up-to-date. If it needs updating, it also prints out the exact command that was used to save the current version, so there is no need to wonder then what the exact parameters were that were used with |
I have switched to use to tibble also in other packages recently, I would support that. No major issues so far with that. |
Sure, fine by me. The only thing that probably needs to checked (literally
Sounds good! |
Discussed functionality implemented in d0d8e86 , closing this issue. |
FYI @ilarischeinin , test
At closer inspection, here are the differing rows for > local_stations[!local_stations$Lon == downloaded_stations$Lon,]
Name FMISID LPNN WMO Lat Lon Elevation Groups Started
60 Inari parish village 102046 9615 NA 68.91 27.01 128 Precipitation stations 2003
> downloaded_stations[!local_stations$Lon == downloaded_stations$Lon,]
Name FMISID LPNN WMO Lat Lon Elevation Groups Started
60 Inari parish village 102046 9615 NA 68.91 27.02 127 Precipitation stations 2003 > local_stations[!local_stations$Elevation == downloaded_stations$Elevation,]
Name FMISID LPNN WMO Lat Lon Elevation Groups Started
60 Inari parish village 102046 9615 NA 68.91 27.01 128 Precipitation stations 2003
292 Tampere Härmälä 101124 1222 2763 61.47 23.75 85 Weather stations 1997
293 Tampere Siilinkari 101311 2219 2943 61.52 23.75 98 Weather stations 1989
294 Tampere Tampella 151049 2223 2744 61.50 23.76 92 Weather stations 2012
> downloaded_stations[!local_stations$Elevation == downloaded_stations$Elevation,]
Name FMISID LPNN WMO Lat Lon Elevation Groups Started
60 Inari parish village 102046 9615 NA 68.91 27.02 127 Precipitation stations 2003
292 Tampere Härmälä 101124 1222 2763 61.47 23.75 84 Weather stations 1997
293 Tampere Siilinkari 101311 2219 2943 61.52 23.75 96 Weather stations 1989
294 Tampere Tampella 151049 2223 2744 61.50 23.76 91 Weather stations 2012 Lo and behold, there are actual differences in the values. I have no idea how the table is generated on the FMI end, but I'll re-create the local CSV as per the informative instructions in the test error message. |
Thanks for the info! Nice to see that the test works, but I wonder what to think about the actual changes. If it's a one-time thing to improve accuracy, that's great. But if there is something in the system they're retrieved from that can cause the numbers to fluctuate slightly, then not so great. I guess then the comparisons should be done on a per-column basis, and allow some tolerance for some columns ( Interesting that while the change in other values is just by 1 on the last digit, the |
I agree, it would be nice to know what's causing this. Rounding was what I had in mind as well, but I couldn't figure out how that would work either. I would wait and see whether this is something that happens irregularly. If so, then better that the test catches that. If it something that happens on a regular basis (e.g. table being automatically regenerated with some [hopefully explainable!] fluctuation in the values) then we can consider setting the tolerance. |
Let's do that. What I had in mind with rounding was if for some reason it's (incorrectly) done twice. 127.46 rounds to 127, but if for some reason it's first rounded to 127.5, and then subsequently rounded again, that could result in 128. So, a change like this could hypothetically happen if someone noticed that they're rounding twice and removed the extra rounding step. (I don't see how it could've resulted in a change to tie-breaking rules, as there are both odd and even numbers on the local and downloaded sides.) Anyways, since the |
At the moment, a list of weather stations is included in
inst/extdata/weather_stations.csv
. I suppose the reason for the inclusion is that these are not available through the API. I have some possible minor contributions in mind, and would like to hear the package authors' views on three points:Which stations should be included?
Right now, the list contains only stations of type "Sääasemat". This is the most common type, and can be used to retrieve information on e.g. temperature and wind. But other types of stations, such as "Poijut" (waves) and "Mareografit" (sea level) are not included.
I think it could be argued either way whether it should return all station types (for completeness) or only "Sääasemat" (for consistent behavior). One should also note that the function is called fmi_weather_stations(), not fmi_stations().
However, function
valid_fmisid()
also uses this list to check for the validity of FMISIDs. For example, 134220 is the FMISID of the wave buoy of Northern Baltic Proper, but this function says that it is not a valid FMISID. This use would point towards including all stations (possibly with a new functionfmi_stations()
).Whether to include the list in the package, or to scrape live from the web?
The downside of including the list within the package is of course that it can get out-of-date (as it apparently has to some extent). Another option could be to include a function that scrapes it live from the FMI website, something along the lines of:
Downsides include the additional package dependency (
XML
), and of course the fact that just like the included list can get out-of-date, this function can break at any time if FMI changes their website. Naturally, there's also the hybrid option of including one version of the list within the package and also providing a function to download it from the web.Which language should be used?
The list can of course be in either Finnish or English. When scraping, this would apply to both column names (e.g. "Ryhmät" vs. "Groups") and contents ("Sääasemat" vs. "Weather stations"). Naturally, most (and probably practically all) use for this package is from people who have at least some level of understanding of Finnish. But sill, one could argue for keeping everything in English (like command names and documentation are).
The text was updated successfully, but these errors were encountered: