New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion on accessing Australian data #17
Comments
I'd add my own packages to this list: (suggestions for API changes welcome)
The ABS contains some very rich data. However, their interface leaves a lot of room for improvement. The ATO has tidier data and a more proactive approach to releasing data. Some quick enhancements to the |
Wow, that's great, thanks @HughParsonage ! An outcome of the ozunconf could be to improve upon existing package documentation, perhaps making a pull request to your packages with READMEs or vignettes with examples, and perhaps even package websites, to improve accessibility. This issue might also fracture out into multiple projects, for example, one group working on documentation, another group working on searching and finding datasets and APIs, and another writing packages for existing data. Really excited for this issue! |
This would be a huge productivity benefit for many people. On a related, but probably separate issue: has anybody been following the discussions around Indigenous Data Sovereignty? It seems to me that open source software/data and projects like this would interface very nicely and provide support for indigenous communities to control, preserve and generate data -> under the leadership and/or acquiescence of the indigenous community, obviously. |
Great ideas re In relation to the data from |
Another example, the Australian Road Deaths Database, contains monthly, quite clean data. An example of a super brief analysis is here It looks like there's a bunch of other interesting data, like the airport traffic data. |
I love this topic, I have some explorations of TAS open data, cadastre, address, roads etc. Collating sources is a very good plan, I think the synching/reading is pretty well covered by other general tools, but I'm probably going to need to outline the bowerbird way-of-life to show why. (And maybe a good example for a shared vm to prepare...) |
It looks as though the portal is primarily WMS (images rendered) and CSV, which is not much good. From a quick scan it looks as though going to state-based opendata sources will be better, but happy to be shown otherwise. |
Another resource that might be useful: |
@mdsumner I'm keen to see the bowerbird way of life! It would be great if we can determine a way to get the shapefiles out from these sources, or even if we can point to where they are stored so we can access them. |
An example of throwing bowerbird at a data.gov.au dataset:
This will create the local directory I'm assuming that roughly the same template (with different We like bowerbird because (a) it's recursive, so you generally only need to specify the top-level directory, even if the data set contains many files; and (b) it will do incremental updates, so you can run the sync process again later and it will only download what has changed. The |
FYI, |
Another data source to potentially look at - Queensland police data: https://www.police.qld.gov.au/online/data/default.htm |
@jonocarroll awesome! I think that for the scope of the ozunconf and to make it easier to maintain, it might be easiest to wrap up the data sources into individual packages and then get ozdata to import them? |
Progress in this space looks both useful and achievable. Also, I'm officially wearing a Stats NZ hat (metaphorically) at this conference and there could be some useful suggestions / opportunities to feedback to New Zealand on this. |
Extending and/or generalising the Census2016 packages by Hugh Parsonage at https://github.com/hughparsonage/Census2016 and It probably isn't necessary to include the raw data in such packages, because it is all freely available online, and thus can be downloaded on-demand by functions in the package. ABS are reasonably good at keeping data resources available at specific URLs, once published (but some maintenance is inevitable). It may even be possible to spider and parse the ABS web site pages to dynamically determine data download URLs, which would be more robust. |
Assorted comments (sorry for a very bowerbird focus)!
|
NationalMap uses the CKAN API to list datasets in data.*.gov.au, but also has lots of other sources of data - manually listed datasets, the ABS SDMX API, various Esri services etc. Most of the "national datasets" are hand-curated. |
Hey everyone, and sorry to chime in so late. (Had a very full-on last week). I used to work on NationalMap, and have been pretty active around the open data space in Australia for a few years, working with many government bodies at different levels. (I'm generally working on data that is "useful but boring", rather than ripe for statistical analysis, machine learning etc... however). But lots of the aggregators and links in the above thread are new to me - that's awesome. Just to add to a list, I've been working on http://opencouncildata.github.io/Platform, which is another approach to aggregating data - it focuses on data that meet the Open Council Data Standards. The main relevance would be some of the aggregated datasets, like the 500,000 odd trees that power opentrees.org, that might be of interest. There is also Magda, which I think is meant to eventually replace CKAN as the registry for data.gov.au. It was just being started around the time I left Data61, so I don't know much about it. Finally, one more interesting dataset you may like is http://github.com/stevage/BikeTrafficCounts, which is - well, read the README. A dream I've had for a while is to map out the whole potential open data universe as some kind of grid, and start filling in the boxes, based on whether data exists and is public, exists but is not public, or is not public. That is, instead of starting, like most catalogues, from the question of "what is available" and trying to organise those into some useful structure, I'd like to start from the question of "what do people want", and provide definitive answers like "that is not available". It should be possible to start at some high level like "water", and subdivide that domain into "freshwater > river levels > Yarra River > ..." But I'm a bit scared of the ontological work required to make that meaningful :) (I do suspect that that approach, where you map out the domain, and draw attention to blanks, will yield useful pressure - much in the way that map.opencouncildata.org has been surprisingly useful at encouraging councils to join the open data movement.) Anyway, I'm really looking forward to supporting whatever project people want to work on, however best. (Caveat: I don't know R :) ) |
Steve I think that's a really useful approach - a resource where people can see what's available, what could be for the asking/pressing and what's not would be useful across all sorts of domains. |
(I should mention that there is the open data census but it's really about scoring organisations on a very small number of datasets rather than actually facilitating access to data.) |
In terms of Aussie data I have been curious about Australian real estate prices, eg. sold, rental etc. I think there is definitely some interesting data mining and analysis that could be done there. @HughParsonage I see you've got a package for NSW property prices. Is this something would be worth generalising to other parts of Aus, like Vic or Qld? Also @stevage I like the idea of being able to look at what datasets are available for a given gridded location - so often we search by data type, not the other way around. |
@SAUNDERSK1 While it would be certainly worth generalizing, I'm not aware that the other state governments have released such data. |
There is a lot of Australia data sources available through resources such as
data.gov.au
, which contains a huge amount of public data.However, it is almost guaranteed that you need to invest a solid chunk of time into cleaning the data and preparing it for analysis and checking the data quality.
I'd like to develop a catalog/table/similar that describes Australian datasets for analysis that are ready, or near-ready to analyse. Or perhaps even just discuss this idea here on the repo.
What I imagine is something like a table where you have columns like:
2016 Rainfall at Toowong_Bowls_Club
etc.)This could help direct the efforts of researchers and analysts, knowing the state of what is ready to access, and also identify those data sources that might be ripe for an R package containing the data, or a way to access it.
I can think of a few R packages and datasets that we could add right now:
Related to this, there was an R package developed to access data from data.gov.au - ozdata, which could be very useful in accessing the data.
@stevage, do you might have some ideas of where we could start looking? Or thoughts on this topic?
The text was updated successfully, but these errors were encountered: