Load ABS FORs #6

jayvdb · 2016-04-02T07:37:44Z

ERA uses the ABS FOR (2008) research classification vocabulary, which is described at https://en.wikipedia.org/wiki/Australian_and_New_Zealand_Standard_Research_Classification, and the origin is http://www.abs.gov.au/ausstats/abs@.nsf/0/6BB427AB9696C225CA2574180004463E

The ERA key documents includes a matrix of the ABS FORs, with other associated business process information, which could be useful to load. It was not included in the techpack, but comes as a separate and quite small (45KB) XLS file such as http://content.webarchive.nla.gov.au/gov/wayback/20140212022156/http://www.arc.gov.au/xls/era12/ERA_2012_Discipline_Matrix.xls

However it would be good to fetch this reference data from the source (ABS), possibly even in a separate repository so it can be used for non-ERA purposes. The source also includes other tightly related official reference data, such as mapping from the 2008 vocabulary to the ABS' 2003 vocabulary, and other vocabulary.

Doing analysis here, both looking for a simple solution to get this functional asap, and collecting notes for the 'right' solution.

jayvdb · 2016-04-02T12:48:50Z

Callista Research and other systems I have developed had a non-standard XML format for this data, which was used to create JSON and populate databases. The XML was also used to make sense of ERA SEER XML documents using XSLT. It would be nice to use a standardised XML format.

Searching on github for FOR codes and descriptions...

Mark Gregson produced a non-standard XML version http://files.eprints.org/564/

There are a few systems using a CSV file, or similar static files, for the data
https://github.com/jcu-eresearch/tdh.metadata/blob/master/tdh/metadata/browser/for_codes.csv
https://github.com/IntersectAustralia/acdata/blob/master/config/FOR_CodeList.csv
https://github.com/IntersectAustralia/metadata-aggregator/blob/master/sydma-install/resource/research_subject_code.csv
https://github.com/rrothwell/nectar_visualisation/blob/master/web/data/for_codes_final_2.json
https://github.com/datagovau/ckanext-agls/blob/master/ckanext/agls/ABS%20Fields%20Of%20Research.csv
https://github.com/au-research/ANDS-Registry-Core/blob/master/etc/misc/vocabularies/anzsrc-for/ANZSRC-FOR-EXPORT.csv
https://github.com/rd-switchboard/RD-Switchboard-Net/blob/master/etc/misc/vocabularies/anzsrc-for/ANZSRC-FOR-EXPORT.csv
https://github.com/NeCTAR-RC/nectar-dashboard/blob/832e99b0ea736ee36adb556cdf3e73c9b1c7a340/nectar_dashboard/rcallocation/migrations/0001_initial.py
https://github.com/NeCTAR-RC/nectar-dashboard/blob/832e99b0ea736ee36adb556cdf3e73c9b1c7a340/nectar_dashboard/rcallocation/for_choices.py
https://github.com/NeCTAR-RC/langstroth/blob/master/nectar_allocations/models/forcode.py
https://github.com/IntersectAustralia/dc2c/blob/master/mecat/subject_codes.py
https://github.com/sprinsloo/Research-Flagship/blob/master/build/reporting/fields-research.html / https://github.com/sprinsloo/Research-Flagship/blob/master/source/reporting/fields-research.html.erb
https://github.com/IntersectAustralia/exsite9/blob/master/exsite9/rootfiles/configuration/fieldsOfResearch.sql
https://github.com/IntersectAustralia/ap11_webapp/blob/master/db/create_research_subject_code.sql
https://github.com/anu-doi/DataCommons/blob/master/DataCommons/extras/sql/20120620_create_select_codes_table.sql
https://github.com/CurtinUniversity/Research-Data-Manager/blob/master/Urdms.Dmp/Urdms.Dmp/Database/Migrations/20110906145800_CreateFieldOfResearchList.cs

With only two columns, it isnt possible to record some of the niggly details about FORs, such as when a non-precise code is usable for classification (there were one or two of these, but maybe they can be inferred algorithmically (like no child nodes..))

https://github.com/anzsrco/anzsrco is described as "Unofficial AusNZ Standard Research Classification Ontology", and has two branches:
https://github.com/anzsrco/anzsrco/tree/master
https://github.com/anzsrco/anzsrco/tree/gh-pages , which is http://anzsrco.github.io/anzsrco/

ANDS has the FORs available as an XML vocab. https://vocabs.ands.org.au/anzsrc-for
ANDS is recommending that datasets include FORs in the RIF-CS data. See http://guides.ands.org.au/rda-cpg/describecpas

RIFCS generated from Java .. https://github.com/eresearchrmit/seaports-pacific/blob/master/src/main/java/edu/rmit/eres/seaports/controller/RIFCSController.java

https://github.com/AustralianAntarcticDataCentre/metadata_xml_convert/search?utf8=%E2%9C%93&q=anzsrc contains a subset of the FORs in a GMX XML standard format, used by XSLT. e-atlas and other systems are using this same system.

https://github.com/mlwbarlow/scripts-as-required/blob/master/python/RDACollectionsSubjectsReport.py loads FORs into a database.

https://github.com/dedickinson/forcsv/ has a neat project that loads FORs and SEOs into a Hyper SQL Database. It doesnt using travis-ci, or have tests, but definitely worth investigating further.

Some other more structured approaches
https://github.com/uqlibrary/fez/blob/master/.docker/development/backend/db/seed/cvs.sql
https://github.com/IntersectAustralia/ap11_webapp/blob/master/db/data.yml
https://github.com/anzsrco/anzsrco/blob/39496a380a6ee593dcd9250dfad01dd0320f6e67/versions/0.1/for08.n3
https://github.com/gu-eresearch/VIVO/blob/b1783c0f7486f963821bec9123b7998bd02c537b/productMods/WEB-INF/filegraph/tbox/for08.n3

It looks like the best approach is the upgrade the anzsrco repo to meet the ERA needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load ABS FORs #6

Load ABS FORs #6

jayvdb commented Apr 2, 2016

jayvdb commented Apr 2, 2016

Load ABS FORs #6

Load ABS FORs #6

Comments

jayvdb commented Apr 2, 2016

jayvdb commented Apr 2, 2016