Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load ABS FORs #6

Open
jayvdb opened this issue Apr 2, 2016 · 1 comment
Open

Load ABS FORs #6

jayvdb opened this issue Apr 2, 2016 · 1 comment

Comments

@jayvdb
Copy link
Owner

jayvdb commented Apr 2, 2016

ERA uses the ABS FOR (2008) research classification vocabulary, which is described at https://en.wikipedia.org/wiki/Australian_and_New_Zealand_Standard_Research_Classification, and the origin is http://www.abs.gov.au/ausstats/abs@.nsf/0/6BB427AB9696C225CA2574180004463E

The ERA key documents includes a matrix of the ABS FORs, with other associated business process information, which could be useful to load. It was not included in the techpack, but comes as a separate and quite small (45KB) XLS file such as http://content.webarchive.nla.gov.au/gov/wayback/20140212022156/http://www.arc.gov.au/xls/era12/ERA_2012_Discipline_Matrix.xls

However it would be good to fetch this reference data from the source (ABS), possibly even in a separate repository so it can be used for non-ERA purposes. The source also includes other tightly related official reference data, such as mapping from the 2008 vocabulary to the ABS' 2003 vocabulary, and other vocabulary.

Doing analysis here, both looking for a simple solution to get this functional asap, and collecting notes for the 'right' solution.

@jayvdb
Copy link
Owner Author

jayvdb commented Apr 2, 2016

Callista Research and other systems I have developed had a non-standard XML format for this data, which was used to create JSON and populate databases. The XML was also used to make sense of ERA SEER XML documents using XSLT. It would be nice to use a standardised XML format.

Searching on github for FOR codes and descriptions...

Mark Gregson produced a non-standard XML version http://files.eprints.org/564/

There are a few systems using a CSV file, or similar static files, for the data
https://github.com/jcu-eresearch/tdh.metadata/blob/master/tdh/metadata/browser/for_codes.csv
https://github.com/IntersectAustralia/acdata/blob/master/config/FOR_CodeList.csv
https://github.com/IntersectAustralia/metadata-aggregator/blob/master/sydma-install/resource/research_subject_code.csv
https://github.com/rrothwell/nectar_visualisation/blob/master/web/data/for_codes_final_2.json
https://github.com/datagovau/ckanext-agls/blob/master/ckanext/agls/ABS%20Fields%20Of%20Research.csv
https://github.com/au-research/ANDS-Registry-Core/blob/master/etc/misc/vocabularies/anzsrc-for/ANZSRC-FOR-EXPORT.csv
https://github.com/rd-switchboard/RD-Switchboard-Net/blob/master/etc/misc/vocabularies/anzsrc-for/ANZSRC-FOR-EXPORT.csv
https://github.com/NeCTAR-RC/nectar-dashboard/blob/832e99b0ea736ee36adb556cdf3e73c9b1c7a340/nectar_dashboard/rcallocation/migrations/0001_initial.py
https://github.com/NeCTAR-RC/nectar-dashboard/blob/832e99b0ea736ee36adb556cdf3e73c9b1c7a340/nectar_dashboard/rcallocation/for_choices.py
https://github.com/NeCTAR-RC/langstroth/blob/master/nectar_allocations/models/forcode.py
https://github.com/IntersectAustralia/dc2c/blob/master/mecat/subject_codes.py
https://github.com/sprinsloo/Research-Flagship/blob/master/build/reporting/fields-research.html / https://github.com/sprinsloo/Research-Flagship/blob/master/source/reporting/fields-research.html.erb
https://github.com/IntersectAustralia/exsite9/blob/master/exsite9/rootfiles/configuration/fieldsOfResearch.sql
https://github.com/IntersectAustralia/ap11_webapp/blob/master/db/create_research_subject_code.sql
https://github.com/anu-doi/DataCommons/blob/master/DataCommons/extras/sql/20120620_create_select_codes_table.sql
https://github.com/CurtinUniversity/Research-Data-Manager/blob/master/Urdms.Dmp/Urdms.Dmp/Database/Migrations/20110906145800_CreateFieldOfResearchList.cs

With only two columns, it isnt possible to record some of the niggly details about FORs, such as when a non-precise code is usable for classification (there were one or two of these, but maybe they can be inferred algorithmically (like no child nodes..))

https://github.com/anzsrco/anzsrco is described as "Unofficial AusNZ Standard Research Classification Ontology", and has two branches:
https://github.com/anzsrco/anzsrco/tree/master
https://github.com/anzsrco/anzsrco/tree/gh-pages , which is http://anzsrco.github.io/anzsrco/

ANDS has the FORs available as an XML vocab. https://vocabs.ands.org.au/anzsrc-for
ANDS is recommending that datasets include FORs in the RIF-CS data. See http://guides.ands.org.au/rda-cpg/describecpas

RIFCS generated from Java .. https://github.com/eresearchrmit/seaports-pacific/blob/master/src/main/java/edu/rmit/eres/seaports/controller/RIFCSController.java

https://github.com/AustralianAntarcticDataCentre/metadata_xml_convert/search?utf8=%E2%9C%93&q=anzsrc contains a subset of the FORs in a GMX XML standard format, used by XSLT. e-atlas and other systems are using this same system.

https://github.com/mlwbarlow/scripts-as-required/blob/master/python/RDACollectionsSubjectsReport.py loads FORs into a database.

https://github.com/dedickinson/forcsv/ has a neat project that loads FORs and SEOs into a Hyper SQL Database. It doesnt using travis-ci, or have tests, but definitely worth investigating further.

Some other more structured approaches
https://github.com/uqlibrary/fez/blob/master/.docker/development/backend/db/seed/cvs.sql
https://github.com/IntersectAustralia/ap11_webapp/blob/master/db/data.yml
https://github.com/anzsrco/anzsrco/blob/39496a380a6ee593dcd9250dfad01dd0320f6e67/versions/0.1/for08.n3
https://github.com/gu-eresearch/VIVO/blob/b1783c0f7486f963821bec9123b7998bd02c537b/productMods/WEB-INF/filegraph/tbox/for08.n3

It looks like the best approach is the upgrade the anzsrco repo to meet the ERA needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant