-
Notifications
You must be signed in to change notification settings - Fork 2
Import dataset data in batches
Paul Walk edited this page Aug 1, 2019
·
2 revisions
-
Clone the nims-mdr repo
-
Copy the sample datasets to /srv/ngdr/data/sample_datasets
NOTE: /srv/ngdr/data is mounted into docker at /data/data -
SSH into docker
docker exec -it nims-hyrax_git_web_1 /bin/bash
-
Start the rails console
bundle exec rails c
-
import_dir
The import directory containing the dataset directories to import. The path needs to be relative to the rails application or an absolute path -
metadata_filename
Default is nil. It assumes there is just one xml file in the import directory and uses that as the metadata file
If a filename is given, it looks for an xml file with that name in the import directory -
collection_ids
Default is nil. We are not using this feature until the issue with collections is resolved.
collection_ids expects an array of collection ids (['qweq323', 'qwe422']). The dataset will be made a member of the collections -
debug
Default is false.
If set to true, a dry run of the import will take place. The dataset is not imported into Hyrax, but all of the processing is done and you can see the dataset attributes parsed, the list of files to be imported and errors, if any.
require 'importers/dataset_importer'
import_dir = '/data/data/sample_datasets/characterization'
metadata_filename = nil
collection_ids = nil
debug = true
i = Importers::DatasetImporter::BulkImporter.new(import_dir, metadata_filename, collection_ids, debug)
i.perform_create
This performs just a dry run of the bulk import
The datasets are not imported into Hyrax, but all of the processing is done
The outcomes are written to a log file
The log file can be found in /srv/ngdr/data/ and is timestamped.
The log file will containe the following columns:
- Current time
- Datsaet directory - the name of the directory being imported
- attributes - the dataset attributes parsed from the xml file
- files - the list of files to be imported
- errors - the list of errors from the import, if any
- time taken - the time taken to do the import
require 'importers/dataset_importer'
import_dir = '/data/data/sample_datasets/characterization'
i = Importers::DatasetImporter::BulkImporter.new(import_dir)
i.perform_create
The outcomes are written to a log file
The log file can be found in /srv/ngdr/data/ and is timestamped.
The log file will containe the following columns:
- Current time
- Datsaet directory - the name of the directory being imported
- attributes - the dataset attributes parsed from the xml file
- files - the list of files to be imported
- errors - the list of errors from the import, if any
- time taken - the time taken to do the import