Import dataset data in batches

Steps to do before importing the datasets

Clone the nims-mdr repo
Copy the sample datasets to /srv/ngdr/data/sample_datasets

NOTE: /srv/ngdr/data is mounted into docker at /data/data

SSH into docker

docker exec -it nims-hyrax_git_web_1 /bin/bash

Start the rails console
```
bundle exec rails c
```

The bulk importer takes the following input values

import_dir
The import directory containing the dataset directories to import. The path needs to be relative to the rails application or an absolute path
metadata_filename
Default is nil. It assumes there is just one xml file in the import directory and uses that as the metadata file
If a filename is given, it looks for an xml file with that name in the import directory
collection_ids
Default is nil. We are not using this feature until the issue with collections is resolved.
collection_ids expects an array of collection ids (['qweq323', 'qwe422']). The dataset will be made a member of the collections
debug
Default is false.
If set to true, a dry run of the import will take place. The dataset is not imported into Hyrax, but all of the processing is done and you can see the dataset attributes parsed, the list of files to be imported and errors, if any.

To run the bulk importer in debug mode

require 'importers/dataset_importer'
import_dir = '/data/data/sample_datasets/characterization'
metadata_filename = nil
collection_ids = nil
debug = true
i = Importers::DatasetImporter::BulkImporter.new(import_dir, metadata_filename, collection_ids, debug)
i.perform_create

This performs just a dry run of the bulk import
The datasets are not imported into Hyrax, but all of the processing is done
The outcomes are written to a log file
The log file can be found in /srv/ngdr/data/ and is timestamped.
The log file will containe the following columns:

Current time
Datsaet directory - the name of the directory being imported
attributes - the dataset attributes parsed from the xml file
files - the list of files to be imported
errors - the list of errors from the import, if any
time taken - the time taken to do the import

To bulk import the datasets

require 'importers/dataset_importer'
import_dir = '/data/data/sample_datasets/characterization'
i = Importers::DatasetImporter::BulkImporter.new(import_dir)
i.perform_create

The outcomes are written to a log file
The log file can be found in /srv/ngdr/data/ and is timestamped.
The log file will containe the following columns:

Current time
Datsaet directory - the name of the directory being imported
attributes - the dataset attributes parsed from the xml file
files - the list of files to be imported
errors - the list of errors from the import, if any
time taken - the time taken to do the import

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import dataset data in batches

Steps to do before importing the datasets

The bulk importer takes the following input values

To run the bulk importer in debug mode

To bulk import the datasets

Clone this wiki locally