Skip to content

Import dataset data

Paul Walk edited this page Aug 1, 2019 · 2 revisions

Steps to do before importing the dataset

  • Clone the nims-mdr repo

  • Copy the sample datasets to /srv/ngdr/data/sample_datasets

    NOTE: /srv/ngdr/data is mounted into docker at /data/data

  • SSH into docker

    docker exec -it nims-hyrax_git_web_1 /bin/bash
    
  • Start the rails console

    bundle exec rails c
    

The importer takes the following input values

  • import_dir
    The dataset directory to import. The path needs to be relative to the rails application or an absolute path
  • metadata_filename
    Default is nil. It assumes there is just one xml file in the import directory and uses that as the metadata file
    If a filename is given, it looks for an xml file with that name in the import directory
  • collection_ids
    Default is nil. We are not using this feature until the issue with collections is resolved.
    collection_ids expects an array of collection ids (['qweq323', 'qwe422']). The dataset will be made a member of the collections
  • debug
    Default is false.
    If set to true, a dry run of the import will take place. The dataset is not imported into Hyrax, but all of the processing is done and you can see the dataset attributes parsed, the list of files to be imported and errors, if any.

To run the dataset importer in debug mode

require 'importers/dataset_importer'
import_dir = '/data/data/sample_datasets/characterization/AES-narrow'
metadata_filename = nil
collection_ids = nil
debug = true
i = Importers::DatasetImporter::Importer.new(import_dir, metadata_filename, collection_ids, debug)
i.perform_create

# This performs just a dry run of the import
# The dataset is not imported into Hyrax, but all of the processing is done
# To see the outcomes

# the title of the dataset
i.title 

# the list of files to be imported
i.files 

# the dataset attributes parsed from the xml file
i.attributes 

# the list of errors
i.errors

# The time taken to do the import.
# In debug mode this is a measure of just the processing time
i.time_taken

To import the dataset

require 'importers/dataset_importer'
import_dir = '/data/data/sample_datasets/characterization/AES-narrow'
i = Importers::DatasetImporter::Importer.new(import_dir)
i.perform_create

# To see the outcomes

# the title of the dataset
i.title 

# the list of files to be imported
i.files 

# the dataset attributes parsed from the xml file
i.attributes 

# the list of errors
i.errors 

# the time taken to do the import
i.time_taken