Skip to content

Steps to analyse the data in MDR

Anusha Ranganathan edited this page May 21, 2024 · 2 revisions

To analyse the data in MDR, we will generate a CSV file listing the properties in each dataset, with a separate CSV file for each nested property.

The PR https://github.com/nims-dpfc/nims-hyrax/pull/563 has the code needed to generate these CSV files.

The spreadsheets will be saved within a directory named data_analysis_{datetime} in /srv/ngdr/data/. For example: data_analysis_20240521T004339.tar.gz copied from /srv/ngdr/data/data_analysis_20240521T004339 is the directory created from the last run of data.

In the test system: the run took a few hours to produce a csv file

How to generate the csv file

  1. Run a rails console in the web container

    docker exec -it nims-hyrax-web-1 /bin/bash
    rails c
    
  2. Run the code to generate the csv files

    data_base_dir = "data"
    a = DataModelAnalysis.new(data_base_dir)
    a.run
    
  3. The code will

    • Create a directory named data_analysis_{datetime} in data which is shared with the host at/srv/ngdr/data/
    • Within the data_analysis_{datetime} directory, there will be many csv files, starting with works.csv and one csv file for each nested property, as shown below
    root@805d968fd6fd:/data# cd data/
    root@805d968fd6fd:/data/data# ls -l
    
    total 4
    drwxr-xr-x 2 root root 4096 May 21 01:11 data_analysis_20240521T004339
    
    root@805d968fd6fd:/data/data# cd data_analysis_20240521T004339/
    root@805d968fd6fd:/data/data/data_analysis_20240521T004339# ls -ltr
    
    total 12324
    -rw-r--r-- 1 root root 5033784 May 21 01:11 works.csv
    -rw-r--r-- 1 root root  334729 May 21 01:11 works_head.csv
    -rw-r--r-- 1 root root  334729 May 21 01:11 works_tail.csv
    -rw-r--r-- 1 root root   63184 May 21 01:11 works_complex_date.csv
    -rw-r--r-- 1 root root  152956 May 21 01:11 works_complex_identifier.csv
    -rw-r--r-- 1 root root 1898723 May 21 01:11 works_complex_person.csv
    -rw-r--r-- 1 root root    9977 May 21 01:11 works_complex_version.csv
    -rw-r--r-- 1 root root   98641 May 21 01:11 works_complex_source.csv
    -rw-r--r-- 1 root root    9414 May 21 01:11 works_rights_notes.csv
    -rw-r--r-- 1 root root    9414 May 21 01:11 works_complex_rights.csv
    -rw-r--r-- 1 root root    6574 May 21 01:11 works_complex_event.csv
    -rw-r--r-- 1 root root   70470 May 21 01:11 works_updated_subresources.csv
    -rw-r--r-- 1 root root    1815 May 21 01:11 works_custom_property.csv
    -rw-r--r-- 1 root root  575869 May 21 01:11 works_complex_relation.csv
    -rw-r--r-- 1 root root  649087 May 21 01:11 works_complex_funding_reference.csv
    -rw-r--r-- 1 root root  603019 May 21 01:11 works_complex_contact_agent.csv
    -rw-r--r-- 1 root root  267653 May 21 01:11 works_complex_instrument.csv
    -rw-r--r-- 1 root root 1337316 May 21 01:11 works_complex_specimen_type.csv
    -rw-r--r-- 1 root root  513240 May 21 01:11 works_complex_chemical_composition.csv
    -rw-r--r-- 1 root root   38861 May 21 01:11 works_complex_structural_feature.csv
    -rw-r--r-- 1 root root  562590 May 21 01:11 works_complex_software.csv