Skip to content
GerganaTancheva123 edited this page Feb 26, 2024 · 12 revisions

NMDataParser is a generic configurable excel sheet parser which enables import of the data, stored in a excel spreadsheet file. NMDataParser accommodates different row-based, column-based or mixed organizations of the data. The parser configuration is defined in a separate JSON file, mapping the custom spreadsheet structure into the internal eNanoMapper data model storage components such as: “Substance”, “Protocol Application”, “Measurement”, “Parameters” and “Conditions”.

MDataParser is an open source library developed in Java on top of the Ambit data model and with extensive use of Apache POI library. The parser works with a simple input: a spreadsheet (excel) file and a JSON configuration file. As a result, an iterator to a list of substances (i.e. instances of Java class SubstanceRecord from Ambit/Java data model) are returned.

NMDataParser principle work

The full power of NMDataParser is unleashed through a JSON configuration of the parsing process i.e. the user has to define how the excel data will be mapped onto the eNanoMapper/Ambit data model. A comprehensive understanding of the data model is mandatory (see section eNanoMapper Data Model) for the efficient usage of the NMDataParser and optimal setting the parsing process (i.e. database import). The JSON syntax was enriched with flexible options in order to map various scenarios of the most popular excel files templates used for storing experimental data and meta data from majority of the NSC projects. The main features NMDataParser JSON configuration are: Upload of NM data from various spreadsheet templates;

  • Universal approach (as much as possible);
  • Automation;
  • Configuration by an external JSON file;
  • Code reusability;
  • Capsulation/hiding of the original raw data details (when needed);
  • Open source;

NMDataParser configuration syntax is based solely on the simplicity of JSON format. JSON (JavaScript Object Notation) is an open standard data interchange file format used to store and transmit data objects consisting of attribute–value pairs and array data types. JSON format is also human-readable, lightweight, text-based and language independent syntax.

NMDataParser recognizes the missing or wrong type attributes in the EDLs (Excel Data Locations) or other JSON sections and returns corresponding error messages. Error messages are crucial for the correct configuration of the NM data import and especially helpful when the JSON configuration is performed manually. The latter is quite common case since the domain expert knowledge is extensively used in the process of parser configuration.

Next: JSON Main Sections