Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide validation information via XML metadata #45

Closed
4 tasks done
kMutagene opened this issue Nov 5, 2020 · 6 comments
Closed
4 tasks done

Provide validation information via XML metadata #45

kMutagene opened this issue Nov 5, 2020 · 6 comments
Labels
Host: Excel Issues regarding Excel interop Type: Feature Request This item is confirmed by the maintainers to be a request for a new feature
Milestone

Comments

@kMutagene
Copy link
Member

kMutagene commented Nov 5, 2020

It is possible to set metadata of cells via office.js. Especially for users that create templates for others to use, information on the proposed cell contents is helpful. This would be set via a JSON string in the XML tag with the key "validationParams" and should have the following fields:

  • importance - an integer between 0 and 100

  • content type - one of : ontologyTerm, text, number, int, decimal, url, boolean

  • if the content type is ontologyTerm, asdd the isA restriction if set: term[restriction]

  • Create a JSON schema for this

  • Create a form to generate this for a column

Edit: Office.js only supports custom tags, not custom attributes on existing tags. Therefore, we need to investigate alternative places to save this information. For now we can add the validation table to the app model and later use that data to generate the file/tags/etc we need.

Edit2(by @Freymaurer): As of the latest decision all swate validation data will now be saved in the excel custom xml, as desribed here.

  • Add validation data to the app model.
  • Swate can write validation data into excel custom xml.
@kMutagene kMutagene added Priority: High Type: Feature Request This item is confirmed by the maintainers to be a request for a new feature Host: Excel Issues regarding Excel interop labels Nov 5, 2020
@kMutagene kMutagene added this to the 0.2-beta milestone Nov 5, 2020
@kMutagene kMutagene pinned this issue Nov 5, 2020
Freymaurer added a commit that referenced this issue Nov 13, 2020
Freymaurer added a commit that referenced this issue Nov 13, 2020
@Freymaurer
Copy link
Collaborator

Upon discussing this feature we decided to add the table validation info in the model to the cumstom xml data. See here and here.

@Freymaurer
Copy link
Collaborator

Freymaurer commented Nov 25, 2020

I thought about the information we need to store in the custom xml and i think the following could work.

  • CustomXml
    • TableValidation (SwateVersion)
      • Worksheet (WorksheetName, DateTime, UserList)
        • TableColumn (ColumnHeader, ColumnAdress, Importance, ValidationFormat, Unit)

Detail:

  • CustomXml (tag): Top level tag provided by excel for custom xml parts.
  • TableValidation (tag): This tag should exist only once and contains all table validation information for the file.
    • SwateVersion (attribute): This is the version of Swate used to generate this information.
  • Worksheet (tag): This tag can exist multiple times, once for each worksheet with a validation scheme in the document.
    • WorksheetName (attribute): The worksheet name, necessary to identify the correct worksheet.
    • DateTime (attribute): The date and time of the last update to the table validation scheme.
    • UserList (attribute); A list of users, who contributed to this table validationscheme. From first (oldest) to last (newest) contributer)
  • TableColumn (tag): This tag can exist multiple times, once for each column in the annotation table.
    • ColumnHeader (attribute): String of the column header.
    • ColumnAdress (attribute): String excel adress to column.
    • Importance (attribute): Integer value of importance. 0 - 100.
    • ValidationFormat (attribute): Contains the information about the expected content type. Related to the ContentType shown below.
    • Unit (attribute): String of unit for values in the columns. Can be null.
/// Model.fs
type ContentType =
    | OntologyTerm of string
    | Text
    | Url
    | Boolean
    | Number
    | Int
    | Decimal

Example:

<?xml version="1.0"?>
<CustomXml>
    <TableValidation SwateVersion="0.1.1" >
        <Worksheet WorksheetName="Experiment1" DateTime="2020-11-25 12:00" Userlist="FirstUser; SecondUser">
            <TableColumn ColumnHeader="Source Name" ColumnAdress="A" Importance="0" ValidationFormat="string" Unit="None">
                Source Name
            </TableColumn>
            <TableColumn ColumnHeader="Parameter [instrument model]" ColumnAdress="B" Importance="100" ValidationFormat="ParentOntology Instrument Model" Unit="None">
                instrument model
            </TableColumn>
            <TableColumn ColumnHeader="Factor [day light exposure]" ColumnAdress="C" Importance="100" ValidationFormat="Float" Unit="lux" >day light exposure</TableColumn>
        </Worksheet>
        <Worksheet>...</Worksheet>
        <Worksheet>...</Worksheet>
    </TableValidation>
</CustomXml>

Can use this for a nice visual xml representation.

@kMutagene
Copy link
Member Author

kMutagene commented Nov 25, 2020

Looking good! I have some comments:

  • Worksheet (tag): This tag can exist multiple times, once for each worksheet with a validation scheme in the document.

    While A tag named Worksheet does not exist in the context of OpenXML Spreadhseet ML, the Worksheet type is part of the object model, and the xlsx archive structure contains a worksheets folder. I would not align custom tag names so close to the core namespace. I think this tag can be completely removed. Just use a top-level Validation tag, with nested TableValidation tags, containing the attributes you currently use in Worksheet

  • there is no need for actual innerXML of the TableColumn tags, the information is already present in the tag as an attribute.

@Freymaurer
Copy link
Collaborator

I understand what you mean, but i would keep a layer for different worksheets around. If we use templating options between worksheets in the future, the separation by a worksheet tag could make accession easier. In addition i really think this will improve readability.

While A tag named Worksheet does not exist in the context of OpenXML Spreadhseet ML, the Worksheet type is part of the object mode, and the xlsx archive structure contains a worksheets folder.

Maybe we should just agree on a distinct naming convention. As both tags Worksheet and TableColumn are existing types in office.js, we should adjust the names for a clearer separation. Maybe go for SwateWorksheet and SwateTableColumn or WorksheetValidation and TableColumnValidation.

Tell me what you think about it, my opinion on this is not very firm.

@kMutagene
Copy link
Member Author

kMutagene commented Nov 25, 2020

Here is an adjusted example for clarification:

<?xml version="1.0"?>
<CustomXml>
    <Validation SwateVersion="0.1.1" >
        <TableValidation WorksheetName="Experiment1" DateTime="2020-11-25 12:00" Userlist="FirstUser; SecondUser">
            <ColumnValidation ColumnHeader="Source Name" ColumnAdress="A" Importance="0" ValidationFormat="string" Unit="None"/>
            <ColumnValidation ColumnHeader="Parameter [instrument model]" ColumnAdress="B" Importance="100" ValidationFormat="ParentOntology Instrument Model" Unit="None" />
            <ColumnValidation ColumnHeader="Factor [day light exposure]" ColumnAdress="C" Importance="100" ValidationFormat="Float" Unit="lux"/ >
        </TableValidation>
    </Validation>
</CustomXml>
  • The top level tag is now simply Validation
  • Your Worksheet is now TableValidation
  • Your TableColumn is now ColumnValidation
  • no unnecessary innerXML

It may also be important to add the Name of the annotation table to the attributes of TableValidation, as those are unique across sheets as we just learned.

@Freymaurer
Copy link
Collaborator

Done! This will work as of version 1.4!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Host: Excel Issues regarding Excel interop Type: Feature Request This item is confirmed by the maintainers to be a request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants