CEDAR Value Recommender

Introduction

CEDAR's Value Recommender is a metadata recommendation system that helps users to fill out metadata templates with the most appropriate values. The system finds and applies patterns in the previously entered values to generate recommendations for all the recommendation-enabled fields. CEDAR's Value Recommender is integrated in the CEDAR Workbench and it can also be invoked programmatically through its API.

This document describes the steps to enable recommendations for templates and use them to quickly and accurately create new metadata.

Getting started

Metadata suggestions are disabled by default. When creating your template, you can enable suggestions by following these steps:

Log in to the CEDAR Workbench, navigate to the folder where you want to create your new template, click on the New+ button, located on the top left corner of the screen, and select Template to start creating your template.
Add a new field to the template and use the Suggestions setting to enable recommendations for it. Note that:
- The Suggestions tab is only available for fields of type text.
- You must enable suggestions for a minimum of two fields in the template.
- The Value Recommender will make recommendations only when it can see a pattern worth recommending, which means there must be many repeated values in a field. Thus, fields with controlled vocabularies, or with relatively few likely choices, are the best candidates for recommendations.
Save the template and click on the arrow located on the top left corner of the screen to exit the Template Designer.

Once the template has been saved, users can start entering metadata and getting recommendations. At least one saved metadata instance must exist for a template before recommendations are generated. To fill out the metadata for a template with recommendations the user proceeds as for any other template:

In the line listing the template, click on the metadata tag (). This will generate a form to fill out the metadata. Alternatively, you can select your template and click on the template options menu (displayed as three vertical dots), choosing Populate to start entering metadata for the template.
Click on any field that is configured to support recommendations. You will receive a list of value recommendations presented as a drop-down.

Example

Suppose we have an Experiment template with three fields:

An experiment ID field that stores the identifier of the experiment.
A tissue field that captures the type of tissue tested in the experiment.
A disease field to enter the disease of interest.

Fig. 1 shows the template in the Template Designer and the Suggestions setting for the field tissue. Suppose that the user enabled suggestions for the fields tissue and disease.

Fig. 1. Screenshot of CEDAR’s Template Designer that shows an Experiment template with three fields: experiment ID, tissue, and disease. The user has clicked on the tissue field and enabled metadata recommendations for the field using the Suggestions tab.

Fig. 2 shows a screenshot of the Metadata Editor for a template generated from the Experiment template. The user entered the value skin for the field tissue and is about to enter a value for the field disease. In this case, the editor shows four suggested values: skin ulcer, atopic dermatitis, melanoma, psoriasis. The percentage shown between brackets next to each suggestion represents the strength of the recommendation. The recommendations provided by the system are context-sensitive, meaning that the values predicted for the disease field are generated and ranked based on the value entered for the tissue field. In this case, the four values suggested by the system are useful, since all of them are diseases that affect the skin.

Fig. 2. Screenshot of the Metadata Editor for the Experiment template. The user entered the value skin for the field tissue and is about to enter a value for the disease field. The recommendations system analyzes the CEDAR metadata repository on real time and suggests four diseases that affect the skin.

CEDAR's Value Recommender can be also invoked via its API. Here is an example of curl command to generate the previous suggestions via the API:

curl --request POST \
  --url https://valuerecommender.metadatacenter.org/command/recommend \
  --header 'authorization: apiKey <YOUR_CEDAR_API_KEY>' \
  --header 'content-type: application/json' \
  --data '{
  "populatedFields": [
    {
      "fieldPath": "tissue",
      "fieldValueLabel": "skin"
    }
  ],
  "templateId": <TEMPLATE_ID>,
  "targetField": {
    "fieldPath": "disease"
  }
}'

Frequently Asked Questions

Can I enable suggestions for text fields whose values have been restricted to terms from ontologies?

Yes. CEDAR’s Value Recommender works both for plain text values and ontology terms.

Can I access the source code?

Yes. The source code of CEDAR’s Value Recommender is open source and available on GitHub.

How does CEDAR’s Value Recommender work?

The system is based on a well-established data mining method known as association rule mining to generate real-time suggestions based on analyses of previously entered metadata. A paper describing this method in detail is available at https://doi.org/10.1093/database/baz059.

Will recommendations still work if I modify the template after metadata is collected?

No, they will not. To modify a template with existing metadata, you must create a new template version. The recommendations for the previous template version will be applied to any metadata created from that previous template, and metadata for the new template version will be based on metadata created for that version.

What are the technologies behind CEDAR’s Value Recommender?

CEDAR's Value Recommender has been implemented in Java. The system uses the Dropwizard framework to provide a REST-based API that is used by the Metadata Editor and can also be used directly by third-party applications. The rule extraction process is performed using the Java API of WEKA data mining software. All metadata for a particular template are transformed to WEKA's ARFF (Attribute-Relation File Format). The association rules are extracted using the Apriori algorithm, which is commonly used in association rule mining.

How should I cite this work?

Martínez-Romero, M., O’Connor, M. J., Egyedi, A. L., Willrett, D., Hardi, J., Graybeal, J., & Musen, M. A. (2019). Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases. Database, (2019), 1–25. https://doi.org/10.1093/database/baz059

Provide feedback

Saved searches

Use saved searches to filter your results more quickly