Skip to content


Repository files navigation

PRISM Search


Web-based search engine to show similar clinical documents to a user-input clinical snippet


  • Python 3.8 (could work with 3.6+ but not tested)
  • scikit-learn
  • mojimoji
  • MedNER-J
  • Flask


If you use poetry, just run poetry install. Otherwise, you can install the dependencies with pip (ver. 20.0.0+) by pip install -r requirements.txt. You may want to create a virtual environment first.

You need to prepare a PRISM-annotated document source for search. We prepared for this purpose. Please adapt the code for the data format of your document data. The script,, is another example for PRISM's Q3 data.

After these setups completed, you should be able to run the server with python in the Flask's development mode. Be aware that, by default, the app uses the PRISM Q3 data, which requires you to modify the DATA source in for your preprocessed data.

The procedure to deploy this app to a production environment depends on the web-server's setting. Please consult with the administrators.


  1. Submit a clinical document to find relevant text thereof at / (root)
  2. You will see an NER result of your input and its top 3-ranked "similar" documents at /result
  3. You can modify the similarity criteria:
    • Options to calculate similarity among clinical docs
    • Clinical NE tags to consider in similarity search

How it works

This app first apply PRISM-based clinical NER to your input document. The NER result is used for similarity calculation with a search-source documents, which are NER-ed in advance.

The current version's similarity calculation is simply based on what-is-called "bag of named entities" (BoNE). Like the "bag of words" (BoW), documents are vectorised into occurrence counts of the named entities appearing in the whole source. Then, the "similarity" among documents is calculated with the cosine-similarity measure.

This similarity calculation can be regarded as a baseline for this purpose. Further improvements could be implemented.


Developed by Shuntaro Yada in Social Computing Lab. at NAIST.


To be announced.


Web-based search engine to show similar clinical documents to a user-input clinical snippet






No releases published


No packages published