Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$expand sorting is different (worse) than the full snowstorm version #4

Closed
ivank opened this issue Jan 15, 2024 · 5 comments
Closed
Assignees

Comments

@ivank
Copy link

ivank commented Jan 15, 2024

Example:

curl --silent 'https://snowstorm.ihtsdotools.org/fhir/ValueSet/$expand?url=http://snomed.info/sct?fhir_vs&filter=Breast+Cancer&count=5' | jq

responds with:

{
  "resourceType": "ValueSet",
  "id": "5fc7dd97-888a-4385-aa05-8c2fabce0fe1",
  "url": "http://snomed.info/sct?fhir_vs",
  "status": "active",
  "copyright": "This value set includes content from SNOMED CT, which is copyright © 2002+ International Health Terminology Standards Development Organisation (SNOMED International), and distributed by agreement between SNOMED International and HL7. Implementer use of SNOMED CT is not covered by this agreement.",
  "expansion": {
    "id": "7cc5cd7b-f6ca-4602-9e76-76d2499dd01b",
    "timestamp": "2024-01-15T14:43:46+00:00",
    "total": 45,
    "offset": 0,
    "parameter": [
      {
        "name": "version",
        "valueUri": "http://snomed.info/sct|http://snomed.info/sct/900000000000207008/version/20240101"
      },
      {
        "name": "displayLanguage",
        "valueString": "en"
      }
    ],
    "contains": [
      {
        "system": "http://snomed.info/sct",
        "code": "254837009",
        "display": "Malignant tumor of breast"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "372064008",
        "display": "Malignant neoplasm of female breast"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "724451007",
        "display": "Fear of breast cancer"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "134405005",
        "display": "Suspected breast cancer"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "268547008",
        "display": "Screening for malignant neoplasm of breast"
      }
    ]
  }
}

Whereas local snowstorm-lite server is returning

curl -u admin:yourAdminPassword --silent 'http://localhost:8085/fhir/ValueSet/$expand?url=http://snomed.info/sct?fhir_vs&filter=Breast+Cancer&count=5' | jq
{
  "resourceType": "ValueSet",
  "url": "http://snomed.info/sct?fhir_vs",
  "name": "SNOMED CT Implicit ValueSet of all concepts.",
  "status": "active",
  "copyright": "This value set includes content from SNOMED CT, which is copyright © 2002+ International Health Terminology Standards Development Organisation (SNOMED International), and distributed by agreement between SNOMED International and HL7. Implementer use of SNOMED CT is not covered by this agreement.",
  "expansion": {
    "identifier": "1e115cd2-d887-4b14-b399-4362974939b0",
    "timestamp": "2024-01-15T14:47:43+00:00",
    "total": 50,
    "parameter": [
      {
        "name": "version",
        "valueUri": "http://snomed.info/sct|http://snomed.info/sct/900000000000207008/version/20230131"
      }
    ],
    "contains": [
      {
        "system": "http://snomed.info/sct",
        "code": "717129004",
        "display": "Claus Model"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "724451007",
        "display": "Fear of breast cancer"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "134405005",
        "display": "Suspected breast cancer"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "254843006",
        "display": "Familial cancer of breast"
      },
      {
        "system": "http://snomed.info/sct",
        "code": "254837009",
        "display": "Malignant tumor of breast"
      }
    ]
  }
}

Where "Fear of breast cancer" is higher on the list than "Malignant tumor of breast" ... not the response we were expecting (and the full version was correctly returning).

Maybe we have to configure / reindex something to make it work the same as the full version?

Docker image version: snomedinternational/snowstorm:9.2.0

@JonZammit
Copy link
Contributor

Hi @ivank just looking at the "version" parameter in the response - does your instance of Snowstorm-lite have the January 2023 version loaded?

@JonZammit
Copy link
Contributor

Today I ran Ivan's query against my instance of snowstorm-lite which has the January 2024 edition loaded. My results were similar - 50 concepts in the expansion of this value set.

I'm not sure how the concepts are ordered in the response, but I believe the difference in numbers can be explained because snowstorm-lite includes inactive concepts in the value set. These can be filtered as they are indicated as inactive, for e.g.

            {
                "system": "http://snomed.info/sct",
                "inactive": true,
                "code": "366980001",
                "display": "Suspected breast cancer"
            }

@kaicode kaicode self-assigned this Jan 17, 2024
@kaicode
Copy link
Member

kaicode commented Jan 17, 2024

@ivank Thanks for reaching out.
Snowstorm Lite does not use the same search mechanism as Snowstorm. The lite search is much faster but the results ranking is not as good in some cases. This is because the results ranking sorts concepts on their average description length, rather than the length of the description that matched the search query.

The relevance of the results can be improved by searching the specific area of the hierarchy you are interested in using ECL. Examples:

I hope that helps.

Long Explanation
Snowstorm searches against individual descriptions, sorts them by the shortest matching description first, and then returns the unique concepts.
Snowstorm Lite only has concept documents, it finds concepts that have some matching description. Sorting happens using the average description length.

@kaicode
Copy link
Member

kaicode commented Mar 1, 2024

There is a fix for this in the develop branch: to apply the same sorting as the main Snowstorm product, when within the first 100 results, without a loss of performance.
Initial testing looks good!

@kaicode kaicode closed this as completed in 6d8e64f Apr 2, 2024
@kaicode
Copy link
Member

kaicode commented Apr 2, 2024

This is fixed in the latest version 1.3.0-beta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants