Skip to content

Latest commit

 

History

History
102 lines (86 loc) · 2.87 KB

File metadata and controls

102 lines (86 loc) · 2.87 KB
page_type languages products name description azureDeploy
sample
csharp
azure
azure-search
Distinct sample skill for cognitive search
This custom skill removes duplicates from a list of terms.

Distinct

This custom skill removes duplicates from a list of terms.

Terms are considered the same if they only differ by casing, separators such as spaces, or punctuation, or if they have a common entry in the thesaurus.

Deploy to Azure

Requirements

This skill has no additional requirements than the ones described in the root README.md file.

Settings

This function uses a JSON file called thesaurus.json that can be found at the root of this project, and that will be deployed with the function. This file contains a simple list of lists of synonyms. For each list of synonyms, the first is considered the canonical form. Please replace this file with your own data.

link-acronyms

Sample Input:

{
    "values": [
        {
            "recordId": "foobar2",
            "data":
            {
                "words": [
                    "MSFT",
                    "U.S.A",
                    "word",
                    "United states",
                    "WOrD",
                    "Microsoft Corp."
                ]
            }
        }
    ]
}

Sample Output:

{
    "values": [
        {
            "recordId": "foobar2",
            "data": {
                "distinct": {
                    "value": [
                        "Microsoft",
                        "USA",
                        "word"
                    ]
                }
            },
            "errors": [],
            "warnings": []
        }
    ]
}

Sample Skillset Integration

In order to use this skill in a cognitive search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):

{
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "description": "Distinct entities",
    "uri": "[AzureFunctionEndpointUrl]/api/link-acronyms-list?code=[AzureFunctionDefaultHostKey]",
    "batchSize": 1,
    "context": "/document/merged_content",
    "inputs": [
        {
            "name": "words",
            "source": "/document/merged_content/organizations"
        }
    ],
    "outputs": [
        {
            "name": "distinct",
            "targetName": "distinct_organizations"
        }
    ]
}