# Tutorial: Build and deploy a custom skill

This notebook walks you through the end-to-end process of building an enrichment pipeline with the following Azure Cognitive Search features:

1. Cognitive skills
2. Azure Machine Learning skill
4. Knowledge store
5. Indexer cache
6. Incremental indexing
7. Search

## Using Azure Machine Learning to build and deploy a custom skill

Working with the open source [hotel reviews dataset](https://www.kaggle.com/datafiniti/hotel-reviews), this exercise will build, train, and use an Azure Machine Learning model to extract the aspect-based sentiment from the reviews. Aspect-based sentiment allows for the assignment of negative and positive scoring for specific entities within the reviews like room, lobby, or service. After building and training the model it is deployed as an endpoint to an Azure Kubernetes cluster to feed its results into an AI enrichment pipeline for an Azure Cognitive Search instance.

To train the aspect-based sentiment model, you will be using the [nlp recipes repository](https://github.com/microsoft/nlp-recipes/tree/master/examples/sentiment_analysis/absa). The model will then be deployed as an endpoint on an Azure Kubernetes cluster. Once deployed, the model is added to the enrichment pipeline as a custom skill for use by the Cognitive Search service.

There are two datasets provided. To train the model the larger of the two datasets, the `hotel_reviews_1000.csv`, is required. Prefer to skip the training step? Download the `hotel_reviews_100.csv`.

## Prerequisites

 - An Azure Subscription: you can get a [free subscription](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
 - [Cognitive Search service](https://docs.microsoft.com/azure/search/search-get-started-arm)
 - [Cognitive Services resource](https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows)
 - [Azure Storage account](https://docs.microsoft.com/azure/storage/common/storage-account-create?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&tabs=azure-portal)
 - [Azure Machine Learning workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace)

## Walkthrough

In this notebook:
1. Set up an enrichment pipeline
  1. Create the datasource
  2. Create the skillset
  3. Create the index
  4. Create the indexer
2. Project the reviews text to the knowledge store as the training data for the model.
3. Train an aspect-based sentiment model in Azure Machine Learning using the nlp-recipes repository to build the model.
4. Deploy the model as an endpoint on Azure Kubernetes.
5. Add the new custom skill to the enrichment pipeline to now enrich reviews with aspect based sentiment.


## Setup

There are two options when working with this sample.

1. Option 1: Run everything on the free tier, except for the Azure Machine Learning inferencing endpoint. 
2. Option 2: Run the entire process end-to-end on a small dataset.

To run the entire proces end to end, the larger of the two dataset files is required to train the aspect based sentiment model. Start by downloading the large hotel reviews dataset.

To run on the free tier, start by downloading the small hotel reviews dataset.

Upload the file to a container in a storage account. 

You will need to define a number of variables so the calls to Azure services are successful. In order to define these variables you will need to set up all of the required services first and provide the details about your subscriptions and service names in the "Define the required variables" cell.


## Configure Cognitive Search enrichment pipeline

In [35]:
import requests
import json

## 1.0 Define the required variables

The following code cell defines information used to connect to your Azure subscription, the Cognitive Services instance, Azure Machine Learning workspace, and Azure Storage account.

__WARNING__: While the `default=` parameter can be used to set the information, please remember to remove it before commiting this notebook to a public repository such as GitHub.

If you want this notebook to pick up the values from environment variables, set the following environment variables in the shell use to run the Jupyter notebook environment:

| Environment variable | Value |
| ----- | ----- |
| subscription_id | The ID of your Azure subscription. |
| resource_group | The name of your Azure resource group. |
| workspace_name | The name of your Azure Machine Learning workspace. |
| search_service_name | The name of your Azure Cognitive Search service. |
| api_key | The API key for your Azure Cognitive Search service. |
| STORAGEACCOUNTNAME | The name of your Azure Storage account. |
| STORAGEACCOUNTKEY | The access key for your Azure Storage account. |
| STORAGECONNSTRING | The string that allows access to the storage account. |
| know_store_cache | Same value as your STORAGECONNSTRING. |
| cog_svcs_key | The access key for the Cognitive Services instance. |
| cog_svcs_acct | The name of the Cognitive Services instance. |

How you set an environment variable depends on the shell or command line you are using. The following table shows the format for some common shells. If your shell is not listed, please see the documentation for your shell for information on setting environment variables.

| Shell/Command line| Example |
| ----- | ----- |
| Bash | `export SUBSCRIPTION_ID=0000-00000-000000-0000` |
| PowerShell | `$env:SUBSCRIPTION_ID = 0000-00000-000000-0000` |
| CMD.EXE | `SET SUBSCRIPTION_ID=0000-00000-000000-0000` |


In [6]:
# Configure all required variables for Cognitive Search. Replace each with the credentials from your accounts.
# The required services are created as part of the initial resource deployment step

## Configure variables for this example

# Replace with Search Service name, API key, and endpoint from the Azure portal.
search_service_name = 'your search service name'
api_key = 'your api key'
search_service = "search service endpoint"
# Leave the API version and content_type as they are listed here.
api_version = '2019-05-06-Preview'
content_type = 'application/json'

# Replace with a cognitive services key.
cog_svcs_key = 'your cognitive services key'
cog_svcs_acct = 'cognitive services account name'

#Name of the storage account. This will be used for the datasource, knowledge store
STORAGEACCOUNTNAME = 'your storage account name'
STORAGEACCOUNTKEY = 'your storage account key'
STORAGECONNSTRING = "your storage account connection string"

# Create a container and specify that below
# This sample assumes you will use the same storage account for the datasource, knowledge store and indexer cache
datasource_container = 'your datasource container name'

know_store_cache = "your storage account connection string"
DS_STORAGE_ACCT_CONN_STR = know_store_cache

# Resource group your AML inference cluster will be deployed in 
resource_group = 'your resource group name'
subscription_id = 'your subscription id'
workspace_name = 'your workspace name'

In [None]:
def construct_Url(service, resource, resource_name, action, api_version):
    if resource_name:
        
        if action:
            return service + '/'+ resource + '/' + resource_name + '/' + action + '?api-version=' + api_version
        else:
            return service + '/'+ resource + '/' + resource_name + '?api-version=' + api_version
    else:
        return service + '/'+ resource + '?api-version=' + api_version


headers = {'api-key': api_key, 'Content-Type': content_type}
# Test out the URLs to ensure that the configuration works
print(construct_Url(search_service, "indexes", "azureml-sentiment", "analyze", api_version))
print(construct_Url(search_service, "indexes", "azureml-sentiment", None, api_version))
print(construct_Url(search_service, "indexers", None, None, api_version))

## 1.1 Create datasource
Be sure to upload `hotel_reviews_1000.csv` or `hotel_reviews_100.csv` to your storage account before continuing.

In [None]:
container = datasource_container

datsource_def = {
    'name': f'{datasource_container}-ds',
    'description': f'Datasource with hotel reviews',
    'type': 'azureblob',
    'subtype': None,
    'credentials': {
        'connectionString': f'{STORAGECONNSTRING}'
    },
    'container': {
        'name': f'{datasource_container}'
    },

}
r = requests.post(construct_Url(search_service, "datasources", None, None, api_version), data=json.dumps(datsource_def),  headers=headers)
print(r)
res = r.json()
print(json.dumps(res, indent=2))

## 1.2 Create skillset

In [None]:
skillset_name = f'{datasource_container}-ss'
skillset_def = {
    'name': f'{skillset_name}',
    'description': 'Skillset to enrich hotel reviews with aspect based sentiment',
    'skills': [
        {
            '@odata.type': '#Microsoft.Skills.Text.KeyPhraseExtractionSkill',
            'name': '#1',
            'description': None,
            'context': '/document/reviews_text',
            'defaultLanguageCode': 'en',
            'maxKeyPhraseCount': None,
            'inputs': [
                {
                    'name': 'text',
                    'source': '/document/reviews_text',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'languageCode',
                    'source': '/document/language',
                    'sourceContext': None,
                    'inputs': []
                }
            ],
            'outputs': [
                {
                    'name': 'keyPhrases',
                    'targetName': 'keyphrases'
                }
            ]
        },
        {
            '@odata.type': '#Microsoft.Skills.Text.LanguageDetectionSkill',
            'name': '#2',
            'description': None,
            'context': '/document',
            'inputs': [
                {
                    'name': 'text',
                    'source': '/document/reviews_text',
                    'sourceContext': None,
                    'inputs': []
                }
            ],
            'outputs': [
                {
                    'name': 'languageCode',
                    'targetName': 'language'
                }
            ]
        },
        {
            '@odata.type': '#Microsoft.Skills.Text.TranslationSkill',
            'name': '#3',
            'description': None,
            'context': '/document/reviews_text',
            'defaultFromLanguageCode': None,
            'defaultToLanguageCode': 'en',
            'suggestedFrom': 'en',
            'inputs': [
                {
                    'name': 'text',
                    'source': '/document/reviews_text',
                    'sourceContext': None,
                    'inputs': []
                }
            ],
            'outputs': [
                {
                    'name': 'translatedText',
                    'targetName': 'translated_text'
                }
            ]
        },
        {
            '@odata.type': '#Microsoft.Skills.Util.ShaperSkill',
            'name': '#4',
            'description': None,
            'context': '/document',
            'inputs': [
                {
                    'name': 'address',
                    'source': '/document/address',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'categories',
                    'source': '/document/categories',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'city',
                    'source': '/document/city',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'country',
                    'source': '/document/country',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'latitude',
                    'source': '/document/latitude',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'longitude',
                    'source': '/document/longitude',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'name',
                    'source': '/document/name',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'postalCode',
                    'source': '/document/postalCode',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'province',
                    'source': '/document/province',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_date',
                    'source': '/document/reviews_date',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_dateAdded',
                    'source': '/document/reviews_dateAdded',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_rating',
                    'source': '/document/reviews_rating',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_text',
                    'source': '/document/reviews_text',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_title',
                    'source': '/document/reviews_title',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_username',
                    'source': '/document/reviews_username',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'AzureSearch_DocumentKey',
                    'source': '/document/AzureSearch_DocumentKey',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_content_type',
                    'source': '/document/metadata_storage_content_type',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_size',
                    'source': '/document/metadata_storage_size',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_last_modified',
                    'source': '/document/metadata_storage_last_modified',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_content_md5',
                    'source': '/document/metadata_storage_content_md5',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_name',
                    'source': '/document/metadata_storage_name',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_path',
                    'source': '/document/metadata_storage_path',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_file_extension',
                    'source': '/document/metadata_storage_file_extension',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'keyPhrases',
                    'source': '/document/reviews_text/keyphrases/*',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'languageCode',
                    'source': '/document/language',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'translatedText',
                    'source': '/document/reviews_text/translated_text',
                    'sourceContext': None,
                    'inputs': []
                }
            ],
            'outputs': [
                {
                    'name': 'output',
                    'targetName': 'objectprojection'
                }
            ]
        }
    ],
    'cognitiveServices': {
        '@odata.type': '#Microsoft.Azure.Search.CognitiveServicesByKey',
        'description': '/subscriptions/subscription_id/resourceGroups/resource_group/providers/Microsoft.CognitiveServices/accounts/cog_svcs_acct',
        'key': f'{cog_svcs_key}'
    },
    'knowledgeStore': {
        'storageConnectionString': f'{STORAGECONNSTRING}',
        'projections': [
            {
                'tables': [],
                'objects': [
                    {
                        'storageContainer':  f'{datasource_container}-enriched',
                        'referenceKeyName': None,
                        'generatedKeyName': None,
                        'source': '/document/objectprojection',
                        'sourceContext': None,
                        'inputs': []
                    }
                ],
                'files': []
            }
        ]
    }
}


r = requests.put(construct_Url(search_service, "skillsets", skillset_name, None, api_version), data=json.dumps(skillset_def),  headers=headers)
print(r)
res = r.json()
print(json.dumps(res, indent=2))

## 1.3 Create index

In [None]:
indexname = f'{datasource_container}-idx'
index_def = {
    "name":f'{indexname}',
      "defaultScoringProfile": "",
    "fields": [
        {
            "name": "address",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "categories",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "city",
            "type": "Edm.String",
            "searchable": False,
            "filterable": True,
            "retrievable": True,
            "sortable": False,
            "facetable": True,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "country",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": True,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "latitude",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "longitude",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "name",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": True,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "postalCode",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": True,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "province",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "reviews_date",
            "type": "Edm.DateTimeOffset",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "reviews_dateAdded",
            "type": "Edm.DateTimeOffset",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "reviews_rating",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "reviews_text",
            "type": "Edm.String",
            "searchable": True,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": "en.microsoft",
            "synonymMaps": []
        },
        {
            "name": "reviews_title",
            "type": "Edm.String",
            "searchable": True,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": "en.microsoft",
            "synonymMaps": []
        },
        {
            "name": "reviews_username",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "AzureSearch_DocumentKey",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": True,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "metadata_storage_content_type",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "metadata_storage_size",
            "type": "Edm.Int64",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "metadata_storage_last_modified",
            "type": "Edm.DateTimeOffset",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "metadata_storage_content_md5",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "metadata_storage_name",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "metadata_storage_path",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "metadata_storage_file_extension",
            "type": "Edm.String",
            "searchable": False,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": None,
            "synonymMaps": []
        },
        {
            "name": "keyphrases",
            "type": "Collection(Edm.String)",
            "searchable": True,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": "en.microsoft",
            "synonymMaps": []
        },
        {
            "name": "language",
            "type": "Edm.String",
            "searchable": True,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": "en.microsoft",
            "synonymMaps": []
        },
        {
            "name": "translated_text",
            "type": "Edm.String",
            "searchable": True,
            "filterable": False,
            "retrievable": True,
            "sortable": False,
            "facetable": False,
            "key": False,
            "indexAnalyzer": None,
            "searchAnalyzer": None,
            "analyzer": "en.lucene",
            "synonymMaps": []
        },
        {
            "name": "aspect_sentiment",
            "type": "Collection(Edm.ComplexType)",
            "fields": [
                {
                    "name": "text",
                    "type": "Edm.String",
                    "searchable": True,
                    "filterable": False,
                    "retrievable": True,
                    "sortable": False,
                    "facetable": False,
                    "key": False,
                    "indexAnalyzer": None,
                    "searchAnalyzer": None,
                    "analyzer": "standard.lucene",
                    "synonymMaps": []
                },
                {
                    "name": "type",
                    "type": "Edm.String",
                    "searchable": False,
                    "filterable": True,
                    "retrievable": True,
                    "sortable": False,
                    "facetable": False,
                    "key": False,
                    "indexAnalyzer": None,
                    "searchAnalyzer": None,
                    "analyzer": None,
                    "synonymMaps": []
                },
                {
                    "name": "polarity",
                    "type": "Edm.String",
                    "searchable": False,
                    "filterable": True,
                    "retrievable": True,
                    "sortable": False,
                    "facetable": True,
                    "key": False,
                    "indexAnalyzer": None,
                    "searchAnalyzer": None,
                    "analyzer": None,
                    "synonymMaps": []
                },
                {
                    "name": "score",
                    "type": "Edm.Double",
                    "searchable": False,
                    "filterable": False,
                    "retrievable": True,
                    "sortable": False,
                    "facetable": False,
                    "key": False,
                    "indexAnalyzer": None,
                    "searchAnalyzer": None,
                    "analyzer": None,
                    "synonymMaps": []
                },
                {
                    "name": "start",
                    "type": "Edm.Int32",
                    "searchable": False,
                    "filterable": False,
                    "retrievable": True,
                    "sortable": False,
                    "facetable": False,
                    "key": False,
                    "indexAnalyzer": None,
                    "searchAnalyzer": None,
                    "analyzer": None,
                    "synonymMaps": []
                },
                {
                    "name": "len",
                    "type": "Edm.Int32",
                    "searchable": False,
                    "filterable": False,
                    "retrievable": True,
                    "sortable": False,
                    "facetable": False,
                    "key": False,
                    "indexAnalyzer": None,
                    "searchAnalyzer": None,
                    "analyzer": None,
                    "synonymMaps": []
                }
            ]
        }
    ],
    "scoringProfiles": [],
    "corsOptions": None,
    "suggesters": [
        {
            "name": "sg",
            "searchMode": "analyzingInfixMatching",
            "sourceFields": [
                "reviews_text"
            ]
        }
    ],
    "analyzers": [],
    "tokenizers": [],
    "tokenFilters": [],
    "charFilters": [],
    "encryptionKey": None,
    "similarity": None
}
r = requests.post(construct_Url(search_service, "indexes", None, None, api_version), data=json.dumps(index_def),  headers=headers)
print(r)
res = r.json()
print(json.dumps(res, indent=2))

## 1.4 Create indexer

In [None]:
indexername = f'{datasource_container}-idxr'
indexer_def = {
    "name": f'{indexername}',
    "description": "Indexer to enrich hotel reviews",
    "dataSourceName": f'{datasource_container}-ds',
    "skillsetName": f'{datasource_container}-ss',
    "targetIndexName": f'{datasource_container}-idx',
    "disabled": None,
    "schedule": {
        "interval": "PT2H",
        "startTime": "0001-01-01T00:00:00Z"
      },
    "parameters": {
        "batchSize": None,
        "maxFailedItems": 0,
        "maxFailedItemsPerBatch": 0,
        "base64EncodeKeys": None,
        "configuration": {
            "dataToExtract": "contentAndMetadata",
            "parsingMode": "delimitedText",
            "firstLineContainsHeaders": True,
            "delimitedTextDelimiter": ",",
            "delimitedTextHeaders": ""
        }
    },
   "fieldMappings": [
        {
            "sourceFieldName": "AzureSearch_DocumentKey",
            "targetFieldName": "AzureSearch_DocumentKey",
            "mappingFunction": {
                "name": "base64Encode",
                "parameters": None
            }
        }
    ],
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/reviews_text/keyphrases",
            "targetFieldName": "keyphrases",
            "mappingFunction": None
        },
        {
            "sourceFieldName": "/document/language",
            "targetFieldName": "language",
            "mappingFunction": None
        },
        {
            "sourceFieldName": "/document/reviews_text/translated_text",
            "targetFieldName": "translated_text",
            "mappingFunction": None
        }
    ],
    "cache": {
        "enableReprocessing": True,
        "storageConnectionString": f'{know_store_cache}'
    }
}
r = requests.post(construct_Url(search_service, "indexers", None, None, api_version), data=json.dumps(indexer_def),  headers=headers)
print(r)
res = r.json()
print(json.dumps(res, indent=2))

## 1.5 Run the indexer

In [None]:
r = requests.post(construct_Url(search_service, "indexers", indexername, "run", api_version), data=None,  headers=headers)
print(r)
#res = r.json()
#print(json.dumps(res, indent=2))

In [None]:
r = requests.get(construct_Url(search_service, "indexers", indexername, "status", api_version), data=None,  headers=headers)
print(r)
res = r.json()
print(json.dumps(res, indent=2))
#print(res['lastResult']['status'] + ', ' + str(res['lastResult']['itemsProcessed'] ))

## Decision Point!

If you chose to use the small dataset, you will need to register the models from the model folder you downloaded at the beginning of the exercise. Skip to step 3.4. Otherwise keep going with the next cell. 

## 2.0 Generate Training Data
Review the results in the knowledge store and generate trainign data


In [None]:
%pip install azure-storage-blob
%pip install --upgrade azureml-sdk

### Connect to the Azure ML workspace and get the default datastore

In [None]:
from azureml.core import Workspace, Datastore
try:
    ws = Workspace(subscription_id, resource_group, workspace_name)
    print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')
    print('Library configuration succeeded')
except:
    print('Workspace not found')
    
ds = ws.get_default_datastore()

## 2.1 Download the Glove embeddings

Download the Glove embeddings to the local training-data directory. **NOTE:** This file is around 2G, so it may take some time for the download to complete.

In [None]:
import os
import urllib.request
lib_root = os.path.dirname(os.path.abspath("__file__"))
dataset = os.path.join(lib_root, 'training-data')
if(os.path.isdir(dataset)):
    print("dataset folder exists")
else:
    os.mkdir(os.path.join(lib_root, 'training-data'))
url = 'http://nlp.stanford.edu/data/glove.840B.300d.zip'
filename = 'training-data/glove.840B.300d.zip'
urllib.request.urlretrieve(url, filename)

Upload the Glove embeddings and pre-extracted training data to the default storage for the workspace.

In [None]:
lib_root = os.path.dirname(os.path.abspath("__file__"))
ds = ws.get_default_datastore()
ds.upload('./training-data', target_path='hotels_data', overwrite=True, show_progress=True)

### (Optional) Remove the Glove embeddings file

After uploading the Glove embeddings, you may want to remove it from your local storage to up space. If so, run the following cell.

In [None]:
try:
    os.remove('./training-data/glove.840B.300d.zip')
except OSError:
    pass

## 2.2 (Optional) Extract training data

If you created the enrichment pipeline and processed the hotel records, you can use the following code cell to extract the training data and upload it. It will overwrite the default hotels_train.csv that was uploaded in the previous cell.

In [None]:
import pandas as pd
from pandas.io.json import json_normalize 
from azure.storage.blob import BlobServiceClient, ContainerClient

# Connect to enriched data
blob_service_client = BlobServiceClient.from_connection_string(know_store_cache)
container_client = ContainerClient.from_connection_string(know_store_cache, container_name=f'{datasource_container}-enriched')
blob_list = container_client.list_blobs()
count = 0
reviews = []
file_name = 'single.json'
# Extract into a single dataframe
for blob in blob_list:
    blob_client = container_client.get_blob_client(blob.name)
    
    with open(file_name, "wb") as my_blob:
        download_stream = blob_client.download_blob()
        my_blob.write(download_stream.readall())
    
    with open(file_name, encoding="utf8") as data_file:    
        data = json.load(data_file) 
        reviews.append(data['translatedText'])
    count = count + 1
    if count % 100 == 0:
        print("Processed " + str(count))
    if count == 1500:
        break
print(len(reviews))
df = pd.DataFrame(reviews, columns=['reviews'])
    
# Convert to CSV
df.to_csv('hotels_train.csv')
print(f'Succesfully generated labeled data')
# Upload the CSV file to the default datastore for the machine learning workspace
ds.upload_files(['hotels_train.csv'], relative_root=None, target_path='hotels_data', overwrite=True, show_progress=True)
print(f'Uploaded labeled data to datastore')

## 3.0 Train the aspect based sentiment model

## 3.1 Create the training cluster


In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cluster_name = "cpu-train"

# Verify that cluster does not exist already
try:
    cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2',
                                                           vm_priority='lowpriority',
                                                           min_nodes=0,
                                                           max_nodes=4)
    cluster = ComputeTarget.create(ws, cluster_name, compute_config)

cluster.wait_for_completion(show_output=True)

## 3.2 Create the training script¶

The training script uses the data stored in the default storage account to train the model.

In [None]:
%%writefile train.py
import argparse
import json
import os 
from pathlib import Path
from nltk import flatten
from azureml.core import Run
from sklearn.metrics import f1_score
from azureml.core.model import Model
import shutil



# Load NLP Architect
from nlp_architect.models.absa.train.train import TrainSentiment
from nlp_architect.models.absa.inference.inference import SentimentInference

# Inputs
parser = argparse.ArgumentParser(description='ABSA Train')
parser.add_argument('--data_folder', type=str, dest='data_folder', help='data folder mounting point')
parser.add_argument('--asp_thresh', type=int, default=3)
parser.add_argument('--op_thresh', type=int, default=2)
parser.add_argument('--max_iter', type=int, default=3)

args = parser.parse_args()

# Download ABSA dependencies including spacy parser and glove embeddings 
from spacy.cli.download import download as spacy_download
from nlp_architect.utils.io import uncompress_file
from nlp_architect.models.absa import TRAIN_OUT

spacy_download('en')
GLOVE_ZIP = os.path.join(args.data_folder, 
                                 'hotels_data/glove.840B.300d.zip')
EMBEDDING_PATH = TRAIN_OUT / 'word_emb_unzipped' / 'glove.840B.300d.txt'


uncompress_file(GLOVE_ZIP, Path(EMBEDDING_PATH).parent)

hotels_train = os.path.join(args.data_folder, 
                                 'hotels_data/hotels_train.csv')

os.makedirs('outputs', exist_ok=True)

train = TrainSentiment(asp_thresh=args.asp_thresh,
                       op_thresh=args.op_thresh, 
                       max_iter=args.max_iter)

opinion_lex, aspect_lex = train.run(data=hotels_train,
                                    out_dir = './outputs')

print("Aspect Lexicon: {}\n".format(aspect_lex) + "=" * 40 + "\n")
print("Opinion Lexicon: {}".format(opinion_lex))


# ~/nlp-architect/cache/absa/train/output/generated_opinion_lex_reranked.csv
inference = SentimentInference('/root/nlp-architect/cache/absa/train/lexicons/generated_aspect_lex.csv', 
                               '/root/nlp-architect/cache/absa/train/lexicons/generated_opinion_lex_reranked.csv')

# Copy the models to the outputs/model location in the data store for this run
shutil.copyfile('/root/nlp-architect/cache/absa/train/lexicons/generated_aspect_lex.csv', './outputs/hotel_aspect_lex.csv')  
shutil.copyfile('/root/nlp-architect/cache/absa/train/lexicons/generated_opinion_lex_reranked.csv', './outputs/hotel_opinion_lex_reranked.csv')  


## 3.3 Create and submit experiment

The following code cell creates a new experiment in the workspace. Experiments are used to process training runs, capture logs, and the models produced from training.

This code also creates an estimator, which is used to set up the environment used to train the model.

In [None]:
from azureml.core import Experiment
experiment_name = 'hotel-absa'
exp = Experiment(workspace=ws, name=experiment_name)

from azureml.train.estimator import Estimator
script_params = {
    '--data_folder': ds,
}

nlp_est = Estimator(source_directory='.',
                   script_params=script_params,
                   compute_target=cluster,
                   environment_variables = {'NLP_ARCHITECT_BE':'CPU'},
                   entry_script='train.py',
                   #pip_packages=['git+https://github.com/NervanaSystems/nlp-architect.git@absa',
                                 #'spacy==2.1.8']
                   pip_packages = ['git+https://github.com/NervanaSystems/nlp-architect.git@absa', 'nlp-architect',
                    'spacy==2.1.8']

)
# Submit the run
run = exp.submit(nlp_est)

The following cell shows the run details, so you can monitor the progress of the run job.

In [None]:
# Show the run details
from azureml.widgets import RunDetails

RunDetails(run).show()

In [None]:
# Shows output of the run on stdout.
run.wait_for_completion(show_output=True)

## 3.4 Register Models

In [None]:
aspect_lex = run.register_model(model_name='hotel_aspect_lex', model_path='./outputs/hotel_aspect_lex.csv')
opinion_lex = run.register_model(model_name='hotel_opinion_lex', model_path='./outputs/hotel_opinion_lex_reranked.csv')

In [None]:
%pip install git+https://github.com/NervanaSystems/nlp-architect.git@absa --user

In [None]:
from nlp_architect.models.absa.inference.inference import SentimentInference

aspect_lex_path = "../models/hotel_aspect_lex.csv"
opinion_lex_path = "../models/hotel_opinion_lex_reranked.csv"
inference = SentimentInference(aspect_lex_path, opinion_lex_path)

docs = [ "Great", "The bathroom sink splashes water, need to be fixed. The bathtub/shower make loud noises. Other than that, it was a good stay." ]
sentiment_docs = []

for doc_raw in docs:
    sentiment_doc = inference.run(doc=doc_raw)
    if sentiment_doc != None:
        print(sentiment_doc._sentences)
    else:
        print("Sentiment doc is None")
    sentiment_docs.append(sentiment_doc)
print(sentiment_docs[0])

## 3.5 Write Scoring Script


In [None]:
%%writefile score.py
from azureml.core.model import Model
from nlp_architect.models.absa.inference.inference import SentimentInference
from spacy.cli.download import download as spacy_download
import traceback
import json
# Inference schema for schema discovery
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType

def init():
    """
    Set up the ABSA model for Inference  
    """
    global SentInference
    spacy_download('en')
    aspect_lex = Model.get_model_path('hotel_aspect_lex')
    opinion_lex = Model.get_model_path('hotel_opinion_lex') 
    SentInference = SentimentInference(aspect_lex, opinion_lex)

    # Inference schema for schema discovery
standard_sample_input = {'text': 'a sample input record containing some text' }
standard_sample_output = {"sentiment": {"sentence": "This place makes false booking prices, when you get there, they say they do not have the reservation for that day.", 
                                        "terms": [{"text": "hotels", "type": "AS", "polarity": "POS", "score": 1.0, "start": 300, "len": 6}, 
                                                  {"text": "nice", "type": "OP", "polarity": "POS", "score": 1.0, "start": 295, "len": 4}]}}
@input_schema('raw_data', StandardPythonParameterType(standard_sample_input))
@output_schema(StandardPythonParameterType(standard_sample_output))    
def run(raw_data):
    try:
        input_txt = raw_data["text"]
        doc = SentInference.run(doc=input_txt)
        if doc is None:
            return None
        sentences = doc._sentences
        result = {"sentence": doc._doc_text}
        terms = []
        for sentence in sentences:
            for event in sentence._events:
                for x in event:
                    term = {"text": x._text, "type":x._type.value, "polarity": x._polarity.value, "score": x._score,"start": x._start,"len": x._len }
                    terms.append(term)
        result["terms"] = terms
        print("Success!")
        return {"sentiment": result}
    except Exception as e:
        result = str(e)
        # return error message back to the client
        print("Failure!")
        print(traceback.format_exc())
        return json.dumps({"error": result, "tb": traceback.format_exc()})

Define the software environment needed to run inferencing. This loads the Python packages needed to run both the model(s) and entry script.

In [None]:
from azureml.core.conda_dependencies import CondaDependencies 
from azureml.core import Environment

conda = None
pip = ["azureml-defaults", "azureml-monitoring", 
       "git+https://github.com/NervanaSystems/nlp-architect.git@absa", 'nlp-architect', 'inference-schema',
       "spacy==2.0.18"]

conda_deps = CondaDependencies.create(conda_packages=None, pip_packages=pip)

myenv = Environment(name='myenv')
myenv.python.conda_dependencies = conda_deps

Use the environment and entry script to create an inference configuration.

In [None]:
from azureml.core.model import InferenceConfig
inf_config = InferenceConfig(entry_script='score.py', environment=myenv)

## 3.6 Create Inference Cluster

This process can take up to 30 minutes

In [None]:
from azureml.core.compute import AksCompute, ComputeTarget
# Create a default configuration
prov_config = AksCompute.provisioning_configuration()
# Enable TLS (for HTTPS communication)
# Leaf domain label generates a name using the formula
#  "<leaf-domain-label>######.<azure-region>.cloudapp.azure.net"
#  where "######" is a random series of characters
prov_config.enable_ssl(leaf_domain_label = "contoso")

from azureml.core.compute import AksCompute, ComputeTarget

#If there's an existing cluster, use it. Else, create a new one.
cluster_name = 'amlskills'
try:    
    aks_target = ComputeTarget(ws, cluster_name)
    print("Attaching to existing cluster")
except Exception as e:
    print("Creating new cluster")
    aks_target = ComputeTarget.create(workspace = ws, 
                                  name = cluster_name, 
                                  provisioning_configuration = prov_config)
    # Wait for the create process to complete
    aks_target.wait_for_completion(show_output = True)

Define the deployment configuration. This includes the CPU, memory, automatic scaling behavior, etc.

In [None]:
from azureml.core.model import Model
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage
from azureml.core.webservice import AksWebservice, Webservice



# If deploying to a cluster configured for dev/test, ensure that it was created with enough
# cores and memory to handle this deployment configuration. Note that memory is also used by
# things such as dependencies and AML components.

aks_config = AksWebservice.deploy_configuration(autoscale_enabled=True, 
                                                       autoscale_min_replicas=1, 
                                                       autoscale_max_replicas=3, 
                                                       autoscale_refresh_seconds=10, 
                                                       autoscale_target_utilization=70,
                                                       auth_enabled=True, 
                                                       cpu_cores=1, memory_gb=2, 
                                                       scoring_timeout_ms=5000, 
                                                       replica_max_concurrent_requests=2, 
                                                       max_request_wait_time=5000)

## 3.7 Deploy Webservice

The following code deploys the models as a web service. The service uses the registered models, inference config, and deployment config to deploy to Azure Kubernetes Service.

In [None]:
from azureml.core.webservice import AksWebservice, Webservice

# Reference the registered models by name and load from the workspace
c_aspect_lex = Model(ws, 'hotel_aspect_lex')
c_opinion_lex = Model(ws, 'hotel_opinion_lex') 
service_name = "hotel-absa-v2"

aks_service = Model.deploy(workspace=ws,
                           name=service_name,
                           models=[c_aspect_lex, c_opinion_lex],
                           inference_config=inf_config,
                           deployment_config=aks_config,
                           deployment_target=aks_target,
                           overwrite=True)

aks_service.wait_for_deployment(show_output = True)
print(aks_service.state)
print(aks_service.get_logs()) 
print(aks_service.state)
print()
primary, secondary = aks_service.get_keys()

## 3.8 Test Webservice

In [None]:
import requests
import json

primary, secondary = aks_service.get_keys()

# send a random row from the test set to score
#input_data = '{"values": [{"data": { "abstractText": "  "  }, "recordId": 1 }]}'
input_data = '{"raw_data": {"text": "This is a nice place for a relaxing evening out with friends. The owners seem pretty nice, too. I have been there a few times including last night. Recommend."}}'
#"This is a nice place for a relaxing evening out with friends. The owners seem pretty nice, too. I've been there a few times including last night. Recommend."
headers = {'Content-Type':'application/json'}

# for AKS deployment you'd need to the service key in the header as well

headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ primary)} 

resp = requests.post(aks_service.scoring_uri, input_data, headers=headers)
print(resp.text)

## Endpoint Deployed!

Next, add the endpoint in as a Azure ML skill. Either follow the instructions in [the tutorial](https://docs.microsoft.com/azure/search/cognitive-search-tutorial-aml-custom-skill) to add the skill in the portal or continue through the notebook for the API experience.

## 4.0 Add AML Skill - Update Skillset

In [None]:
aks_service.get_logs()
#print(aks_service.scoring_uri)
#print(primary)

In [None]:
skillset_def = {
    'name': f'{skillset_name}',
    'description': 'Skillset to enrich hotel reviews with aspect based sentiment',
    'skills': [
        {
            '@odata.type': '#Microsoft.Skills.Custom.AmlSkill',
            'name': 'AspectSentiment',
            'description': 'AspectSentiment',
            'context': '/document/reviews_text',
            'uri': f'{aks_service.scoring_uri}',
            'key': f'{primary}',
            'resourceId': None,
            'region': None,
            'timeout': 'PT30S',
            'degreeOfParallelism': 0,
            'inputs': [
              {
                'name': 'raw_data',
                'sourceContext': '/document/reviews_text',
                'inputs': [
                  {
                    'name': 'text',
                    'source': '/document/reviews_text/translated_text'
                  }
                ]
              }
            ],
            'outputs': [
              {
                'name': 'sentiment',
                'targetName': 'absa'
              }
            ]
        },
        {
            '@odata.type': '#Microsoft.Skills.Text.KeyPhraseExtractionSkill',
            'name': '#1',
            'description': None,
            'context': '/document/reviews_text',
            'defaultLanguageCode': 'en',
            'maxKeyPhraseCount': None,
            'inputs': [
                {
                    'name': 'text',
                    'source': '/document/reviews_text',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'languageCode',
                    'source': '/document/language',
                    'sourceContext': None,
                    'inputs': []
                }
            ],
            'outputs': [
                {
                    'name': 'keyPhrases',
                    'targetName': 'keyphrases'
                }
            ]
        },
        {
            '@odata.type': '#Microsoft.Skills.Text.LanguageDetectionSkill',
            'name': '#2',
            'description': None,
            'context': '/document',
            'inputs': [
                {
                    'name': 'text',
                    'source': '/document/reviews_text',
                    'sourceContext': None,
                    'inputs': []
                }
            ],
            'outputs': [
                {
                    'name': 'languageCode',
                    'targetName': 'language'
                }
            ]
        },
        {
            '@odata.type': '#Microsoft.Skills.Text.TranslationSkill',
            'name': '#3',
            'description': None,
            'context': '/document/reviews_text',
            'defaultFromLanguageCode': None,
            'defaultToLanguageCode': 'en',
            'suggestedFrom': 'en',
            'inputs': [
                {
                    'name': 'text',
                    'source': '/document/reviews_text',
                    'sourceContext': None,
                    'inputs': []
                }
            ],
            'outputs': [
                {
                    'name': 'translatedText',
                    'targetName': 'translated_text'
                }
            ]
        },
        {
            '@odata.type': '#Microsoft.Skills.Util.ShaperSkill',
            'name': '#4',
            'description': None,
            'context': '/document',
            'inputs': [
                {
                    'name': 'address',
                    'source': '/document/address',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'categories',
                    'source': '/document/categories',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'city',
                    'source': '/document/city',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'country',
                    'source': '/document/country',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'latitude',
                    'source': '/document/latitude',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'longitude',
                    'source': '/document/longitude',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'name',
                    'source': '/document/name',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'postalCode',
                    'source': '/document/postalCode',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'province',
                    'source': '/document/province',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_date',
                    'source': '/document/reviews_date',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_dateAdded',
                    'source': '/document/reviews_dateAdded',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_rating',
                    'source': '/document/reviews_rating',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_text',
                    'source': '/document/reviews_text',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_title',
                    'source': '/document/reviews_title',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'reviews_username',
                    'source': '/document/reviews_username',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'AzureSearch_DocumentKey',
                    'source': '/document/AzureSearch_DocumentKey',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_content_type',
                    'source': '/document/metadata_storage_content_type',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_size',
                    'source': '/document/metadata_storage_size',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_last_modified',
                    'source': '/document/metadata_storage_last_modified',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_content_md5',
                    'source': '/document/metadata_storage_content_md5',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_name',
                    'source': '/document/metadata_storage_name',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_path',
                    'source': '/document/metadata_storage_path',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'metadata_storage_file_extension',
                    'source': '/document/metadata_storage_file_extension',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'keyPhrases',
                    'source': '/document/reviews_text/keyphrases/*',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'languageCode',
                    'source': '/document/language',
                    'sourceContext': None,
                    'inputs': []
                },
                {
                    'name': 'translatedText',
                    'source': '/document/reviews_text/translated_text',
                    'sourceContext': None,
                    'inputs': []
                }
            ],
            'outputs': [
                {
                    'name': 'output',
                    'targetName': 'objectprojection'
                }
            ]
        }
    ],
    'cognitiveServices': {
        '@odata.type': '#Microsoft.Azure.Search.CognitiveServicesByKey',
        'description': '/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.CognitiveServices/accounts/{cog_svcs_acct}',
        'key': f'{cog_svcs_key}'
    },
    'knowledgeStore': {
        'storageConnectionString': f'{know_store_cache}',
        'projections': [
            {
                'tables': [],
                'objects': [
                    {
                        'storageContainer':  f'{datasource_container}-enriched',
                        'referenceKeyName': None,
                        'generatedKeyName': None,
                        'source': '/document/objectprojection',
                        'sourceContext': None,
                        'inputs': []
                    }
                ],
                'files': []
            }
        ]
    }
}


r = requests.put(construct_Url(search_service, "skillsets", skillset_name, None, api_version), data=json.dumps(skillset_def),  headers=headers)
print(r)
res = r.json()
print(json.dumps(res, indent=2))

In [None]:
indexername = f'{datasource_container}-idxr'
indexer_def = {
    "name": indexername,
    "description": "Indexer to enrich hotel reviews",
    "dataSourceName": f'{datasource_container}-ds',
    "skillsetName": f'{datasource_container}-ss',
    "targetIndexName": f'{datasource_container}-idx',
    "disabled": None,
    "schedule": {
        "interval": "PT2H",
        "startTime": "0001-01-01T00:00:00Z"
      },
    "parameters": {
        "batchSize": None,
        "maxFailedItems": 0,
        "maxFailedItemsPerBatch": 0,
        "base64EncodeKeys": None,
        "configuration": {
            "dataToExtract": "contentAndMetadata",
            "parsingMode": "delimitedText",
            "firstLineContainsHeaders": True,
            "delimitedTextDelimiter": ",",
            "delimitedTextHeaders": ""
        }
    },
   "fieldMappings": [
        {
            "sourceFieldName": "AzureSearch_DocumentKey",
            "targetFieldName": "AzureSearch_DocumentKey",
            "mappingFunction": {
                "name": "base64Encode",
                "parameters": None
            }
        }
    ],
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/reviews_text/absa/terms",
            "targetFieldName": "aspect_sentiment",
            "mappingFunction": None
        },
        {
            "sourceFieldName": "/document/reviews_text/keyphrases",
            "targetFieldName": "keyphrases",
            "mappingFunction": None
        },
        {
            "sourceFieldName": "/document/language",
            "targetFieldName": "language",
            "mappingFunction": None
        },
        {
            "sourceFieldName": "/document/reviews_text/translated_text",
            "targetFieldName": "translated_text",
            "mappingFunction": None
        }
    ],
    "cache": {
        "enableReprocessing": True,
        "storageConnectionString": f'{know_store_cache}'
    }
}
r = requests.put(construct_Url(search_service, "indexers",  f'{datasource_container}-idxr', None, api_version), data=json.dumps(indexer_def),  headers=headers)
print(r)
res = r.json()
print(json.dumps(res, indent=2))

In [None]:
r = requests.post(construct_Url(search_service, "indexers", indexername, "run", api_version), data=None,  headers=headers)
print(r)
#res = r.json()
#print(json.dumps(res, indent=2))