Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.


# Prerequisites and Environment Setup


Run these Jupyter Notebooks from Azure ML Compute Instance or Notebook VM.




## 1.0 Install packages
This example is based on the NER model from [NLP recipes](https://github.com/microsoft/nlp-recipes/blob/master/examples/named_entity_recognition/ner_wikigold_transformer.ipynb). You need to install the following package that has Utility functions and classes in the NLP Best Practices repo that are used to facilitate data preprocessing, model training, model scoring, and model evaluation.

In [None]:
!pip install ./nlp-recipes-utils/utils_nlp-2.0.0-py3-none-any.whl

## 2.0 Initialize workspace

To create or access an Azure ML Workspace, you will need to import the AML library and the following information:
* A name for your workspace
* Your subscription id
* The resource group name

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace/?WT.mc_id=bert-notebook-abornst) object from the existing ML workspace you created during resource deployment.

In [None]:
from azureml.core import Workspace

try:
    ws = Workspace.from_config()
    print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')
    print('Library configuration succeeded')
except:
    print('Workspace not found')

## 3.0 Prepare Data

### 3.1 Get Default Datastore

In [None]:
ds = ws.get_default_datastore()

### 3.2 Download training data

We will be using the Labeled Dataset we generated earlier. The dataset is in CONLL format. Download this dataset(ner_dataset.txt) from the Azure Blob Storage that was refenced while generating the Labeled Data set( Container name: labeled-data-df) 
Note: Make sure to remove the number from the file name.


### 3.3 Create a folder to upload data
Create a folder called entity-annotated-corpus and upload the downloaded train data set(ner_dataset.txt) from previous step

In [None]:
os.makedirs('entity-annotated-corpus', exist_ok=True)

### 3.4 Upload the data to ML workspace datastore


In [None]:
ds.upload_files(["entity-annotated-corpus/ner_dataset.txt"], relative_root='.')

### 3.5 Create Dataset and upload the training data

 

In [None]:
from azureml.core import Dataset, Datastore
datastore_paths = [(ds, 'entity-annotated-corpus')]
ner_ds = Dataset.File.from_files(path=datastore_paths)

ner_ds = ner_ds.register(workspace=ws,
                                 name='ner_ds_file',
                                 description='ner training data')

## Next Step

Next we need to train the model. Follow the steps mentioned in 02_Train_Model.ipynb to train the model. 