So essentially what we have, and this is a little bit off of what you might think of the use case for this, but it is a great fit for.
So if we were to open up these, what we have is a bunch of loan agreements. Just print it off the text files here. You know, maybe they had an export as text or they saved the text in the file.
Could be a PDF file too, just adds processing you got to do. So this is a loan agreement. So they have sections like parties, amounts, interest, payment, terms, prepayment.
So we have all these sections that are going to be relatively similar, right? You know, it's going to know that like the interest value is going to be listed under the interest section. So if we were to open up another one of these, you know, the parties change, the amounts and the terms change, but still largely looking at the same form. So, you know, as a business, we might have thousands of loans that we've given out all using that same template.
And so what we might realize is we're like, hey, now that we're digitizing this, we need to have a way to extract the parties and the terms and the addresses and all of that off of it. 

Note:

We have text articles in the data folder and a json with labels for each <br/>
The json has a field for `storageInputContainerName`. This needs to match the container name in Azure Storage acct

### Prerequisites

- Make sure you have a language resource and a storage account created
-  [Link for setting that up here](https://learn.microsoft.com/en-us/azure/ai-services/language-service/custom-named-entity-recognition/how-to/create-project?tabs=portal%2Clanguage-studio#using-a-pre-existing-language-resource)

In [1]:
# We will use the API for this rather than the SDK

In [2]:
import requests, json, os, time
from dotenv import load_dotenv

In [3]:
load_dotenv()

True

In [6]:
key = os.environ['S0_LANGUAGE_KEY']
base_url = os.environ['S0_LANGUAGE_ENDPOINT']
projectName = 'LoanAgreement'
API_version = '2022-05-01'

#### Steps

- Import the Data (with labels)
- Train the model
- Deploy the model
- Make Predictions

In [10]:
#using the quickstart

In [7]:
endpoint = f'{base_url}/language/authoring/analyze-text/projects/{projectName}/:import?api-version={API_version}'

In [9]:
endpoint

'https://ai-language-qna.cognitiveservices.azure.com//language/authoring/analyze-text/projects/LoanAgreement/:import?api-version=2022-05-01'

In [11]:
headers = {
    'Ocp-Apim-Subscription-Key': key,
    'Content-Type': 'application/json'
    }

In [23]:
#read the label file
label_file = '04a. loanAgreementsLabels.json'
with open(label_file, 'r') as f:
    data = json.load(f)

In [24]:
response = requests.post(endpoint, headers=headers, json=data)
operation_location = response.headers['operation-location']

'''
you can check if a key is in a dictionary by doing:
if 'key_name' in dictionary:
    # do something
else:
    # do something else

This is helpful bc if the operation fails, the response will not contain the 'operation-location' header.
''';

In [25]:
response

<Response [202]>

In [26]:
operation_location

'https://ai-language-qna.cognitiveservices.azure.com/language/authoring/analyze-text/projects/LoanAgreement/import/jobs/3a7756cc-249d-4236-9246-a4da5e3d7cf4_638874432000000000?api-version=2022-05-01'

In [28]:
#this allows us to check the status of our project and labeling job
def check_status(operation_location):
    status_response = requests.get(operation_location, headers=headers)
    return status_response.json()

In [27]:
check_status(operation_location)

{'jobId': '3a7756cc-249d-4236-9246-a4da5e3d7cf4_638874432000000000',
 'createdDateTime': '2025-07-07T19:47:46Z',
 'lastUpdatedDateTime': '2025-07-07T19:47:48Z',
 'expirationDateTime': '2025-07-14T19:47:46Z',
 'status': 'succeeded'}

Check the studio to make sure the project was created

### Let's Train The Model

In [None]:
def train_model(model_name):
    #we will use the train endpoint to train the model rather than the import one
    endpoint = f'{base_url}/language/authoring/analyze-text/projects/{projectName}/:train?api-version={API_version}'
    #headers are the same as before
    headers = {
    'Ocp-Apim-Subscription-Key': key,
    'Content-Type': 'application/json'
    }
    #following the quickstart
    body = {
        "modelLabel": model_name,
	    "trainingConfigVersion": API_version,
	    "evaluationOptions": {
		    "kind": "percentage",
		    "trainingSplitPercentage": 80,
		    "testingSplitPercentage": 20
            #there are more possible params here, but we will use the defaults
	    }
    }

    response = requests.post(endpoint, headers=headers, json=body)
    #print(response.text)
    operation_location = response.headers['operation-location']
    return operation_location

    

In [None]:
training_model = train_model('LoanAgreementModel')

In [None]:
training_location = '4a4b92c7-91f1-456e-84d8-febb7fe6d39b_638874432000000000' #this can be found in the response.text that gets outputted by the train_model function
#It is actually the training job ID

In [37]:
def check_training_status(training_location):
    endpoint = f'{base_url}/language/authoring/analyze-text/projects/{projectName}/train/jobs/{training_location}?api-version={API_version}'
    headers = {'Ocp-Apim-Subscription-Key': key}
    status_response = requests.get(endpoint, headers=headers)
    return status_response.json()

In [40]:
check_training_status(training_location)

{'result': {'modelLabel': 'LoanAgreementModel',
  'trainingConfigVersion': '2022-05-01',
  'trainingStatus': {'percentComplete': 0, 'status': 'notStarted'},
  'evaluationStatus': {'percentComplete': 0, 'status': 'notStarted'}},
 'jobId': '4a4b92c7-91f1-456e-84d8-febb7fe6d39b_638874432000000000',
 'createdDateTime': '2025-07-07T20:01:20Z',
 'lastUpdatedDateTime': '2025-07-07T20:10:37Z',
 'expirationDateTime': '2025-07-14T20:01:20Z',
 'status': 'running',
   'message': 'Entity `Date` is tagged in `20` training dataset examples, it is recommended to have `200` tags for better model quality.'},
   'message': 'Entity `BorrowerName` is tagged in `20` training dataset examples, it is recommended to have `200` tags for better model quality.'},
   'message': 'Entity `BorrowerAddress` is tagged in `20` training dataset examples, it is recommended to have `200` tags for better model quality.'},
   'message': 'Entity `BorrowerCity` is tagged in `20` training dataset examples, it is recommended to 

In [None]:
# Deploy the model

In [None]:
#Test the model