# World Bank Financial Survey Q&A Model Project 

This project develops a NLP powererd question-answering system that is trained on World Bank Survey Data containing financial information gathered from various federal banks across the globe. This notebook walks the user through gathering/processing the data and training/deploying the final model. 

### Dataset Description
The World Bank survey dataset comprises of structured financial questions sent to financial instituitions worldwide. The dataset includes multi-dimensional survey responses, hierarchial question structures, and financial metrics. For this project, we will use the questions that have long-form textual answers to train an NLP model, rather than using binary response questions.

### Project Architecture

##### Phase 1 - Data Processing 

- Transform unstructured survey data into structurerd NLP training pairs
    - Parse all relevant sheets from excel file
    - Properly handle hierarchical question structures to ensure each question answer pair is standalone
- Indentify and Flag PII using a ML model in dataset

##### Phase 2 - Model Development & Fine Tuning

- Fine-tune a Google FLAN-T5-Base NLP model 
- Optimize the model's performance on this specific World Bank survey domain
- Evaluate model performance using validation and test sample sets

##### Phase 3 - Deployement

- Deploy fine-tuned model to production environment on Azure/HuggingFace that allows for web/API interaction

##### Future Steps (if time allows):
- train FLAN-T5-Base instead of FLAN-T5-Small to improve generalization/accuracy
- add some sort of sentiment analysis to clasify questions/answers (financial questions, admin questions, etc)
- get feedback on model performance (answer quality/hallucinations/knowledge gaps)  
- add additional survey questions to knowledge base 

##### Model Notes:
- The model is trained on a balanced subset that consists of all the long form questions in the dataset and a random 30% selection of classification/numerical questions.
    - This was done due to time constraints. Training on the entire dataset would have taken multiple days
    - Additionally, the dataset is severely imbalanced. I found that training with a high number of classification samples meant the model would learn to answer yes or 0 for each one
- Additionally, the model being trained is FLAN-T5-SMALL rather than FLAN-T5-BASE
    - SMALL is significantly smaller than BASE (60M vs 220M parameters)
    - The smaller model was chosen due to hardware limitations and time constraints
    - This means that this model's accuracy and ability to generalize isn't as good as it could be
    - In future versions, given enough time, I would have liked to implement the BASE model instead


In [None]:
# download data from World Bank Database
import requests

url = "https://datacatalogfiles.worldbank.org/ddh-published/0038632/2/DR0047737/2021_04_26_brss-public-release.xlsx"
response = requests.get(url)

with open("worldbank_data.xlsx", "wb") as f:
    f.write(response.content)

Now that data is downloaded, it needs to be converted from an xlsx file with row column format to something that works for t5 training (question:answer pairs).  

In [2]:
## read and process data
import pandas as pd
import re

# remove extra unnecessary information from question
# for example, "Select all that apply"
def simplify_question(qText):
    if pd.isna(qText):
        return ""
    
    text = str(qText).strip()
    
    # split on common instruction starters and take first part
    for splitter in [" Please ", " If ", " Include ", " Specify ", " Describe ", " List "]:
        if splitter in text:
            text = text.split(splitter)[0]
            break
    
    # if there's a question mark, take up to first one
    if "?" in text:
        text = text.split("?")[0] + "?"
    
    return text.strip()

# loads all sheets at once
allSheets = pd.ExcelFile("worldbank_data.xlsx")

# store samples
samples = []

# process all sheets except first 2 and last 1
process = allSheets.sheet_names[2:-1]

# read first sheet and extract countries
dfFirst = pd.read_excel(allSheets, sheet_name=process[0], header=None)
countries = [str(c) for c in dfFirst.iloc[0, 2:].values if not pd.isna(c)]

for sheet in process:
    # read current sheet
    df = pd.read_excel(allSheets, sheet_name=sheet, header=None)
    
    # create parent and base vars
    parent = None
    currBase = None
    
    # iterate through every row except header
    # get question index and question text
    for idx, row in df.iloc[1:].iterrows():
        qIndex = row[0]
        qText = row[1]
        
        # if the question index is null but text does exist 
        # then the question is a parent question
        # assign parent question and then clear prev base and move onto next row
        if pd.isna(qIndex) and not pd.isna(qText):
            parent = simplify_question(qText)  # ← Simplify parent too
            currBase = None
            continue
        
        # regex starts with Q and captures groups delimited by _
        # group 1 is the main question number
        # group 2 is sub-question number
        # group 3 is for multi-part questions with extra text
        # non-capturing group is for sections of index which are unnecessary
        match = re.match(r'Q(\d+)_([0-9_]+?)([a-zA-Z_]+)?(?:_[A-Z]|_\d{4}|$)', str(qIndex))
        
        # if regex matched then process row, otherwise skip
        if match:
            baseNum = f"{match.group(1)}_{match.group(2)}"
            isMulti = bool(match.group(3)) or bool(re.search(r'_\d{4}', str(qIndex)))
            part = match.group(3) if match.group(3) else ""
        else:
            continue
        
        # if new base is different to current base, update base
        if baseNum and baseNum != currBase:
            # reset parent if new question isn't multi part
            if not isMulti:
                parent = None
            currBase = baseNum
        
        # loop through each column
        for colIdx, country in enumerate(countries):
            
            # get answer for current column
            answer = row[colIdx + 2]
            
            # skip column if there's no answer
            if pd.isna(answer):
                continue
            
            # Simplify the question text
            simplifiedQ = simplify_question(qText)  # ← KEY CHANGE
            
            # if question is multi-part combine parent question and question text
            if isMulti and parent:
                completeQ = f"{parent} {simplifiedQ}"
            # otherwise just append question text
            else:
                completeQ = simplifiedQ
            
            # fill in sample entry
            sample = {
                "input": f"Answer this question about {country}: {completeQ}".strip(),
                "target": str(answer).strip()
            }
            
            # append sample to list
            samples.append(sample)

Now that the data is in proper training format, it needs to be checked for PII. We will use Microsoft's Presidio pre-trained ML library to detect PII (https://github.com/microsoft/presidio).

In [None]:
# install dependecies
# !pip install presidio_analyzer presidio_anonymizer
# !python -m spacy download en_core_web_lg

In [None]:
from presidio_analyzer import AnalyzerEngine
from tqdm import tqdm
import json

# initialize analyzer
analyzer = AnalyzerEngine()

# specific countries and years are necessary to the survey data
# do not flag these as PII
excludeWords = set(countries)
excludeWords.update(['2011', '2012', '2013', '2014', '2015', '2016'])

# only include entries that the model has 70%+ confidnece is PII
CONFIDENCE = 0.7

# only track unique PII values
seenPII = set()

# storage for PII
potentialPII = []

# iterate through every sample
for idx, sample in enumerate(tqdm(samples, desc='finding pii')):

    # get input question and target
    inputText = sample["input"]
    targetText = sample["target"]

    # analyze input and target
    inputRes = analyzer.analyze(text=inputText, language='en')
    targetRes = analyzer.analyze(text=targetText, language='en')

    # filter out exclude list from text matches
    inputRes = [r for r in inputRes 
                if r.score >= CONFIDENCE
                and not any(inputText[r.start:r.end] in word or word in inputText[r.start:r.end] for word in excludeWords)] 
    targetRes = [r for r in targetRes 
                 if r.score >= CONFIDENCE 
                 and not any(targetText[r.start:r.end] in word or word in targetText[r.start:r.end] for word in excludeWords)]

    # if pii is found
    isNewPII = False
    for r in inputRes:
        if inputText[r.start:r.end] not in seenPII:
            isNewPII = True
            seenPII.add(inputText[r.start:r.end])
    for r in targetRes:
        if targetText[r.start:r.end] not in seenPII:
            isNewPII = True
            seenPII.add(targetText[r.start:r.end])

    if isNewPII:
        res = {
            "input": inputText,
            "target": targetText,
            "inputPII": [{"type": r.entity_type, "text": inputText[r.start:r.end], "score": r.score} for r in inputRes],
            "targetPII": [{"type": r.entity_type, "text": targetText[r.start:r.end], "score": r.score} for r in targetRes]
        }
        potentialPII.append(res)

# dump all potential flagged PII into a json file
with open('potentialPII.json', 'w', encoding='utf-8') as f:
    json.dump(potentialPII, f, indent=2, ensure_ascii=False)


finding pii: 100%|██████████| 107833/107833 [35:25<00:00, 50.73it/s] 


The code dumps all potential PII matches to a seperate JSON file saved to the current directory (potentiallyPII.json). This file can now be manually checked to determine which flagged keywords are false postives and which are actually PII. Once all PII is removed from the dataset, the T5 model training can begin.

In [None]:
# install dependencies
# !pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
# !pip install transformers datasets accelerate


In [None]:
import torch
from torch.utils.data import DataLoader
from torch.nn import CrossEntropyLoss
from torch.optim.lr_scheduler import CosineAnnealingLR
from tqdm import tqdm
from transformers import (
    AutoTokenizer, 
    AutoModelForSeq2SeqLM, 
    DataCollatorForSeq2Seq,
)
from datasets import Dataset
import random

# balances samples to significantly reduce training time for project constraints
# also helps prevent the model from learning to predict yes/no for every question
samplesSmall = [s for s in samples if len(s["target"].split()) < 3]
samplesLarge = [s for s in samples if len(s["target"].split()) >= 3]
random.seed(42)
samplesBalanced = (
    random.sample(samplesLarge, min(int(len(samples) * 0.7), len(samplesLarge))) + 
    random.sample(samplesSmall, min(int(len(samples) * 0.3), len(samplesSmall)))
)
random.shuffle(samplesBalanced)

# convert existing data to hugging face dataset
data = Dataset.from_list(samplesBalanced)

# setup base model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")

# tokenize inputs and targets
def preprocess(samples):
    modelInputs = tokenizer(
        samples["input"],
        max_length=512,
        truncation=True,
        padding=False
    )
    targets = tokenizer(
        samples["target"],
        max_length=128,
        truncation=True,
        padding=False
    )
    modelInputs["labels"] = targets["input_ids"]
    return modelInputs

# split data for testing and validation
trainValSplit = data.train_test_split(test_size=0.2)
valTestSplit = trainValSplit["test"].train_test_split(test_size=0.5)

splits = {
    "train": trainValSplit["train"],
    "validation": valTestSplit['train'],
    "test": valTestSplit["test"]
}

finalData = {
    "train": splits["train"].map(preprocess, batched=True, remove_columns=["input", "target"]),
    "validation": splits["validation"].map(preprocess, batched=True, remove_columns=["input", "target"]),
    "test": splits["test"].map(preprocess, batched=True, remove_columns=["input", "target"])
}

# create data collator
dataCollator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

# create adam optimizer with loss function, learning rate, and weight decay params
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-6, weight_decay=0.01)

# learning rate sceduling - reduces the lr over all epochs
# helps model converge better by reducing oscillation
scheduler = CosineAnnealingLR(optimizer, T_max=7)

# Create dataloaders
train_dataloader = DataLoader(
    finalData["train"], 
    batch_size=4, 
    shuffle=True, 
    collate_fn=dataCollator
)

val_dataloader = DataLoader(
    finalData["validation"],
    batch_size=4,
    collate_fn=dataCollator
)

# define epochs and specify gpu for training
num_epochs = 7
device = "cuda"

# training loop
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    
    progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch+1}/{num_epochs}")
    for step, batch in enumerate(progress_bar):
        # input_ids is the tokenized input
        # attention_mask tells the model which tokens are important (padding vs content)
        # labels are the tokenized targets
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["labels"].to(device)
        
        # forward pass - feeding current batch to model 
        # logits are raw, unnormalized scores - can be considered the model's "thoughts"
        # loss function includes label_smoothing - makes the model less confident and improves generalization (ability to perform on unseen data)
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        logits = outputs.logits
        loss_fct = CrossEntropyLoss(label_smoothing=0.1, ignore_index=-100)
        loss = loss_fct(logits.view(-1, logits.size(-1)), labels.view(-1))        
        
        # sets gradients back to zero so they are fresh for new batch
        optimizer.zero_grad()

        # computes the gradient of the loss of all the weights and biases
        loss.backward()

        # gradient clipping - prevents gradients from breaking if model updates by large amount
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        
        # updates model's parameters
        # Adam optimizer uses the gradients calculated by loss.backward() to make adjustments
        optimizer.step()

        # print loss and progress bar
        total_loss += loss.item()
        progress_bar.set_postfix({"loss": f"{loss.item():.4f}"})
            
    avg_train_loss = total_loss / len(train_dataloader)
    print(f"\nEpoch {epoch+1} - Avg Train Loss: {avg_train_loss:.4f}")

    # defines how big the next lr step should be - learning rate scheduling
    scheduler.step()
    
    # validation testing
    model.eval()
    total_val_loss = 0
    
    # validates model on samples without modifying weights
    with torch.no_grad():
        for batch in val_dataloader:
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["labels"].to(device)
            
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            total_val_loss += outputs.loss.item()
    
    avg_val_loss = total_val_loss / len(val_dataloader)
    print(f"Epoch {epoch+1} - Avg Val Loss: {avg_val_loss:.4f}")
    
# save model once finetuned
model.save_pretrained("./flan-t5-small-label-smooth-balanced")
tokenizer.save_pretrained("./flan-t5-small-label-smooth-balanced")


KeyboardInterrupt: 

Now that the model is fine-tuned on the initial dataset, it can be locally queried to it correctly provides predictions.

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import random

# load model
model = AutoModelForSeq2SeqLM.from_pretrained("./flan-t5-small-label-smooth-balanced").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("./flan-t5-small-label-smooth-balanced")

# select random test samples
test_indices = random.sample(range(len(data)), 20)

for idx in test_indices:
    sample = data[idx]
    question = sample["input"]
    true_answer = sample["target"]
    
    inputs = tokenizer(question, return_tensors="pt", max_length=512, truncation=True).to("cuda")
    outputs = model.generate(**inputs, max_length=128, num_beams=4)
    predicted = tokenizer.decode(outputs[0], skip_special_tokens=True)
        
    print(f"\nQ: {question[:70]}...")
    print(f"True: {true_answer[:60]}...")
    print(f"Pred: {predicted}")

# test on custom questions
print("\ncustom questions:")
custom_questions = [
    "Answer this question about United States: What body/agency grants banking licenses?",
    "Answer this question about France: What is the minimum capital requirement?",
    "Answer this question about Japan: Who regulates banks?"
]

for q in custom_questions:
    inputs = tokenizer(q, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_length=128, num_beams=4)
    pred = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"\nQ: {q}")
    print(f"A: {pred}")


Q: Answer this question about Canada: 3.1 Which regulatory capital adequa...
True: banks, bank holding companies, trust and loan companies, and...
Pred: All banks

Q: Answer this question about Guyana: 13.5 What were the total loans of t...
True: 219000...
Pred: 0

Q: Answer this question about Switzerland: 15.1 Are there any Islamic ban...
True: Yes...
Pred: Yes

Q: Answer this question about Guinea-Bissau: 11.12 Which mechanisms are p...
True: X...
Pred: X

Q: Answer this question about Albania: 3.5 What was the actual Tier 1 cap...
True: 13.45...
Pred: 0

Q: Answer this question about Angola: 10.7 Do banks disclose to the super...
True: X...
Pred: X

Q: Answer this question about Jordan: 8.5.1 How many times has the deposi...
True: No...
Pred: Yes

Q: Answer this question about Thailand: 11.1 i. Require banks to reduce/r...
True: X...
Pred: X

Q: Answer this question about Bolivia: 1.6 Which of the following are leg...
True: X...
Pred: X

Q: Answer this question about Costa Rica: 1

Now that we have made sure the model is working, we can upload it to a server. In this project, I'm using HuggingFace as its free and allows for easy testing. For production we would use Azure/AWS/GCP.

In [None]:
# you may need to run the authentication command directly in your terminal 
!pip install huggingface_hub
!hf auth login

^C


Now that you are logged in to huggingface, you must upload the trained model.

In [None]:
# upload model to 
model.push_to_hub("mian21/flan-t5-small-label-smooth-balanced")
tokenizer.push_to_hub("mian21/flan-t5-small-label-smooth-balanced")

model.safetensors: 100%|██████████| 308M/308M [03:35<00:00, 1.43MB/s] 
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


CommitInfo(commit_url='https://huggingface.co/mian21/flan-t5-small-label-smooth-balanced/commit/ed8880c8bd72f6bd3ac23a038a9d2d08e33922d3', commit_message='Upload tokenizer', commit_description='', oid='ed8880c8bd72f6bd3ac23a038a9d2d08e33922d3', pr_url=None, repo_url=RepoUrl('https://huggingface.co/mian21/flan-t5-small-label-smooth-balanced', endpoint='https://huggingface.co', repo_type='model', repo_id='mian21/flan-t5-small-label-smooth-balanced'), pr_revision=None, pr_num=None)

The model is now available from huggingface's website and can be loaded directly into your code using huggingface's autotrainer.

Additionally, the model can be viewed online through the huggingface model page or interacted with directly through the huggingface space.

https://huggingface.co/mian21/flan-t5-small-label-smooth-balanced  
https://huggingface.co/spaces/mian21/t5-demo

The model can also be queried through API calls when deployed on a cloud server like Google Cloud Platform, Amazon Web Services, or Microsft Azure. In this project, I will be using Azure to demostrate endpoint deployement.

In [None]:
# first install azure dependencies and login to azure
!pip install azure-ai-ml azure-cli azure-identity
!az login

Collecting azure-ai-ml
  Using cached azure_ai_ml-1.30.0-py3-none-any.whl.metadata (40 kB)
Collecting azure-cli
  Using cached azure_cli-2.78.0-py3-none-any.whl.metadata (8.8 kB)
Collecting azure-identity
  Using cached azure_identity-1.25.1-py3-none-any.whl.metadata (88 kB)
Collecting azure-core>=1.23.0 (from azure-ai-ml)
  Using cached azure_core-1.36.0-py3-none-any.whl.metadata (47 kB)
Collecting azure-mgmt-core>=1.3.0 (from azure-ai-ml)
  Using cached azure_mgmt_core-1.6.0-py3-none-any.whl.metadata (4.6 kB)
Collecting marshmallow<4.0.0,>=3.5 (from azure-ai-ml)
  Using cached marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting jsonschema<5.0.0,>=4.0.0 (from azure-ai-ml)
  Using cached jsonschema-4.25.1-py3-none-any.whl.metadata (7.6 kB)
Collecting strictyaml<2.0.0 (from azure-ai-ml)
  Using cached strictyaml-1.7.3-py3-none-any.whl.metadata (11 kB)
Collecting pyjwt<3.0.0 (from azure-ai-ml)
  Using cached PyJWT-2.10.1-py3-none-any.whl.metadata (4.0 kB)
Collecting azure-st

  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
presidio-anonymizer 2.2.360 requires cryptography<44.1, but you have cryptography 46.0.3 which is incompatible.


Now that you are logging into Azure, create the resource group and workspace for the new model.

In [4]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Workspace
from azure.identity import DefaultAzureCredential
from azure.mgmt.resource import ResourceManagementClient
import time

# define azure values
SUBSCRIPTION_ID = "a5937ed9-afe1-4645-9cf0-7e50f1e2b2d3"
RESOURCE_GROUP = "t5-model-deployment"
WORKSPACE_NAME = "t5-WB-workspace"
LOCATION = "eastus"

# authenticate
creds = DefaultAzureCredential()

# create resource client
resourceClient = ResourceManagementClient(creds, SUBSCRIPTION_ID)

# register ML provider
provider = resourceClient.providers.register('Microsoft.MachineLearningServices')
while provider.registration_state != 'Registered':
    time.sleep(20)
    provider.resourceClient.providers.get('Microsoft.MachineLearningServices')

# create resource group
rgRes = resourceClient.resource_groups.create_or_update(
    RESOURCE_GROUP,
    {"location": LOCATION}
)

# create workspace
mlClient = MLClient(creds, SUBSCRIPTION_ID, RESOURCE_GROUP)
workspace = Workspace(
    name=WORKSPACE_NAME,
    location=LOCATION,
    description="deployment workspace for WB fine-tuned t5 model"
)
workspace = mlClient.workspaces.begin_create(workspace).result()

print("azure deployment details:")
print(f"\nsubscription id: {SUBSCRIPTION_ID}")
print(f"\nresource group: {RESOURCE_GROUP}")
print(f"\nworkspace name: {WORKSPACE_NAME}")

Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


azure deployment details:

subscription id: a5937ed9-afe1-4645-9cf0-7e50f1e2b2d3

resource group: t5-model-deployment

workspace name: t5-WB-workspace


Azure requires a scoring script that defines how to use the model to make predictions and a conda environment file that lists the endpoint requirements. Before we upload the model, we need to create these files.

In [16]:
import os

# make a new directory for azure files
os.makedirs("azure", exist_ok=True)

scoringScript = """import os, json, torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

def init():
    global model, tokenizer

    # define model/tokenizer path
    modelDir = os.environ["AZUREML_MODEL_DIR"]
    model = AutoModelForSeq2SeqLM.from_pretrained(modelDir)
    tokenizer = AutoTokenizer.from_pretrained(modelDir)
    model.eval()

def run(rawData):
    try:
        # load data from parameter and get question
        data = json.loads(rawData)
        inputs = data["inputs"]

        # tokenize input
        encoded = tokenizer(
            inputs,
            max_length=512,
            truncation=True,
            padding=True,
            return_tensors="pt"
        )

        # generate output
        with torch.no_grad():
            outputs = model.generate(
                **encoded,
                max_length=128,
                num_beams=4,
                early_stopping=True
            )

        predictions = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        return json.dumps({"predictions": predictions})
    
    except Exception as e:
        return json.dumps({"error": str(e)})
"""

# write scoring file to azure directory
with open("azure/score.py", "w") as f:
    f.write(scoringScript)

condaENV = """name: model-env
channels:
    - conda-forge
    - defaults
dependencies:
    - python=3.8
    - pip
    - pip:
        - azureml-defaults
        - transformers==4.35.0
        - torch==2.1.0
        - inference-schema
        - sentencepiece
"""

# write conda file
with open("azure/conda.yaml", "w") as f:
    f.write(condaENV)

Before we create the actual endpoint itself, we must register the necessary resource providers for ML endpoints.

In [None]:
from azure.mgmt.resource import ResourceManagementClient
from azure.identity import DefaultAzureCredential
import time

SUBSCRIPTION_ID = "a5937ed9-afe1-4645-9cf0-7e50f1e2b2d3"

credential = DefaultAzureCredential()
resource_client = ResourceManagementClient(credential, SUBSCRIPTION_ID)

# list of all providers needed for ML endpoints
providersToRegister = [
    'Microsoft.MachineLearningServices',
    'Microsoft.ContainerInstance',
    'Microsoft.Storage',
    'Microsoft.KeyVault',
    'Microsoft.ContainerRegistry',
    'Microsoft.Insights',
    'Microsoft.Compute',
    'Microsoft.Network',
    'Microsoft.Cdn',
    'Microsoft.PolicyInsights'
]

for providerName in providersToRegister:
    provider = resource_client.providers.register(providerName)
    
    # wait for this provider to finish registering
    while provider.registration_state == 'Registering':
        time.sleep(20)
        provider = resource_client.providers.get(providerName)
    
    print(f"{providerName}: {provider.registration_state}\n")

Now that our environment is setup, we can deploy the model to Azure.

In [17]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration
)
from azure.identity import DefaultAzureCredential

# azure values we created earlier
SUBSCRIPTION_ID = "a5937ed9-afe1-4645-9cf0-7e50f1e2b2d3"
RESOURCE_GROUP = "t5-model-deployment"
WORKSPACE_NAME = "t5-WB-workspace"

# connect to workspace
mlClient = MLClient(
    DefaultAzureCredential(),
    subscription_id=SUBSCRIPTION_ID,
    resource_group_name=RESOURCE_GROUP,
    workspace_name=WORKSPACE_NAME
)

# register the model
model = Model(
    path="./flan-t5-small-label-smooth-balanced",
    name="flan-t5-WB-finetuned",
    description="T5 model fine tuned on World Bank Survey Data"
)
registeredModel = mlClient.models.create_or_update(model)
print(f"model registered: {registeredModel.name}")

# create endpoint
endpointName = "t5-endpoint"
endpoint = ManagedOnlineEndpoint(
    name=endpointName,
    description="Endpoint for fine-tuned FLAN-T5",
    auth_mode="key"
)
mlClient.online_endpoints.begin_create_or_update(endpoint).result()
print(f"endpoint created: {endpointName}")

# create deployment
deployment = ManagedOnlineDeployment(
    # azure uses blue/green naming conventions
    # blue represents production endpoint while green is for testing
    name="blue",
    endpoint_name=endpointName,
    model=registeredModel.id,
    code_configuration=CodeConfiguration(
        code="./azure",
        scoring_script="score.py"
    ),
    environment=Environment(
        conda_file="./azure/conda.yaml",
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04"
    ),
    instance_type="Standard_DS2_v2",
    instance_count=1
)
mlClient.online_deployments.begin_create_or_update(deployment).result()
print("deployment created")

# direct traffic to deployment
# we currently don't have a testing version as green so direct all traffic to blue/production
endpoint.traffic = {"blue": 100}
mlClient.online_endpoints.begin_create_or_update(endpoint).result()
print("traffic is configured")

Overriding of current TracerProvider is not allowed


Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Your file exceeds 100 MB. If you experience low speeds, latency, or broken connections, we recommend using the AzCopyv10 tool for this file transfer.

Example: azcopy copy 'C:\Users\moham\Documents\GitHub\WorldBankCapstone\flan-t5-small-label-smooth-balanced' 'https://t5wbworkstorage970d839ad.blob.core.windows.net/azureml-blobstore-dd0e842a-c7a3-4848-9904-a0ed73b0cc9f/LocalUpload/461ad405d3e037c36d42e336257b58c081c05f9a1e2d74d23e354dd852dcb401/flan-t5-small-label-smooth-balanced' 

See https://learn.microsoft.com/azure/storage/common/storage-use-azcopy-v10 for more information.


model registered: flan-t5-WB-finetuned


Instance type Standard_DS2_v2 may be too small for compute resources. Minimum recommended compute SKU is Standard_DS3_v2 for general purpose endpoints. Learn more about SKUs here: https://learn.microsoft.com/azure/machine-learning/referencemanaged-online-endpoints-vm-sku-list
Check: endpoint t5-endpoint exists


endpoint created: t5-endpoint


[32mUploading azure (0.0 MBs): 100%|##########| 1428/1428 [00:00<00:00, 18070.39it/s]
[39m



..................................................................................................................................................................................................................................................................................

HttpResponseError: (ResourceNotReady) User container has crashed or terminated. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-resourcenotready
Code: ResourceNotReady
Message: User container has crashed or terminated. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-resourcenotready