# Sentiment Analysis Using Azure's OpenAI Service with Parallel Processing

This notebook performs sentiment analysis on financial auditor report sentences using the GPT-3.5 Turbo model from Azure's OpenAI service. The goal is to categorize the sentiment of each sentence as positive, neutral, or negative. A subset of 1000 samples from the "auditor_sentiment" dataset, available on Hugging Face's datasets hub, is utilized for this analysis.

To optimize the workflow and efficiently handle a large volume of data, a custom Python class, `ParallelAPIRequester`, is used. This class is designed to manage and submit multiple requests in parallel, significantly speeding up the processing time compared to sequential submissions.

The notebook will guide through the steps of loading the dataset, preparing, and sending parallel requests to the API.



In [31]:
import os
import yaml
import json
from datasets import load_dataset, Dataset
from src.Parallel_LLM_API_Requester import ParallelAPIRequesterConfig, ParallelAPIRequester

In [2]:
# Select a random subset of 1000 samples from the training set
dataset = load_dataset("FinanceInc/auditor_sentiment", split='train')
dataset = dataset.shuffle(seed=42).select(range(1000))
dataset

Dataset({
    features: ['sentence', 'label'],
    num_rows: 1000
})

In [29]:
#label: a label corresponding to the class as a string: 'positive' - (2), 'neutral' - (1), or 'negative' - (0)
print(f"sentence: {dataset[350]['sentence']}")
print(f"label: {dataset[350]['label']}")

sentence: In the second quarter of 2010 , the group 's net profit rose to EUR3 .1 m from EUR2 .5 m in April-June 2009 .
label: 2


In [3]:
system_msg = """Assistant predicts the sentiment of a given text.
After users input the text and a JSON template, Assistant will provide the suggested response."""

user_msg = """{text}

Respond the sentiment in JSON according to the template:
{{
    "sentiment": "X"
}}

Replace the 'X' exclusively with either '2' for positive, '1' for neutral or '0' for negative.
Ensure the response strictly follows the format provided above.
"""

In [4]:
# Prepare the requests to be sent to the API
#Note: Make sure in each request, you have 'api_message' dictionary with 'role' (either "system" or "user") and 'content' keys

prepared_requests = []
for example in dataset:
    prepared_requests.append({
                        "text": example['sentence'],
                        "actual_label": example['label'],
                        "api_message": [
                            {"role": "system", "content": system_msg},
                            {"role": "user", "content": user_msg.format(text=example['sentence'])}
                        ]
                    })

print(f"len of prepared_requests: {len(prepared_requests)}")
prepared_requests[:2]

len of prepared_requests: 1000


[{'text': "We have also cut our price projections for paper and packaging , '' an analyst with Goldman Sachs said on a note on Monday .",
  'actual_label': 0,
  'api_message': [{'role': 'system',
    'content': 'Assistant predicts the sentiment of a given text.\nAfter users input the text and a JSON template, Assistant will provide the suggested response.'},
   {'role': 'user',
    'content': 'We have also cut our price projections for paper and packaging , \'\' an analyst with Goldman Sachs said on a note on Monday .\n\nRespond the sentiment in JSON according to the template:\n{\n    "sentiment": "X"\n}\n\nReplace the \'X\' exclusively with either \'2\' for positive, \'1\' for neutral or \'0\' for negative.\nEnsure the response strictly follows the format provided above.\n'}]},
 {'text': 'No financial details were revealed .',
  'actual_label': 1,
  'api_message': [{'role': 'system',
    'content': 'Assistant predicts the sentiment of a given text.\nAfter users input the text and a JS

In [30]:
# Load the parameters and initialize the paralle API requester class

# Note:  Ensure the appropriate values in the config file are configured to avoid API request or token request rejections due to limit overreaches.
with open("src/configs/sentiment_analysis.yml", "r") as f:
    config_dict = yaml.safe_load(f)
config = ParallelAPIRequesterConfig(**config_dict)

llm_sentiment_gpt35 = ParallelAPIRequester(name = "gpt35_llm", config = config)

In [6]:
# Submit all the requests to the class and get the responses
results = llm_sentiment_gpt35.get_responses_parallel(prepared_requests)

Total cost after 100/1000 requests: 0.009487000000000002 dollars
Total cost after 200/1000 requests: 0.019251500000000005 dollars
Total cost after 300/1000 requests: 0.027012999999999992 dollars
Total cost after 400/1000 requests: 0.03617349999999998 dollars
Error while processing: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400, 'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': True, 'severity': 'medium'}, 'violence': {'filtered': False, 'severity': 'safe'}}}}}
Total cost after 500/1000 requests: 0.043130

In [9]:
# Save the results to a json file
with open("generatad_data/sentiment_analysis.json", 'w') as f:
    json.dump(results, f)

In [10]:
# Load the results from the json file
with open("generatad_data/sentiment_analysis.json", 'r') as f:
    results_json = json.load(f)

In [16]:
results_json[30]

{'text': 'Compared with the FTSE 100 index , which rose 94.9 points ( or 1.6 % ) on the day , this was a relative price change of -0.4 % .',
 'actual_label': 0,
 'api__system_message': 'Assistant predicts the sentiment of a given text.\nAfter users input the text and a JSON template, Assistant will provide the suggested response.',
 'api__user_message': 'Compared with the FTSE 100 index , which rose 94.9 points ( or 1.6 % ) on the day , this was a relative price change of -0.4 % .\n\nRespond the sentiment in JSON according to the template:\n{\n    "sentiment": "X"\n}\n\nReplace the \'X\' exclusively with either \'2\' for positive, \'1\' for neutral or \'0\' for negative.\nEnsure the response strictly follows the format provided above.\n',
 'response': {'sentiment': '0'}}

The `ParallelAPIRequester` class efficiently processes all 1000 requests in a very short time (1 minute and 24 seconds), adding a `response` key to each request in the input list, maintaining the relationship between each request and its corresponding response.

The `ParallelAPIRequester` class effectively manages errors through robust mechanisms. It gracefully handles interruptions (example above caused by content policy violations) with automated retries and detailed logging, allowing the process to proceed smoothly despite errors.