In [42]:
import os
from dotenv import load_dotenv
import json
import requests
import time
from openai import AzureOpenAI
import pandas as pd

In [None]:
client = AzureOpenAI(
  azure_endpoint = "",
  api_key= "",
  api_version=""
)

In [None]:
assistant = client.beta.assistants.create(
  model="", # replace with model deployment name.
  instructions="""Classify the tone of a given written entry as positive, negative, mixed, or neutral. Exclude phrases that are related to regular duties unless they indicate a strategic change and its impact. Exclude any phrases that are plans for the future.  Exclude any phrases that only describe current efforts without mentioning impact/outcomes. Provide reasoning, supporting quotes, a score based on the number of positive/negative phrases, and tags for your classification.

# Steps

1. **Read the Entry**: Carefully examine the given text to understand the context, content, and overall message.
2. **Identify Relevant Sections**: Exclude sentences that are part of regular duties unless they showcase a strategic change and its impact. Exclude any phrases that are plans for the future.  Exclude any phrases that only describe current efforts without mentioning impact/outcomes
3. **Determine the Tone**: Analyze the remaining content to determine whether the tone is positive, negative, mixed, or neutral.
4. **Provide Reasoning**: Explain the rationale for your classification based on the content and tone of the relevant sections.
5. **Support with Quotes**: Include specific quotes from the text that support your classification decision.
6. **Score**: Add 1 for every positive phrase, -1 for every negative phrase, and 0 for all else. Report the final score.
7. **Components**: # of positive, # of negative, # of mixed, # of neutral
8. **Provide tag(s)**: Include tag(s) describing the general subject of the narrative (example, "Data quality challenges, drug stockouts, healthcare statistics, etc.)

# Output Format

Provide a brief paragraph including:
- Tone classification (positive, negative, mixed, or neutral).
- Reasoning for the classification decision.
- Supporting quotes from the text.
- Score
- Components
- Tag(s): recommend tags representing the general topic of the narrative (ex. data quality issues, resource issues

# Examples

**Example 1:**

Input: "FMOH ART register/EMR_ART patient level database is the primary data source. This indicator is collected monthly from ART register/ EMR_ART database at health facilities. ICAP collects data directly from the source using standard reporting forms that include all required disaggregation for age. Data is entered into ICAP DSS database where logic checks and data validations are conducted to ensure completeness, consistency, and logic. Site support visits to all ICAP-supported facilities include spot checking of data to ensure completeness and accuracy, and mentorship is provided on data quality. No challenges in reporting this indicator were encountered and there are no data quality issues to report."

Output: 
```
- Classification: Positive
- Reasoning: The text describes the quality and accuracy of the data system.
- Supporting Quotes: "No challenges in reporting this indicator were encountered and there are no data quality issues to report"
- Score: 2
- Components: 2 positive, 5 neutral
- Tag(s): Data quality
```

**Example 2:**

Input: "We saw an increase in patients returning for treatment. But the EMR encountered unexpected downtimes, affecting performance."

Output: 
```
- Classification: Mixed
- Reasoning: The text contains elements of both successful patient outcomes and unexpected issues that caused concern.
- Supporting Quotes: "increase in patients returning", "Routine checks were performed," "encountered unexpected downtimes, affecting performance."
- Score: 0 
- Components: 1 positive, 1 negative phrase
- Tag(s): Data quality, patient return
```

**Example 3:**

Input: "A total of 2492 clients were initiated on PrEP in Q3."

Output:
```
-Classification: Neutral
- Reasoning: The text provides a factual update on the number of clients initiated on PrEP and the project's achievement percentage, without any qualitative assessment or indication of impact beyond the figures.
- Supporting Quotes: "A total of 2492 clients were initiated on PrEP in Q3." "Bring the project achievement to 74% (6,440/8712)."
- Score: 0
- Components: 2 neutral
- Tag(s): Health statistics
``` 


# Notes

- Ensure to separate regular duties from strategic changes or impactful events.
- Exclude any phrases that are plans for the future as they have not happened yet.
- Exclude any phrases that only describe current efforts without mentioning impact/outcomes. 
- Mixed tones should be noted when positive and negative elements coexist in significance.
- If the narrative uses the word achievement and reports a percentage, evaluate >= 100% as positive, and low achievement as negative. If the achievement is close to 100% (use your discretion to figure out how close is good enough), evaluate as positive
""",
  tools=[],
  tool_resources={},
  temperature=0.01,
  top_p=1
)

In [44]:
df = pd.read_csv("NarrativesRaw__2024-11-14_.csv")

df.head()

Unnamed: 0.1,Unnamed: 0,Operating Unit,Org Level,Country,Indicator Bundle,Indicator,Support Type,Funding Agency,Mechanism Code,Implementing Mechanism Name,Period,Narrative
0,1,Angola,Operating Unit,Angola,Health Systems,CARE_CURR,TA,DOD,17397.0,CDU Angola,2015 Q2,The FAA have not authorized sharing their care...
1,2,Angola,Operating Unit,Angola,Health Systems,CARE_NEW,TA,DOD,17397.0,CDU Angola,2015 Q2,The FAA have not authorized sharing their care...
2,3,Angola,Operating Unit,Angola,Prevention,PP_PREV,TA,DOD,17397.0,CDU Angola,2015 Q2,"During this period, the HIV activists have rea..."
3,4,Angola,Operating Unit,Angola,Testing,HTS_TST,TA,DOD,17397.0,CDU Angola,2015 Q2,Monthly reports come to Luanda from all 24 VCT...
4,5,Angola,Operating Unit,Angola,Treatment,TX_CURR,TA,DOD,17397.0,CDU Angola,2015 Q2,The FAA have not authorized sharing their care...


In [None]:
def get_response(user_text):
  # Create a thread
  thread = client.beta.threads.create()

  # Add a user question to the thread
  message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content=user_text
  )

  # Run the thread
  run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
  )

  # Looping until the run completes or fails
  while run.status in ['queued', 'in_progress', 'cancelling']:
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(
      thread_id=thread.id,
      run_id=run.id
    )

  if run.status == 'completed':
    messages = client.beta.threads.messages.list(
      thread_id=thread.id
    )
    #print(messages)
    print(messages.data[0].content[0].text.value)
    return messages.data[0].content[0].text.value

  elif run.status == 'requires_action':
    # the assistant requires calling some functions
    # and submit the tool outputs back to the run
    pass
  else:
    print(run.status)




In [None]:
filtered_df = df[(df['Period'].str.contains("2024"))].copy() 

# Loop through each Narrative and process
for index, row in filtered_df.iterrows():
    narrative = row['Narrative']
    response = get_response(narrative)  # Replace with your actual API call

    # Parse the components from the response
    #value = response['data'][0]['content'][0]['text']['value']
    components = {}
    for line in response.splitlines():
        if ": " in line:  # Check for key-value separator
            key, val = line.split(": ", 1)
            components[key.strip("- ")] = val.strip()

    # Add components to the corresponding row in the filtered DataFrame
    for key, val in components.items():
        filtered_df.loc[index, key] = val  # Add as new columns

    time.sleep(30)  # Respect rate limits if necessary

filtered_df.head()

- Classification: Mixed
- Reasoning: The entry contains both positive outcomes and challenges. The positive aspects include a high number of contacts elicited per index case, a high yield of individuals testing positive, and all individuals identified as HIV positive through index testing being initiated on ART. However, there are challenges mentioned such as low rates of index contacts tested at the health facility level and the use of an outdated form that does not allow for IPV risk assessment.
- Supporting Quotes: "42 individuals tested positive (yield of 35.89%)", "all were initiated on ART during the reporting period", "Low rates of IC contacts tested at the HF"
- Score: 1 (3 positive phrases - 2 negative phrases)
- Components: 3 positive, 2 negative, 0 mixed, 0 neutral
- Tag(s): Health statistics, HIV testing, Data Quality Assessment, Index Case Testing, ART Initiation, Challenges in Health Services
- Classification: Mixed
- Reasoning: The text contains both positive outcomes, s

In [98]:
filtered_df.to_csv("narratives_sentiments.csv")