# **Transcription with Conversation-Level Sentiment Annotations**
Below, we use the Hume and GPT APIs to generate conversation-level sentiment annotations for a Zoom conversation.

> We design a two-part pipeline to visualize Zoom meetings with conversation-level sentiment annotations. We first introduce novel metrics to capture conversation-level sentiments along three axes: comprehension, consensus, and cordiality. To obtain these metrics, we first identify each speaker's individual expressed sentiments during each of their responses. To determine speaker sentiment, we segment Zoom recordings by speaker and feed the video data, audio file (including information on voice prosity), and transcript (text content) of each segment to an off-the-shelf model that outputs a quantitative measure of the extent to which the speaker expresses 48 emotions. Afterward, for each segment, we combine the speaker's top 5 emotions with weights, uniformly sampled facial expressions, and spoken words in an instruction-tuned prompt to a multimodal large language model in order to determine conversation-level metrics.



# Initialization

In [42]:
# Install libraries
!pip install hume
!pip install hume[stream]
!pip install openai
!pip install python-dotenv
!pip install pydub
!pip install ffmpeg
!pip install moviepy
!pip install webvtt-py
!pip install opencv-python




In [1]:
import os
from dotenv import load_dotenv
from openai import OpenAI
import requests
import base64
from pydub import AudioSegment
from hume import HumeBatchClient
import json
import pandas as pd
import matplotlib.pyplot as plt
# from google.colab import userdata
import webvtt
# from google.colab import userdata
import cv2
from moviepy.editor import VideoFileClip
import subprocess

## Performance Experiments

## Load datasets

In [2]:
dataset_directory = './dataset/outputs/merged_all'

# calculate the grouped predicted emotion for each modality