# NLTK VADER: single-pass sentiment on a transcript string
```

**Goal.** Use lightweight sentiment analysis on transcripts to triage support calls.

| What               | Why                                   | Outcome                       |
| ------------------ | ------------------------------------- | ----------------------------- |
| Sentiment analysis | Detect negative/neutral/positive tone | Prioritize problematic calls  |
| VADER (NLTK)       | Pretrained, rule-lexicon hybrid       | Fast, no training data needed |

**Setup.** Install NLTK and fetch required models.

| Step          | Command                                                                       | Note                          |
| ------------- | ----------------------------------------------------------------------------- | ----------------------------- |
| Install       | `pip install nltk`                                                            | Once per environment          |
| Download data | Python: `import nltk; nltk.download('punkt'); nltk.download('vader_lexicon')` | Tokenizer + sentiment lexicon |

**VADER outputs.** What the analyzer returns and how to read it.

| Key        | Meaning                  |  Range | Interpretation                       |
| ---------- | ------------------------ | -----: | ------------------------------------ |
| `neg`      | Negative proportion      |    0–1 | Higher ⇒ more negative tokens        |
| `neu`      | Neutral proportion       |    0–1 | Middle mass                          |
| `pos`      | Positive proportion      |    0–1 | Higher ⇒ more positive tokens        |
| `compound` | Normalized overall score | −1..+1 | ≲−0.05 neg, ≳+0.05 pos, else neutral |

**Minimal code.** Initialize VADER and score a transcript string.


# Instalation

In [None]:
# NLTK VADER: single-pass sentiment on a transcript string
import nltk

In [2]:
# # one-time per environment:
# nltk.download('vader_lexicon')
# nltk.download('punkt')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\herie\AppData\Roaming\nltk_data...
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\herie\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.


True

In [19]:
# nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\herie\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

# Sentiment Analysis with NLTK VADER

NTLTK VADER: single-pass sentiment on a transcript string. It is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

In [4]:
from nltk.sentiment import SentimentIntensityAnalyzer

In [5]:
sid = SentimentIntensityAnalyzer()

In [6]:
text = "I still haven't received my product. The service is terrible."
scores = sid.polarity_scores(text)
print(scores)  # {'neg': ..., 'neu': ..., 'pos': ..., 'compound': ...}
# The 'compound' score is a normalized, weighted composite score that ranges from -1 (most extreme negative) to +1 (most extreme positive).
# A score >= 0.05 is considered positive, <= -0.05 is considered negative, and between -0.05 and 0.05 is considered neutral.

{'neg': 0.279, 'neu': 0.721, 'pos': 0.0, 'compound': -0.4767}


In [7]:
text = "I LOVE EVERYTHING about this! Best purchase ever!!! Amazing!!!"
scores = sid.polarity_scores(text)
print(scores)  # {'neg': ..., 'neu': ..., 'pos': ..., 'compound': ...}
# The 'compound' score is a normalized, weighted composite score that ranges from -1 (most extreme negative) to +1 (most extreme positive).
# A score >= 0.05 is considered positive, <= -0.05 is considered negative, and between -0.05 and 0.05 is considered neutral.

{'neg': 0.0, 'neu': 0.251, 'pos': 0.749, 'compound': 0.9509}


In [10]:
text = "I HATE BEING BI-POLAR ITS AWESOME!"
scores = sid.polarity_scores(text)
print(scores)  # {'neg': ..., 'neu': ..., 'pos': ..., 'compound': ...}
# The 'compound' score is a normalized, weighted composite score that ranges from -1 (most extreme negative) to +1 (most extreme positive).
# A score >= 0.05 is considered positive, <= -0.05 is considered negative, and between -0.05 and 0.05 is considered neutral.

{'neg': 0.334, 'neu': 0.27, 'pos': 0.396, 'compound': 0.1759}


In [14]:
text = """
Hola Heriberto:
Hemos estado revisando los documentos de tus compañeros, para continuar
con su proceso de inscripción, y nos hemos percatado que la credencial
de estudiante de Nuria no cumple con la vigencia pertinente.
En el concurso nos solicitan alumnos que estén cursando la
licenciatura/ingeniería, por lo que solicitamos de la manera más atenta
que envíen otro documento oficial y vigente que acredite su permanencia
en la carrera.
Ejemplo: kardex 2025.

¡Gracias!

Saludos
Comité organizador
"""
scores = sid.polarity_scores(text)
print(scores)  # {'neg': ..., 'neu': ..., 'pos': ..., 'compound': ...}
# The 'compound' score is a normalized, weighted composite score that ranges from -1 (most extreme negative) to +1 (most extreme positive).
# A score >= 0.05 is considered positive, <= -0.05 is considered negative, and between -0.05 and 0.05 is considered neutral. 

{'neg': 0.033, 'neu': 0.967, 'pos': 0.0, 'compound': -0.3595}


```python
def transcribe_audio(filename):
  """Takes a .wav format audio file and transcribes it to text."""

  import speech_recognition as sr
  
  # Setup a recognizer instance
  recognizer = sr.Recognizer()
  
  # Import the audio file and convert to audio data
  audio_file = sr.AudioFile(filename)
  with audio_file as source:
    audio_data = recognizer.record(source)
  
  # Return the transcribed text
  return recognizer.recognize_google(audio_data)

# Test the function
text = transcribe_audio("data/raw/wav/audio1.wav")
```

In [15]:
text = "we are looking at how to pronounce this word how do you say it correctly for reference this is the word of French origin and friendship is said as mayonnaise mayonnaise in English whether it is normally pronounced as mayonnaise mayonnaise"
scores = sid.polarity_scores(text)
print(scores)

{'neg': 0.0, 'neu': 0.932, 'pos': 0.068, 'compound': 0.4404}




**Sentence-level analysis.** Sentiment shifts inside a call, so score per sentence.

| Reason                   | Method                           | Benefit                      |
| ------------------------ | -------------------------------- | ---------------------------- |
| Tone varies by utterance | Split into sentences, score each | Pinpoint troublesome moments |


In [21]:
# Sentence-by-sentence scoring
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.tokenize import sent_tokenize

sid = SentimentIntensityAnalyzer()
transcript = "I still haven't received my product. The service is terrible. Thanks for the update."

rows = []
for s in sent_tokenize(transcript):
    sc = sid.polarity_scores(s)
    rows.append((s, sc["neg"], sc["neu"], sc["pos"], sc["compound"]))

for r in rows:
    print(r)

# Score of the total sentence:
total_scores = sid.polarity_scores(transcript)  
print(total_scores)

("I still haven't received my product.", 0.0, 1.0, 0.0, 0.0)
('The service is terrible.', 0.508, 0.492, 0.0, -0.4767)
('Thanks for the update.', 0.0, 0.508, 0.492, 0.4404)
{'neg': 0.182, 'neu': 0.647, 'pos': 0.171, 'compound': -0.0516}



**ASR → sentiment caveats.** Transcription quality affects scores.

| Issue                     | Effect on sentiment | Mitigation                                        |
| ------------------------- | ------------------- | ------------------------------------------------- |
| Missing negations (“not”) | Flips polarity      | Prefer higher-quality ASR; manual QA for outliers |
| Disfluencies/fillers      | Inflate neutrality  | Clean text or ignore fillers                      |
| Domain terms              | Unseen by lexicon   | Add custom lexicon or rules                       |

**Pipeline.** Practical flow from audio to actionable sentiment.

| Stage      | Tool                                   | Output                             |
| ---------- | -------------------------------------- | ---------------------------------- |
| Transcribe | Your ASR (free or paid)                | Text or sentences                  |
| Score      | VADER on whole text and per sentence   | Compound and token proportions     |
| Act        | Thresholds on compound; flag negatives | Review segments with lowest scores |

**Heuristic thresholds.** Use simple rules for triage.

| Rule                      | Action           |
| ------------------------- | ---------------- |
| `compound ≤ −0.05`        | Flag as negative |
| `−0.05 < compound < 0.05` | Mark neutral     |
| `compound ≥ 0.05`         | Mark positive    |
