# Demonstration

- This notebook demonstrates key components of our Delta 8 analysis project.
- It shows how we preprocess tweets, extract keywords, analyze sentiment, and generate themes.
- These techniques can be applied to larger datasets for more comprehensive analysis.

## 1. Setup and Imports

In [1]:
# First, let's import the necessary libraries
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk import pos_tag
from nltk.sentiment import SentimentIntensityAnalyzer
from collections import Counter

# Download necessary NLTK data
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')
nltk.download('vader_lexicon')

[nltk_data] Downloading package punkt to
[nltk_data]     /teamspace/studios/this_studio/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /teamspace/studios/this_studio/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /teamspace/studios/this_studio/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /teamspace/studios/this_studio/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

## 2. Sample Data

In [2]:
# Let's create some sample tweets for our demonstration
sample_tweets = [
    "Delta 8 THC offers a milder high compared to traditional marijuana.",
    "Just tried Delta 8 gummies and they helped with my anxiety!",
    "Is Delta 8 legal? Need to check the regulations in my state.",
    "Delta 8 products are becoming popular, but we need more research on long-term effects."
]

## 3. Text Preprocessing

In [3]:
def preprocess_text(text):
    # Tokenize the text
    tokens = word_tokenize(text.lower())
    
    # Remove stop words
    stop_words = set(stopwords.words('english'))
    tokens = [token for token in tokens if token not in stop_words]
    
    # Keep only nouns, verbs, and adjectives
    tokens = [word for word, pos in pos_tag(tokens) if pos.startswith(('N', 'V', 'J'))]
    
    return tokens

## Let's preprocess our sample tweets

In [4]:
preprocessed_tweets = [preprocess_text(tweet) for tweet in sample_tweets]


## 4. Keyword Extraction

In [5]:
def extract_keywords(texts, top_n=5):
    all_words = []
    for text in texts:
        all_words.extend(text)
    return Counter(all_words).most_common(top_n)


## Extract top keywords from our sample

In [6]:
top_keywords = extract_keywords(preprocessed_tweets)

print("\nTop keywords:")
for keyword, count in top_keywords:
    print(f"{keyword}: {count}")


Top keywords:
delta: 3
thc: 1
offers: 1
milder: 1
high: 1


## 5. Sentiment Analysis

In [7]:
def analyze_sentiment(text):
    sia = SentimentIntensityAnalyzer()
    return sia.polarity_scores(text)


In [8]:
# Analyze sentiment for each sample tweet
for tweet in sample_tweets:
    sentiment = analyze_sentiment(tweet)
    print(f"\nTweet: {tweet}")
    print(f"Sentiment: {sentiment}")



Tweet: Delta 8 THC offers a milder high compared to traditional marijuana.
Sentiment: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

Tweet: Just tried Delta 8 gummies and they helped with my anxiety!
Sentiment: {'neg': 0.181, 'neu': 0.819, 'pos': 0.0, 'compound': -0.2481}

Tweet: Is Delta 8 legal? Need to check the regulations in my state.
Sentiment: {'neg': 0.0, 'neu': 0.87, 'pos': 0.13, 'compound': 0.128}

Tweet: Delta 8 products are becoming popular, but we need more research on long-term effects.
Sentiment: {'neg': 0.0, 'neu': 0.863, 'pos': 0.137, 'compound': 0.2263}


## 6. Theme Generation (Simplified)

In [9]:
def generate_themes(keywords):
    themes = {
        "Product": ["delta", "thc", "gummies"],
        "Effects": ["high", "anxiety"],
        "Legality": ["legal", "regulations"],
        "Research": ["effects", "research"]
    }
    
    keyword_themes = {}
    for keyword, _ in keywords:
        for theme, theme_keywords in themes.items():
            if keyword in theme_keywords:
                if theme not in keyword_themes:
                    keyword_themes[theme] = []
                keyword_themes[theme].append(keyword)
    
    return keyword_themes

## Generate themes from our top keywords

In [10]:
themes = generate_themes(top_keywords)

print("\nGenerated Themes:")
for theme, keywords in themes.items():
    print(f"{theme}: {', '.join(keywords)}")



Generated Themes:
Product: delta, thc
Effects: high


## Roundtrip example

In [11]:
def analyze_tweet(tweet):
    preprocessed = preprocess_text(tweet)
    sentiment = analyze_sentiment(tweet)
    return preprocessed, sentiment

print("\nComplete Analysis of a Tweet:")
sample_tweet = "Delta 8 THC is gaining popularity, but we need more research on its long-term effects and legal status."
preprocessed, sentiment = analyze_tweet(sample_tweet)

print(f"Original Tweet: {sample_tweet}")
print(f"Preprocessed: {preprocessed}")
print(f"Sentiment: {sentiment}")


Complete Analysis of a Tweet:
Original Tweet: Delta 8 THC is gaining popularity, but we need more research on its long-term effects and legal status.
Preprocessed: ['delta', 'thc', 'gaining', 'popularity', 'need', 'research', 'long-term', 'effects', 'legal', 'status']
Sentiment: {'neg': 0.0, 'neu': 0.711, 'pos': 0.289, 'compound': 0.5719}


In [12]:
def analyze_tweet(tweet):
    preprocessed = preprocess_text(tweet)
    sentiment = analyze_sentiment(tweet)
    return preprocessed, sentiment

print("\nComplete Analysis of a Tweet:")
sample_tweet = "Delta 8 THC is harsh but it's the only thing I can buy."
preprocessed, sentiment = analyze_tweet(sample_tweet)

print(f"Original Tweet: {sample_tweet}")
print(f"Preprocessed: {preprocessed}")
print(f"Sentiment: {sentiment}")


Complete Analysis of a Tweet:
Original Tweet: Delta 8 THC is harsh but it's the only thing I can buy.
Preprocessed: ['delta', 'thc', 'harsh', 'thing', 'buy']
Sentiment: {'neg': 0.163, 'neu': 0.837, 'pos': 0.0, 'compound': -0.2382}


## Understanding Sentiment Scores

The sentiment scores provided are based on the VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis tool, which is specifically attuned to sentiments expressed in social media. VADER uses a combination of a sentiment lexicon and rule-based analysis to produce four scores: 'neg', 'neu', 'pos', and 'compound'. The 'neg', 'neu', and 'pos' scores represent the proportion of the text that falls into negative, neutral, and positive categories respectively, and they sum to 1. These scores provide a breakdown of the sentiment composition of the text.

The 'compound' score is a unified sentiment measure, computed by summing the valence scores of each word in the lexicon, adjusted according to rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This score is the most useful for determining overall sentiment. Generally, a compound score ≤ -0.05 is considered negative, a score ≥ 0.05 is considered positive, and anything in between is considered neutral. However, these thresholds can be adjusted based on your specific needs and the nature of your data.