# Indian Stock Market Sentiment Analysis from Reddit

This notebook makes it very easy for you to create sentiment analysis data from r/indianstocks subreddit about specific Indian stock tickers.

## Key Features:
- **Stock-Specific Analysis**: Analyzes Hinglish comments for specific Indian stocks (RELIANCE, TCS, HDFC, etc.)
- **Dual Sentiment Analysis**: Uses both TextBlob and VADER for comprehensive sentiment scoring
- **Real Stock Data**: Pulls live stock market data via Yahoo Finance
- **Configurable Parameters**: Easy to change target stock, time period, and analysis depth
- **CSV Export**: Automatically produces two CSV files for further analysis

## Main Parameters to Customize:
- **selectedTickerSymbol**: The NSE stock ticker you want to explore (e.g., 'RELIANCE.NS')
- **howmanysubmissions**: Number of submissions to analyze (takes ~5 seconds each)
- **time_period**: Analysis period ('week', 'month', 'year')
- **min_ticker_mentions**: Minimum mentions required to include a post

## Output Files:
1. `comment_analysis.csv` - Sentiment analysis results
2. `stockticker_history.csv` - Historical stock price data

Perfect for analyzing retail investor sentiment on Indian stocks! 📈

## 1. Environment Setup and Library Installation

First, let's install all the required packages. Uncomment these lines if running for the first time:

## 2. Import Required Libraries

Import all the necessary libraries for Reddit API access, sentiment analysis, and data processing:

In [3]:
# Import all required libraries
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
import praw
import nltk
import os
import re
from dotenv import load_dotenv
from textblob import TextBlob
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import warnings
warnings.filterwarnings('ignore')

# Global variables for tracking
global selectedTickerSymbolCount
global selectedTickerSymbol
selectedTickerSymbolCount = 0

print("Ready for Indian stock market sentiment analysis!")

Ready for Indian stock market sentiment analysis!


## 3. Load Environment Configuration

Load Reddit API credentials from the .env file and set up configuration:

In [4]:
# Load environment variables from .env file
load_dotenv()

# Get Reddit API credentials from environment
reddit_client_id = os.getenv('REDDIT_CLIENT_ID')
reddit_client_secret = os.getenv('REDDIT_CLIENT_SECRET')
reddit_user_agent = os.getenv('REDDIT_USER_AGENT')

# Validate credentials
if not all([reddit_client_id, reddit_client_secret, reddit_user_agent]):
    print("Error: Reddit API credentials not found in .env file!")
    print("Please ensure your .env file contains:")
    print("REDDIT_CLIENT_ID=your_client_id")
    print("REDDIT_CLIENT_SECRET=your_client_secret")
    print("REDDIT_USER_AGENT=your_user_agent")
else:
    print("Reddit API credentials loaded successfully!")
    print(f" Client ID: {reddit_client_id[:8]}...")
    print(f" User Agent: {reddit_user_agent}")

Reddit API credentials loaded successfully!
 Client ID: iaho6cAo...
 User Agent: Common_Attitude_8079


## 4. Initialize Reddit API Connection

Create a Reddit API client using PRAW:

In [5]:
# Create Reddit API client
try:
    reddit = praw.Reddit(
        client_id=reddit_client_id,
        client_secret=reddit_client_secret,
        user_agent=reddit_user_agent
    )
    
    # Test the connection
    test_sub = reddit.subreddit('indianstocks')
    print(f"Successfully connected to Reddit API!")
    print(f"Connected to r/indianstocks - {test_sub.display_name}")
    print(f"Subreddit subscribers: {test_sub.subscribers:,}")
    
except Exception as e:
    print(f" Error connecting to Reddit API: {e}")
    print("Please check your credentials in the .env file")

Version 7.7.1 of praw is outdated. Version 7.8.1 was released Friday October 25, 2024.


Successfully connected to Reddit API!
Connected to r/indianstocks - indianstocks
Subreddit subscribers: 103,528
Subreddit subscribers: 103,528


## 5. Configure Analysis Parameters

Set up the main parameters for your analysis. **CUSTOMIZE THESE VALUES**:

In [None]:
# ===== MAIN CONFIGURATION PARAMETERS =====
# Change these values to customize your analysis

# SELECT YOUR STOCK (NSE symbols - add .NS for Yahoo Finance)
selectedTickerSymbol = 'ICICIBANK.NS'  # Now analyzing ICICI Bank

# TARGET SUBREDDIT (indianstocks is perfect for Indian stocks)
selectedsubreddit = 'indianstocks'

# NUMBER OF SUBMISSIONS TO ANALYZE
howmanysubmissions = 200  # Increased for comprehensive all-time analysis

# TIME PERIOD FOR POSTS
time_period = 'all'  # All-time data for maximum historical coverage

# MINIMUM TICKER MENTIONS (filter out posts with fewer mentions)
min_ticker_mentions = 1

# ANALYSIS SETTINGS
analyze_comments = True  # Set to False to analyze only post titles
max_comments_per_post = 100  # Increased for more Hinglish content
hinglish_only = True  # Only process Hinglish content
min_hinglish_score = 2  # Minimum Hinglish words required

# Indian stock tickers and their common name variations
stock_variations = {
    'RELIANCE.NS': ['reliance', 'ril', 'mukesh ambani'],
    'TCS.NS': ['tcs', 'tata consultancy', 'tata consulting'],
    'HDFCBANK.NS': ['hdfc bank', 'hdfc', 'hdfcbank'],
    'INFY.NS': ['infosys', 'infy', 'infy'],
    'ICICIBANK.NS': ['icici bank', 'icici', 'icicibank'],
    'SBIN.NS': ['sbi', 'state bank', 'sbin'],
    'BHARTIARTL.NS': ['airtel', 'bharti airtel', 'bharti'],
    'ADANIENT.NS': ['adani', 'adani enterprises', 'gautam adani']
}

print("ANALYSIS CONFIGURATION")
print("=" * 40)
print(f"Target Stock: {selectedTickerSymbol}")
print(f"Subreddit: r/{selectedsubreddit}")
print(f"Submissions to analyze: {howmanysubmissions}")
print(f"Time period: {time_period} (more than 1 year)")
print(f"Analyze comments: {analyze_comments}")
print(f"Min ticker mentions: {min_ticker_mentions}")
print(f"Hinglish only: {hinglish_only}")
print(f"Min Hinglish score: {min_hinglish_score}")
print("=" * 40)

ANALYSIS CONFIGURATION
Target Stock: ADANIENT.NS
Subreddit: r/indianstocks
Submissions to analyze: 200
Time period: all (more than 1 year)
Analyze comments: True
Min ticker mentions: 1
Hinglish only: True
Min Hinglish score: 2


## 6. Define Sentiment Analysis Functions

Create functions for sentiment analysis using both TextBlob and VADER:

In [7]:
# Download required NLTK data first
import nltk
print("Downloading NLTK data...")
try:
    nltk.download('vader_lexicon', quiet=True)
    nltk.download('punkt', quiet=True)
    print("NLTK data downloaded successfully!")
except Exception as e:
    print(f"NLTK data download failed: {e}")

# Initialize VADER sentiment analyzer
try:
    sia = SentimentIntensityAnalyzer()
    print("VADER sentiment analyzer initialized!")
except Exception as e:
    print(f"Failed to initialize VADER: {e}")
    sia = None

def text_blob_sentiment(text, sub_entries_textblob):
    """
    Sentiment analysis using TextBlob
    Returns: 'Positive', 'Negative', or 'Neutral'
    """
    try:
        analysis = TextBlob(str(text))
        polarity = analysis.sentiment.polarity
        
        if polarity > 0.1:  # More lenient threshold for positive
            sub_entries_textblob['positive'] += 1
            return 'Positive'
        elif polarity < -0.1:  # More lenient threshold for negative
            sub_entries_textblob['negative'] += 1
            return 'Negative'
        else:
            sub_entries_textblob['neutral'] += 1
            return 'Neutral'
    except:
        sub_entries_textblob['neutral'] += 1
        return 'Neutral'

def nltk_sentiment(text, sub_entries_nltk):
    """
    Sentiment analysis using VADER (NLTK)
    Returns: 'Positive', 'Negative', or 'Neutral'
    """
    try:
        if sia is None:
            sub_entries_nltk['neutral'] += 1
            return 'Neutral'
            
        vs = sia.polarity_scores(str(text))
        compound_score = vs['compound']
        
        if compound_score >= 0.05:  # Positive sentiment
            sub_entries_nltk['positive'] += 1
            return 'Positive'
        elif compound_score <= -0.05:  # Negative sentiment
            sub_entries_nltk['negative'] += 1
            return 'Negative'
        else:  # Neutral sentiment
            sub_entries_nltk['neutral'] += 1
            return 'Neutral'
    except:
        sub_entries_nltk['neutral'] += 1
        return 'Neutral'

def is_hinglish(text):
    """
    Enhanced Hinglish detection with scoring system
    Returns (is_hinglish_boolean, hinglish_score)
    """
    hinglish_words = [
        # Basic Hindi words
        'hai', 'hain', 'kar', 'kya', 'aur', 'bhi', 'main', 'yeh', 'woh', 'jo',
        'abhi', 'phir', 'bhai', 'yaar', 'achha', 'bura', 'sahi', 'galat',
        # Financial Hindi terms
        'paisa', 'lakh', 'crore', 'khareed', 'bech', 'munafa', 'nuksan',
        'gira', 'gaya', 'jayega', 'badhega', 'giregi', 'upar', 'niche',
        # Common Hinglish expressions
        'kaise', 'kahan', 'kab', 'kyun', 'koi', 'sabse', 'zyada', 'kam',
        'bhot', 'bahut', 'thoda', 'pura', 'sab', 'kuch', 'aise', 'waise',
        # Market specific
        'stock', 'share', 'market', 'trade', 'buy', 'sell', 'profit', 'loss',
        'rupee', 'rs', 'inr', 'portfolio', 'invest', 'investment',
        # Sentiment words
        'mast', 'zabardast', 'bakwas', 'bekar', 'kamaal', 'shandar'
    ]
    
    if not text:
        return False, 0
        
    text_lower = str(text).lower()
    hinglish_score = sum(1 for word in hinglish_words if word in text_lower)
    
    return hinglish_score >= 2, hinglish_score

def get_sentiment_label(textblob_sentiment, vader_sentiment):
    """
    Combine TextBlob and VADER to create final sentiment label
    Returns: 'bullish', 'bearish', or 'neutral'
    """
    # Create scoring system
    score = 0
    
    # VADER scoring
    if vader_sentiment == 'Positive':
        score += 1
    elif vader_sentiment == 'Negative':
        score -= 1
    
    # TextBlob scoring  
    if textblob_sentiment == 'Positive':
        score += 1
    elif textblob_sentiment == 'Negative':
        score -= 1
    
    # Final label based on combined score
    if score > 0:
        return 'bullish'
    elif score < 0:
        return 'bearish'
    else:
        return 'neutral'

print("Sentiment analysis functions defined!")
print("Using TextBlob and VADER for comprehensive sentiment scoring")
print("Enhanced Hinglish detection with scoring system")
print("Added combined sentiment labeling (bullish/bearish/neutral)")

Downloading NLTK data...
NLTK data downloaded successfully!
VADER sentiment analyzer initialized!
Sentiment analysis functions defined!
Using TextBlob and VADER for comprehensive sentiment scoring
Enhanced Hinglish detection with scoring system
Added combined sentiment labeling (bullish/bearish/neutral)
NLTK data downloaded successfully!
VADER sentiment analyzer initialized!
Sentiment analysis functions defined!
Using TextBlob and VADER for comprehensive sentiment scoring
Enhanced Hinglish detection with scoring system
Added combined sentiment labeling (bullish/bearish/neutral)


## 7. Implement Comment Processing Functions

Functions to recursively process Reddit comments and count ticker mentions:

In [8]:
def count_ticker_mentions(text, ticker_symbol):
    """
    Count mentions of the ticker symbol and its variations in text
    """
    global selectedTickerSymbolCount
    
    if not text:
        return 0
    
    text_lower = str(text).lower()
    count = 0
    
    # Get the base ticker (remove .NS suffix for searching)
    base_ticker = ticker_symbol.replace('.NS', '').lower()
    
    # Count direct ticker mentions
    count += text_lower.count(base_ticker)
    
    # Count variations if they exist in our mapping
    if ticker_symbol in stock_variations:
        for variation in stock_variations[ticker_symbol]:
            count += text_lower.count(variation.lower())
    
    selectedTickerSymbolCount += count
    return count

def process_comment_replies(top_level_comment, sub_entries_textblob, sub_entries_nltk, ticker_symbol):
    """
    Recursively process comment replies
    """
    try:
        if hasattr(top_level_comment, 'replies') and len(top_level_comment.replies) > 0:
            for comment in top_level_comment.replies:
                try:
                    if hasattr(comment, 'body') and comment.body not in ['[deleted]', '[removed]']:
                        # Analyze sentiment
                        text_blob_sentiment(comment.body, sub_entries_textblob)
                        nltk_sentiment(comment.body, sub_entries_nltk)
                        
                        # Count ticker mentions
                        count_ticker_mentions(comment.body, ticker_symbol)
                        
                        # Process nested replies
                        process_comment_replies(comment, sub_entries_textblob, sub_entries_nltk, ticker_symbol)
                        
                except Exception as e:
                    continue
    except Exception as e:
        pass

def clean_text(text):
    """
    Clean text for better analysis
    """
    if not text:
        return ""
    
    # Remove URLs
    text = re.sub(r'http\S+|www\S+|https\S+', '', str(text), flags=re.MULTILINE)
    
    # Remove excessive whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

print("Comment processing functions defined!")
print("Ready to recursively analyze Reddit comment threads")
print("Ticker mention counting with variations included")

Comment processing functions defined!
Ready to recursively analyze Reddit comment threads
Ticker mention counting with variations included


## 8. Fetch Stock Market Data

Use Yahoo Finance to get historical stock data for the selected ticker:

In [23]:
# Fetch stock market data using Yahoo Finance
print(f"Fetching stock data for {selectedTickerSymbol}...")

try:
    # Create ticker object
    selected_ticker = yf.Ticker(selectedTickerSymbol)
    
    # Get historical data (last 1 year for context)
    stock_history = selected_ticker.history(period="1y")
    
    # Get basic info
    stock_info = selected_ticker.info
    
    print("Stock data fetched successfully!")
    print("=" * 50)
    print(f"Company: {stock_info.get('longName', 'N/A')}")
    print(f"Sector: {stock_info.get('sector', 'N/A')}")
    print(f"Exchange: {stock_info.get('exchange', 'N/A')}")
    print(f"Market Cap: Rs{stock_info.get('marketCap', 0):,}")
    print("=" * 50)
    
    # Display recent price data
    if not stock_history.empty:
        latest_price = stock_history['Close'].iloc[-1]
        price_change = stock_history['Close'].iloc[-1] - stock_history['Close'].iloc[-2]
        price_change_pct = (price_change / stock_history['Close'].iloc[-2]) * 100
        
        print(f"Latest Price: Rs{latest_price:.2f}")
        print(f"Price Change: Rs{price_change:.2f} ({price_change_pct:+.2f}%)")
        print(f"Data Range: {stock_history.index[0].date()} to {stock_history.index[-1].date()}")
        print(f"Total Records: {len(stock_history)}")
    else:
        print(" No stock price data available")
        
except Exception as e:
    print(f" Error fetching stock data: {e}")
    stock_history = pd.DataFrame()
    stock_info = {}

Fetching stock data for ADANIENT.NS...
Stock data fetched successfully!
Company: Adani Enterprises Limited
Sector: Energy
Exchange: NSI
Market Cap: Rs2,914,012,364,800
Latest Price: Rs2524.10
Price Change: Rs-18.10 (-0.71%)
Data Range: 2024-10-08 to 2025-10-08
Total Records: 252
Stock data fetched successfully!
Company: Adani Enterprises Limited
Sector: Energy
Exchange: NSI
Market Cap: Rs2,914,012,364,800
Latest Price: Rs2524.10
Price Change: Rs-18.10 (-0.71%)
Data Range: 2024-10-08 to 2025-10-08
Total Records: 252


## 9. Process Reddit Submissions

Main analysis loop - process Reddit posts and comments:

## 9. Extract Individual Comments

Extract and analyze individual comments from posts instead of just post titles:

In [24]:
# Extract individual comments from Reddit posts for detailed sentiment analysis
print(f"Starting comment extraction from r/{selectedsubreddit} for {selectedTickerSymbol}")
print("Extracting individual Hinglish comments with sentiment labels")
print("=" * 60)

# Get Reddit submissions based on time period  
try:
    if time_period == 'week':
        submissions = reddit.subreddit(selectedsubreddit).top('week', limit=howmanysubmissions)
    elif time_period == 'month':
        submissions = reddit.subreddit(selectedsubreddit).top('month', limit=howmanysubmissions)
    elif time_period == 'year':
        submissions = reddit.subreddit(selectedsubreddit).top('year', limit=howmanysubmissions)
    else:
        submissions = reddit.subreddit(selectedsubreddit).top('all', limit=howmanysubmissions)
    
    # Initialize comments dataframe
    comments_df = pd.DataFrame()
    
    # Analysis counters
    submission_counter = 1
    total_comments_extracted = 0
    hinglish_comments_found = 0
    
    print(f"Processing {howmanysubmissions} posts to extract individual comments...")
    print(" This may take several minutes...\n")
    
    # Process each submission
    for submission in submissions:
        try:
            # Clean title and get basic info
            clean_title = clean_text(submission.title)
            post_date = datetime.fromtimestamp(submission.created_utc)
            
            print(f"Post {submission_counter}: {clean_title[:60]}...")
            submission_counter += 1
            
            # Check if post title mentions our ticker
            post_ticker_mentions = 0
            selectedTickerSymbolCount = 0
            post_ticker_mentions = count_ticker_mentions(clean_title, selectedTickerSymbol)
            
            # Process comments from this post
            try:
                submission.comments.replace_more(limit=0)
                post_comments_processed = 0
                
                for comment in submission.comments[:max_comments_per_post]:
                    try:
                        if hasattr(comment, 'body') and comment.body not in ['[deleted]', '[removed]']:
                            clean_comment = clean_text(comment.body)
                            
                            # Skip very short comments
                            if len(clean_comment.split()) < 3:
                                continue
                            
                            # Check for Hinglish in this comment
                            is_hinglish_comment, hinglish_score = is_hinglish(clean_comment)
                            
                            # Check for ticker mentions in comment
                            selectedTickerSymbolCount = 0
                            comment_ticker_mentions = count_ticker_mentions(clean_comment, selectedTickerSymbol)
                            
                            # Only process if comment is Hinglish OR mentions ticker (based on your filter settings)
                            should_include = True
                            if hinglish_only:
                                should_include = is_hinglish_comment
                            
                            if should_include and (comment_ticker_mentions > 0 or post_ticker_mentions > 0):
                                # Analyze sentiment for this comment
                                comment_textblob = {'negative': 0, 'positive': 0, 'neutral': 0}
                                comment_nltk = {'negative': 0, 'positive': 0, 'neutral': 0}
                                
                                tb_sentiment = text_blob_sentiment(clean_comment, comment_textblob)
                                vader_sentiment = nltk_sentiment(clean_comment, comment_nltk)
                                combined_sentiment = get_sentiment_label(tb_sentiment, vader_sentiment)
                                
                                # Create comment record
                                comment_record = {
                                    'Post_Title': clean_title,
                                    'Post_ID': submission.id,
                                    'Post_Date': post_date.strftime('%Y-%m-%d'),
                                    'Post_Score': submission.score,
                                    'Comment_Text': clean_comment,
                                    'Comment_ID': comment.id,
                                    'Comment_Author': str(comment.author) if comment.author else '[deleted]',
                                    'Comment_Score': comment.score,
                                    'Comment_Date': datetime.fromtimestamp(comment.created_utc).strftime('%Y-%m-%d'),
                                    'Ticker': selectedTickerSymbol,
                                    'Ticker_Mentions_In_Comment': comment_ticker_mentions,
                                    'Ticker_Mentions_In_Post': post_ticker_mentions,
                                    'Is_Hinglish': is_hinglish_comment,
                                    'Hinglish_Score': hinglish_score,
                                    'VADER_Sentiment': vader_sentiment,
                                    'TextBlob_Sentiment': tb_sentiment,
                                    'Combined_Sentiment': combined_sentiment,
                                    'VADER_Compound': comment_nltk.get('positive', 0) - comment_nltk.get('negative', 0),
                                    'TextBlob_Polarity': comment_textblob.get('positive', 0) - comment_textblob.get('negative', 0)
                                }
                                
                                # Append to comments dataframe
                                comments_df = pd.concat([comments_df, pd.DataFrame([comment_record])], ignore_index=True)
                                
                                total_comments_extracted += 1
                                if is_hinglish_comment:
                                    hinglish_comments_found += 1
                                
                                post_comments_processed += 1
                                
                    except Exception as e:
                        continue
                
                print(f"   Extracted {post_comments_processed} relevant comments")
                
            except Exception as e:
                print(f"   ⚠️ Error processing comments: {e}")
            
            print()  # Empty line for readability
            
        except Exception as e:
            print(f"    Error processing submission: {e}")
            continue
    
    print("Comment extraction completed!")
    print("=" * 60)
    print(f"COMMENT EXTRACTION SUMMARY:")
    print(f"    Total posts processed: {submission_counter - 1}")
    print(f"    Total comments extracted: {total_comments_extracted}")
    print(f"   Hinglish comments found: {hinglish_comments_found}")
    print(f"    Final comments dataset size: {len(comments_df)} records")
    
    # Show sentiment breakdown if we have data
    if not comments_df.empty:
        sentiment_counts = comments_df['Combined_Sentiment'].value_counts()
        print(f"\n📊 COMMENT SENTIMENT BREAKDOWN:")
        for sentiment, count in sentiment_counts.items():
            print(f"    {sentiment.title()}: {count} ({count/len(comments_df)*100:.1f}%)")
        
        hinglish_percentage = (comments_df['Is_Hinglish'].sum() / len(comments_df)) * 100
        print(f"\n🇮🇳 Hinglish Content: {hinglish_percentage:.1f}% of extracted comments")
    
except Exception as e:
    print(f" Error during comment extraction: {e}")
    comments_df = pd.DataFrame()

Starting comment extraction from r/indianstocks for ADANIENT.NS
Extracting individual Hinglish comments with sentiment labels
Processing 200 posts to extract individual comments...
 This may take several minutes...

Post 1: This is what real ball looks like.....
Post 1: This is what real ball looks like.....
   Extracted 0 relevant comments

Post 2: Telegram Trader's...
   Extracted 0 relevant comments

Post 2: Telegram Trader's...
   Extracted 0 relevant comments

Post 3: Never forget...
   Extracted 0 relevant comments

Post 3: Never forget...
   Extracted 0 relevant comments

Post 4: I didn’t touch my stocks for 4 years and this is the result…...
   Extracted 0 relevant comments

Post 4: I didn’t touch my stocks for 4 years and this is the result…...
   Extracted 0 relevant comments

Post 5: OMG Now What 😱...
   Extracted 0 relevant comments

Post 5: OMG Now What 😱...
   Extracted 0 relevant comments

Post 6: Indians are getting out of poverty 🇮🇳👍...
   Extracted 0 relevant comments

## 10. Display and Export Comments Data

Show sample comments and export the individual comments dataset:

In [25]:
# Display and export the extracted comments
if not comments_df.empty:
    print("INDIVIDUAL COMMENTS ANALYSIS RESULTS")
    print("=" * 60)
    
    # Show sample comments by sentiment
    print("SAMPLE COMMENTS BY SENTIMENT:")
    print("-" * 40)
    
    for sentiment in ['bullish', 'bearish', 'neutral']:
        sentiment_comments = comments_df[comments_df['Combined_Sentiment'] == sentiment]
        if not sentiment_comments.empty:
            print(f"\n{sentiment.upper()} Comments ({len(sentiment_comments)} total):")
            for idx, row in sentiment_comments.head(2).iterrows():
                print(f"  \"{row['Comment_Text'][:100]}...\"")
                print(f"     {row['Comment_Date']} | Score: {row['Comment_Score']} | Hinglish: {row['Is_Hinglish']}")
    
    # Show top Hinglish comments
    hinglish_comments = comments_df[comments_df['Is_Hinglish'] == True]
    if not hinglish_comments.empty:
        print(f"\nTOP HINGLISH COMMENTS (Showing 3 examples):")
        print("-" * 40)
        top_hinglish = hinglish_comments.nlargest(3, 'Hinglish_Score')
        for idx, row in top_hinglish.iterrows():
            print(f"\"{row['Comment_Text'][:80]}...\"")
            print(f"   Sentiment: {row['Combined_Sentiment']} | Score: {row['Hinglish_Score']} | Date: {row['Comment_Date']}")
            print()
    
    # Display dataset info
    print(f"COMMENTS DATASET PREVIEW:")
    print("=" * 60)
    display_cols = ['Comment_Text', 'Combined_Sentiment', 'Is_Hinglish', 'Hinglish_Score', 'Comment_Date']
    preview_df = comments_df[display_cols].head(3).copy()
    # Truncate comment text for display
    preview_df['Comment_Text'] = preview_df['Comment_Text'].str[:50] + '...'
    print(preview_df.to_string(index=False))
    
    # Export comments to CSV
    print(f"\n💾 EXPORTING COMMENTS DATA:")
    print("=" * 30)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    base_ticker = selectedTickerSymbol.replace('.NS', '')
    comments_filename = f"{base_ticker}_individual_comments_{timestamp}.csv"
    
    # Export all comments
    comments_df.to_csv(comments_filename, index=False)
    print(f"Comments data saved: {comments_filename}")
    print(f"   Total comments: {len(comments_df)}")
    print(f"   Columns: {len(comments_df.columns)}")
    
    # Export only Hinglish comments if available
    if not hinglish_comments.empty:
        hinglish_filename = f"{base_ticker}_hinglish_comments_only_{timestamp}.csv"
        hinglish_comments.to_csv(hinglish_filename, index=False)
        print(f"Hinglish-only data saved: {hinglish_filename}")
        print(f"   Hinglish comments: {len(hinglish_comments)}")
    
    print(f"\n🎯 FINAL COMMENTS SUMMARY:")
    print(f"   📝 Individual comments extracted: {len(comments_df)}")
    print(f"   🇮🇳 Pure Hinglish comments: {len(hinglish_comments)}")
    print(f"   📅 Date range: {comments_df['Comment_Date'].min()} to {comments_df['Comment_Date'].max()}")
    
    # Sentiment distribution
    sentiment_dist = comments_df['Combined_Sentiment'].value_counts()
    print(f"   📊 Sentiment distribution:")
    for sentiment, count in sentiment_dist.items():
        print(f"      {sentiment.title()}: {count} ({count/len(comments_df)*100:.1f}%)")

else:
    print("No comments found! Try:")
    print("   - Reducing hinglish_only to False")
    print("   - Increasing howmanysubmissions")  
    print("   - Changing time_period to 'year' or 'all'")
    print("   - Reducing min_hinglish_score")

INDIVIDUAL COMMENTS ANALYSIS RESULTS
SAMPLE COMMENTS BY SENTIMENT:
----------------------------------------

BULLISH Comments (12 total):
  "Doubled my money in adani stocks from Jan 2023 to Dec 2023. Exited with all profit as I know it will..."
     2023-12-31 | Score: 2 | Hinglish: True
  "If you knew what you were doing, then you should average them as they break each support level. But ..."
     2025-02-19 | Score: 1 | Hinglish: True

BEARISH Comments (5 total):
  "Massive! Adani green, when did you buy at 11? Also do you have other portfolios?..."
     2024-11-07 | Score: 2 | Hinglish: True
  "Bad portfolio never buy stocks of same company if anything happens to Adani all will crash simultane..."
     2024-11-08 | Score: 2 | Hinglish: True

NEUTRAL Comments (2 total):
  "50% portfolio for adani stock 🙌😂..."
     2025-02-09 | Score: 1 | Hinglish: True
  "Sell Adani ent and Kalyan Jewellers..."
     2025-02-08 | Score: 1 | Hinglish: True

TOP HINGLISH COMMENTS (Showing 3 examples):


In [26]:
# Main analysis - process Reddit submissions with enhanced Hinglish filtering
print(f"Starting enhanced analysis of r/{selectedsubreddit} for {selectedTickerSymbol}")
print("Focusing on Hinglish content with sentiment labeling")
print("=" * 60)

# Get Reddit submissions based on time period  
try:
    if time_period == 'week':
        submissions = reddit.subreddit(selectedsubreddit).top('week', limit=howmanysubmissions)
    elif time_period == 'month':
        submissions = reddit.subreddit(selectedsubreddit).top('month', limit=howmanysubmissions)
    elif time_period == 'year':
        submissions = reddit.subreddit(selectedsubreddit).top('year', limit=howmanysubmissions)
    else:
        submissions = reddit.subreddit(selectedsubreddit).top('all', limit=howmanysubmissions)
    
    # Initialize results dataframe
    results_df = pd.DataFrame()
    
    # Analysis counters
    submission_counter = 1
    total_posts_with_ticker = 0
    total_ticker_mentions = 0
    hinglish_posts = 0
    
    print(f"Processing {howmanysubmissions} submissions for Hinglish content...")
    print(" This may take several minutes...\n")
    
    # Process each submission
    for submission in submissions:
        try:
            # Reset global counter for this submission
            selectedTickerSymbolCount = 0
            
            # Initialize sentiment counters
            sub_entries_textblob = {'negative': 0, 'positive': 0, 'neutral': 0}
            sub_entries_nltk = {'negative': 0, 'positive': 0, 'neutral': 0}
            
            # Clean title and analyze
            clean_title = clean_text(submission.title)
            
            print(f"Post {submission_counter}: {clean_title[:60]}...")
            submission_counter += 1
            
            # Analyze title sentiment
            text_blob_sentiment(clean_title, sub_entries_textblob)
            nltk_sentiment(clean_title, sub_entries_nltk)
            
            # Count ticker mentions in title
            count_ticker_mentions(clean_title, selectedTickerSymbol)
            
            # Check for Hinglish content in title
            title_hinglish, title_hinglish_score = is_hinglish(clean_title)
            
            # Check comments for Hinglish and sentiment if enabled
            comment_hinglish_scores = []
            if analyze_comments:
                try:
                    submission.comments.replace_more(limit=0)
                    processed_comments = 0
                    
                    # First pass: check for Hinglish in first 10 comments
                    for comment in submission.comments[:10]:
                        try:
                            if hasattr(comment, 'body') and comment.body not in ['[deleted]', '[removed]']:
                                _, comment_score = is_hinglish(comment.body)
                                if comment_score > 0:
                                    comment_hinglish_scores.append(comment_score)
                        except:
                            continue
                    
                    # Calculate total Hinglish score early
                    total_hinglish_score = title_hinglish_score + sum(comment_hinglish_scores)
                    is_hinglish_content = total_hinglish_score >= 2
                    
                    # Only process all comments if content has potential
                    if selectedTickerSymbolCount >= min_ticker_mentions or is_hinglish_content:
                        for comment in submission.comments[:max_comments_per_post]:
                            try:
                                if hasattr(comment, 'body') and comment.body not in ['[deleted]', '[removed]']:
                                    clean_comment = clean_text(comment.body)
                                    
                                    # Analyze comment sentiment
                                    text_blob_sentiment(clean_comment, sub_entries_textblob)
                                    nltk_sentiment(clean_comment, sub_entries_nltk)
                                    
                                    # Count ticker mentions
                                    count_ticker_mentions(clean_comment, selectedTickerSymbol)
                                    
                                    # Process replies
                                    process_comment_replies(comment, sub_entries_textblob, sub_entries_nltk, selectedTickerSymbol)
                                    
                                    processed_comments += 1
                            except:
                                continue
                        
                        print(f"   Processed {processed_comments} comments")
                    
                except Exception as e:
                    print(f"   ⚠️ Error processing comments: {e}")
                    total_hinglish_score = title_hinglish_score
            else:
                total_hinglish_score = title_hinglish_score
            
            # Final Hinglish determination
            is_hinglish_post = total_hinglish_score >= 2
            
            # Determine if we should record this post
            should_process = selectedTickerSymbolCount >= min_ticker_mentions
            if hinglish_only:
                should_process = should_process and is_hinglish_post
            
            if should_process:
                total_posts_with_ticker += 1
                total_ticker_mentions += selectedTickerSymbolCount
                
                if is_hinglish_post:
                    hinglish_posts += 1
                
                # Get post timestamp
                post_date = datetime.fromtimestamp(submission.created_utc)
                
                # Determine overall sentiment labels
                vader_sentiment = 'Neutral'
                if sub_entries_nltk.get('positive', 0) > sub_entries_nltk.get('negative', 0):
                    vader_sentiment = 'Positive'
                elif sub_entries_nltk.get('negative', 0) > sub_entries_nltk.get('positive', 0):
                    vader_sentiment = 'Negative'
                
                textblob_sentiment = 'Neutral'
                if sub_entries_textblob.get('positive', 0) > sub_entries_textblob.get('negative', 0):
                    textblob_sentiment = 'Positive'
                elif sub_entries_textblob.get('negative', 0) > sub_entries_textblob.get('positive', 0):
                    textblob_sentiment = 'Negative'
                
                # Get combined sentiment label
                combined_sentiment = get_sentiment_label(textblob_sentiment, vader_sentiment)
                
                # Create record with enhanced features
                record = {
                    'Title': clean_title,
                    'Ticker': selectedTickerSymbol,
                    'Date': post_date.strftime('%Y-%m-%d'),
                    'DateTime': post_date,
                    'Post_ID': submission.id,
                    'Score': submission.score,
                    'Num_Comments': submission.num_comments,
                    'Author': str(submission.author) if submission.author else '[deleted]',
                    'NumberOfTickerMentions': selectedTickerSymbolCount,
                    'Is_Hinglish': is_hinglish_post,
                    'Hinglish_Score': total_hinglish_score,
                    'VADER_Sentiment': vader_sentiment,
                    'TextBlob_Sentiment': textblob_sentiment,
                    'Combined_Sentiment': combined_sentiment,
                    'VADER_Negative': sub_entries_nltk.get('negative', 0),
                    'VADER_Positive': sub_entries_nltk.get('positive', 0),
                    'VADER_Neutral': sub_entries_nltk.get('neutral', 0),
                    'TextBlob_Negative': sub_entries_textblob.get('negative', 0),
                    'TextBlob_Positive': sub_entries_textblob.get('positive', 0),
                    'TextBlob_Neutral': sub_entries_textblob.get('neutral', 0)
                }
                
                # Append to results
                results_df = pd.concat([results_df, pd.DataFrame([record])], ignore_index=True)
                
                hinglish_status = f"(Hinglish score: {total_hinglish_score})" if is_hinglish_post else "(Not Hinglish)"
                print(f"   {selectedTickerSymbolCount} mentions, sentiment: {combined_sentiment} {hinglish_status}")
            else:
                if selectedTickerSymbolCount < min_ticker_mentions:
                    print(f"   Only {selectedTickerSymbolCount} mentions - Skipped")
                elif hinglish_only and not is_hinglish_post:
                    print(f"   Not Hinglish content (score: {total_hinglish_score}) - Skipped")
                else:
                    print(f"   Skipped")
            
            print()  # Empty line for readability
            
        except Exception as e:
            print(f"    Error processing submission: {e}")
            continue
    
    print("Enhanced analysis completed!")
    print("=" * 60)
    print(f"ENHANCED ANALYSIS SUMMARY:")
    print(f"    Total submissions processed: {submission_counter - 1}")
    print(f"    Posts mentioning {selectedTickerSymbol}: {total_posts_with_ticker}")
    print(f"    Total ticker mentions: {total_ticker_mentions}")
    print(f"   Hinglish posts found: {hinglish_posts}")
    print(f"    Final Hinglish dataset size: {len(results_df)} records")
    
    # Show sentiment breakdown if we have data
    if not results_df.empty:
        sentiment_counts = results_df['Combined_Sentiment'].value_counts()
        print(f"\n📊 SENTIMENT BREAKDOWN:")
        for sentiment, count in sentiment_counts.items():
            print(f"    {sentiment.title()}: {count} ({count/len(results_df)*100:.1f}%)")
    
except Exception as e:
    print(f" Error during analysis: {e}")
    results_df = pd.DataFrame()

Starting enhanced analysis of r/indianstocks for ADANIENT.NS
Focusing on Hinglish content with sentiment labeling
Processing 200 submissions for Hinglish content...
 This may take several minutes...

Post 1: This is what real ball looks like.....
Post 1: This is what real ball looks like.....
   Processed 63 comments
   Only 0 mentions - Skipped

Post 2: Telegram Trader's...
   Processed 63 comments
   Only 0 mentions - Skipped

Post 2: Telegram Trader's...
   Processed 16 comments
   Only 0 mentions - Skipped

Post 3: Never forget...
   Processed 16 comments
   Only 0 mentions - Skipped

Post 3: Never forget...
   Processed 4 comments
   Only 0 mentions - Skipped

Post 4: I didn’t touch my stocks for 4 years and this is the result…...
   Processed 4 comments
   Only 0 mentions - Skipped

Post 4: I didn’t touch my stocks for 4 years and this is the result…...
   Processed 45 comments
   Only 0 mentions - Skipped

Post 5: OMG Now What 😱...
   Processed 45 comments
   Only 0 mentions - S

## 10. Generate Sentiment Analysis Results

Display and analyze the sentiment analysis results:

In [27]:
# Analyze and display sentiment results
if not results_df.empty:
    print("DETAILED SENTIMENT ANALYSIS RESULTS")
    print("=" * 60)
    
    # Calculate sentiment totals
    total_vader_pos = results_df['VADER_Positive'].sum()
    total_vader_neg = results_df['VADER_Negative'].sum()
    total_vader_neu = results_df['VADER_Neutral'].sum()
    
    total_textblob_pos = results_df['TextBlob_Positive'].sum()
    total_textblob_neg = results_df['TextBlob_Negative'].sum()
    total_textblob_neu = results_df['TextBlob_Neutral'].sum()
    
    # VADER Analysis
    print("VADER Sentiment Analysis:")
    vader_total = total_vader_pos + total_vader_neg + total_vader_neu
    if vader_total > 0:
        print(f"    Positive: {total_vader_pos} ({total_vader_pos/vader_total*100:.1f}%)")
        print(f"    Negative: {total_vader_neg} ({total_vader_neg/vader_total*100:.1f}%)")
        print(f"    Neutral: {total_vader_neu} ({total_vader_neu/vader_total*100:.1f}%)")
    
    print()
    
    # TextBlob Analysis
    print("TextBlob Sentiment Analysis:")
    textblob_total = total_textblob_pos + total_textblob_neg + total_textblob_neu
    if textblob_total > 0:
        print(f"    Positive: {total_textblob_pos} ({total_textblob_pos/textblob_total*100:.1f}%)")
        print(f"    Negative: {total_textblob_neg} ({total_textblob_neg/textblob_total*100:.1f}%)")
        print(f"    Neutral: {total_textblob_neu} ({total_textblob_neu/textblob_total*100:.1f}%)")
    
    print("\n" + "=" * 60)
    
    # Overall sentiment score calculation
    vader_sentiment_score = (total_vader_pos - total_vader_neg) / max(vader_total, 1)
    textblob_sentiment_score = (total_textblob_pos - total_textblob_neg) / max(textblob_total, 1)
    
    print(f"OVERALL SENTIMENT SCORES:")
    print(f"   VADER Score: {vader_sentiment_score:+.3f} (-1 to +1)")
    print(f"   TextBlob Score: {textblob_sentiment_score:+.3f} (-1 to +1)")
    
    # Determine overall sentiment
    avg_sentiment = (vader_sentiment_score + textblob_sentiment_score) / 2
    if avg_sentiment > 0.1:
        overall_sentiment = "BULLISH"
    elif avg_sentiment < -0.1:
        overall_sentiment = "BEARISH"
    else:
        overall_sentiment = "NEUTRAL"
    
    print(f"   Overall Sentiment: {overall_sentiment} ({avg_sentiment:+.3f})")
    
    # Show top posts by ticker mentions
    print(f"\nTOP POSTS BY {selectedTickerSymbol} MENTIONS:")
    print("-" * 60)
    top_posts = results_df.nlargest(5, 'NumberOfTickerMentions')
    for idx, row in top_posts.iterrows():
        print(f" {row['Title'][:80]}...")
        print(f"    {row['NumberOfTickerMentions']} mentions | {row['Date']} | {row['Score']} upvotes")
        print()
    
    # Hinglish analysis
    hinglish_count = results_df['Is_Hinglish'].sum()
    print(f"HINGLISH CONTENT ANALYSIS:")
    print(f"    Hinglish posts: {hinglish_count} out of {len(results_df)} ({hinglish_count/len(results_df)*100:.1f}%)")
    
    # Display first few rows of the dataset
    print(f"\nDATASET PREVIEW (First 3 rows):")
    print("=" * 60)
    display_columns = ['Title', 'Date', 'NumberOfTickerMentions', 'VADER_Positive', 'VADER_Negative', 'TextBlob_Positive', 'TextBlob_Negative']
    print(results_df[display_columns].head(3).to_string(index=False))
    
else:
    print("No data found! Try:")
    print("   - Reducing min_ticker_mentions")
    print("   - Increasing howmanysubmissions")
    print("   - Changing time_period to 'year' or 'all'")
    print("   - Checking if ticker symbol is correct")

DETAILED SENTIMENT ANALYSIS RESULTS
VADER Sentiment Analysis:
    Positive: 679 (34.7%)
    Negative: 301 (15.4%)
    Neutral: 976 (49.9%)

TextBlob Sentiment Analysis:
    Positive: 536 (27.4%)
    Negative: 169 (8.6%)
    Neutral: 1251 (64.0%)

OVERALL SENTIMENT SCORES:
   VADER Score: +0.193 (-1 to +1)
   TextBlob Score: +0.188 (-1 to +1)
   Overall Sentiment: BULLISH (+0.190)

TOP POSTS BY ADANIENT.NS MENTIONS:
------------------------------------------------------------
 30 M , can you guys rate my portfolio and how much can i improve here?...
    17 mentions | 2025-02-08 | 120 upvotes

 Buy now or wait ? Tell me experts...
    10 mentions | 2025-09-26 | 141 upvotes

 At the end of the year, I got a 46% return this year. How much return did you ge...
    9 mentions | 2023-12-30 | 386 upvotes

 Rate this portfolio...
    9 mentions | 2024-11-07 | 142 upvotes

 How Cooked Am I ??...
    8 mentions | 2024-11-21 | 108 upvotes

HINGLISH CONTENT ANALYSIS:
    Hinglish posts: 27 out of 2

## 11. Export Data to CSV Files

Save the sentiment analysis results and stock data to CSV files:

In [28]:
# Export results to CSV files
print("EXPORTING DATA TO CSV FILES")
print("=" * 40)

# Generate timestamp for unique filenames
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
base_ticker = selectedTickerSymbol.replace('.NS', '')

try:
    # Export sentiment analysis results
    if not results_df.empty:
        sentiment_filename = f"{base_ticker}_comment_analysis_{timestamp}.csv"
        results_df.to_csv(sentiment_filename, index=False)
        print(f"Sentiment data saved: {sentiment_filename}")
        print(f"   Records: {len(results_df)}")
        print(f"   Columns: {list(results_df.columns)}")
    else:
        print("No sentiment data to export")
    
    # Export stock price data
    if not stock_history.empty:
        stock_filename = f"{base_ticker}_stock_history_{timestamp}.csv"
        stock_history.to_csv(stock_filename, index=True)
        print(f"Stock data saved: {stock_filename}")
        print(f"   Price records: {len(stock_history)}")
        print(f"   Date range: {stock_history.index[0].date()} to {stock_history.index[-1].date()}")
    else:
        print("No stock data to export")
    
    print("\n" + "=" * 20)
    print("ANALYSIS COMPLETE!")
    print("=" * 20)
    
    if not results_df.empty:
        print(f"\nFINAL SUMMARY FOR {selectedTickerSymbol}:")
        print(f"   Posts analyzed: {len(results_df)}")
        print(f"   Total ticker mentions: {results_df['NumberOfTickerMentions'].sum()}")
        print(f"   Date range: {results_df['Date'].min()} to {results_df['Date'].max()}")
        print(f"   Hinglish content: {results_df['Is_Hinglish'].sum()} posts")
        print(f"   Overall sentiment: {overall_sentiment}")
        
        print(f"\nFILES CREATED:")
        print(f"   {sentiment_filename} - Sentiment analysis results")
        if not stock_history.empty:
            print(f"   {stock_filename} - Stock price history")
            
        print(f"\nNext Steps:")
        print(f"   Load the CSV files for further analysis")
        print(f"   Create visualizations to correlate sentiment with stock price")
        print(f"   Train machine learning models for prediction")
        print(f"   Analyze Hinglish sentiment patterns")
    
except Exception as e:
    print(f"❌ Error during export: {e}")

print(f"\nAnalysis completed successfully!")

EXPORTING DATA TO CSV FILES
Sentiment data saved: ADANIENT_comment_analysis_20251008_160650.csv
   Records: 27
   Columns: ['Title', 'Ticker', 'Date', 'DateTime', 'Post_ID', 'Score', 'Num_Comments', 'Author', 'NumberOfTickerMentions', 'Is_Hinglish', 'Hinglish_Score', 'VADER_Sentiment', 'TextBlob_Sentiment', 'Combined_Sentiment', 'VADER_Negative', 'VADER_Positive', 'VADER_Neutral', 'TextBlob_Negative', 'TextBlob_Positive', 'TextBlob_Neutral']
Stock data saved: ADANIENT_stock_history_20251008_160650.csv
   Price records: 252
   Date range: 2024-10-08 to 2025-10-08

ANALYSIS COMPLETE!

FINAL SUMMARY FOR ADANIENT.NS:
   Posts analyzed: 27
   Total ticker mentions: 92
   Date range: 2023-12-30 to 2025-09-29
   Hinglish content: 27 posts
   Overall sentiment: BULLISH

FILES CREATED:
   ADANIENT_comment_analysis_20251008_160650.csv - Sentiment analysis results
   ADANIENT_stock_history_20251008_160650.csv - Stock price history

Next Steps:
   Load the CSV files for further analysis
   Create 