# Visualizing Alienation: Mapping Emotional Spaces in Frankenstein

This brief analysis uses sentiment analysis to understand the emotional valence of the locations and characters in Mary Shelley's *Frankenstein*. 

## Methodology

The data was created by splitting the text of Frankenstein into Vol. Chapter and paragraphs. For each paragraph, the location of the narrative present was noted. This is distinct from all the locations that are *mentioned* in the text. For example, South America is mentioned, but as the text never *goes* there it is not part of this data set. Subsequently, the roBERTa Sentiment analyzer was run on each paragraph and the aggregate score per location was registered. 

A follow up analysis was performed of the sentiments surrounding each character. This is less accurate as the calculation purely replies on character name as indicative of character presence. No attempt was made to reconcile pronouns with characters. Thus, "I" could be Walton, Victor, or The Monster. Without significantly more work, these distinctions cannot be recovered from the data without supervision.

In [None]:
# Load analysis results from parquet files
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py

# Configure Plotly for HTML export - this ensures charts work in exported HTML
py.init_notebook_mode(connected=False)  # Use offline mode for HTML export
import plotly.io as pio
pio.renderers.default = "notebook"  # Ensure charts render in notebook and HTML

print("üìä Loading Frankenstein analysis results...")


try:
    # Load all datasets from parquet files (fast and efficient)
    frankenstein_all_with_sentiment = pd.read_parquet("frankenstein_all_paragraphs_with_sentiment.parquet")
    frankenstein_all_with_sentiment.to_csv("frankenstein_all_paragraphs_with_sentiment.csv")
    character_sentiment_df = pd.read_parquet("frankenstein_character_sentiment.parquet")
    location_sentiment_summary = pd.read_parquet("frankenstein_location_sentiment.parquet")
    frankenstein_manual_locations = pd.read_parquet("frankenstein_manual_locations.parquet")
    
    print("‚úÖ Successfully loaded all datasets:")
  
    
    # Set up coordinate columns
    coords_columns = list(frankenstein_manual_locations.columns[-2:])
    lat_col = coords_columns[0]
    lon_col = coords_columns[1]
    
   
    
except FileNotFoundError as e:
    print(f"‚ùå Error loading data: {e}")

üìä Loading Frankenstein analysis results...
‚úÖ Successfully loaded all datasets:


## Part I: The Geographic Imagination of Frankenstein

The text has a remarkably large geographic canvas given its relative brevity. Some of this is no doubt due to the fact that Mary Shelley traveled quite a bit during the composition of the text. 

### Geographic Distribution and Narrative Weight

The map below shows all the locations in *Frankenstein* where the text travels. The circle sizes represent the total word count associated with each location. Note that this does not necessarily indicate how long the text stays in that location in terms of narrative duration. For example, Victor is in Ingolstadt for quite some time, but the relative amount of text there is quite small.

In [None]:
# Geographic Distribution Map - Clean Version
try:
    # Calculate location counts for sizing
    valid_coords = frankenstein_manual_locations[
        (frankenstein_manual_locations[lat_col].notna()) & 
        (frankenstein_manual_locations[lon_col].notna())
    ].copy()
    
    valid_coords['word_count'] = valid_coords['paragraph_text'].str.split().str.len()
    total_narrative_words = frankenstein_manual_locations['paragraph_text'].str.split().str.len().sum()
    
    # Clean location names to handle duplicates like "Delacey Cottage"
    valid_coords['curated_name_clean'] = valid_coords['curated_name'].str.strip()
    
    # Group locations that are essentially the same (like multiple "Delacey Cottage" entries)
    # First aggregate by cleaned name to get representative coordinates
    location_coords = valid_coords.groupby('curated_name_clean').agg({
        lat_col: 'first',  # Use first occurrence coordinates
        lon_col: 'first'
    }).reset_index()
    
    # Then sum word counts by cleaned name
    location_counts = valid_coords.groupby('curated_name_clean').agg({
        'word_count': 'sum'
    }).reset_index()
    
    # Merge coordinates back
    location_counts = location_counts.merge(location_coords, on='curated_name_clean')
    location_counts = location_counts.rename(columns={'curated_name_clean': 'curated_name'})
    location_counts = location_counts.rename(columns={'word_count': 'total_words'})
    location_counts['narrative_percent'] = (location_counts['total_words'] / total_narrative_words * 100).round(2)

    # Create the geographic map
    fig_geo = px.scatter_map(
        location_counts,
        lat=lat_col,
        lon=lon_col,
        hover_name="curated_name",
        size="total_words",
        size_max=40,
        hover_data={
            "narrative_percent": ":.2f",
            "total_words": True,
            lat_col: False,
            lon_col: False
        },
        title="Geographic Locations in Frankenstein: Narrative Distribution",
        labels={"narrative_percent": "% of Total Narrative"},
        zoom=3,
        height=700,
        color_discrete_sequence=['#2E86AB']
    )

    fig_geo.update_layout(
        mapbox_style="carto-positron",
        margin={"r":0,"t":50,"l":0,"b":0}
    )
    
    # Configure for HTML export - embed the plot with full JavaScript
    fig_geo.update_layout(
        font=dict(size=12),
        title_font=dict(size=16),
    )
    
    # Note: scatter_map doesn't support marker line outlines
    # The circles will be solid without outlines for this map type

    # Show with offline configuration for HTML export
    py.iplot(fig_geo, show_link=False, config={'displayModeBar': True})

    # Display insights
    print(f"üìç Analysis reveals {len(location_counts)} unique geographic locations")
    print(f"üìä Most significant locations by word count:")
    
    top_locations = location_counts.nlargest(5, 'total_words')
    for _, row in top_locations.iterrows():
        print(f"   {row['curated_name']}: {row['narrative_percent']:.1f}% ({row['total_words']} words)")
    
except NameError:
    print("‚ö†Ô∏è Data not loaded - please run the data loading cell first")

üìç Analysis reveals 45 unique geographic locations
üìä Most significant locations by word count:
   Geneva: 24.2% (17387 words)
   Delacey Cottage: 14.1% (10135 words)
   Ingolstadt: 11.9% (8543 words)
   Artic: 10.6% (7583 words)
   Montanvert: 5.8% (4175 words)


In [27]:
# Debug and fix Unicode issues in location data
import re

def clean_unicode_surrogates(text):
    """Remove problematic Unicode surrogate characters"""
    if isinstance(text, str):
        # Remove surrogate characters (U+D800 to U+DFFF)
        return re.sub(r'[\uD800-\uDFFF]', '', text)
    return text

def clean_dataframe_unicode(df):
    """Clean Unicode issues in all string columns of a DataFrame"""
    df_cleaned = df.copy()
    
    for column in df_cleaned.columns:
        if df_cleaned[column].dtype == 'object':
            df_cleaned[column] = df_cleaned[column].apply(clean_unicode_surrogates)
    
    return df_cleaned

# Clean the location sentiment data
print("üßπ Cleaning Unicode issues in location data...")

try:
    # Check for problematic characters in location names
    problematic_locations = []
    for idx, row in location_sentiment_summary.iterrows():
        try:
            # Try to encode the location name
            row['curated_name'].encode('utf-8')
        except UnicodeEncodeError as e:
            problematic_locations.append((idx, row['curated_name'], str(e)))
    
    if problematic_locations:
        print(f"‚ùå Found {len(problematic_locations)} problematic locations:")
        for idx, name, error in problematic_locations:
            print(f"   Index {idx}: '{name}' - {error}")
    else:
        print("‚úÖ No obvious Unicode issues found in location names")
    
    # Clean the location data
    location_sentiment_summary_cleaned = clean_dataframe_unicode(location_sentiment_summary)
    
    # Verify cleaning worked
    print(f"üìä Original data shape: {location_sentiment_summary.shape}")
    print(f"üìä Cleaned data shape: {location_sentiment_summary_cleaned.shape}")
    
    # Update the global variable
    location_sentiment_summary = location_sentiment_summary_cleaned
    print("‚úÖ Location data cleaned successfully")
    
except Exception as e:
    print(f"‚ùå Error during cleaning: {e}")
    print(f"Error type: {type(e)}")

üßπ Cleaning Unicode issues in location data...
‚úÖ No obvious Unicode issues found in location names
üìä Original data shape: (50, 11)
üìä Cleaned data shape: (50, 11)
‚úÖ Location data cleaned successfully


In [None]:
# Emotional Geography Map - Sentiment Analysis (Unicode Safe)
import re

def clean_text_for_display(text):
    """Clean text for safe display, removing problematic Unicode characters"""
    if pd.isna(text) or not isinstance(text, str):
        return str(text)
    # Remove surrogate pairs and other problematic characters
    text = re.sub(r'[\uD800-\uDFFF]', '', text)  # Remove surrogates
    text = re.sub(r'[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F]', '', text)  # Remove control characters
    return text

try:
    # Clean all text data for safe display and handle duplicate locations
    location_data_safe = location_sentiment_summary.copy()
    location_data_safe['curated_name'] = location_data_safe['curated_name'].apply(clean_text_for_display)
    location_data_safe['sentiment_category'] = location_data_safe['sentiment_category'].apply(clean_text_for_display)
    
    # Clean location names to handle duplicates like "Delacey Cottage"
    location_data_safe['curated_name_clean'] = location_data_safe['curated_name'].str.strip()
    
    # Aggregate duplicate locations by summing word counts and averaging sentiment
    location_data_safe = location_data_safe.groupby('curated_name_clean').agg({
        'lat': 'first',
        'long': 'first',
        'total_words': 'sum',
        'avg_sentiment': 'mean',
        'narrative_percent': 'sum',
        'sentiment_category': lambda x: x.mode().iloc[0] if not x.empty else x.iloc[0]
    }).reset_index()
    location_data_safe = location_data_safe.rename(columns={'curated_name_clean': 'curated_name'})
    
    # Create sentiment-enhanced map using cleaned data
    fig_sentiment = px.scatter_map(
        location_data_safe,
        lat='lat',
        lon='long',
        hover_name='curated_name',
        size="total_words",
        size_max=35,
        color="avg_sentiment",
        color_continuous_scale='RdYlGn',
        color_continuous_midpoint=0,
        hover_data={
            "narrative_percent": ":.2f",
            "avg_sentiment": ":.3f",
            "sentiment_category": True,
            'lat': False,
            'long': False
        },
        title="Emotional Geography of Frankenstein: Location Sentiment Analysis",
        labels={
            "avg_sentiment": "Average Sentiment",
            "narrative_percent": "% of Total Narrative"
        },
        zoom=3,
        height=700
    )
    
    fig_sentiment.update_layout(
        mapbox_style="carto-positron",
        margin={"r":0,"t":50,"l":0,"b":0},
        coloraxis_colorbar=dict(
            title="Sentiment Score",
            tickvals=[-0.4, -0.2, 0, 0.2, 0.4],
            ticktext=["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
        )
    )
    
    # Configure for HTML export
    fig_sentiment.update_layout(
        font=dict(size=12),
        title_font=dict(size=16),
    )
    
    # Note: scatter_map doesn't support marker line outlines
    # The circles will use their sentiment colors without outlines
    
    # Show with offline configuration for HTML export
    py.iplot(fig_sentiment, show_link=False, config={'displayModeBar': True})
    
    # Display sentiment insights with cleaned text
    avg_overall_sentiment = location_data_safe['avg_sentiment'].mean()
    print(f"üìä Emotional Geography Analysis Complete")
    print(f"üé≠ Overall sentiment across all locations: {avg_overall_sentiment:.3f}")
    
    sentiment_distribution = location_data_safe['sentiment_category'].value_counts()
    print(f"üìà Sentiment distribution: {sentiment_distribution.to_dict()}")
    
    # Show most positive and negative locations with cleaned names
    most_positive = location_data_safe.nlargest(3, 'avg_sentiment')[['curated_name', 'avg_sentiment']]
    most_negative = location_data_safe.nsmallest(3, 'avg_sentiment')[['curated_name', 'avg_sentiment']]
    
    print(f"\n‚ú® Most positively framed locations:")
    for _, row in most_positive.iterrows():
        clean_name = clean_text_for_display(row['curated_name'])
        print(f"   {clean_name}: {row['avg_sentiment']:.3f}")
    
    print(f"\n‚õàÔ∏è Most negatively framed locations:")
    for _, row in most_negative.iterrows():
        clean_name = clean_text_for_display(row['curated_name'])
        print(f"   {clean_name}: {row['avg_sentiment']:.3f}")
        
except NameError:
    print("‚ö†Ô∏è Sentiment data not available - please run the data loading cell first")
except Exception as e:
    print(f"‚ùå Error creating sentiment map: {e}")
    print(f"Error type: {type(e).__name__}")
    
    # Fallback: show basic sentiment statistics without the map
    try:
        avg_overall_sentiment = location_sentiment_summary['avg_sentiment'].mean()
        print(f"\nüìä Basic Sentiment Statistics:")
        print(f"üé≠ Overall sentiment across all locations: {avg_overall_sentiment:.3f}")
        
        sentiment_distribution = location_sentiment_summary['sentiment_category'].value_counts()
        print(f"üìà Sentiment distribution: {sentiment_distribution.to_dict()}")
    except Exception as fallback_error:
        print(f"‚ùå Even fallback statistics failed: {fallback_error}")

üìä Emotional Geography Analysis Complete
üé≠ Overall sentiment across all locations: 0.012
üìà Sentiment distribution: {'Neutral': 19, 'Negative': 14, 'Positive': 12}

‚ú® Most positively framed locations:
   Windsor: 0.477
   Edinburgh: 0.344
   Chamonix: 0.333

‚õàÔ∏è Most negatively framed locations:
   Holyhead: -0.410
   Zurich: -0.238
   Beach somewhere on the Irish Coast: -0.231


### Animating Movements

We can also track where these emotions take place in time by splitting up the data further. By giving each location a chronological number, we can get the sentiment for that particular event at that particular location rather than the average sentiment per location. After all, sometimes Victor is happy in Geneva and sometimes he is sad. 

In [5]:
frankenstein_emotion_sequence_df = pd.read_csv("frankenstein_paragraphs_geoparsed_located_chrono.csv")

In [6]:
# Examine the chronological data structure
print("üîç Exploring the chronological emotion data...")
print(f"üìä Data shape: {frankenstein_emotion_sequence_df.shape}")
print(f"üìã Columns: {frankenstein_emotion_sequence_df.columns.tolist()}")

print(f"\nüìà Ordinal range: {frankenstein_emotion_sequence_df['ordinal'].min()} to {frankenstein_emotion_sequence_df['ordinal'].max()}")
print(f"üèÉ Unique ordinal values: {frankenstein_emotion_sequence_df['ordinal'].nunique()}")

print(f"\nüó∫Ô∏è Sample data (first 5 rows):")
display(frankenstein_emotion_sequence_df[['text_section', 'chapter_letter', 'curated_name', 'lat', 'long', 'ordinal']].head())

print(f"\nüìç Unique locations in chronological data: {frankenstein_emotion_sequence_df['curated_name'].nunique()}")
print(f"üìç Locations: {sorted(frankenstein_emotion_sequence_df['curated_name'].unique())}")

üîç Exploring the chronological emotion data...
üìä Data shape: (764, 13)
üìã Columns: ['text_section', 'chapter_letter', 'paragraph_number', 'paragraph_text', 'places', 'latitudes', 'longitudes', 'feature_names', 'curated_name', 'lat', 'long', 'chrono', 'ordinal']

üìà Ordinal range: 1 to 67
üèÉ Unique ordinal values: 67

üó∫Ô∏è Sample data (first 5 rows):


Unnamed: 0,text_section,chapter_letter,curated_name,lat,long,ordinal
0,vol_1,CHAPTER I,Lucern,47.05048,8.30635,1
1,vol_1,CHAPTER I,Lucern,47.05048,8.30635,1
2,vol_1,CHAPTER I,Lucern,47.05048,8.30635,1
3,vol_1,CHAPTER I,Geneva,46.203278,6.147158,1
4,vol_1,CHAPTER I,Geneva,46.203278,6.147158,1



üìç Unique locations in chronological data: 45
üìç Locations: ['Archangel', 'Arles', 'Artic', 'Barents Sea', 'Beach somewhere on the Irish Coast', 'Belrive', 'Chamonix', 'Cologne', 'Constantinople', 'Cumberland', 'Delacey Cottage', 'Dublin', 'Edinburgh', 'Elizabeth in Italy', 'Geneva', 'Holyhead', 'Ingolstadt', 'Ingolstadt (Forest)', 'Irish Sea', 'Lausanne', 'Le Havre', 'Livorno', 'London', 'Lucern', 'Mainz', 'Matlock', 'Monster Travel to Geneva', 'Montanvert', 'Near Mont Blanc', 'Orkney Islands', 'Oxford', 'Paris', 'Perth', 'Portsmouth', 'Rhine below Mainz', 'Rotterdam', 'Russian plain', 'Russian pursuit', 'Russian pursuit near Archangel', 'Russian pursuit on ice', 'St. Petersburgh', 'Strasbourg', 'Thonon-les-Bains', 'Windsor', 'Zurich']


In [None]:
# Create animation data with sentiment from existing analysis
print("üé¨ Creating animated emotional journey with RoBERTa sentiment...")

# Work with chronological data
df = frankenstein_emotion_sequence_df.copy()
df['word_count'] = df['paragraph_text'].str.split().str.len()

# Check what sentiment columns are available
print("Available sentiment columns:")
sentiment_cols = [col for col in frankenstein_all_with_sentiment.columns if 'roberta' in col.lower()]
for col in sentiment_cols:
    print(f"  - {col}")

# Use compound score if available, otherwise calculate from pos/neg
if 'roberta_compound' in frankenstein_all_with_sentiment.columns:
    sentiment_col = 'roberta_compound'
    print(f"Using compound sentiment score: {sentiment_col}")
    df['sentiment_score'] = frankenstein_all_with_sentiment[sentiment_col].values
elif 'roberta_pos' in frankenstein_all_with_sentiment.columns and 'roberta_neg' in frankenstein_all_with_sentiment.columns:
    print("Calculating sentiment from positive - negative scores")
    df['sentiment_score'] = (frankenstein_all_with_sentiment['roberta_pos'].values - 
                            frankenstein_all_with_sentiment['roberta_neg'].values)
else:
    print("‚ö†Ô∏è Using first available sentiment column")
    sentiment_col = sentiment_cols[0] if sentiment_cols else 'roberta_neg'
    df['sentiment_score'] = frankenstein_all_with_sentiment[sentiment_col].values

# Clean and prepare data
df_clean = df.dropna(subset=['lat', 'long']).copy()

# Clean location names to handle duplicates like "Delacey Cottage"
df_clean['curated_name_clean'] = df_clean['curated_name'].str.strip()

# Aggregate by ordinal (chronological step) and cleaned location name
animation_data = df_clean.groupby(['ordinal', 'curated_name_clean', 'lat', 'long']).agg({
    'word_count': 'sum',  # Total words for this location at this time
    'sentiment_score': 'mean',  # Average sentiment
    'text_section': 'first',
    'chapter_letter': 'first'
}).reset_index()

# Rename back to curated_name for display
animation_data = animation_data.rename(columns={'curated_name_clean': 'curated_name'})

# Add frame information
animation_data['frame_label'] = animation_data['ordinal'].astype(str)
animation_data['chapter_info'] = (animation_data['text_section'].str.replace('_', ' ') + 
                                 ' ' + animation_data['chapter_letter']).str.title()

# Add sentiment category for hover info
animation_data['sentiment_category'] = animation_data['sentiment_score'].apply(
    lambda x: 'Positive' if x > 0.1 else ('Negative' if x < -0.1 else 'Neutral')
)

print(f"üìä Animation ready: {len(animation_data)} location-time points")
print(f"üìà Chronological steps: {animation_data['ordinal'].nunique()}")
print(f"üìù Word count range: {animation_data['word_count'].min()}-{animation_data['word_count'].max()}")
print(f"üé≠ Sentiment range: {animation_data['sentiment_score'].min():.3f} to {animation_data['sentiment_score'].max():.3f}")

# Create the animated map  
fig_animated = px.scatter_map(
    animation_data,
    lat="lat",
    lon="long",
    hover_name="curated_name",
    size="word_count",  # Size = word count (as requested)
    size_max=60,
    color="sentiment_score",  # Color = sentiment
    color_continuous_scale='RdYlGn',
    color_continuous_midpoint=0,
    animation_frame="frame_label",
    hover_data={
        "sentiment_score": ":.3f",
        "sentiment_category": True,
        "word_count": True,
        "chapter_info": True,
        "ordinal": True
    },
    title="Emotional Journey Through Frankenstein: Chronological Animation<br><sub>Size = Word Count | Color = RoBERTa Sentiment Score</sub>",
    labels={
        "sentiment_score": "Sentiment Score",
        "sentiment_category": "Sentiment",
        "word_count": "Words",
        "chapter_info": "Chapter",
        "ordinal": "Chronological Step"
    },
    zoom=3,
    height=850
)

# Style the map
fig_animated.update_layout(
    mapbox_style="carto-positron",
    margin={"r":0,"t":90,"l":0,"b":0},
    coloraxis_colorbar=dict(
        title="Sentiment Score",
        tickvals=[-0.6, -0.3, 0, 0.3, 0.6],
        ticktext=["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
    )
)

# Configure for HTML export
fig_animated.update_layout(
    font=dict(size=12),
    title_font=dict(size=16),
)

# Animation settings - slower for better observation
fig_animated.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 1000  # 1 second per frame
fig_animated.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = .01  # 0.01 second transition

# Add step info to frames with more context
for i, frame in enumerate(fig_animated.frames):
    step = i + 1
    current_step_data = animation_data[animation_data['ordinal'] == step]
    if not current_step_data.empty:
        # Get the chapter info and locations for this step
        chapter_info = current_step_data['chapter_info'].iloc[0]
        locations = current_step_data['curated_name'].tolist()
        location_text = ", ".join(locations) if len(locations) <= 3 else f"{', '.join(locations[:2])}, +{len(locations)-2} more"
        frame.layout.title = f"{chapter_info} - {location_text}"
    else:
        frame.layout.title = f"Frankenstein Journey - Step {step}/{len(fig_animated.frames)}"

# Show with offline configuration for HTML export
py.iplot(fig_animated, show_link=False, config={'displayModeBar': True})

print("üéØ Enhanced animated emotional journey map created!")
print("üìà Circle size = word count at each location during each chronological step")
print("üåà Circle color = RoBERTa sentiment (red=negative, yellow=neutral, green=positive)")
print("üó∫Ô∏è Each frame shows the emotional state of locations during that narrative moment")
print("‚ñ∂Ô∏è Press play to watch how emotions shift across geography as the story unfolds")
print(f"üìñ Animation spans {animation_data['ordinal'].nunique()} chronological steps through Shelley's narrative")

üé¨ Creating animated emotional journey with RoBERTa sentiment...
Available sentiment columns:
  - roberta_neg
  - roberta_neu
  - roberta_pos
  - roberta_compound
Using compound sentiment score: roberta_compound
üìä Animation ready: 72 location-time points
üìà Chronological steps: 67
üìù Word count range: 36-5619
üé≠ Sentiment range: -0.902 to 0.803


üéØ Enhanced animated emotional journey map created!
üìà Circle size = word count at each location during each chronological step
üåà Circle color = RoBERTa sentiment (red=negative, yellow=neutral, green=positive)
üó∫Ô∏è Each frame shows the emotional state of locations during that narrative moment
‚ñ∂Ô∏è Press play to watch how emotions shift across geography as the story unfolds
üìñ Animation spans 67 chronological steps through Shelley's narrative


### Emotional Geography Insights

The sentiment analysis of geographic locations reveals sophisticated patterns in Shelley's emotional mapping:

**Positive Locations**: Often associated with domesticity, family, and early happiness
- Geneva (Victor's family home)
- Peaceful natural settings

**Negative Locations**: Frequently connected to isolation, creation, and consequence
- Laboratory spaces
- Remote wilderness areas
- Sites of confrontation

**Neutral Locations**: Transitional spaces and narrative bridges
- Travel routes
- Temporary stops

This emotional geography suggests that Shelley uses location not merely as setting but as an extension of character psychology and thematic development.

## Part III: Character Sentiment Analysis

Moving from geographic to character-centered analysis, we examined how Mary Shelley emotionally frames the principal characters throughout the narrative. This analysis identifies paragraphs mentioning key characters and measures the sentiment associated with each character's textual presence.

In [None]:
# Character Sentiment Analysis Visualizations
try:
    # Sort by sentiment for better visualization
    character_df_sorted = character_sentiment_df.sort_values('Avg_Sentiment', ascending=False)
    
    # 1. Character Sentiment Overview Bar Chart
    fig1 = px.bar(
        character_df_sorted,
        x='Character',
        y='Avg_Sentiment',
        color='Avg_Sentiment',
        color_continuous_scale='RdYlGn',
        color_continuous_midpoint=0,
        title='Character Emotional Framing: Average Sentiment by Character',
        labels={'Avg_Sentiment': 'Average Sentiment Score'},
        hover_data=['Total_Mentions', 'Total_Words']
    )
    
    fig1.add_hline(y=0, line_dash="dash", line_color="gray", 
                   annotation_text="Neutral Baseline", annotation_position="top right")
    
    fig1.update_layout(
        height=500,
        xaxis_title="Character",
        yaxis_title="Average Sentiment Score",
        showlegend=False
    )
    
    # Show with offline configuration for HTML export
    py.iplot(fig1, show_link=False, config={'displayModeBar': True})
    
    # 2. Sentiment Distribution Stack Chart
    fig2 = go.Figure()
    
    fig2.add_trace(go.Bar(
        name='Positive Mentions',
        x=character_df_sorted['Character'],
        y=character_df_sorted['Positive_Mentions'],
        marker_color='#2E8B57',
        opacity=0.8
    ))
    
    fig2.add_trace(go.Bar(
        name='Neutral Mentions',
        x=character_df_sorted['Character'],
        y=character_df_sorted['Neutral_Mentions'],
        marker_color='#708090',
        opacity=0.8
    ))
    
    fig2.add_trace(go.Bar(
        name='Negative Mentions',
        x=character_df_sorted['Character'],
        y=character_df_sorted['Negative_Mentions'],
        marker_color='#CD5C5C',
        opacity=0.8
    ))
    
    fig2.update_layout(
        barmode='stack',
        title='Character Emotional Complexity: Sentiment Distribution by Character',
        xaxis_title='Character',
        yaxis_title='Number of Paragraphs',
        height=500
    )
    
    # Show with offline configuration for HTML export
    py.iplot(fig2, show_link=False, config={'displayModeBar': True})
    
    # 3. Character Frequency vs Sentiment Scatter Plot
    fig3 = px.scatter(
        character_df_sorted,
        x='Total_Mentions',
        y='Avg_Sentiment',
        size='Total_Words',
        color='Avg_Sentiment',
        color_continuous_scale='RdYlGn',
        color_continuous_midpoint=0,
        hover_name='Character',
        title='Character Analysis: Narrative Presence vs Emotional Framing',
        labels={
            'Total_Mentions': 'Number of Paragraph Mentions',
            'Avg_Sentiment': 'Average Sentiment Score',
            'Total_Words': 'Total Words in Context'
        }
    )
    
    # Add reference lines
    fig3.add_hline(y=0, line_dash="dash", line_color="gray", opacity=0.5)
    fig3.add_vline(x=character_df_sorted['Total_Mentions'].median(), 
                   line_dash="dash", line_color="gray", opacity=0.5)
    
    fig3.update_layout(height=500)
    
    # Show with offline configuration for HTML export
    py.iplot(fig3, show_link=False, config={'displayModeBar': True})
    
    # Display key insights
    most_positive = character_df_sorted.iloc[0]
    most_negative = character_df_sorted.iloc[-1]
    most_mentioned = character_df_sorted.loc[character_df_sorted['Total_Mentions'].idxmax()]
    
    print("üé≠ Character Analysis - Key Findings:")
    print(f"‚ú® Most positively portrayed: {most_positive['Character']} (sentiment: {most_positive['Avg_Sentiment']:.3f})")
    print(f"‚õàÔ∏è Most negatively portrayed: {most_negative['Character']} (sentiment: {most_negative['Avg_Sentiment']:.3f})")
    print(f"üìà Most frequently mentioned: {most_mentioned['Character']} ({most_mentioned['Total_Mentions']} paragraphs)")
    print(f"üìä Characters analyzed: {len(character_df_sorted)}")
    
    print(f"\nüîç Character Emotional Patterns:")
    for _, row in character_df_sorted.iterrows():
        pos_pct = (row['Positive_Mentions'] / row['Total_Mentions'] * 100)
        neg_pct = (row['Negative_Mentions'] / row['Total_Mentions'] * 100)
        neu_pct = (row['Neutral_Mentions'] / row['Total_Mentions'] * 100)
        
        print(f"   {row['Character']:>10}: {pos_pct:5.1f}% pos, {neu_pct:5.1f}% neu, {neg_pct:5.1f}% neg (avg: {row['Avg_Sentiment']:6.3f})")

except NameError:
    print("‚ö†Ô∏è Character analysis data not available - please run the data loading cell first")

üé≠ Character Analysis - Key Findings:
‚ú® Most positively portrayed: Agatha (sentiment: 0.107)
‚õàÔ∏è Most negatively portrayed: Justine (sentiment: -0.181)
üìà Most frequently mentioned: Alphonse (125 paragraphs)
üìä Characters analyzed: 10

üîç Character Emotional Patterns:
       Agatha:  58.8% pos,  23.5% neu,  17.6% neg (avg:  0.107)
       Ernest:  42.9% pos,  35.7% neu,  21.4% neg (avg:  0.023)
    Elizabeth:  33.8% pos,  28.7% neu,  37.5% neg (avg:  0.018)
        Felix:  36.8% pos,  36.8% neu,  26.3% neg (avg: -0.001)
        Henry:  32.3% pos,  27.7% neu,  40.0% neg (avg: -0.008)
     Alphonse:  28.0% pos,  36.8% neu,  35.2% neg (avg: -0.012)
       Victor:  15.1% pos,  41.5% neu,  43.4% neg (avg: -0.127)
      William:  16.7% pos,  25.0% neu,  58.3% neg (avg: -0.130)
      Monster:  16.5% pos,  25.7% neu,  57.8% neg (avg: -0.136)
      Justine:  12.5% pos,  12.5% neu,  75.0% neg (avg: -0.181)


### Character Sentiment Insights

The character-based sentiment analysis reveals several fascinating patterns in Shelley's characterization:

**Most Positively Framed Characters**:
- **Elizabeth**: Consistently associated with positive emotional language, representing domesticity and love
- **Henry Clerval**: Framed as Victor's moral compass and source of positive influence

**Most Negatively Framed Characters**:
- **The Monster**: Despite being a complex character deserving sympathy, often surrounded by negative emotional language
- **Victor**: Surprisingly negative sentiment, reflecting his internal torment and moral complexity

**Complex Characterization**:
- Characters with mixed sentiment patterns show Shelley's nuanced approach to characterization
- The frequency vs. sentiment analysis reveals that major characters often have more complex emotional profiles

## Literary and Critical Implications

### Geographic Symbolism
Shelley's geographic choices are far from arbitrary. The sentiment analysis reveals that she consistently associates certain types of locations with specific emotional tones, creating a symbolic geography that reinforces the novel's themes:

- **Domestic spaces** tend toward positive sentiment, representing safety and family bonds
- **Scientific/laboratory spaces** carry negative associations, reflecting the dangerous nature of Victor's pursuits
- **Natural wilderness** shows mixed sentiment, serving both as refuge and as sites of confrontation

### Character Psychology and Moral Framework
The character sentiment analysis illuminates Shelley's moral framework:

- **Victor's negative sentiment** suggests Shelley's critique of unchecked scientific ambition
- **The Monster's treatment** reveals the complex interplay between sympathy and horror in Gothic fiction
- **Elizabeth's consistently positive framing** reinforces traditional gender roles while highlighting what Victor loses through his obsessions

### Methodological Innovation
This computational approach reveals patterns that would be difficult to detect through traditional close reading:

- **Quantified emotional patterns** provide evidence for interpretive claims about character and setting
- **Geographic distribution analysis** reveals the scope of Shelley's imaginative world-building
- **Sentiment mapping** creates new ways of understanding the relationship between place and emotion in literary texts

## Conclusion: Digital Humanities and Literary Understanding

This analysis demonstrates how digital humanities methods can enhance rather than replace traditional literary analysis. By applying computational techniques to *Frankenstein*, we uncover:

1. **Hidden Patterns**: Quantitative analysis reveals consistent patterns in Shelley's treatment of geography and character

2. **Evidence-Based Interpretation**: Sentiment analysis provides measurable evidence for claims about characterization and setting

3. **New Research Questions**: These visualizations generate new questions about Gothic literature, gender roles, and the relationship between science and emotion in Romantic literature

4. **Accessible Analysis**: Interactive visualizations make complex literary patterns visible and explorable

The computational analysis of *Frankenstein* reveals Mary Shelley as a sophisticated architect of both geographic and emotional landscapes. Her novel operates through carefully constructed patterns of place and sentiment that reinforce its central themes about creation, responsibility, and the consequences of unchecked ambition.

Rather than diminishing the literary richness of *Frankenstein*, digital analysis reveals new dimensions of Shelley's artistic achievement, demonstrating how computational methods can serve literary understanding and open new avenues for critical interpretation.

---

*This analysis was conducted using computational text analysis, geoparsing technology, and RoBERTa sentiment analysis. All visualizations are interactive and can be explored in detail to examine specific locations, characters, and textual patterns.*