# Semantic Network Analysis

**Step 1: Data Loading and Preprocessing**
Load the Excel files using **pandas** and merge them into a single DataFrame. 

**Step 2: Text Processing and Concept Extraction**
Using **NLP** techniques we can extract key concepts from the transcripts. 

**NLP Techniques**
    - Tokenization, Part-of-Speech (POS) tagging to filter nouns, adjectives, and verbs.
    - Use Named Entity Recognition (NER) to identify key entities like locations, organizations, etc.
    - Extract bigrams and trigrams to capture relevant phrases.

**Step 3: Building the Semantic Network**

Use **networkx** to create a semantic network graph to show how words, phrases, or concepts are interconnected based on their meanings or semantic relationships.
    - Nodes represent extracted concepts.
    - Edges represent co-occurrence relationships between concepts within the same transcript.
The weights of the edges can be determined by the frequency of co-occurrence.

**Step 4: Visualizing the Semantic Network**
For visualizing the network, we can use tools like **matplotlib**, **pyvis**, or **Plotly** for interactive graphs.

In [43]:
!pip install pyvis --upgrade

Defaulting to user installation because normal site-packages is not writeable


In [44]:
import spacy
import pandas as pd
import networkx as nx
from itertools import combinations
from pyvis.network import Network

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Function to process text and extract key concepts (nouns and adjectives)
def extract_concepts(text):
    doc = nlp(text.lower())
    return [token.lemma_ for token in doc if token.pos_ in ['NOUN', 'ADJ'] and not token.is_stop and token.dep_ not in ['aux', 'punct']]

# Load data from your Excel file
file_path = r"C:\Users\Shrey\Documents\Research\Dr. Sun Research\NLP_and_Sentiment_Analysis_Tasks\master-file.xlsx"
df = pd.read_excel(file_path)

# Assuming that the 'transcript' column exists in the data, apply concept extraction
df['concepts'] = df['transcript'].apply(extract_concepts)

# Create an empty graph
G = nx.Graph()

# Build edges based on co-occurrence of concepts
for concepts in df['concepts']:
    for concept1, concept2 in combinations(concepts, 2):
        if G.has_edge(concept1, concept2):
            G[concept1][concept2]['weight'] += 1
        else:
            G.add_edge(concept1, concept2, weight=1)

# List of words to remove
words_to_remove = ['cause', 'thing', 'idea', 'bill', 'sure', 'true', 'lot', 'sense', 'post', 'overall', 
                   'little', 'silver', 'well', 'type', 'people', 'free', 'kid', 'old', 'pro', 'con', 'sorry', 
                   'prompt', 'time', 'sort', 'issue', 'long', 'term', 'confusing', 'different', 'problem', 
                   'general', 'great', 'board', 'component', 'helpful', 'way', 'room', 'upfront', 'good', 'slow', 
                   'reason', 'hair', 'big', 'pretty', 'right', 'easy', 'clear', 'response', 'correct', 'beca', 
                   'perfect', 'situation', 'et', 'cetera', 'roll', 'new', 'tap', 'pta', 'understanding', 'bad', 
                   'gouging', 'run', 'hard', 'crazy', 'previous', 'page', 'similar', 'motivated', 'doublespeak', 
                   'lunch', 'stuff', 'average', 'gentleman', 'group', 'feeling', 'order', 'extra', 'owner', 
                   'worried', 'increase', 'એવું', 'વગર', 'વર્કના', 'ડિસ્કસ', 'પણ', 'કરો', 'આમ', 'કે', 'હું', 
                   'કરે', 'તો', 'શું', 'તડલાઇન', 'jacqueline', 'ટેબલગેટું', 'છે', 'processમાં', 'clean', 
                   'number', 'detail', 'kind', 'offer', 'topic', 'w', 'waiting', 'comment', 'diving', 'welcome', 
                   'table', 'request', 'second', 'stage', 'slight', 'gap', 'bit', 'struggle', 'reflective', 'film', 
                   'crack', 'condo', 'tom', 'tv', 'sound', 'clip', 'winter', 'outside', 'tool', 'smart', 'lieu', 
                   'ujsc', 'white', 'interested', 'numerous', 'goal', 'mute', 'thought', 'subject', 'course', 
                   'question', 'wood', 'able', 'list', 'game', 'excellent', 'loud', 'possible', 'mic', 'speaker', 
                   'proper', 'channel', 'use', 'fact', 'guy', 'grid', 'r', 'dog', 'buck']

# Filter out the words to remove from the graph's nodes
filtered_nodes = [node for node in G.nodes() if node not in words_to_remove]

# Create PyVis Network object
net = Network(notebook=True)

# Add nodes and edges to the PyVis network
for node in filtered_nodes:
    net.add_node(node)

for concept1, concept2 in G.edges():
    if concept1 in filtered_nodes and concept2 in filtered_nodes:
        weight = G[concept1][concept2]['weight']
        net.add_edge(concept1, concept2, weight=weight)

# Set the physics layout
net.force_atlas_2based()

# Show the network
net.show("semantic_network.html")


semantic_network.html
