# Network Analysis

Network analysis can be utilized to understand communication patterns between employees.
Building an email network is helpful with network analysis to visualize communication between individuals 
by creating a graph with representation for each employee and email communication. 
When developing a network analysis measurements of network distribution will be important 
to identify to determine how nodes and edges are distributed in a network. 
This includes the important theory of betweenness centrality, which can show which
nodes are likely pathways of information and what employees would act as bridges to facilitate
communication for wrongdoing acts.
With use of the Python package, NetworkX, creation and analysis can be performed for complex networks. 
Email networks can help
uncover key individuals, groups and relationships.

- `nxviz`
- `G = nx.from_pandas_dataframe(data, 'sender', 'recipient1', edge_attr=['date', 'subject'])`
- `nxviz.ArcPlot`
- `nxviz.CircosPlot`
- `networkx.draw_networkx(G, networkx.spring_layout(G, k=0.1), node_size=25, node_color='red', with_labels=True, edge_color='blue'))`
    - `k` or spring tension in `spring_layout` changes the visualization (small k is more useful)
- Degree Centrality
- Betweenness Centraility

[Kaggle - Enron Network Analysis](https://www.kaggle.com/code/jamestollefson/enron-network-analysis)

- Anomaly Detection, Social Network Analysis, Email Body Analysis

[Enron-Email-Analysis](https://github.com/mihir-m-gandhi/Enron-Email-Analysis)

[Network Analysis with the Enron Email Corpus](https://www.tandfonline.com/doi/pdf/10.1080/10691898.2015.11889734)
[Exploration of Communication Networks from the Enron Email Corpus](http://www.casos.cs.cmu.edu/publications/protected/2005-2006/diesner_2005_explorationsenron.pdf)


Social Network Analysis

Refer to the Python package `networkx` for information on network analysis in Python [1].
Good examples of social network analysis with Python in [2].

- Pre-Processing:
    - Load the dataset
    - Clean Data: Remove any emails with missing information, irrelevant emails, etc.
- Build Network
    - Directed Graph: e.g., sender and recipient represent the nodes, and each email is a directed edge from sender to recipient
    - Analyze Network Properties: Centrality bar charts (Refer to [2])
        - Degree Centrality: Identify individuals who sent or received the most emails.
        - Betweenness Centrality: Identify key individuals who serve as intermediaries.
        - Closeness Centrality: Find individuals who are closest to others on average.
        - Eigenvector Centrality: Help identify key figures who are well-connected to other influential individuals, providing deeper insight into the power dynamics within the Enron network.
            - High Eigenvector Centrality: Individuals with high scores are not only well-connected but connected to other influential people in the network. 
            These could represent core members of critical communication networks or leaders in the organization.
            - Low Eigenvector Centrality: Individuals with low scores are likely either peripheral or isolated in the network or are only connected to others with similarly low influence.
- Additional Network Ideas:
    - Detect Communities: Community detection algorithms (e.g., Girvan-Newman) to find groups within the network.
    - Analyze Clusters: Investigate clusters within the network to understand isolated teams or departments.
- Visualize Network:
    - Color-code communities or size nodes by centrality for
    - Extra detail for more information: Color-code communities or size nodes by centrality
- Reference NetworkX: [bibtex](http://conference.scipy.org.s3-website-us-east-1.amazonaws.com/proceedings/scipy2008/paper_2/reference.bib)

[1](https://networkx.org/documentation/stable/index.html)
[2](https://link.springer.com/book/10.1007/978-3-319-53004-8)