# Internal analysis: Italian Gang

In [8]:
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd

## Dataset import

In [None]:
df = pd.read_csv("datasets/italian_CSV/ITALIAN_GANGS.csv", index_col=0)


G = nx.from_pandas_adjacency(df)

print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")

### Graph visualization

In [None]:
pos = nx.spring_layout(G, k=0.5, iterations=50)

plt.figure(figsize=(12, 10))
nx.draw(G, pos, with_labels=True, node_size=600, font_size=10)
plt.show()

## Within-network analysis (internal analysis of the gangs)

### A. General structural metrics
- **Density**
  - **What it is:** Ratio between existing ties and all possible ties.
  - **Why useful:** Measures cohesion; high density â†’ easier communication and lower vulnerability to central node removal.

- **Average degree**
  - **What it is:** Average number of connections per node.
  - **Why useful:** Indicates member activity and level of engagement.

- **Network diameter and average path length**
  - **What it is:** Maximum/average length of paths between nodes.
  - **Why useful:** Measures the network's efficiency in transmitting information or orders.

- **Clustering coefficient**
  - **What it is:** Probability that a node's neighbors are connected to each other.
  - **Why useful:** Highlights closed subgroups or internal "cells"; useful for understanding resilience and community str

- **Modularity**
  - **What it is:** Measure of the presence of well-defined internal communities.
  - **Why useful:** Reveals internal divisions and possible subgroups or "cliques".

### B. Centrality metrics
- **Degree centrality**
  - **What it is:** Number of direct connections a node has.
  - **Why useful:** Identifies the most active or influential members.

- **Betweenness centrality**
  - **What it is:** Number of times a node lies on the shortest paths between other nodes.
  - **Why useful:** Highlights brokers or gatekeepers; nodes critical for the flow of information.


- **Closeness centrality**
  - **What it is:** Reciprocal of the sum of a node's distances to all other nodes.
  - **Why useful:** A node close to all others can quickly spread information or orders.

- **Eigenvector centrality / PageRank**
  - **What it is:** Importance based on being connected to other important nodes.
  - **Why useful:** Highlights leaders recognized by the most influential members.

### C. Roles and vulnerability
- **Identification of key roles (leader, broker, peripheral members)**
  - **Metrics:** Combination of centrality, degree, betweenness, and clustering.
  - **Why useful:** Identifies who leads, who mediates between subgroups, and who remains peripheral.


- **Cohesion / network robustness**
  - **Metrics:** Density, average path length, k-core decomposition.
  - **Why useful:** Vulnerability testing: impact of removing central nodes.


- **K-core / core-periphery structure**
  - **What it is:** Identifies the central core versus the periphery.
  - **Why useful:** Highlights implicit hierarchy and concentration of power.
