# Community Detection Algorithms
- essential for evaluating group behaviour and emergent phenomena
- members will have more relationships within vs. outside of the group
- reveals 
    - clusters of nodes, 
    - isolated groups, and 
    - network structure
- infer similar behaviour or preferences of peer groups
- estimate resiliency
- find nested relationships
- prepare data for other analyses 

| Algorithm type                         | What it does                                                                                                                     | Example use                                                                                                                        |
|----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| Triangle Count and Cluster Coefficient | Measures how many nodes form triangles and the degree to which nodes tend to cluster together                                    | Estimating group stability and whether the network might exhibit “small-world” behaviors seen in graphs with tightly knit clusters |
| Strongly Connected Components          | Finds groups where each node is reachable from every other node in that same group following the direction of relationships      | Making product recommendations based on group affiliation or similar items                                                         |
| Connected Components                   | Finds groups where each node is reachable from every other node in that same group, regardless of the direction of relationships | Performing fast grouping for other algorithms and identify islands                                                                 |
| Label Propagation                      | Infers clusters by spreading labels based on neighborhood majorities                                                             | Understanding consensus in social communities or finding dangerous combinations of possible co-prescribed drugs                    |
| Louvain Modularity                     | Maximizes the presumed accuracy of groupings by comparing relationship weights and densities to a defined estimate or average    | In fraud analysis, evaluating whether a group has just a few discrete bad behaviors or is acting as a fraud ring                   |

# Example Graph Data: The Software Dependency Graph
## Importing Data into Apache Spark

In [6]:
from pyspark.sql.session import SparkSession
spark = SparkSession.builder.appName('community').getOrCreate() 
from code.script.community import *
g = GraphFrame(
        spark.read.csv(op.join(data_path, 'sw-nodes.csv'), header=True), 
        spark.read.csv(op.join(data_path, 'sw-relationships.csv'), header=True)
    )

In [8]:
g.vertices.toPandas()

Unnamed: 0,id
0,six
1,pandas
2,numpy
3,python-dateutil
4,pytz
5,pyspark
6,matplotlib
7,spacy
8,py4j
9,jupyter


In [9]:
g.edges.toPandas()

Unnamed: 0,src,dst,relationship
0,pandas,numpy,DEPENDS_ON
1,pandas,pytz,DEPENDS_ON
2,pandas,python-dateutil,DEPENDS_ON
3,python-dateutil,six,DEPENDS_ON
4,pyspark,py4j,DEPENDS_ON
5,matplotlib,numpy,DEPENDS_ON
6,matplotlib,python-dateutil,DEPENDS_ON
7,matplotlib,six,DEPENDS_ON
8,matplotlib,pytz,DEPENDS_ON
9,spacy,six,DEPENDS_ON


Triangle Count and Clustering Coeefficient