# Modularization options with Neo4j (executed)

Short demo notebook that integrates various data sources into one graph via the Neo4j graph database.

## Show current package structure

In [1]:
%%bash
tree ./spring-framework-petclinic/src/main/java/

./spring-framework-petclinic/src/main/java/
└── org
    └── springframework
        └── samples
            └── petclinic
                ├── PetclinicInitializer.java
                ├── model
                │   ├── BaseEntity.java
                │   ├── NamedEntity.java
                │   ├── Owner.java
                │   ├── Person.java
                │   ├── Pet.java
                │   ├── PetType.java
                │   ├── Specialty.java
                │   ├── Vet.java
                │   ├── Vets.java
                │   ├── Visit.java
                │   └── package-info.java
                ├── repository
                │   ├── OwnerRepository.java
                │   ├── PetRepository.java
                │   ├── VetRepository.java
                │   ├── VisitRepository.java
                │   ├── jdbc
                │   │   ├── JdbcOwnerRepositoryImpl.java
                │   │   ├── JdbcPet.java
                │   │   ├── JdbcPetRepositoryImpl.java
            

## Set up connection to Neo4J
Needs a running Neo4j instance in the background

### Esablish connection to Neo4j graph database

In [2]:
from neo4j import GraphDatabase

URI = "bolt://localhost"
AUTH = ("neo4j", "neo4j")

driver = GraphDatabase.driver(URI, auth=AUTH)
driver.verify_connectivity()
session = driver.session()

Clean data from previous run

In [3]:
query="""
MATCH (a) -[r] -> () DELETE a, r
"""
session.run(query);

In [4]:
query="""
   MATCH (a) DELETE a
"""
session.run(query);

# Data import

## Import dependencies (from jdeps)

### Generating dataset

You can generate this kind of data with any tool that can show you dependencies between your classes. E.g. in Java, using `jdeps`:
    
    
`jdeps -e 'org.springframework.samples.petclinic.*' -v target/classes/ > spring_petclinic_deps.txt`

*Vorsicht: `jdeps` ist nur eine Annäherung an die Abhängigkeiten. Z. B. werden hier Typen, die in Generics verwendet werden, nicht angezeigt.*

### Show dataset

In [5]:
!head data/spring_petclinic_deps.txt

   org.springframework.samples.petclinic.model.NamedEntity -> org.springframework.samples.petclinic.model.BaseEntity classes
   org.springframework.samples.petclinic.model.Owner  -> org.springframework.samples.petclinic.model.Person classes
   org.springframework.samples.petclinic.model.Owner  -> org.springframework.samples.petclinic.model.Pet    classes
   org.springframework.samples.petclinic.model.Person -> org.springframework.samples.petclinic.model.BaseEntity classes
   org.springframework.samples.petclinic.model.Pet    -> org.springframework.samples.petclinic.model.NamedEntity classes
   org.springframework.samples.petclinic.model.Pet    -> org.springframework.samples.petclinic.model.Owner  classes
   org.springframework.samples.petclinic.model.Pet    -> org.springframework.samples.petclinic.model.PetType classes
   org.springframework.samples.petclinic.model.Pet    -> org.springframework.samples.petclinic.model.Visit  classes
   org.springframework.samples.petclinic.mode

### Import dataset to pandas

In [6]:
import pandas as pd

deps = pd.read_csv("data/spring_petclinic_deps.txt", names=["raw"], sep="\r")
deps.head()

Unnamed: 0,raw
0,org.springframework.samples.petclinic.model...
1,org.springframework.samples.petclinic.model...
2,org.springframework.samples.petclinic.model...
3,org.springframework.samples.petclinic.model...
4,org.springframework.samples.petclinic.model...


### Normalize data
*(always a messy thing...)*

In [7]:
# class entries begin with three whitespaces
deps = deps[deps['raw'].str.startswith("   ")]
# separates the source from the target
splitted = deps['raw'].str.split("->", n=1, expand=True)
# remove whitespaces from source and get rid of inner classes
deps['from'] = splitted[0].str.strip().str.split("\$").str[0]
# get the target and the artifact names
splitted_2 = splitted[1].str.split(" ", n=2)
# get also rid of inner classes
deps['to'] = splitted_2.str[1].str.split("\$").str[0]
deps['type'] = splitted_2.str[2].str.strip()
deps['name'] = deps['from'].str.split(".").str[-1]
deps.head()

Unnamed: 0,raw,from,to,type,name
0,org.springframework.samples.petclinic.model...,org.springframework.samples.petclinic.model.Na...,org.springframework.samples.petclinic.model.Ba...,classes,NamedEntity
1,org.springframework.samples.petclinic.model...,org.springframework.samples.petclinic.model.Owner,org.springframework.samples.petclinic.model.Pe...,classes,Owner
2,org.springframework.samples.petclinic.model...,org.springframework.samples.petclinic.model.Owner,org.springframework.samples.petclinic.model.Pet,classes,Owner
3,org.springframework.samples.petclinic.model...,org.springframework.samples.petclinic.model.Pe...,org.springframework.samples.petclinic.model.Ba...,classes,Person
4,org.springframework.samples.petclinic.model...,org.springframework.samples.petclinic.model.Pet,org.springframework.samples.petclinic.model.Na...,classes,Pet


### Transform data for source code file names into dictionary
To load data into Neo4j, we need a dict-like data structure. We also drop duplicated entries to avoid nodes with the same data.

In [8]:
names_data = deps[['from', 'name']].drop_duplicates().to_dict(orient='records')
names_data[:5]

[{'from': 'org.springframework.samples.petclinic.model.NamedEntity',
  'name': 'NamedEntity'},
 {'from': 'org.springframework.samples.petclinic.model.Owner',
  'name': 'Owner'},
 {'from': 'org.springframework.samples.petclinic.model.Person',
  'name': 'Person'},
 {'from': 'org.springframework.samples.petclinic.model.Pet', 'name': 'Pet'},
 {'from': 'org.springframework.samples.petclinic.model.PetType',
  'name': 'PetType'}]

### Import data into Neo4j

In [9]:
query="""
    UNWIND $data as dep_name
    CREATE (t:Type)
    SET
        t.fqn = dep_name.from,
        t.name = dep_name.name
    RETURN t.fqn, t.name
"""
session.run(query, data=names_data).to_df().head()

Unnamed: 0,t.fqn,t.name
0,org.springframework.samples.petclinic.model.Na...,NamedEntity
1,org.springframework.samples.petclinic.model.Owner,Owner
2,org.springframework.samples.petclinic.model.Pe...,Person
3,org.springframework.samples.petclinic.model.Pet,Pet
4,org.springframework.samples.petclinic.model.Pe...,PetType


### Create index for `fqn` for faster queries
support older and newer versions of Neo4j while creating the index

In [10]:
query = ""

if driver.get_server_info().protocol_version[0] <= 3:
    query = "CREATE INDEX ON :Type(fqn)"
else:
    query = "CREATE INDEX FOR (t:Type) ON (t.fqn)"
    
session.run(query);

### Transform data for dependencies into a dictionary

In [11]:
deps_data = deps[['from', 'to']].to_dict(orient='records')
deps_data[:3]

[{'from': 'org.springframework.samples.petclinic.model.NamedEntity',
  'to': 'org.springframework.samples.petclinic.model.BaseEntity'},
 {'from': 'org.springframework.samples.petclinic.model.Owner',
  'to': 'org.springframework.samples.petclinic.model.Person'},
 {'from': 'org.springframework.samples.petclinic.model.Owner',
  'to': 'org.springframework.samples.petclinic.model.Pet'}]

### Connect nodes that depend on each other

In [12]:
query="""
    UNWIND $data as dep
    MATCH (from:Type {fqn : dep.from})
    MATCH (to:Type {fqn: dep.to})
    MERGE (from)-[:DEPENDS_ON]->(to)
    RETURN from.fqn, to.fqn
"""
session.run(query, data=deps_data).to_df().head()

Unnamed: 0,from.fqn,to.fqn
0,org.springframework.samples.petclinic.model.Owner,org.springframework.samples.petclinic.model.Pe...
1,org.springframework.samples.petclinic.model.Owner,org.springframework.samples.petclinic.model.Pet
2,org.springframework.samples.petclinic.model.Pet,org.springframework.samples.petclinic.model.Na...
3,org.springframework.samples.petclinic.model.Pet,org.springframework.samples.petclinic.model.Owner
4,org.springframework.samples.petclinic.model.Pet,org.springframework.samples.petclinic.model.Pe...


### Prepare results for dependency analysis

In [13]:
import json
query="""
    MATCH (t:Type)
    WITH DISTINCT t
    MATCH (type)-[:DEPENDS_ON*0..1]->(directDependency:Type)
    RETURN type.fqn as name, COLLECT(DISTINCT directDependency.fqn) as imports
"""

json_data = session.run(query).to_df().to_json(orient="records")
print(json.dumps(json.loads(json_data), indent=4)[:500] + "\n...")

[
    {
        "name": "org.springframework.samples.petclinic.repository.jpa.JpaOwnerRepositoryImpl",
        "imports": [
            "org.springframework.samples.petclinic.repository.jpa.JpaOwnerRepositoryImpl",
            "org.springframework.samples.petclinic.model.Owner",
            "org.springframework.samples.petclinic.repository.OwnerRepository"
        ]
    },
    {
        "name": "org.springframework.samples.petclinic.repository.jpa.JpaPetRepositoryImpl",
        "imports": [
    
...


### Visualize dependencies

In [14]:
from IPython.core.display import HTML

with open("vis/template_hierarchical_edge_bundling_d3_inline.html") as html_template:
    html = html_template.read().replace("###JSON###", str(json_data))

    with open(f'output/source_code_file_dependencies.html', mode='w') as html_out:
        html_out.write(html)

HTML('<a href="output/source_code_file_dependencies.html" target="_blank">Source Code Files Dependencies</a>')

## Import lines of code information

### Generate dataset

You can generate this data for various source code projects e.g. via `cloc`:

`src/main/java/cloc . --by-file --quiet --csv --out spring_petclinic_cloc.csv`


### Show dataset

In [15]:
!head data/spring_petclinic_cloc.csv

language,filename,blank,comment,code,"github.com/AlDanial/cloc v 1.82  T=0.19 s (244.7 files/s, 16290.7 lines/s)"
Java,./org/springframework/samples/petclinic/repository/jdbc/JdbcOwnerRepositoryImpl.java,19,41,98
Java,./org/springframework/samples/petclinic/model/Owner.java,22,35,96
Java,./org/springframework/samples/petclinic/web/OwnerController.java,18,32,85
Java,./org/springframework/samples/petclinic/web/PetController.java,15,20,78
Java,./org/springframework/samples/petclinic/repository/jdbc/JdbcPetRepositoryImpl.java,15,26,75
Java,./org/springframework/samples/petclinic/service/ClinicServiceImpl.java,18,21,74
Java,./org/springframework/samples/petclinic/repository/jdbc/OneToManyResultSetExtractor.java,15,74,70
Java,./org/springframework/samples/petclinic/model/Pet.java,20,22,69
Java,./org/springframework/samples/petclinic/repository/jdbc/JdbcVisitRepositoryImpl.java,17,29,57


### Import data

In [16]:
cloc = pd.read_csv("data/spring_petclinic_cloc.csv")[:-1].copy()
cloc.tail()

Unnamed: 0,language,filename,blank,comment,code,"github.com/AlDanial/cloc v 1.82 T=0.19 s (244.7 files/s, 16290.7 lines/s)"
42,Java,./org/springframework/samples/petclinic/reposi...,2,21,6,
43,Java,./org/springframework/samples/petclinic/web/pa...,1,3,1,
44,Java,./org/springframework/samples/petclinic/model/...,1,3,1,
45,Java,./org/springframework/samples/petclinic/reposi...,1,4,1,
46,Java,./org/springframework/samples/petclinic/reposi...,1,4,1,


### Normalize data
`cloc` delivers paths, but we need a full qualified name ("fqn") that matches with exiting data.

In [17]:
cloc['fqn'] = cloc['filename'].str.replace("./", "", regex=False)\
                              .str.replace("/",".", regex=False)\
                              .str.replace(".java","", regex=False)
cloc.head()

Unnamed: 0,language,filename,blank,comment,code,"github.com/AlDanial/cloc v 1.82 T=0.19 s (244.7 files/s, 16290.7 lines/s)",fqn
0,Java,./org/springframework/samples/petclinic/reposi...,19,41,98,,org.springframework.samples.petclinic.reposito...
1,Java,./org/springframework/samples/petclinic/model/...,22,35,96,,org.springframework.samples.petclinic.model.Owner
2,Java,./org/springframework/samples/petclinic/web/Ow...,18,32,85,,org.springframework.samples.petclinic.web.Owne...
3,Java,./org/springframework/samples/petclinic/web/Pe...,15,20,78,,org.springframework.samples.petclinic.web.PetC...
4,Java,./org/springframework/samples/petclinic/reposi...,15,26,75,,org.springframework.samples.petclinic.reposito...


### Generate dictionary

In [18]:
cloc_data = cloc.to_dict(orient='records')
cloc_data[:2]

[{'language': 'Java',
  'filename': './org/springframework/samples/petclinic/repository/jdbc/JdbcOwnerRepositoryImpl.java',
  'blank': 19,
  'comment': 41,
  'code': 98,
  'github.com/AlDanial/cloc v 1.82  T=0.19 s (244.7 files/s, 16290.7 lines/s)': nan,
  'fqn': 'org.springframework.samples.petclinic.repository.jdbc.JdbcOwnerRepositoryImpl'},
 {'language': 'Java',
  'filename': './org/springframework/samples/petclinic/model/Owner.java',
  'blank': 22,
  'comment': 35,
  'code': 96,
  'github.com/AlDanial/cloc v 1.82  T=0.19 s (244.7 files/s, 16290.7 lines/s)': nan,
  'fqn': 'org.springframework.samples.petclinic.model.Owner'}]

### Import into Neo4j

In [19]:
query="""
    UNWIND $data as loc
    MATCH (t:Type {fqn : loc.fqn})
    SET
        t.lines = loc.code,
        t.comments = loc.comment,
        t.blanks = loc.blank
    RETURN t.fqn, t.name, t.lines, t.comments, t.blanks
"""

session.run(query, data=cloc_data).to_df().head()

Unnamed: 0,t.fqn,t.name,t.lines,t.comments,t.blanks
0,org.springframework.samples.petclinic.reposito...,JdbcOwnerRepositoryImpl,98,41,19
1,org.springframework.samples.petclinic.model.Owner,Owner,96,35,22
2,org.springframework.samples.petclinic.web.Owne...,OwnerController,85,32,18
3,org.springframework.samples.petclinic.web.PetC...,PetController,78,20,15
4,org.springframework.samples.petclinic.reposito...,JdbcPetRepositoryImpl,75,26,15


## Import usage data

### Generate dataset

E.g. via coverage tools like JaCoCo you can get a glimpse on what's happening during the usage of your application.

See here for more details: https://www.feststelltaste.de/visualizing-production-coverage-with-jacoco-pandas-and-d3/

### Show dataset

In [20]:
!head data/spring_petclinic_production_coverage_data.csv

PACKAGE,CLASS,LINE_MISSED,LINE_COVERED
org.springframework.samples.petclinic,PetclinicInitializer,0,24
org.springframework.samples.petclinic.model,NamedEntity,1,4
org.springframework.samples.petclinic.model,Specialty,0,1
org.springframework.samples.petclinic.model,PetType,0,1
org.springframework.samples.petclinic.model,Vets,4,0
org.springframework.samples.petclinic.model,Visit,0,12
org.springframework.samples.petclinic.model,BaseEntity,0,5
org.springframework.samples.petclinic.model,Person,0,7
org.springframework.samples.petclinic.model,Owner,14,26


### Import dataset

In [21]:
coverage = pd.read_csv("data/spring_petclinic_production_coverage_data.csv")
coverage.head()

Unnamed: 0,PACKAGE,CLASS,LINE_MISSED,LINE_COVERED
0,org.springframework.samples.petclinic,PetclinicInitializer,0,24
1,org.springframework.samples.petclinic.model,NamedEntity,1,4
2,org.springframework.samples.petclinic.model,Specialty,0,1
3,org.springframework.samples.petclinic.model,PetType,0,1
4,org.springframework.samples.petclinic.model,Vets,4,0


### Enrich data
Calculate the percentage of executed lines of code per class

In [22]:
coverage['lines'] = coverage.LINE_COVERED + coverage.LINE_MISSED
coverage['ratio'] = coverage.LINE_COVERED / coverage.lines
coverage.head()

Unnamed: 0,PACKAGE,CLASS,LINE_MISSED,LINE_COVERED,lines,ratio
0,org.springframework.samples.petclinic,PetclinicInitializer,0,24,24,1.0
1,org.springframework.samples.petclinic.model,NamedEntity,1,4,5,0.8
2,org.springframework.samples.petclinic.model,Specialty,0,1,1,1.0
3,org.springframework.samples.petclinic.model,PetType,0,1,1,1.0
4,org.springframework.samples.petclinic.model,Vets,4,0,4,0.0


### Normalize data

In [23]:
coverage['fqn'] = coverage["PACKAGE"] + "." + coverage["CLASS"]
coverage.head()

Unnamed: 0,PACKAGE,CLASS,LINE_MISSED,LINE_COVERED,lines,ratio,fqn
0,org.springframework.samples.petclinic,PetclinicInitializer,0,24,24,1.0,org.springframework.samples.petclinic.Petclini...
1,org.springframework.samples.petclinic.model,NamedEntity,1,4,5,0.8,org.springframework.samples.petclinic.model.Na...
2,org.springframework.samples.petclinic.model,Specialty,0,1,1,1.0,org.springframework.samples.petclinic.model.Sp...
3,org.springframework.samples.petclinic.model,PetType,0,1,1,1.0,org.springframework.samples.petclinic.model.Pe...
4,org.springframework.samples.petclinic.model,Vets,4,0,4,0.0,org.springframework.samples.petclinic.model.Vets


### Import data into Neo4j

In [24]:
query="""
    UNWIND $data as coverage
    MATCH (t:Type {fqn : coverage.fqn})
    MERGE (t)-[:HAS_MEASURE]->(m)
    SET 
        m:Measure:Coverage,
        m.ratio = coverage.ratio,
        m.lines = coverage.lines
    RETURN t.fqn as fqn, m.ratio as ratio, m.lines as lines
"""

session.run(query, data=coverage.to_dict(orient='records')).to_df().head()

Unnamed: 0,fqn,ratio,lines
0,org.springframework.samples.petclinic.model.Na...,0.8,5
1,org.springframework.samples.petclinic.model.Sp...,1.0,1
2,org.springframework.samples.petclinic.model.Pe...,1.0,1
3,org.springframework.samples.petclinic.model.Vets,0.0,4
4,org.springframework.samples.petclinic.model.Visit,1.0,12


# Check data

## Query Nodes

### List measures

In [25]:
query="""
   MATCH (n:Type)-[:HAS_MEASURE]->(m:Measure)
   RETURN n.fqn as fqn, n.lines as lines, m.ratio as ratio
"""

module_options = session.run(query).to_df()
module_options.head()

Unnamed: 0,fqn,lines,ratio
0,org.springframework.samples.petclinic.web.Visi...,35,0.8125
1,org.springframework.samples.petclinic.web.VetC...,40,0.3
2,org.springframework.samples.petclinic.util.Ent...,16,0.0
3,org.springframework.samples.petclinic.reposito...,16,0.0
4,org.springframework.samples.petclinic.reposito...,75,0.0


# Explore modularization options

## Explore existing modularization

### Extract existing main module structure

In [26]:
module_options['base_module'] = module_options['fqn'].str.split(".").str[4]
module_options.head()

Unnamed: 0,fqn,lines,ratio,base_module
0,org.springframework.samples.petclinic.web.Visi...,35,0.8125,web
1,org.springframework.samples.petclinic.web.VetC...,40,0.3,web
2,org.springframework.samples.petclinic.util.Ent...,16,0.0,util
3,org.springframework.samples.petclinic.reposito...,16,0.0,repository
4,org.springframework.samples.petclinic.reposito...,75,0.0,repository


### Add base module information to graph

In [27]:
query="""
    UNWIND $data as module
    MATCH (t:Type {fqn : module.fqn})
    MERGE (m:Base:Module{name:module.base_module})
    MERGE (t)-[:BELONGS_TO]->(m)
    RETURN t.fqn as fqn, m.name as base_module
"""
session.run(query, data=module_options.to_dict(orient='records')).to_df().head()

Unnamed: 0,fqn,base_module
0,org.springframework.samples.petclinic.web.Visi...,web
1,org.springframework.samples.petclinic.web.VetC...,web
2,org.springframework.samples.petclinic.util.Ent...,util
3,org.springframework.samples.petclinic.reposito...,repository
4,org.springframework.samples.petclinic.reposito...,repository


### Add base module dependencies to graph

In [28]:
query = """
    MATCH (m1:Base:Module)<-[:BELONGS_TO]-(t1:Type)<-[:DEPENDS_ON]-(t2:Type)-[:BELONGS_TO]->(m2:Base:Module)
    WHERE m1 <> m2
    MERGE (m2)-[:USES]->(m1)
    RETURN DISTINCT(m2.name) as module, m1.name as dependent_module, COUNT(t2) as dependencies
"""
base_module_dependencies = session.run(query).to_df()
base_module_dependencies.head()

Unnamed: 0,module,dependent_module,dependencies
0,web,model,10
1,repository,model,18
2,repository,util,3
3,service,model,5


### Query for basic module statistics

In [29]:
query="""
    MATCH (t:Type)-[:BELONGS_TO]->(m:Base:Module)
    RETURN m.name as module_name, count(t) as classes
"""

session.run(query).to_df().head()

Unnamed: 0,module_name,classes
0,web,6
1,util,1
2,repository,12
3,model,9
4,service,1


### Generate JSON output for d3 visualization

In [30]:
json_data = base_module_dependencies.to_dict(orient='split')['data']
print(json.dumps(json_data, indent=4)[:200] + "\n...")

[
    [
        "web",
        "model",
        10
    ],
    [
        "repository",
        "model",
        18
    ],
    [
        "repository",
        "util",
        3
    ],
    [
        "ser
...


### Export data for visualization

In [31]:
with open("vis/template_chord_diagram_d3_inline.html") as html_template:
    html = html_template.read().replace("###JSON###", str(json_data))

    with open(f'output/chord_diagram_base_module.html', mode='w') as html_out:
        html_out.write(html)

HTML('<a href="output/chord_diagram_base_module.html" target="_blank">Open Chord Diagram for Base Modules</a>')

## Explore alternative modularization options

In [32]:
module_options.head()

Unnamed: 0,fqn,lines,ratio,base_module
0,org.springframework.samples.petclinic.web.Visi...,35,0.8125,web
1,org.springframework.samples.petclinic.web.VetC...,40,0.3,web
2,org.springframework.samples.petclinic.util.Ent...,16,0.0,util
3,org.springframework.samples.petclinic.reposito...,16,0.0,repository
4,org.springframework.samples.petclinic.reposito...,75,0.0,repository


### Extract domain based modules
*(here we use a very simple heuristic by using domain-related names that a part of the class names)*

In [33]:
domain_parts = ["Owner", "Pet", "Visit", "Vet", "Specialty", "Clinic"]

for domain_part in domain_parts:
    module_options.loc[module_options['fqn'].str.contains(domain_part), 'domain_part'] = domain_part

module_options.head()

Unnamed: 0,fqn,lines,ratio,base_module,domain_part
0,org.springframework.samples.petclinic.web.Visi...,35,0.8125,web,Visit
1,org.springframework.samples.petclinic.web.VetC...,40,0.3,web,Vet
2,org.springframework.samples.petclinic.util.Ent...,16,0.0,util,
3,org.springframework.samples.petclinic.reposito...,16,0.0,repository,Visit
4,org.springframework.samples.petclinic.reposito...,75,0.0,repository,Pet


### Come up with an alternative structure

In [34]:
domain_part_mapping = {
    "Visit" : "Checkup",
    "Pet" : "Patient",
    "Owner" : "Patient",
    "Vet" : "Doctor",
    "Specialty" : "Doctor"
} 
    
module_options['domain'] = module_options['domain_part'].map(domain_part_mapping).fillna("Framework")
module_options.head()

Unnamed: 0,fqn,lines,ratio,base_module,domain_part,domain
0,org.springframework.samples.petclinic.web.Visi...,35,0.8125,web,Visit,Checkup
1,org.springframework.samples.petclinic.web.VetC...,40,0.3,web,Vet,Doctor
2,org.springframework.samples.petclinic.util.Ent...,16,0.0,util,,Framework
3,org.springframework.samples.petclinic.reposito...,16,0.0,repository,Visit,Checkup
4,org.springframework.samples.petclinic.reposito...,75,0.0,repository,Pet,Patient


### Add alternative modules to graph

In [35]:
query="""
    UNWIND $data as module
    MATCH (t:Type {fqn : module.fqn})
    MERGE (m:Domain:Module{name:module.domain})
    MERGE (t)-[:BELONGS_TO]->(m)
    RETURN t.fqn as fqn, m.name
"""

session.run(query, data=module_options.to_dict(orient='records')).to_df().head()

Unnamed: 0,fqn,m.name
0,org.springframework.samples.petclinic.web.Visi...,Checkup
1,org.springframework.samples.petclinic.web.VetC...,Doctor
2,org.springframework.samples.petclinic.util.Ent...,Framework
3,org.springframework.samples.petclinic.reposito...,Checkup
4,org.springframework.samples.petclinic.reposito...,Patient


### Add base module dependencies to graph

In [36]:
query = """
    MATCH (m1:Domain:Module)<-[:BELONGS_TO]-(t1:Type)<-[:DEPENDS_ON]-(t2:Type)-[:BELONGS_TO]->(m2:Domain:Module)
    WHERE m1 <> m2
    MERGE (m2)-[:USES]->(m1)
    RETURN DISTINCT(m2.name) as module, m1.name as dependent_module, COUNT(t2) as dependencies, SUM(t2.lines) as lines
"""
domain_module_dependencies = session.run(query).to_df()
domain_module_dependencies.head()

Unnamed: 0,module,dependent_module,dependencies,lines
0,Checkup,Patient,7,303
1,Doctor,Framework,3,99
2,Framework,Patient,3,222
3,Framework,Doctor,1,74
4,Framework,Checkup,1,74


### Visualize alternative modularization

In [37]:
json_data = domain_module_dependencies.to_dict(orient='split')['data']

with open("vis/template_chord_diagram_d3_inline.html") as html_template:
    html = html_template.read().replace("###JSON###", str(json_data))

    with open(f'output/chord_diagram_domain_module.html', mode='w') as html_out:
        html_out.write(html)

HTML('<a href="output/chord_diagram_domain_module.html" target="_blank">Open Chord Diagram for Domain Modules</a>')

### Add base module dependencies to graph

In [38]:
query = """
    MATCH (m1:Domain:Module)<-[:BELONGS_TO]-(t1:Type)<-[:DEPENDS_ON]-(t2:Type)-[:BELONGS_TO]->(m2:Domain:Module)
    RETURN DISTINCT(m2.name) as module, m1.name as dependent_module, COUNT(t2) as dependencies
"""
domain_module_dependencies = session.run(query).to_df()
json_data = domain_module_dependencies.to_dict(orient='split')['data']
with open ( "output/chord-diagram.json", mode='w') as json_file:
    json_file.write(json.dumps(json_data, indent=3))
json_data

[['Checkup', 'Patient', 7],
 ['Checkup', 'Checkup', 7],
 ['Doctor', 'Doctor', 6],
 ['Doctor', 'Framework', 3],
 ['Framework', 'Patient', 3],
 ['Framework', 'Doctor', 1],
 ['Framework', 'Checkup', 1],
 ['Patient', 'Patient', 23],
 ['Patient', 'Framework', 5],
 ['Patient', 'Checkup', 2]]

### Prepare results for dependency analysis

In [39]:
query="""
MATCH (m:Domain:Module)-[:USES]->(m_dep:Domain:Module)
RETURN m.name as name, COLLECT(DISTINCT m_dep.name) as imports
"""

json_data = session.run(query).to_df().to_json(orient="records")
print(json_data[:200])

[{"name":"Checkup","imports":["Patient"]},{"name":"Doctor","imports":["Framework"]},{"name":"Framework","imports":["Patient","Checkup","Doctor"]},{"name":"Patient","imports":["Framework","Checkup"]}]


### Create visualization based on data

In [40]:
with open("vis/template_hierarchical_edge_bundling_d3_inline.html") as html_template:
    html = html_template.read().replace("###JSON###", str(json_data))

    with open(f'output/domain_modules_dependencies.html', mode='w') as html_out:
        html_out.write(html)

HTML('<a href="output/domain_modules_dependencies.html" target="_blank">Domain Modules Dependencies</a>')

## Analyze weird dependencies from Framework to other modules

### List all classes in the Framework module

In [41]:
query = """
    MATCH (m1:Domain:Module {name:"Framework"})<-[:BELONGS_TO]-(t1:Type)
    RETURN t1.name as FrameworkType
"""
session.run(query).to_df()

Unnamed: 0,FrameworkType
0,ClinicServiceImpl
1,Person
2,EntityUtils
3,NamedEntity


### List dependencies from Framework to domain modules

In [42]:
query = """
    MATCH (m1:Domain:Module {name:"Framework"})<-[:BELONGS_TO]-(t1:Type)-[:DEPENDS_ON]->(t2:Type)-[:BELONGS_TO]->(m2:Domain:Module)
    RETURN t1.name as FrameworkType, t2.name as DomainType, m2.name as DomainModule
"""
session.run(query).to_df()

Unnamed: 0,FrameworkType,DomainType,DomainModule
0,ClinicServiceImpl,Pet,Patient
1,ClinicServiceImpl,PetType,Patient
2,ClinicServiceImpl,Vet,Doctor
3,ClinicServiceImpl,Visit,Checkup
4,ClinicServiceImpl,Owner,Patient
