In [None]:
neo4j_url=%env NEO4J_URL

%reload_ext cypher
%config CypherMagic.uri=neo4j_url + "/db/data"

# Identifying hot spots by change frequency with Software Analytics

## Question

<center>What parts of the code are changed the most?</center>

## Data Sources

<center><i>Which data can possibly answer our question? What information do we need?</i></center>

* Java structures of the CWA-Server scanned by jQAssistant and available in Neo4j
* Git history of the CWA-Server scanned by jQAssistant and available in Neo4j

## Heuristics

<center><i>Which assumptions do we want to make to simplify the answer to our question?</i></center>

* Merge commits are excluded in the analysis
* Each commit of a file counts as one change no matter the number of code that was changed
* Only Java classes are considered in the analysis
* Test classes are excluded from this analysis

## Validation

<center><i>What results do we expect from our analysis, how are they reviewed and presented in an understandable way?</i></center>

* Tabular overview of the Top-20 most changed classes
* Graphical overview of the CWA-Server showing the number of changes per class

## Implementation

<center><i>How can we implement the analysis step by step and in a comprehensible way?</i></center>

* Query which counts the number of commits per Java file 
    * to be stored in a variable (commitsPerType) for later visualizing
* Query which creates a hierarchy of packages and types including the number of commits per type
    * to be stored in a variable (packageTress) for later visualizing

In [None]:
commitsPerType = %cypher MATCH (:Main:Artifact)-[:CONTAINS]->(t:Type:Java), \
                               (t)-[:HAS_SOURCE]->(f:Git:File), \
                               (c:Git:Commit)-[:CONTAINS_CHANGE]->()-[:MODIFIES]->(f) \
                         WHERE NOT ()-[:DECLARES]->(t) \
                         RETURN t.fqn AS Type, t.name AS SimpleName, count(c) AS Commits \
                         ORDER BY Commits Desc \
                         LIMIT 20

In [None]:
packageTree = %cypher MATCH (:Main:Artifact)-[:CONTAINS]->(e:Java) \
                      WHERE (e:Type OR e:Package) AND e.fqn STARTS WITH "app.coronawarn.server" \
                            AND NOT ()-[:DECLARES]->(e) \
                      OPTIONAL MATCH (e)-[:HAS_SOURCE]->(source:Git:File), \
                                     (c:Git:Commit)-[:CONTAINS_CHANGE]->()-[:MODIFIES]->(source) \
                      WITH e, count(c) AS commits \
                      OPTIONAL MATCH (e)-[:DECLARES]->(method:Method) \
                      WITH e, commits, sum(method.effectiveLineCount) AS size \
                      OPTIONAL MATCH (parent:Package)-[:CONTAINS]->(e) \
                      WITH e, parent, commits, size \
                      RETURN DISTINCT e.fqn as Element, parent.fqn as Parent, commits as Size, commits as Color                                    

## Results

In [None]:
import pandas as pd 
import plotly.express as px

### Most Changed Java Classes

In [None]:
commitsPerType

### Pie Chart: Most Changed Java Classes

In [None]:
df = commitsPerType.get_dataframe()
fig = px.pie(df, values='Commits', names='SimpleName', title='Top-20 Commits Per Type')
fig.show()

### Tree Map: Most Changed Java Classes

In [None]:
df = packageTree.get_dataframe()
fig = px.treemap(df, names = 'Element', parents = 'Parent', values = 'Size', color= 'Color')
fig.show()

## Next Steps

* Detailed analysis of the identified classes, to answer e.g.:
    * Are the classes not following the Separation-of-Concerns principle?
    * Are the classes adapters for external systems?
    * Are the classes containing complex, error-prone logic?
    
* Deducing actions to circumvent risks by these classes, e.g.:
    * Splitting up the classes
    * Refactoring the logic to be less error-prone