# Code-Hotspot-Analyse
Step-by-Step-Tutorial


## Rohdaten via Kommandozeile

### Lines of Code ("Complexity")

Variante mit `cloc`

In [1]:
%%bash
cd spring-framework-petclinic/
cloc . --by-file --quiet --csv --out ../output/cloc_lines.csv
head ../output/cloc_lines.csv

language,filename,blank,comment,code,"github.com/AlDanial/cloc v 1.98  T=1.63 s (67.0 files/s 10622.5 lines/s)"
SVG,./src/main/webapp/resources/fonts/varela_round-webfont.svg,0,0,7875
SVG,./src/main/webapp/resources/fonts/montserrat-webfont.svg,0,0,1283
Maven,./pom.xml,41,32,483
XSD,./src/main/resources/cache/ehcache.xsd,13,1,405
XML,./src/test/jmeter/petclinic_test_plan.jmx,0,0,401
Bourne Shell,./mvnw,33,62,215
LESS,./src/main/webapp/resources/less/petclinic.less,44,13,182
Text,./LICENSE.txt,32,0,169
Markdown,./readme.md,73,0,156


### Änderungshäufigkeiten ("Hotness")

Änderungen pro Datei mit `git log`

In [2]:
%%bash
cd spring-framework-petclinic/
git log --name-only --no-merges --format="" -- *.java > ../output/changes.txt
head ../output/changes.txt

src/test/java/org/springframework/samples/petclinic/web/VetControllerTests.java
src/main/java/org/springframework/samples/petclinic/web/VetController.java
src/main/java/org/springframework/samples/petclinic/web/VetController.java
src/main/java/org/springframework/samples/petclinic/web/VetController.java
src/main/java/org/springframework/samples/petclinic/repository/jdbc/JdbcVisitRowMapper.java
src/main/java/org/springframework/samples/petclinic/util/EntityUtils.java
src/main/java/org/springframework/samples/petclinic/repository/jdbc/JdbcPetRepositoryImpl.java
src/main/java/org/springframework/samples/petclinic/repository/OwnerRepository.java
src/main/java/org/springframework/samples/petclinic/repository/PetRepository.java
src/main/java/org/springframework/samples/petclinic/repository/VetRepository.java


## Datenmassage mit Data-Science-Werkzeugen

### Lines of Code

"Lines of Code"-Daten einlesen

In [3]:
!head -n 5 output/cloc_lines.csv

language,filename,blank,comment,code,"github.com/AlDanial/cloc v 1.98  T=1.63 s (67.0 files/s 10622.5 lines/s)"
SVG,./src/main/webapp/resources/fonts/varela_round-webfont.svg,0,0,7875
SVG,./src/main/webapp/resources/fonts/montserrat-webfont.svg,0,0,1283
Maven,./pom.xml,41,32,483
XSD,./src/main/resources/cache/ehcache.xsd,13,1,405


In der Theorie: Datei mit Zeilenanzahlen direkt einlesen

In [4]:
import pandas as pd
pd.read_csv("output/cloc_lines.csv").head(3)

Unnamed: 0,language,filename,blank,comment,code,github.com/AlDanial/cloc v 1.98 T=1.63 s (67.0 files/s 10622.5 lines/s)
0,SVG,./src/main/webapp/resources/fonts/varela_round...,0,0,7875,
1,SVG,./src/main/webapp/resources/fonts/montserrat-w...,0,0,1283,
2,Maven,./pom.xml,41,32,483,


In der Praxis: Einlesen mit kleineren Anpassungen

In [5]:
lines = pd.read_csv("output/cloc_lines.csv", index_col=1)[:-1][['code']]
lines.index = lines.index.str[2:]
lines.head()

Unnamed: 0_level_0,code
filename,Unnamed: 1_level_1
src/main/webapp/resources/fonts/varela_round-webfont.svg,7875
src/main/webapp/resources/fonts/montserrat-webfont.svg,1283
pom.xml,483
src/main/resources/cache/ehcache.xsd,405
src/test/jmeter/petclinic_test_plan.jmx,401


### Änderungsdaten

Blick auf die Rohdaten

In [6]:
!head -n 5 output/changes.txt

src/test/java/org/springframework/samples/petclinic/web/VetControllerTests.java
src/main/java/org/springframework/samples/petclinic/web/VetController.java
src/main/java/org/springframework/samples/petclinic/web/VetController.java
src/main/java/org/springframework/samples/petclinic/web/VetController.java
src/main/java/org/springframework/samples/petclinic/repository/jdbc/JdbcVisitRowMapper.java


Datei mit jeder geänderten Datei einlesen (+ Spaltennamen)

In [7]:
change_per_file = pd.read_csv("output/changes.txt", names=['filepath'])
change_per_file.head()

Unnamed: 0,filepath
0,src/test/java/org/springframework/samples/petc...
1,src/main/java/org/springframework/samples/petc...
2,src/main/java/org/springframework/samples/petc...
3,src/main/java/org/springframework/samples/petc...
4,src/main/java/org/springframework/samples/petc...


Hier noch zu erledigen: Änderungshäufigkeit der Dateien zählen

In [8]:
changes = pd.DataFrame(change_per_file['filepath'].value_counts())
changes.columns = ["changes"]
changes.head()

Unnamed: 0_level_0,changes
filepath,Unnamed: 1_level_1
src/main/java/org/springframework/samples/petclinic/repository/jdbc/JdbcOwnerRepositoryImpl.java,26
src/test/java/org/springframework/samples/petclinic/service/AbstractClinicServiceTests.java,20
src/main/java/org/springframework/samples/petclinic/web/OwnerController.java,20
src/main/java/org/springframework/samples/petclinic/web/PetController.java,19
src/main/java/org/springframework/samples/petclinic/repository/jdbc/JdbcVetRepositoryImpl.java,19


Getrennte Daten zusammenführen

In [9]:
hotspots = changes.join(lines).dropna()
hotspots.head()

Unnamed: 0_level_0,changes,code
filepath,Unnamed: 1_level_1,Unnamed: 2_level_1
src/main/java/org/springframework/samples/petclinic/repository/jdbc/JdbcOwnerRepositoryImpl.java,26,98.0
src/test/java/org/springframework/samples/petclinic/service/AbstractClinicServiceTests.java,20,135.0
src/main/java/org/springframework/samples/petclinic/web/OwnerController.java,20,85.0
src/main/java/org/springframework/samples/petclinic/web/PetController.java,19,78.0
src/main/java/org/springframework/samples/petclinic/repository/jdbc/JdbcVetRepositoryImpl.java,19,48.0


## Visualisierung der Daten

Woher nehmen?
[https://observablehq.com/@d3/gallery](https://observablehq.com/@d3/gallery)

Wir brauchen folgendes Datenformat und ein D3-Visualisierungs-Template für Circle Pack Hierarchy Chart.

- Komplette Datei: <code><a href="demo/spring_petclinic_demo_flare.json" target="_blank">demo/spring_petclinic_demo_flare.json</a></code>
- Template: <code><a href="vis/template_circle_pack_hierarchy_chart_d3_inline.html" target="_blank">vis/template_circle_pack_hierarchy_chart_d3_inline.html</a></code> (hier: Quellcode anzeigen lassen)

<small><pre>{
    'name': 'flare',
    'children': [{
            'name': 'src',
            'children': [{
                    'name': 'main',
                    'children': [{
                            'name': 'java',
                            'children': [{
                                    'name': 'org',
                                    'children': [{
                                            'name': 'springframework',
                                            'children': [{
                                                    'name': 'samples',
                                                    'children': [{
                                                            'name': 'petclinic',
                                                            'children': [{
                                                                    'name': 'repository',
                                                                    'children': [{
                                                                            'name': 'jdbc',
                                                                            'children': [{
                                                                                    'name': 'JdbcOwnerRepositoryImpl.java (158.0 [27])',
                                                                                    'size': 158.0,
                                                                                    'color': '#b40426'
                                                                                }, {
                                                                                    'name': 'JdbcVetRepositoryImpl.java (88.0 [20])',
                                                                                    'size': 88.0,
                                                                                    'color': '#f59d7e'
                                                                                }, {
                                                                                    'name': 'JdbcVisitRepositoryImpl.java (103.0 [19])',
                                                                                    'size': 103.0,
                                                                                    'color': '#f7aa8c'
                                                                                },
...
</pre></small>

Daten für Visualisierung aufbereiten: In D3-Datenformat (JSON) umwandeln, HTML-Template für Circle Pack Hierarchy Chart mit Daten füttern und Datei erzeugen (nein, diesen Code schreiben wir nicht selbst heute...).

In [10]:
from matplotlib import cm
from matplotlib.colors import rgb2hex
import json
from IPython.core.display import HTML


def create_plot_data(df, color_column_name, size_column_name, seperator):
    plot_data = pd.DataFrame(index=df.index)
    plot_data['value_for_color'] = df[color_column_name]
    plot_data['ratio_for_color'] = plot_data['value_for_color'] / plot_data['value_for_color'].max()
    plot_data['color'] = plot_data['ratio_for_color'].apply(lambda x : rgb2hex(cm.coolwarm(x)))
    plot_data['size'] = df[size_column_name]
    plot_data[['path', 'name']] = df.index.str.rsplit(seperator, n=1).to_list()
    plot_data['path_list'] = plot_data['path'].str.split(seperator)
    return plot_data

def create_flare_json(df):

    json_data = {'name': 'flare', 'children': []}

    for _, series in df.iterrows():
        hierarchical_data = series['path_list']

        children = json_data['children']
        for part in hierarchical_data:
            entry = next((child for child in children if child.get('name', '') == part), None)
            if not entry:
                entry = {'name': part, 'children': []}
                children.append(entry)
            children = entry['children']

        children.append({
            'name': f"{series['name']} ({series['size']} [{series['value_for_color']}])",
            'size': series['size'],
            'color': series['color']
        })

    return json_data


def create_file(hotspots, color_column_name, size_column_name, separator, suffix=""):
    json_data = create_flare_json(create_plot_data(hotspots, color_column_name, size_column_name,separator))
            
    with open("vis/template_circle_pack_hierarchy_chart_d3_inline.html") as html_template:
        html = html_template.read().replace("###JSON###", str(json_data))
        
        with open(f'output/code_hotspots{suffix}.html', mode='w') as html_out:
            html_out.write(html)
    
    return HTML(f'<a href="output/code_hotspots{suffix}.html" target="_blank">Hotspots via Circle Pack Hierarchy Chart {suffix}</a>')
            
create_file(hotspots, "changes", "code", "/")

## Herausforderung
In `data/spring_petclinic_production_coverage_data.csv` liegt eine Datei mit Messergebnissen in Form einer Code Coverage, welche während einer repräsentativen Nutzung der Anwendung aufgezeichnet wurde. Visualisiert diese Daten mit Hilfe der Circle Pack Hierarchy Chart.

Daten einlesen mittels pandas

In [11]:
coverage = pd.read_csv("data/spring_petclinic_production_coverage_data.csv")
coverage.head()

Unnamed: 0,PACKAGE,CLASS,LINE_MISSED,LINE_COVERED
0,org.springframework.samples.petclinic,PetclinicInitializer,0,24
1,org.springframework.samples.petclinic.model,NamedEntity,1,4
2,org.springframework.samples.petclinic.model,Specialty,0,1
3,org.springframework.samples.petclinic.model,PetType,0,1
4,org.springframework.samples.petclinic.model,Vets,4,0


Berechne `ratio`, also den Prozentsatz des durchgelaufenen Codes.

In [12]:
coverage['size'] = coverage['LINE_MISSED'] + coverage['LINE_COVERED']
coverage['ratio'] = coverage['LINE_COVERED'] / coverage['size']
coverage.head()

Unnamed: 0,PACKAGE,CLASS,LINE_MISSED,LINE_COVERED,size,ratio
0,org.springframework.samples.petclinic,PetclinicInitializer,0,24,24,1.0
1,org.springframework.samples.petclinic.model,NamedEntity,1,4,5,0.8
2,org.springframework.samples.petclinic.model,Specialty,0,1,1,1.0
3,org.springframework.samples.petclinic.model,PetType,0,1,1,1.0
4,org.springframework.samples.petclinic.model,Vets,4,0,4,0.0


Erzeuge eine neue Spalte mit dem kompletten Pfad zu einer Quellcodedatei und setze diesen mit `set_index` als Index.

In [13]:
coverage['fqn'] = coverage['PACKAGE'] + "." + coverage['CLASS']
coverage = coverage.set_index(coverage['fqn'])
coverage.head()

Unnamed: 0_level_0,PACKAGE,CLASS,LINE_MISSED,LINE_COVERED,size,ratio,fqn
fqn,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
org.springframework.samples.petclinic.PetclinicInitializer,org.springframework.samples.petclinic,PetclinicInitializer,0,24,24,1.0,org.springframework.samples.petclinic.Petclini...
org.springframework.samples.petclinic.model.NamedEntity,org.springframework.samples.petclinic.model,NamedEntity,1,4,5,0.8,org.springframework.samples.petclinic.model.Na...
org.springframework.samples.petclinic.model.Specialty,org.springframework.samples.petclinic.model,Specialty,0,1,1,1.0,org.springframework.samples.petclinic.model.Sp...
org.springframework.samples.petclinic.model.PetType,org.springframework.samples.petclinic.model,PetType,0,1,1,1.0,org.springframework.samples.petclinic.model.Pe...
org.springframework.samples.petclinic.model.Vets,org.springframework.samples.petclinic.model,Vets,4,0,4,0.0,org.springframework.samples.petclinic.model.Vets


Nutze die obige `create_file`-Funktion, um eine Visualisierung zu erzeugen (Tipp: `suffix` als letzen Parameter der Funktion verwenden, um eine Datei mit anderem Namen zu erzeugen)

In [14]:
create_file(coverage, "ratio", "size", ".", "_coverage")