# Cheatsheet Code-Hotspot-Analyse
<strong>Step-by-Step Demo</strong>

### Rohdaten von der Kommandozeile

#### Lines of Code
Einfache Variante mit `find`

In [1]:
%%bash
cd spring-framework-petclinic/
find . -name *.java | xargs wc -l | head -n-1 > ../file_sizes.txt
head ../file_sizes.txt

   117 ./.mvn/wrapper/MavenWrapperDownloader.java
    47 ./src/main/java/org/springframework/samples/petclinic/model/BaseEntity.java
    48 ./src/main/java/org/springframework/samples/petclinic/model/NamedEntity.java
   153 ./src/main/java/org/springframework/samples/petclinic/model/Owner.java
     5 ./src/main/java/org/springframework/samples/petclinic/model/package-info.java
    55 ./src/main/java/org/springframework/samples/petclinic/model/Person.java
   111 ./src/main/java/org/springframework/samples/petclinic/model/Pet.java
    29 ./src/main/java/org/springframework/samples/petclinic/model/PetType.java
    30 ./src/main/java/org/springframework/samples/petclinic/model/Specialty.java
    78 ./src/main/java/org/springframework/samples/petclinic/model/Vet.java


Genauere Variante mit `cloc`

In [2]:
%%bash
cloc spring-framework-petclinic/

github.com/AlDanial/cloc v 1.98  T=3.09 s (35.6 files/s, 16318.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Text                             5            369              0          13764
CSV                              1              0              0          11611
SVG                              2              0              0           9158
CSS                              1            852             28           7131
Java                            59            593           1395           2163
XML                              9             54             94            590
Maven                            1             42             32            528
SQL                              8             71              0            414
JSP                              9            

`cloc` Datei-basiert

In [3]:
%%bash
cloc spring-framework-petclinic/ --by-file

github.com/AlDanial/cloc v 1.98  T=3.25 s (33.9 files/s, 15547.3 lines/s)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
File                                                                                                                                                  blank        comment           code
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
spring-framework-petclinic/git_diff_output.txt                                                                                                          325              0          13551
spring-framework-petclinic/git_diff_output.csv                                                                                                            0              0          11

Der passende Output von `cloc`

In [None]:
%%bash
cloc spring-framework-petclinic/ --by-file --quiet --csv --out lines.csv
head lines.csv

**Änderungen**

In [None]:
%%bash
cd spring-framework-petclinic/
git log --name-only --no-merges --format="" -- *.java > ../changes.txt
head ../changes.txt

### Datenmassage mit Data-Science-Werkzeugen

In der Theorie: Datei mit Zeilenanzahlen direkt einlesen

In [None]:
!head -n 4 lines.txt

In [None]:
import pandas as pd
pd.read_csv("lines.txt").head(3)

In der Praxis: Einlesen mit kleineren Anpassungen

In [None]:
lines = pd.read_csv("lines.txt", index_col=1)[:-1][['code']]
lines.index = lines.index.str[2:]
lines.head()

Datei mit jeder geänderten Datei einlesen

In [None]:
!head -n 5 changes.txt

In [None]:
change_per_file = pd.read_csv("changes.txt", names=['filepath'])
change_per_file.head()

Änderungen / Vorkommen der Dateien zählen

In [None]:
changes = pd.DataFrame(change_per_file['filepath'].value_counts())
changes.columns = ["changes"]
changes.head()

Daten vereinen

In [None]:
hotspots = changes.join(lines).dropna()
hotspots.head()

### Hinter den Kulissen der Visualisierung
Wir brauchen: Datenformat und Template

<small><pre>{
    'name': 'flare',
    'children': [{
            'name': 'src',
            'children': [{
                    'name': 'main',
                    'children': [{
                            'name': 'java',
                            'children': [{
                                    'name': 'org',
                                    'children': [{
                                            'name': 'springframework',
                                            'children': [{
                                                    'name': 'samples',
                                                    'children': [{
                                                            'name': 'petclinic',
                                                            'children': [{
                                                                    'name': 'repository',
                                                                    'children': [{
                                                                            'name': 'jdbc',
                                                                            'children': [{
                                                                                    'name': 'JdbcOwnerRepositoryImpl.java (158.0 [27])',
                                                                                    'size': 158.0,
                                                                                    'color': '#b40426'
                                                                                }, {
                                                                                    'name': 'JdbcVetRepositoryImpl.java (88.0 [20])',
                                                                                    'size': 88.0,
                                                                                    'color': '#f59d7e'
                                                                                }, {
                                                                                    'name': 'JdbcVisitRepositoryImpl.java (103.0 [19])',
                                                                                    'size': 103.0,
                                                                                    'color': '#f7aa8c'
                                                                                },
                                                                                ...
</pre></small>

Daten für Visualisierung aufbereiten.

In [None]:
from matplotlib import cm
from matplotlib.colors import rgb2hex

def create_plot_data(df, color_column_name, size_column_name, seperator):
    plot_data = pd.DataFrame(index=df.index)
    plot_data['value_for_color'] = df[color_column_name]
    plot_data['ratio_for_color'] = plot_data['value_for_color'] / plot_data['value_for_color'].max()
    plot_data['color'] = plot_data['ratio_for_color'].apply(lambda x : rgb2hex(cm.coolwarm(x)))
    plot_data['size'] = df[size_column_name]
    plot_data[['path', 'name']] = df.index.str.rsplit(seperator, n=1).to_list()
    plot_data['path_list'] = plot_data['path'].str.split(seperator)
    return plot_data

In D3-Datenformat (JSON) umwandeln

In [None]:


import json

def create_flare_json(df):

    json_data = {'name': 'flare', 'children': []}

    for _, series in df.iterrows():
        hierarchical_data = series['path_list']

        children = json_data['children']
        for part in hierarchical_data:
            entry = next((child for child in children if child.get('name', '') == part), None)
            if not entry:
                entry = {'name': part, 'children': []}
                children.append(entry)
            children = entry['children']

        children.append({
            'name': f"{series['name']} ({series['size']} [{series['value_for_color']}])",
            'size': series['size'],
            'color': series['color']
        })

    return json_data

Alles anstoßen, HTML-Template mit Daten füttern und Datei erzeugen

In [None]:
from IPython.core.display import HTML

def create_hotspot_file(hotspots, color_column_name, size_column_name, separator):
    json_data = create_flare_json(create_plot_data(hotspots, color_column_name, size_column_name,separator))
            
    with open("vis/template_hierarchical_d3_inline.html") as html_template:
        html = html_template.read().replace("###FLARE_JSON###", str(json_data))
        
        with open(f'hotspots.html', mode='w') as html_out:
            html_out.write(html)
    
    return HTML('<a href="hotspots.html">hotspots.html</a>')
            
create_hotspot_file(hotspots, "changes", "code", "/")

### Ende der Demo