# Software Analyse - Performace

## Fragestellung
* Welche Teile des Anwendungscodes benötigen sehr lange bei deren Ausführung (Bottlenecks)?
* Inwieweit können mögliche Ursachen durch statische Metriken (u.a. Größe, Komplexität, Abhängigkeiten) identifiziert werden

## Datenquellen
* Java Projekt 'Argouml' wird von jQAssistant gescannt und in Neo4j dargestellt
* Kieker angewendet um log Daten zur Zeit zu bekommen.

## Annahmen
* Zeitdaten sind unter dem Label 'Kieker' zufinden und werden dort als 'duration' angezeigt.
* Wenn die Methode/Klasse am längsten brauch, können Faktoren wie Größe, Komplexität, Anzahl der Zeilen ein Grund sein?

## Validierung
* Übersicht der Typen und derer Zeit sowie die Methoden und deren Zeit
* Vergleich mit anderen Metriken um auf zusammenhänge zu untersuchen
    
   

## Implementation
* Identifikation der Nodes, die Attribute zu 'duration' haben
* Identifikation von anderen Metriken um die mit den Ergebnissen von 'duration' zu vergleichen

In [85]:
import py2neo
import pandas as pd

graph = py2neo.Graph(host='localhost', user='neo4j', password='neo4j')

In [5]:
from IPython.display import display, HTML

base_html = """
<!DOCTYPE html>
<html>
  <head>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
  <script type="text/javascript" src="https://kozea.github.io/pygal.js/2.0.x/pygal-tooltips.min.js""></script>
  </head>
  <body>
    <figure>
      {rendered_chart}
    </figure>
  </body>
</html>
"""


In [91]:
query="MATCH (t:Type)-[:DECLARES]->(m:Method) RETURN DISTINCT t.name as Type, m.name AS Method, m.duration AS Duration ORDER BY Duration DESC"
df = pd.DataFrame(graph.run(query).data())
df = df.sort_values('Duration',ascending=False)
df.head()

Unnamed: 0,Type,Method,Duration
90600,Main,main,70767740000.0
90601,Main,initializeSubsystems,65988030000.0
90602,SubsystemUtility,initSubsystem,40247770000.0
90603,InitModuleLoader,init,35304250000.0
90604,ModuleLoader2,doLoad,35304070000.0


In [92]:

timeM_df = df[['Type','Method','Duration']][0:10]
timeM_df.head(10)

bar_chart1 = pygal.HorizontalBar(show_legend=True, human_readable=True, fill=True, legend_at_bottom=True, legend_at_bottom_columns=2)
bar_chart1.title = 'Duration Method'
for index, row in timeM_df.iterrows():
    bar_chart1.add(row['Method'],[{"value": row['Duration'], "label": row['Type']}])
display(HTML(base_html.format(rendered_chart=bar_chart1.render(is_unicode=True))))

In [93]:
# Gibt 
query="MATCH (t:Type)-[:DECLARES]->(m:Method) WHERE m.duration IS NOT null RETURN DISTINCT t.name as Type, count(m.name) AS Method, m.duration AS Duration"
df = pd.DataFrame(graph.run(query).data())
df = df.groupby(["Type"])["Duration", "Method"].sum().astype(int).reset_index()

In [94]:
df.head()

Unnamed: 0,Type,Duration,Method
0,AbstractFilePersister,230562,1
1,ActionActivityDiagram,490282678,6
2,ActionAddAssociationRole,2175296,2
3,ActionAddAttribute,18761531,3
4,ActionAddAttribute$1,2903266,3


In [95]:
# Sortiere das Dataframe nach der Anzahl der Dauer.
df = df.sort_values('Duration',ascending=False)
time_df = df[['Type','Duration', 'Method']][0:10]
time_df.head(10)

Unnamed: 0,Type,Duration,Method
395,ModuleLoader2,274421403983,15
378,Main,159891351828,17
513,SubsystemUtility,40247768459,1
349,InitModuleLoader,35321625531,4
388,Model,13114964539,25
377,MDRModelImplementation,12120261090,41
448,ProjectBrowser,8280024882,28
453,ProjectManager,7403590481,20
224,DiagramFactory,6792236100,6
355,InitProfileSubsystem,6315739255,1


In [96]:
# Erstelle Bar Chart für druation.
import pygal
bar_chart = pygal.HorizontalBar(show_legend=True, human_readable=True, fill=True, legend_at_bottom=True, legend_at_bottom_columns=2)
bar_chart.title = 'Duration Types'
for index, row in time_df.iterrows():
    bar_chart.add(row['Type'],[{"value": row['Duration'], "label": "Anzahl der Methoden:" + str(row['Method'])}])
display(HTML(base_html.format(rendered_chart=bar_chart.render(is_unicode=True))))

In [103]:
query="MATCH (t:Type)-[:DECLARES]->(m:Method) RETURN t.name as Type,m. name as Method, m.duration AS Durations, sum(m.cyclomaticComplexity) as complexity"

df = pd.DataFrame(graph.run(query).data())

df = df.groupby(['Type', 'Method']).first().reset_index()
df = df.dropna()

In [113]:
tree_df = df.groupby(['Type'], axis=0).apply(
    lambda x : [{"value" : l, "label" : c } for l, c in zip(x['Durations'], x['Method'])])
tree_df.head(5)
#print(tree_df[0:5])


treemap = pygal.Treemap(show_legend=False)
treemap.title = 'Duraion'

[treemap.add(x[0], x[1]) for x in tree_df.items()]

display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))


TODO mit anderen metriken vergleichen

In [108]:
query1="MATCH (class:Class)-[:DECLARES]->(method:Method) RETURN class.fqn, class.name, method.name AS Methods, sum(method.effectiveLineCount) AS lines"
df1 = pd.DataFrame(graph.run(query1).data())
#print(df1)
# Building Dataframe mit Dauer, Anzahl der Zeilen und Complexity
#pd.merge(df, df1, left_on='Method', right_on='Methods')
df_all = pd.merge(df, df1, left_on=['Type', 'Method'], right_on=['class.name', 'Methods'])
df_lines = df_all.sort_values('lines',ascending=False)
df_duration = df_all.sort_values('Durations',ascending=False)
df_complexity = df_all.sort_values('complexity',ascending=False)
df_lines.head()

Unnamed: 0,Type,Method,Durations,complexity,class.fqn,class.name,Methods,lines
1127,ModelAccessModelInterpreter,invokeFeature,70259851.0,181,org.argouml.profile.internal.ocl.uml14.ModelAc...,ModelAccessModelInterpreter,invokeFeature,297
772,Init,createChecklists,51550290.0,0,org.argouml.uml.cognitive.checklist.Init,Init,createChecklists,276
1964,UmlFactoryMDRImpl,buildNode,12972403.0,0,org.argouml.model.mdr.UmlFactoryMDRImpl,UmlFactoryMDRImpl,buildNode,198
1269,PerspectiveManager,getDefaultPerspectives,11410385.0,0,org.argouml.ui.explorer.PerspectiveManager,PerspectiveManager,getDefaultPerspectives,142
1779,UMLClassDiagram,createDiagramElement,148651127.0,0,org.argouml.uml.diagram.static_structure.ui.UM...,UMLClassDiagram,createDiagramElement,141


In [109]:
df_duration.head()

Unnamed: 0,Type,Method,Durations,complexity,class.fqn,class.name,Methods,lines
994,Main,main,70767740000.0,0,org.argouml.application.Main,Main,main,68
993,Main,main,70767740000.0,0,org.apache.log4j.chainsaw.Main,Main,main,3
992,Main,initializeSubsystems,65988030000.0,0,org.argouml.application.Main,Main,initializeSubsystems,27
1553,SubsystemUtility,initSubsystem,40247770000.0,0,org.argouml.application.SubsystemUtility,SubsystemUtility,initSubsystem,11
813,InitModuleLoader,init,35304250000.0,1,org.argouml.moduleloader.InitModuleLoader,InitModuleLoader,init,3


In [110]:
df_complexity.head()

Unnamed: 0,Type,Method,Durations,complexity,class.fqn,class.name,Methods,lines
1127,ModelAccessModelInterpreter,invokeFeature,70259851.0,181,org.argouml.profile.internal.ocl.uml14.ModelAc...,ModelAccessModelInterpreter,invokeFeature,297
252,CrDisambigStateName,predicate2,10029481.0,20,org.argouml.uml.cognitive.critics.CrDisambigSt...,CrDisambigStateName,predicate2,35
318,CrOppEndVsAttr,predicate2,9732413.0,20,org.argouml.uml.cognitive.critics.CrOppEndVsAttr,CrOppEndVsAttr,predicate2,32
508,FacadeMDRImpl,getName,15958096.0,19,org.argouml.model.mdr.FacadeMDRImpl,FacadeMDRImpl,getName,45
308,CrNoTransitions,predicate2,20357254.0,18,org.argouml.uml.cognitive.critics.CrNoTransitions,CrNoTransitions,predicate2,24


In [112]:
tree_df = df_all1.groupby(['Type'], axis=0).apply(
    lambda x : [{"value" : l, "label" : c } for l, c in zip(x['lines'], x['Method'])])
tree_df.head(5)
#print(tree_df[0:5])


treemap = pygal.Treemap(show_legend=False)
treemap.title = 'Lines'

[treemap.add(x[0], x[1]) for x in tree_df.items()]

display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))

In [111]:
#Anzahl der Methoden in einer Klasse
query="MATCH (class:Class)-[:DECLARES]->(method:Method) RETURN class.name, count(method) AS countMethods ORDER BY countMethods DESC"
df_3 = pd.DataFrame(graph.run(query).data())
#df_3 = pd.merge(df_all, df_3, left_on=['Type'], right_on=['class.name'])
df_3.head()


Unnamed: 0,class.name,countMethods
0,UMLPackageImpl,1517
1,UMLValidator,698
2,ActionLexer,646
3,UMLAdapterFactory$1,499
4,FacadeEUMLImpl,427


In [117]:
query = "Match (class:Class)-[:HAS_METHOD]->(m:Method), (m)-[:HAS_COUNTER]->(c:Counter) Return class.fqn as Class, m.name as Method, max(c.missed) as Line_Missed, max(c.covered)  as Line_Covered"
coverage = pd.DataFrame(graph.run(query).data())

coverage['line_size'] = coverage['Line_Missed'] + coverage['Line_Covered']
coverage['line_covered_ratio'] =  coverage['Line_Covered'] / coverage['line_size']
coverage = coverage.drop_duplicates(subset=['Class', 'Method', 'Line_Missed' ,'Line_Covered'])
#print(coverage)

df_all_coverage = pd.merge(df_all, coverage, left_on=['class.fqn','Method'], right_on=['Class', 'Method'])
df_all_coverage = df_all_coverage[['class.name', 'Method', 'Class','Durations', 'Line_Missed', 'Line_Covered','line_size', 'line_covered_ratio']]
#df_all_coverage = df_all_coverage.drop_duplicates(subset=['class.name', 'Method','Durations', 'Line_Missed','Class', 'Line_Covered','line_size', 'line_covered_ratio'])
df_all_coverage = df_all_coverage.sort_values('Durations',ascending=False)
df_all_coverage.head()

Unnamed: 0,class.name,Method,Class,Durations,Line_Missed,Line_Covered,line_size,line_covered_ratio
701,Main,main,org.argouml.application.Main,70767740000.0,43,159,202,0.787129
700,Main,initializeSubsystems,org.argouml.application.Main,65988030000.0,0,95,95,1.0
1031,SubsystemUtility,initSubsystem,org.argouml.application.SubsystemUtility,40247770000.0,0,56,56,1.0
607,InitModuleLoader,init,org.argouml.moduleloader.InitModuleLoader,35304250000.0,0,5,5,1.0
728,ModuleLoader2,doLoad,org.argouml.moduleloader.ModuleLoader2,35304070000.0,0,4,4,1.0


## Result
* Klasse 'Main' und 'ModuleLoader2' haben die längste Zeit
* Methode Main braucht am längsten
* Jedoch haben andere Metriken einen Einfluss auf die Zeit

### Hindernisse:
* Für Kieker musst das Java Projekt in einer jar file sein (bzw. kein andere möglichkeit gefunden)
* Nicht jede Methode hat ein Zeit



## Nächster Schritt
