# Software Analyse - Performace

* Question
    * Welche Teile des Anwendungscodes benötigen sehr lange bei deren Ausführung (Bottlenecks)?
    * Inwieweit können mögliche Ursachen durch statische Metriken (u.a. Größe, Komplexität, Abhängigkeiten) identifiziert werden?

* Data Sources
    * Java Projekt 'Argouml' wird von jQAssistant gescannt und in Neo4j dargestellt
    * Kieker angewendet um log Daten zur Zeit zu bekommen.
* Heuristics
    * Zeitdaten sind unter Kieker zufinden
    * Die Main klasse könnte die längste sein?
   
* Validation
    * Übersicht der Typen und derer Zeit sowie die Methoden und deren Zeit
    * Vergleich mit anderen Metriken und auf zusammenhänge zu untersuchen
* Implementation
    * Identifikation der Nodes, die Attribute zu duration haben
* Results
    * Here you can list the key point that you have found out with your analysis. What were the actionable insights you got? If there were none, document them anyway. Or did the analysis not work as expected? Are the results standing on a too shaky a floor? Never mind! You got more experience in data analysis in the software area. Next time you can do it differently!


In [1]:
import py2neo

graph = py2neo.Graph(host='localhost', user='', password='')

In [2]:
import pandas as pd

#query ="MATCH (t:Type:File) RETURN count(t)"
query ="MATCH (c1:Class)-[:EXTENDS]->(c2:Type) RETURN c1.fqn, c2.fqn LIMIT 20"
#query = "MATCH (t:Type)-[:DECLARES]->(m:Method) WHERE t.fqn STARTS WITH 'kieker' RETURN t.name as Type RETURN t.name as Type, m.name AS Method, m.incomingCalls AS Calls, m.duration AS Duration"
df = pd.DataFrame(graph.run(query).data())

In [3]:
df.head()

Unnamed: 0,c1.fqn,c2.fqn
0,net.sf.saxon.sql.SQLClose$CloseInstruction,net.sf.saxon.expr.SimpleExpression
1,net.sf.saxon.sql.SQLClose,net.sf.saxon.style.ExtensionInstruction
2,net.sf.saxon.sql.SQLColumn,net.sf.saxon.style.StyleElement
3,net.sf.saxon.sql.SQLConnect,net.sf.saxon.style.ExtensionInstruction
4,net.sf.saxon.sql.SQLConnect$ConnectInstruction,net.sf.saxon.expr.SimpleExpression


In [4]:
query="MATCH (t:Type)-[:DECLARES]->(m:Method) WHERE m.duration IS NOT null RETURN DISTINCT t.name as Type, m.name AS Method,m.effectiveLineCount as loc, m.duration AS Duration ORDER BY loc DESC"
df = pd.DataFrame(graph.run(query).data())
df = df.groupby(["Type"]).Duration.sum().astype(int).reset_index()

#print(df)

In [5]:
df.head()

Unnamed: 0,Type,Duration
0,AbstractFilePersister,465133
1,ActionActivityDiagram,952283049
2,ActionAddAssociationRole,3321112
3,ActionAddAttribute,20721974
4,ActionAddAttribute$1,13442817


In [6]:
# Sortiere das Dataframe nach der Anzahl der Quelltextzeilen (loc).
df = df.sort_values('Duration',ascending=False)
# Extrahiere die Spalten loc, signature und type und speichere sie im Dataframe loc_df.
loc_df = df[['Type','Duration']][0:10]
loc_df.head(10)

Unnamed: 0,Type,Duration
412,ModuleLoader2,520923590514
395,Main,256564830009
541,SubsystemUtility,73288204051
366,InitModuleLoader,66772638908
405,Model,17878163117
394,MDRModelImplementation,15967665111
477,ProjectManager,13038439218
225,DiagramFactory,12285781592
558,TargetManager,12090262511
471,ProjectBrowser,10678935696


In [7]:
from IPython.display import display, HTML

base_html = """
<!DOCTYPE html>
<html>
  <head>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
  <script type="text/javascript" src="https://kozea.github.io/pygal.js/2.0.x/pygal-tooltips.min.js""></script>
  </head>
  <body>
    <figure>
      {rendered_chart}
    </figure>
  </body>
</html>
"""


In [8]:
# Erstelle Bar Chart für druation.
import pygal
bar_chart = pygal.HorizontalBar(show_legend=True, human_readable=True, fill=True, legend_at_bottom=True, legend_at_bottom_columns=2)
bar_chart.title = 'Duration Types'
for index, row in loc_df.iterrows():
    bar_chart.add(row['Type'],[{"value": row['Duration'], "label": row['Type']}])
display(HTML(base_html.format(rendered_chart=bar_chart.render(is_unicode=True))))

In [9]:
query="MATCH (t:Type)-[:DECLARES]->(m:Method) RETURN DISTINCT t.name as Type, m.name AS Method, m.duration AS Duration ORDER BY Duration DESC"
df = pd.DataFrame(graph.run(query).data())

# Sortiere das Dataframe nach der Anzahl der Quelltextzeilen (loc).
df = df.sort_values('Duration',ascending=False)
# Extrahiere die Spalten loc, signature und type und speichere sie im Dataframe loc_df.
loc_df = df[['Type','Method','Duration']][0:10]
loc_df.head(10)

bar_chart1 = pygal.HorizontalBar(show_legend=True, human_readable=True, fill=True, legend_at_bottom=True, legend_at_bottom_columns=2)
bar_chart1.title = 'Duration Method'
for index, row in loc_df.iterrows():
    bar_chart1.add(row['Method'],[{"value": row['Duration'], "label": row['Type']}])
display(HTML(base_html.format(rendered_chart=bar_chart1.render(is_unicode=True))))

In [10]:
query="MATCH (t:Type)-[:DECLARES]->(m:Method) RETURN t.name as Type,m. name as Method, MAX(m.effectiveLineCount) AS lines, m.duration AS Durations, sum(m.cyclomaticComplexity) as complexity"
df = pd.DataFrame(graph.run(query).data())
#df = df.groupby(["Methode",'Type','Signatur']).sum().astype(int).reset_index()
df = df.groupby(['Type', 'Method']).first().reset_index()
#dff = df.groupby(['Type','Methode']).Durations.sum().reset_index()
df = df.dropna()
df = df.sort_values('Durations',ascending=False)
#print(df)


In [11]:
df.head()

Unnamed: 0,Type,Method,lines,Durations,complexity
47042,Main,main,68.0,116205400000.0,0
47041,Main,initializeSubsystems,27.0,108410800000.0,0
69294,SubsystemUtility,initSubsystem,11.0,73288200000.0,0
40903,InitModuleLoader,init,3.0,66746700000.0,1
49849,ModuleLoader2,doLoad,2.0,66746400000.0,0


In [12]:
query="MATCH (class:Type:Class)-[:DECLARES]->(method:Method) RETURN class.name, count(method) AS Methods, sum(method.effectiveLineCount) AS lines ORDER BY Methods DESC"
df = pd.DataFrame(graph.run(query).data())
df.head()

Unnamed: 0,class.name,Methods,lines
0,UMLPackageImpl,1517,2833
1,UMLValidator,698,14793
2,ActionLexer,646,8198
3,UMLAdapterFactory$1,499,500
4,FacadeEUMLImpl,427,1044


Unnamed: 0,loc
0,0
