Sat March 23 2019
===============

* Trying out Jupyter Notebooks (formerly known as IPython Notebooks)


The following graph shows the memory consumption distribution of property maps from the message vertices traversed in LDBC SNB Query3 on the SF0100 dataset. These data were collected using OpenJDK's Java Object Layout (JOL) tool. The following code snippets show how I collected these data.

snb-interactive-torc/pom.xml
```bash
diff --git a/snb-interactive-torc/pom.xml b/snb-interactive-torc/pom.xml
index 44a0d0a..f899a33 100644
--- a/snb-interactive-torc/pom.xml
+++ b/snb-interactive-torc/pom.xml
@@ -43,6 +43,16 @@
       <artifactId>logback-classic</artifactId>
       <version>1.1.2</version>
     </dependency>
+    <dependency>
+      <groupId>org.openjdk.jol</groupId>
+      <artifactId>jol-core</artifactId>
+      <version>0.9</version>
+    </dependency>
+    <dependency>
+      <groupId>org.openjdk.jol</groupId>
+      <artifactId>jol-cli</artifactId>
+      <version>0.9</version>
+    </dependency>
   </dependencies>
   <build>
     <plugins>
```

TorcDb.java:
```java
public static class LdbcQuery3Handler
...

    graph.fillProperties(messages.vSet);

    for (TorcVertex v : messages.vSet) {
        System.out.println(GraphLayout.parseInstance(v.getProperties()).toFootprint());
    }

    System.exit(0)
```

Using QueryTester:
```bash
sudo LD_LIBRARY_PATH=/shome/jde/RAMCloud/obj.torcdb-experiments MAVEN_OPTS="-Xss100M -Xmx50G" ./bin/QueryTester.sh query3 8796093163356 Papua_New_Guinea Namibia 1306886400000 52 20 --repeat 1
```

In [1]:
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as FF

import numpy as np
import pandas as pd

py.init_notebook_mode(connected=True)

sizes_df = pd.read_csv('map_sizes.csv')

num_sizes = len(sizes_df)

cdf_df = pd.read_csv('map_sizes_cdf.csv')

trace1 = go.Scatter(x=cdf_df['bytes'], 
                    y=cdf_df['percentile'],
                    mode='lines', 
                    name='Map Size in Bytes')

layout = go.Layout( title='CDF of %d Property Map Java Memory Footprints' % num_sizes,
                    xaxis=dict(
                        title='Bytes',
                        range=[1000,5000]
                    ),
                    yaxis=dict(
                        title='Percentile',
                        type='log',
                        range=[-1,0.1]
                    ))

fig = go.Figure(data=[trace1], layout=layout)

py.iplot(fig)

print("Min: %dB\nMax: %dB\nSum: %dGB" % (np.min(sizes_df), np.max(sizes_df), np.sum(sizes_df)/1000000000))

Min: 1072B
Max: 5520B
Sum: 20GB


I observed using `htop` that the high watermark on resident memory consumption for TorcDB is 29.1GB. Therefore, for this query, 66% of memory is being consumed by message vertex property maps alone.