documentation update

guilhemmarchand · Mar 28, 2017 · 911a416 · 911a416
1 parent f2a5aac
commit 911a416
Show file tree

Hide file tree

Showing 3 changed files with 58 additions and 1 deletion.
diff --git a/docs/img/csv_vs_json.png b/docs/img/csv_vs_json.png
diff --git a/docs/img/csv_vs_json_result.png b/docs/img/csv_vs_json_result.png
diff --git a/docs/json_indexing.rst b/docs/json_indexing.rst
@@ -26,7 +26,64 @@ In this mode, Splunk identifies the fields name using the CSV header, then each
 
 The indexed CSV mode provides great performances at search time, and CSV data generates a low level of data volume which saves Splunk licensing costs.
 
-However, the disadvantage of this is an higher cost in storage requirements as Splunk has to generate an higher volume of tsidx files (indexed files) versus rawdata files within the indexes storage.
+However, the disadvantage of this is an higher cost in storage requirements due to the volume of tsidx (indexes files) versus raw data, in comparison indexing other kind of data that can be fully extracted at search time. (like key value or json extracted data)
 
+This is why we have introduced in the release 1.3.x a new optional mode that allows to generate the performance data (99.99% of the data volume) in JSON mode instead of legacy CSV data.
 
+**LIMITATION:**
 
+Currently, the JSON mode indexing is only available to the Python parser, if your hosts user Perl parser, this parameter will have no impact and CSV data will be generated.
+
+===============================================
+Comparison of CSV indexing versus JSON indexing
+===============================================
+
+There are advantages and disadvantages in both solutions, this is basically low licensing cost versus storage costs.
+
+The following benchmark has been made on a set of nmon cold data, 3 months of data for 6 servers with an interval of 4 minutes between performance measures. (2 x AIX, 2 x Linux, 2 x Solaris)
+
+.. image:: img/csv_vs_json.png
+   :alt: img/csv_vs_json.png
+   :align: center
+
+Notes: This is a benchmark that operated on a Splunk standalone server, using cold data. This not necessary fully representative of what to expect in term of results depending on various factors.
+It provides however a good vision, and results in real life should be closed enough.
+
+As a conclusion, in JSON data we generate much more data (approximately x 2 volume), licensing costs are then significantly highers due to this factor.
+
+But the lower volume of tsdix a better data compression ratio generates approximately 20% less of final storage costs.
+
+There is a well a lower performance level, which could be even more significant at scale.
+
+==========================
+Ok, how to use JSON then ?
+==========================
+
+Activating JSON mode is very easy, all you need is activating the "--json_output" option in your own "local/nmon.conf":
+
+**Create a "local/nmon.conf" and activate json mode:**::
+
+   # Since the release 1.3.0, AIX and Linux OS use the fifo_consumer.sh script to consume data produced by the fifo readers
+   # the following option allows specifying the options sent to the nmon2csv parsers
+
+   # consult the documentation to get the full list of available options
+
+   # --mode realtime --> explicitly manage realtime data (default)
+   # --use_fqdn --> use the host fully qualified domain name
+   # --json_output --> generate the performance data in json format instead of regular csv data
+
+   nmon2csv_options="--mode realtime --json_output"
+
+And deploy your new configuration.
+
+At next parsing cycle, the data will be generate in JSON mode and transparently rewritten to the nmon_data sourcetype.
+
+Et voila!
+
+.. image:: img/csv_vs_json_result.png
+   :alt: img/csv_vs_json_result.png
+   :align: center
+
+**Additional notes:**
+
+For more coherence, and best index performances, I would recommend to store the JSON nmon data into a separated and dedicated index.