Schema and data analyzer for MongoDB written in Go.
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
_contrib CI: Added MongoDB 3.6 to matrix Oct 27, 2018
_misc Typos Nov 25, 2017
analysis Table output format fixed, issue #11 Nov 25, 2017
cli Typos Nov 25, 2017
decoder inital commit May 27, 2017
helpers inital commit May 27, 2017
mongo $dateToParts support (MongoDB 3.5.10+) Aug 30, 2017
tests $dateToParts support (MongoDB 3.5.10+) Aug 30, 2017
vendor inital commit May 27, 2017
.gitattributes Added .gitattributes Jul 12, 2017
.gitignore inital commit May 27, 2017
.travis.yml CI: Added MongoDB 3.6 to matrix Oct 27, 2018
LICENSE.md inital commit May 27, 2017
Makefile Fixed build: replaced path of golint Oct 19, 2018
README.md Readme: JSON escape note Nov 25, 2017
main.go inital commit May 27, 2017

README.md

Schema and data analyzer for MongoDB written in Go.

GoDoc Coverage Status Build Status Go Report Card

Overview

Mongoeye provides a quick overview of the data in your MongoDB database.

Key features

  • Fast:  the fastest schema analyzer for MongoDB
  • Single binary:  pre-built binaries for Windows, Linux, and MacOS (Darwin)
  • Local analysis:  quick local analysis using a parallel algorithm (MongoDB 2.0+)
  • Remote analysis:  distributed analysis in database using the aggregation framework (MongoDB 3.5.10+)
  • Rich features:  histogram (value, length, weekday, hour), most frequent values, ...
  • Integrable:  table, JSON or YAML output

Demo

Table of Contents

Installation

Mongoeye is one executable binary file.

You can download the archive from GitHub releases page and extract the binary file for your platform.

Compilation

It is required to have Go 1.8. All external dependencies are part of the repository in the vendor directory.

Compilation process:

$ go get github.com/mongoeye/mongoeye
$ cd $GOPATH/src/github.com/mongoeye/mongoeye
$ make build

For development, you need additional dependencies that can be installed using make get-deps.

The test architecture uses the Docker to create the testing MongoDB database.

If you want to contribute to this project, see the actions in Makefile and the _contrib directory.

Usage

mongoeye [host] database collection [flags]

The command mongoeye --help lists all available options.

Table output

Default output format is table. It shows only schema without other analyzes.

Example table output:

            KEY            │ COUNT  │   %    
────────────────────────────────────────────
  all documents            │ 2548   │        
  analyzed documents       │ 1000   │  39.2  
                           │        │        
  _id - objectId           │ 1000   │ 100.0  
  address                  │ 1000   │ 100.0  
  │ - int                  │    1   │   0.1  
  └╴- string               │  999   │  99.9  
  address line 2 - string  │ 1000   │ 100.0  
  name - string            │ 1000   │ 100.0  
  outcode - string         │ 1000   │ 100.0  
  postcode - string        │ 1000   │ 100.0  
  rating                   │ 1000   │ 100.0  
  │ - int                  │  523   │  52.3  
  │ - double               │  451   │  45.1  
  └╴- string               │   26   │   2.6  
  type_of_food - string    │ 1000   │ 100.0  
  URL - string             │ 1000   │ 100.0  

OK  0.190s (local analysis)
    1000/2548 docs (39.2%)
    9 fields, depth 2

JSON and YAML output

Use --format json or --format yaml flags to set these formats.

For output to a file use the option -F /path/to/file.

Features

This chapter explains the features of Mongoeye and their various outputs.

Use --format json or --format yaml to get detailed results, otherwise only the schema table will appear.

The output of the analysis always contains these basic keys:

  • database: database name
  • collection: collection name
  • plan: local for local analysis, db for analysis using aggregation framework
  • duration: duration of analysis
  • allDocs: number of all documents in collection
  • analyzedDocs: number of analyzed documents from collection
  • fieldsCount: number of found fields
  • fields: result of the analysis for each field
    • name: name of field
    • level: level of nested field, 0 is root level
    • count: number of occurrences
    • types: result of the analysis for each type of field
      • type: name of type
      • count: number of occurrences of type

Example result:

database: company
collection: users
plan: local
duration: 46.515331ms
allDocs: 2548
analyzedDocs: 1000
fieldsCount: 9
fields:
  - name: rating
    level: 0
    count: 1000
    types:
    - type: int
      count: 549
      < other outputs according to settings >

Value - min, max, avg

Use the flag --value or -v to enable calculation of minimum, maximum, and average values.

Supported types:

  • Minimum and maximum: objectId, double, string, bool, date, int, timestamp, long, decimal
  • Average: double, bool, int, long, decimal

Example result:

value:
  min: 11.565586
  max: 60.206787
  avg: 38.51128

Length - min, max, avg

Use the flag --length or -l to enable calculation of minimum, maximum, and average lengths.

Supported types: string, array, object

Example result:

length:
  min: 29
  max: 153
  avg: 112

Number of unique values

Use the flag --count-unique to count all unique values.

Supported types: double, string, date, int, timestamp, long, decimal

Example result:

unique: 894

Frequency of values

Use the flag --most-freq N or --least-freq N to get the most or least occurring values.

Supported types: double, string, date, int, timestamp, long, decimal

Example result:

mostFrequent:
- value: USD
  count: 599
- value: EUR
  count: 21
- value: GBP
  count: 5
- value: CAD
  count: 4
leastFrequent:
- value: EUR
  count: 21
- value: GBP
  count: 5
- value: CAD
  count: 4
- value: JPY
  count: 3

Value histogram

Use the flag --value-hist or -V to generate value histogram.

Supported types: objectId - processed as a date, double, date, int, long, decimal

Calculation of step

Flag --value-hist-steps sets the maximum number of steps (default 100).

  • Step of the int and long type is a whole number
  • Step of the double and decimal type is:
    • the smallest possible multiplication of [1, 5 or 2.5] and 10^n so the max. number of steps is kept
    • eg. ..., 100, 50, 25, 10, 5, 2.5, 1, 0.5, 0.25, 0.1, ...
  • Step of the date and objectId type is rounded to:
    • 1, 2, 5, 10, 15, 30 seconds
    • 1, 2, 5, 10, 15, 30 minutes
    • 1, 2, 3, 6, 12 hours
    • 1, 2, 3, 4, ... days

Example result:

valueHistogram:
  start: 2.5
  end: 12
  range: 9.5
  step: 0.5
  numOfSteps: 19
  intervals: [36, 25, 14, 81, 95, 86, 59, 6, 82, 84, 62, 33, 19, 9, 1, 14, 67, 2, 45]

Graphic representation:

Length histogram

Use the flag --length-hist or -L to generate length histogram.

Flag --length-hist-steps sets the maximum number of steps (default 100).

Supported types: string, array, object

Example result:

lengthHistogram:
  start: 0
  end: 300
  range: 300
  step: 50
  numOfSteps: 6
  intervals: [96, 78, 3, 1, 1, 0]

Weekday histogram

Use the flag --weekday-hist or -W to generate weekday histogram.

To determine the day of week it uses the time zone from the --timezone flag (default local).

First day is Sunday.

Example result:

weekdayHistogram: [5, 48, 23, 124, 45, 15, 87]

Hour histogram

Use the flag --hour-hist or -H to generate weekday histogram.

To determine the hour it uses the time zone from the --timezone flag (default local).

First value is for interval [ 00, 01 ), last for interval [ 23, 24 ).

Example result:

hourHistogram: [47, 73, 18, 26, 30, 46, 91, 13, 28, 11, 52, 99, 76, 25, 94, 51, 87, 86, 19, 22, 11, 62, 28, 47]

Scope of analysis

The scope of analysis is defined by the following options.

The --match option is applied as the first:

  • it selects documents for the analysis using $match aggregation
  • value is a string in JSON format
  • suitable for include/exclude documents from analysis
  • by default, all documents are included (if the argument is not present)

The --sample option is applied as the second:

  • determines the sampling method using $sort, $limit and $sample aggregations
  • valid values are: all, first:N, last:N, random:N, where N > 1
  • default value is random:1000

The --project option is applied as the third:

  • before the analysis it modifies document using $project aggregation
  • value is a string in JSON format
  • suitable for include/exclude fields from analysis
  • default is not applied (if the argument is not present)

Note: Be sure to escape JSON options correctly, eg. --project "{\"Field\": 0}".

List of flags and options

Connection options

--host                    mongodb host (default "localhost:27017")
--connection-mode         connection mode (default "SecondaryPreferred")
--connection-timeout      connection timeout (default 5)
--socket-timeout          socket timeout (default 300)
--sync-timeout            sync timeout (default 300)

Authentication

-u, --user                username for authentication (default "admin")
-p, --password            password for authentication
    --auth-db             auth database (default: same as the working db)
    --auth-mech           auth mechanism

Input options

    --db                  database for analysis
    --col                 collection for analysis
    --match               filter documents before analysis (json, $match aggregation)
-s, --sample              all, first:N, last:N, random:N (default "random:1000")
    --project             filter/project fields before analysis (json, $project aggregation)
-d, --depth               max depth in nested documents (default 2)

Output options

    --full                all available analyzes
-v, --value               get min, max, avg value
-l, --length              get min, max, avg length
-V, --value-hist          get value histogram
    --value-hist-steps    max steps of value histogram >=3 (default 100)
-L, --length-hist         get length histogram
    --length-hist-steps   max steps of length histogram >=3 (default 100)
-W, --weekday-hist        get weekday histogram for dates
-H, --hour-hist           get hour histogram for dates
    --count-unique        get count of unique values
    --most-freq           get the N most frequent values
    --least-freq          get the N least frequent values
-f, --format              output format: table, json, yaml (default "table")
-F, --file                path to the output file

Other options

-t, --timezone            timezone, eg. UTC, Europe/Berlin (default "local")
    --use-aggregation     analyze with aggregation framework (mongodb 3.5.10+)
    --string-max-length   max string length (default 100)
    --array-max-length    analyze only first N array elements (default 20)
    --concurrency         number of local processes (default 0 = auto)
    --buffer              size of the buffer between local stages (default 5000)
    --batch               size of batch from database (default 500)
    --no-color            disable color output
    --version             show version
-h, --help                show this help

Environment variables

Environment variables can also be used for configuration.

The names of the environment variables have the MONGOEYE_ prefix and match the flags.

Instead of the --count-unique flag, for example, you can use export MONGOEYE_COUNT-UNIQUE=true.

TODO

  • Create a shared library for integration into other languages (Python, Node.js, ...)
  • Selection of fields for analysis (include and exclude list)
  • TLS/SSL support
  • Create a web interface.

Donation

If is this tool useful to you, so feel free to support its further development.

paypal

License

Mongoeye is under the GPL-3.0 license. See the LICENSE file for details.

AMDG