## Creating and Importing a small dataset

This notebook provides step by step instruction to generate and import test data inside the Nuxeo instance deployed via nuxeo stack.

### Step0: setup connection with nuxeo

In [2]:
%%bash
cat <<EOT > nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=$NXUSER
password=$NXPWD
EOT
#cat nuxeo.properties

In [3]:
!echo "127.0.0.1 nuxeo.docker.localhost" >> /etc/hosts

### Step 1- Create the hierarchy

This hierarchy is simply the list of US states split across 2 repositories.

#### Produce messages

We use the `-m` option to split the data between 2 repositories.

In [56]:
!import.sh -o consumertree -l import/hierarchy -m 

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runConsumerFolderProducers
   nbThreads: 10 
   split: true 
   logName: import/hierarchy 
   logSize: 8 
#####################
Execution completed
elapsed:0.062
failures:0
messages:52
throughput:838.7096774193549
producers:1



#### Consume messages

Because the hierarchy is super small we can start the import is synchronous mode and keep normal event processing (no bulk mode).

In [57]:
!import.sh -o import -l import/hierarchy-us-east -r us-east -b /
!import.sh -o import -l import/hierarchy-us-west -r us-west -b /

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 10 
   logName: import/hierarchy-us-east 
   blockDefaultSyncListeners: true 
   rootFolder: / 
   logSize: 8 
   batchSize: 500 
#####################
Execution completed
elapsed:60.041
committed:26
failures:0
consumers:8
throughput:0.43303742442664184

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 10 
   logName: import/hierarchy-us-west 
   blockDefaultSyncListeners: true 
   rootFolder: / 
   logSize: 8 
   batchSize: 500 
#####################
Execution completed
elapsed:60.078
committed:26
failures:0
consumers:8
throughput:0.432770731382536



Each import should display:

    committed:26
    failures:0

### Step 2 - Generate some customers

We will generate 1000 customers, for that we use the datagen lib to generate:

 - a csv file with 1000 lines
 - 1000 jpeg files
 
In this example we use the `fileDigestTree` output so that the generated files are stored using their digest in 2 level hierarchy that matches the default BinaryManager behavior.

NB: there is current a warm message from pdfbox that was not yet able to remove

In [58]:
!rm metadata.csv
!mkdir -p idcards
!rm idcards/*
!gen.sh -t 4 -m pdf -d 1 -n 1000 -f jpeg  -x id -o fileDigestTree:../blobs/
# remove the first line that contains column names!    
!sed '1d' metadata.csv > idcards.csv 

rm: cannot remove 'metadata.csv': No such file or directory
rm: cannot remove 'idcards/*': No such file or directory
2020-08-31 16:44:01,071 main ERROR File contains an invalid element or attribute "filePattern"
2020-08-31 16:44:01,076 main ERROR appender File has no parameter that matches element Policies
Selected Generation mode:PDF
Output Driver:FolderDigestTreeWriter
Activated filter: jpeg (300 dpi)
Model = id
Init Injector
  Threads:4
  nbDocs:1000
  nbMonths:1
----------------------------------------------------------
Starting thread 0
Starting thread 1
Starting thread 2
Starting thread 3
00 % - 0 / 1000
   Throughput:0 d/s using 1 threads
Aug 31, 2020 4:44:02 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadDiskCache
Aug 31, 2020 4:44:02 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
Aug 31, 2020 4:44:02 PM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
Aug 31, 2020 4:44:02 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 20

Aug 31, 2020 4:44:04 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:04 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:04 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:04 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:04 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
04 % - 41 / 1000
   Throughput:14 d/s using 4 threads
   Projected remaining time: 1 m,8 s
Aug 31, 2020 4:44:05 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:05 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:05 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:05 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:05 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:05 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:05 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:05 

Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:08 PM org.apache.pdf

Aug 31, 2020 4:44:11 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:11 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:11 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:11 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:11 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:11 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
16 % - 164 / 1000
   Throughput:16 d/s using 4 threads
   Projected remaining time: 52 s
Aug 31, 2020 4:44:11 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:11 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:12 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:12 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:12 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:12 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:12 PM

Aug 31, 2020 4:44:14 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:14 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
22 % - 220 / 1000
   Throughput:17 d/s using 4 threads
   Projected remaining time: 45 s
Aug 31, 2020 4:44:14 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:15 PM

Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:18 PM org.apache.pdf

Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:21 PM org.apache.pdf

Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
41 % - 414 / 1000
   Throughput:18 d/s using 4 threads
   Projected remaining time: 32 s
Aug 31, 2020 4:44:24 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:25 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:25 PM

Aug 31, 2020 4:44:27 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:27 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:27 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:27 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:27 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:27 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
47 % - 471 / 1000
   Throughput:18 d/s using 4 threads
   Projected remaining time: 29 s
Aug 31, 2020 4:44:28 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:28 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:28 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:28 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:28 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:28 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:28 PM

Aug 31, 2020 4:44:30 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
53 % - 527 / 1000
   Throughput:18 d/s using 4 threads
   Projected remaining time: 26 s
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:31 PM

59 % - 590 / 1000
   Throughput:18 d/s using 4 threads
   Projected remaining time: 22 s
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:34 PM

Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:37 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
66 % - 663 / 1000
   Throughput:18 d/s

Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
72 % - 721 / 1000
   Throughput:18 d/s using 4 threads
   Projected remaining time: 15 s
Aug 31, 2020 4:44:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:41 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:41 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:41 PM

Aug 31, 2020 4:44:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:43 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
78 % - 780 / 1000
   Throughput:19 d/s using 4 threads
   Projected remaining time: 11 s
Aug 31, 2020 4:44:44 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:44 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:44 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:44 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:44 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:44 PM

Aug 31, 2020 4:44:46 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:46 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:46 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:46 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
84 % - 838 / 1000
   Throughput:19 d/s using 4 threads
   Projected remaining time: 8 s
Aug 31, 2020 4:44:47 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:47 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:47 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:47 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:47 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:47 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:47 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:47 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:47 PM 

Aug 31, 2020 4:44:49 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
90 % - 897 / 1000
   Throughput:19 d/s using 4 threads
   Projected remaining time: 5 s
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:50 PM 

Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
Aug 31, 2020 4:44:53 PM org.apache.pdf

In [62]:
!wc -l idcards.csv
!head idcards.csv

1004 idcards.csv
61b1012da934624d40e6988cc2679502,IDCard-0E570-08A8E-53AE1E6-01.jpg,42635,Carolyn CATONE,CESAR CHAVEZ ON RAMP,Randalia,Iowa,Jan 01 2020,0E570-08A8E-53AE1E6-01,IOWA  ID,Jun 01 1975,null,Jun 01 2020
ae35abe2d40cb250ded587051b3ae3f6,IDCard-0D377-0F93A-3A16029-01.jpg,42775,Olamipo TUONG,15TH ST,Livonia Center,New York,Jan 01 2020,0D377-0F93A-3A16029-01,NEW YORK  ID,Jun 01 1975,null,Feb 01 2022
d4ac0806e25d5d6001bef8d05331c9b1,IDCard-0D377-0F93A-3A16029-02.jpg,42801,Olamipo TUONG,15TH ST,Livonia Center,New York,Jan 01 2020,0D377-0F93A-3A16029-02,NEW YORK  ID,Jul 01 1993,null,May 01 2020
fb2ea37e3bf5c984f1cc4aac92c710bd,IDCard-0D377-0F93A-3A16029-03.jpg,42702,Olamipo TUONG,15TH ST,Livonia Center,New York,Jan 01 2020,0D377-0F93A-3A16029-03,NEW YORK  ID,Jan 01 1970,null,Jun 01 2021
381da38b44127f3571015e30d3bd84e1,IDCard-13A20-0E86C-69A4770-01.jpg,42289,Dolton TOYNE,PRIEST ST,Walnut Grove,Alabama,Jan 01 2020,13A20-0E86C-69A4770-01,ALABAMA  ID,Jan 01 1992,null,Oct 01 2019
fa5344

In [63]:
!ls ../blobs/

00  0e	1b  28	36  43	51  5e	6b  7a	87  94	a1  ae	bb  c8	d7  e4	f3
01  0f	1c  29	37  45	52  5f	6c  7b	88  95	a2  af	bc  c9	d8  e6	f4
02  10	1d  2a	38  46	53  60	6d  7c	89  96	a3  b0	bd  ca	d9  e7	f5
03  11	1e  2b	39  47	54  61	6e  7d	8a  97	a4  b1	be  cb	da  e9	f6
04  12	1f  2c	3a  48	55  62	6f  7e	8b  98	a5  b2	bf  cc	db  ea	f7
05  13	20  2d	3b  49	56  63	70  7f	8c  99	a6  b3	c0  cd	dc  eb	f8
06  14	21  2e	3c  4a	57  64	71  80	8d  9a	a7  b4	c1  cf	dd  ec	f9
07  15	22  2f	3d  4b	58  65	73  81	8e  9b	a8  b5	c2  d0	de  ed	fa
09  16	23  30	3e  4c	59  66	74  82	8f  9c	a9  b6	c3  d1	df  ee	fb
0a  17	24  31	3f  4d	5a  67	75  83	90  9d	aa  b7	c4  d2	e0  ef	fc
0b  18	25  32	40  4e	5b  68	76  84	91  9e	ab  b8	c5  d3	e1  f0	fd
0c  19	26  33	41  4f	5c  69	78  85	92  9f	ac  b9	c6  d4	e2  f1	fe
0d  1a	27  35	42  50	5d  6a	79  86	93  a0	ad  ba	c7  d5	e3  f2


### Step 3 - Produce the customer messages

We will upload by chunk the csv file to let the server produce the messages in Kafka.

In [4]:
!csvImport.sh -t 8 -p 4 -o ConsumerProducers -serverThreads 2 -b 100 -f idcards.csv -m -l import/customers4

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
Connected to Nuxeo Server 11.3-SNAPSHOT
Started ThreadPool with 8 threads
Send batch 1

Send batch 2

Send batch 3

Send batch 4

Send batch 5

Send batch 6

Send batch 7

Send batch 8

Send batch 9

Send batch 10

   Throughput:15688 lines/s using 8 threads
 Queue size: 3
 Tasks count: 11
 ############################## 
 CSV stats:
    Throughput:15688 lines/s
    Total lines:1004
 Producers stats:
    total producers:22
    total messages:2008
    total failures:0
    Throughput:2485


### Step 4 - Move the blobs

Copy the generated blobs to the target blob store.
For the Nuxeo Stack configuration, the target seems to be `/data/nuxeo-binaries/data`

### Step 5 - Consume the customer messages

In [5]:
!import.sh -o import -t 16 -l import/customers4-us-east -r us-east -b / -a -w 3600 -bulk

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 16 
   blockPostCommitListeners: true 
   logName: import/customers4-us-east 
   blockDefaultSyncListeners: true 
   blockAsyncListeners: true 
   rootFolder: / 
   logSize: 8 
   blockIndexing: true 
   batchSize: 500 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runDocumentConsumersEx/@async/9c0e180b-6d60-4a81-837f-49dd508fc716/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
.
Running completed
elapsed:60.026
committed:984
failures:0
consumers:4
throughput:16.392896411554993

Exit after 120 s


In [6]:
!import.sh -o import -t 16 -l import/customers4-us-west -r us-west -b / -a -w 3600 -bulk

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 16 
   blockPostCommitListeners: true 
   logName: import/customers4-us-west 
   blockDefaultSyncListeners: true 
   blockAsyncListeners: true 
   rootFolder: / 
   logSize: 8 
   blockIndexing: true 
   batchSize: 500 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runDocumentConsumersEx/@async/d780aabd-59b4-4874-86d8-bba82bc094fd/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
.
Running completed
elapsed:80.022
committed:1024
failures:0
consumers:4
throughput:12.796480967733872

Exit after 120 s


In [7]:
# each line generates 2 messages  
# each message generates 1 document
(984+1024)==2008==1004*2

True

### Step 6 - Index the 2 repositories

In [8]:
# start indexing via BAF on repository us-east
! curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-east' -X POST -d '{"params":{},"context":{}}' -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/automation/Elasticsearch.BulkIndex

{"commandId":"6057b8cb-c073-466c-96c5-9252aee168cc"}

In [9]:
# check status
! cid="6057b8cb-c073-466c-96c5-9252aee168cc"; \
  curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-east' \
  -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/bulk/$cid


{"entity-type":"bulkStatus","commandId":"6057b8cb-c073-466c-96c5-9252aee168cc","state":"COMPLETED","processed":1026,"error":false,"errorCount":0,"total":1026,"action":"index","username":"Administrator","submitted":"2020-08-31T17:19:58.538Z","scrollStart":"2020-08-31T17:19:58.673Z","scrollEnd":"2020-08-31T17:19:58.695Z","processingStart":null,"processingEnd":null,"completed":"2020-08-31T17:20:07.706Z","processingMillis":0}

In [10]:
# start indexing via BAF on repository us-west
! curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-west' -X POST -d '{"params":{},"context":{}}' -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/automation/Elasticsearch.BulkIndex

{"commandId":"e06f1270-cc00-40bd-aff5-5aaa9cfbb92b"}

In [11]:
# check status
! cid="e06f1270-cc00-40bd-aff5-5aaa9cfbb92b"; \
  curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-west' \
  -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/bulk/$cid


{"entity-type":"bulkStatus","commandId":"e06f1270-cc00-40bd-aff5-5aaa9cfbb92b","state":"COMPLETED","processed":1066,"error":false,"errorCount":0,"total":1066,"action":"index","username":"Administrator","submitted":"2020-08-31T17:24:44.742Z","scrollStart":"2020-08-31T17:24:44.763Z","scrollEnd":"2020-08-31T17:24:44.774Z","processingStart":null,"processingEnd":null,"completed":"2020-08-31T17:24:49.366Z","processingMillis":0}

### Step 7 - Generate Accounts 

In [12]:
!rm metadata.csv
!mkdir -p accounts
!rm accounts/*
!gen.sh -t 4 -m pdf -d 1 -n 2500 -x letter -o fileDigestTree:../blobs/
# remove the first line that contains column names!    
!sed '1d' metadata.csv > accounts.csv 

rm: cannot remove 'accounts/*': No such file or directory
2020-08-31 17:25:50,628 main ERROR File contains an invalid element or attribute "filePattern"
2020-08-31 17:25:50,633 main ERROR appender File has no parameter that matches element Policies
Selected Generation mode:PDF
Output Driver:FolderDigestTreeWriter
Model = letter
Init Injector
  Threads:4
  nbDocs:2500
  nbMonths:1
----------------------------------------------------------
Starting thread 0
Starting thread 1
Starting thread 2
Starting thread 3
00 % - 0 / 2500
   Throughput:0 d/s using 2 threads
90 % - 2260 / 2500
   Throughput:1130 d/s using 4 threads
   Projected remaining time: 0 s
Flushing ...
Flush done.
----------------------------------------------------------
2504 files generated.
  Execution time: 2 s
  Average throughput:1252 docs/s
#### Projected generation time:
     - for 100M files: 0 day(s) and 22 hour(s)]
     - for 1B files: 9 day(s) and 5 hour(s)]
     - for 10B files: 92 day(s) and 10 hour(s)]


In [15]:
!head idcards.csv

61b1012da934624d40e6988cc2679502,IDCard-0E570-08A8E-53AE1E6-01.jpg,42635,Carolyn CATONE,CESAR CHAVEZ ON RAMP,Randalia,Iowa,Jan 01 2020,0E570-08A8E-53AE1E6-01,IOWA  ID,Jun 01 1975,null,Jun 01 2020
ae35abe2d40cb250ded587051b3ae3f6,IDCard-0D377-0F93A-3A16029-01.jpg,42775,Olamipo TUONG,15TH ST,Livonia Center,New York,Jan 01 2020,0D377-0F93A-3A16029-01,NEW YORK  ID,Jun 01 1975,null,Feb 01 2022
d4ac0806e25d5d6001bef8d05331c9b1,IDCard-0D377-0F93A-3A16029-02.jpg,42801,Olamipo TUONG,15TH ST,Livonia Center,New York,Jan 01 2020,0D377-0F93A-3A16029-02,NEW YORK  ID,Jul 01 1993,null,May 01 2020
fb2ea37e3bf5c984f1cc4aac92c710bd,IDCard-0D377-0F93A-3A16029-03.jpg,42702,Olamipo TUONG,15TH ST,Livonia Center,New York,Jan 01 2020,0D377-0F93A-3A16029-03,NEW YORK  ID,Jan 01 1970,null,Jun 01 2021
381da38b44127f3571015e30d3bd84e1,IDCard-13A20-0E86C-69A4770-01.jpg,42289,Dolton TOYNE,PRIEST ST,Walnut Grove,Alabama,Jan 01 2020,13A20-0E86C-69A4770-01,ALABAMA  ID,Jan 01 1992,null,Oct 01 2019
fa5344537a3ec64d7b

In [14]:
# for some reasons, the 2nd customer is generatated with 3 accounts ...
!sed '2,3d' idcards.csv > idcards-nodup.csv 
# Align accounts with customers
! wc -l accounts.csv
! wc -l idcards-nodup.csv


2504 accounts.csv
1002 idcards-nodup.csv


In [16]:
# this will check that for each account we have a customer and remove the orphaned accounts
!csvcheck.sh -i idcards-nodup.csv -a accounts.csv 

Reading the ids
Procesing /nxbench/notebooks/idcards-nodup.csv
collected 1002 keys
Procesing /nxbench/notebooks/accounts.csv

line processed: 2504 # accounts missing:483


In [17]:
!wc -l verified-accounts.csv
!wc -l accounts-withoutIDCard.csv
! rm accounts-withoutIDCard.csv
! rm idcards-nodup.csv

2021 verified-accounts.csv
483 accounts-withoutIDCard.csv


### Step 8 - Produce the account messages

In [18]:
!csvImport.sh -t 8 -p 4 -o AccountProducers -serverThreads 2 -b 100 -f verified-accounts.csv -m -l import/accounts

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
Connected to Nuxeo Server 11.3-SNAPSHOT
Started ThreadPool with 8 threads
Send batch 1

Send batch 2

Send batch 3

Send batch 4

Send batch 5

Send batch 6

Send batch 7

Send batch 8

Send batch 9

Send batch 10

Send batch 11

Send batch 12

Send batch 13

Send batch 14

Send batch 15

Send batch 16

Send batch 17

Send batch 18

Send batch 19

Send batch 20

   Throughput:25263 lines/s using 8 threads
 Queue size: 13
 Tasks count: 21
 ############################## 
 CSV stats:
    Throughput:25263 lines/s
    Total lines:2021
 Producers stats:
    total producers:42
    total messages:4042
    total failures:0
    Throughput:4091


In [19]:
# each line will generate 2 messages
(2021*2)==4042

True

### Step 9 - Copy / Move the blobs

Copy the generated blobs to the target blob store. 
For the Nuxeo Stack configuration, the target seems to be `/data/nuxeo-binaries/data`

### Step 10 - Consume the Accounts messages

In [20]:
!import.sh -o import -t 16 -l import/accounts-us-east -r us-east -b / -a -w 3600 -bulk

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 16 
   blockPostCommitListeners: true 
   logName: import/accounts-us-east 
   blockDefaultSyncListeners: true 
   blockAsyncListeners: true 
   rootFolder: / 
   logSize: 8 
   blockIndexing: true 
   batchSize: 500 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runDocumentConsumersEx/@async/5a3d2f96-ee31-4fcc-ae53-390ec6d1c6e5/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
.
Running completed
elapsed:60.058
committed:2022
failures:0
consumers:4
throughput:33.66745479369943

Exit after 120 s


In [21]:
!import.sh -o import -t 16 -l import/accounts-us-west -r us-west -b / -a -w 3600 -bulk

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 16 
   blockPostCommitListeners: true 
   logName: import/accounts-us-west 
   blockDefaultSyncListeners: true 
   blockAsyncListeners: true 
   rootFolder: / 
   logSize: 8 
   blockIndexing: true 
   batchSize: 500 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runDocumentConsumersEx/@async/c0824792-3959-4abf-9656-5fa0817baed6/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
.
Running completed
elapsed:60.034
committed:2020
failures:0
consumers:4
throughput:33.64759969350701

Exit after 120 s


In [22]:
committed = 2020 + 2022 
csvlines = 2021
produced = 4042
print (committed == 2* csvlines) 
print (committed == produced)

True
True


### Step 11 - index

In [23]:
! curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-east' -X POST -d '{"params":{},"context":{}}' -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/automation/Elasticsearch.BulkIndex

{"commandId":"9cdd2484-aa73-4bc5-9156-68260aefb6c9"}

In [24]:
# check status
! cid="9cdd2484-aa73-4bc5-9156-68260aefb6c9"; \
  curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-east' \
  -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/bulk/$cid


{"entity-type":"bulkStatus","commandId":"9cdd2484-aa73-4bc5-9156-68260aefb6c9","state":"COMPLETED","processed":3048,"error":false,"errorCount":0,"total":3048,"action":"index","username":"Administrator","submitted":"2020-08-31T17:44:24.490Z","scrollStart":"2020-08-31T17:44:24.520Z","scrollEnd":"2020-08-31T17:44:24.546Z","processingStart":null,"processingEnd":null,"completed":"2020-08-31T17:44:26.074Z","processingMillis":0}

In [25]:
! curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-west' -X POST -d '{"params":{},"context":{}}' -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/automation/Elasticsearch.BulkIndex

{"commandId":"89f23236-863f-43a1-997c-7580815528d6"}

In [26]:
# check status
! cid="89f23236-863f-43a1-997c-7580815528d6"; \
  curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-west' \
  -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/bulk/$cid


{"entity-type":"bulkStatus","commandId":"89f23236-863f-43a1-997c-7580815528d6","state":"COMPLETED","processed":3086,"error":false,"errorCount":0,"total":3086,"action":"index","username":"Administrator","submitted":"2020-08-31T17:44:59.479Z","scrollStart":"2020-08-31T17:44:59.580Z","scrollEnd":"2020-08-31T17:44:59.625Z","processingStart":null,"processingEnd":null,"completed":"2020-08-31T17:45:05.970Z","processingMillis":0}

### Step 12 - Produce Statements messages

Statements pdfs are generated on the fly by the customer Binarymanager, so we do not need to pre-generate csv+ binaries like for Customers and Accounts: we can directly produce the messages.

Since we have 2021 accounts, we need to generate:

 - 2021 * 6 statements to go in the east/west repositories
 - 2021 * 54 statements to go in the archives repository

In [44]:
2021*6

12126

The Statement generator will always round the number of messages to a multiple of the number of threads.
So, if we run with 64 threads, the target number should be:


In [51]:
int(2021*6 / 64)*64

12096

In [27]:
!import.sh -o statements -l import/Statements -m  -d 6 -n 12096 -t 64 -p 16 -a -w 18000


Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runStatementProducers
   nbThreads: 64 
   split: true 
   logName: import/Statements 
   seed: 2020 
   nbDocuments: 12096 
   logSize: 16 
   skip: 0 
   nbMonths: 6 
   monthOffset: 0 
   storeInCustomerFolder: true 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runStatementProducers/@async/653e33c2-5758-4822-b7d6-945a223e4246/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured

Running completed
elapsed:1.352
failures:0
messages:12096
throughput:8946.745562130176
producers:64

Exit after 60 s


### Step 13 - Consume the Statements messages

In [28]:
!import.sh -o import -t 16 -l import/Statements-us-east -r us-east -b / -a -w 3600 -bulk

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 16 
   blockPostCommitListeners: true 
   logName: import/Statements-us-east 
   blockDefaultSyncListeners: true 
   blockAsyncListeners: true 
   rootFolder: / 
   logSize: 8 
   blockIndexing: true 
   batchSize: 500 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runDocumentConsumersEx/@async/cf99e23f-93f1-480f-b8b3-4854a24069ed/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
.
Running completed
elapsed:104.594
committed:6048
failures:0
consumers:16
throughput:57.82358452683711

Exit after 120 s


In [29]:
!import.sh -o import -t 16 -l import/Statements-us-west -r us-west -b / -a -w 3600 -bulk

Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 16 
   blockPostCommitListeners: true 
   logName: import/Statements-us-west 
   blockDefaultSyncListeners: true 
   blockAsyncListeners: true 
   rootFolder: / 
   logSize: 8 
   blockIndexing: true 
   batchSize: 500 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runDocumentConsumersEx/@async/cbb2c9a8-5414-41e0-adc6-8401b242b7e1/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
.
Running completed
elapsed:100.05
committed:6048
failures:0
consumers:16
throughput:60.44977511244378

Exit after 120 s


In [30]:
#sanity check
(6048+6048)==12096

True

### Step 14 - final Index

In [31]:
! curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-east' -X POST -d '{"params":{},"context":{}}' -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/automation/Elasticsearch.BulkIndex

{"commandId":"e9bfe5eb-af5f-4861-bb65-b5f9901c7ff6"}

In [32]:
# check status
! cid="e9bfe5eb-af5f-4861-bb65-b5f9901c7ff6"; \
  curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-east' \
  -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/bulk/$cid


{"entity-type":"bulkStatus","commandId":"e9bfe5eb-af5f-4861-bb65-b5f9901c7ff6","state":"COMPLETED","processed":9096,"error":false,"errorCount":0,"total":9096,"action":"index","username":"Administrator","submitted":"2020-08-31T17:54:35.483Z","scrollStart":"2020-08-31T17:54:35.550Z","scrollEnd":"2020-08-31T17:54:35.685Z","processingStart":null,"processingEnd":null,"completed":"2020-08-31T17:54:41.061Z","processingMillis":0}

In [33]:
! curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-west' -X POST -d '{"params":{},"context":{}}' -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/automation/Elasticsearch.BulkIndex

{"commandId":"00b82bed-8449-4d7a-bd6e-fd369b541707"}

In [34]:
# check status
! cid="00b82bed-8449-4d7a-bd6e-fd369b541707"; \
  curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:us-east' \
  -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/bulk/$cid


{"entity-type":"bulkStatus","commandId":"00b82bed-8449-4d7a-bd6e-fd369b541707","state":"COMPLETED","processed":9134,"error":false,"errorCount":0,"total":9134,"action":"index","username":"Administrator","submitted":"2020-08-31T17:55:28.370Z","scrollStart":"2020-08-31T17:55:28.446Z","scrollEnd":"2020-08-31T17:55:28.525Z","processingStart":null,"processingEnd":null,"completed":"2020-08-31T17:55:39.783Z","processingMillis":0}

### Step 15 - Generate the archived Statements

We need to generate 2021 * 54 and do it for 54 months skipping the first 6 months.
The generation need to happen without the multi-repository split.

In [35]:
int(2021*54 / 64)*64

109120

In [4]:
!import.sh -o statements -l import/Statements_Archive -storeInRoot -monthOffset 6 -d 54 -n 109120 -t 64 -p 16 -a -w 18000


Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runStatementProducers
   nbThreads: 64 
   split: false 
   logName: import/Statements_Archive 
   seed: 2020 
   nbDocuments: 109120 
   logSize: 16 
   storeInRoot: true 
   skip: 0 
   nbMonths: 54 
   monthOffset: 6 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runStatementProducers/@async/b991f60d-3df6-4730-b71d-08fd7f3954d2/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured

Running completed
elapsed:11.992
failures:0
messages:109120
throughput:9099.399599733155
producers:64

Exit after 60 s


### Step 16 - Consume the Archived Statements

In [5]:
!import.sh -o import -t 16 -l import/Statements_Archive -r archives -b / -a -w 3600 -bulk


Using config /nxbench/notebooks/nuxeo.properties
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
 (async mode actiavted)
Connected to Nuxeo Server 11.3-SNAPSHOT
Running Operation:StreamImporter.runDocumentConsumersEx
   nbThreads: 16 
   blockPostCommitListeners: true 
   logName: import/Statements_Archive 
   blockDefaultSyncListeners: true 
   blockAsyncListeners: true 
   rootFolder: / 
   logSize: 8 
   blockIndexing: true 
   batchSize: 500 
Async Automation Execution Scheduled
  => status url:[http://nuxeo.docker.localhost/nuxeo/site/api/v1/automation/StreamImporter.runDocumentConsumersEx/@async/644d2896-34ac-44b8-8d82-e0da96ed9d24/status]
#####################
Execution completed

waiting for end of Async Exec
url=http://nuxeo.docker.localhost/nuxeo
login=Administrator
Nuxeo Client configured
.
Running completed
elapsed:119.148
committed:109120
failures:0
consumers:16
throughput:915.8357672810287

Exit after 120 s


### Step 17 - Index Archives

In [6]:
! curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:archives' -X POST -d '{"params":{},"context":{}}' -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/automation/Elasticsearch.BulkIndex

{"commandId":"de18cda5-9a7f-4f94-b578-37588ca3a063"}

In [9]:
# check status
! cid="de18cda5-9a7f-4f94-b578-37588ca3a063"; \
  curl -H 'Content-Type:application/json+nxrequest' -H 'X-NXRepository:archives' \
  -u $NXUSER:$NXPWD http://nuxeo.docker.localhost/nuxeo/api/v1/bulk/$cid


{"entity-type":"bulkStatus","commandId":"de18cda5-9a7f-4f94-b578-37588ca3a063","state":"COMPLETED","processed":109136,"error":false,"errorCount":0,"total":109136,"action":"index","username":"Administrator","submitted":"2020-08-31T18:54:36.176Z","scrollStart":"2020-08-31T18:54:36.195Z","scrollEnd":"2020-08-31T18:54:37.666Z","processingStart":null,"processingEnd":null,"completed":"2020-08-31T18:55:38.478Z","processingMillis":0}