Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gentics Mesh Helm Chart #1

Open
Jotschi opened this issue Oct 18, 2018 · 48 comments
Open

Gentics Mesh Helm Chart #1

Jotschi opened this issue Oct 18, 2018 · 48 comments

Comments

@Jotschi
Copy link
Contributor

Jotschi commented Oct 18, 2018

It would be useful if we could provide a helm chart in our public k8s helm repo for Gentics Mesh.

@Jotschi
Copy link
Contributor Author

Jotschi commented Oct 18, 2018

@cschockaert Do you perhaps already use helm charts for Gentics Mesh? How do you setup and maintain your deployment?

@cschockaert
Copy link

cschockaert commented Oct 19, 2018

Yes, i created a helm chart to get getmesh working in cluster, i can share it with you if you want.
At the moment the chart is in our personnal private repository, but i can push it to github.

I don't really understand the question 'how do you setup and maintain your deployment', you mean how did i wrote the helm chart, or how do we manage the lifecycle of our apps using getmesh?

@Jotschi
Copy link
Contributor Author

Jotschi commented Oct 19, 2018

@cschockaert It would be great if you could share the repo. We would like to provide a helm chart for mesh.
My question was aiming on how you manage the lifecycle of mesh itself. Do you just update the chart and scale down / up?

@cschockaert
Copy link

@Jotschi currently upgrading our chart to get working with 0.27.0, ill push it to github after that

@cschockaert
Copy link

@Jotschi
Copy link
Contributor Author

Jotschi commented Oct 23, 2018

@cschockaert Great! I'll take a look. Thanks

@cschockaert
Copy link

cschockaert commented Nov 1, 2018

@Jotschi have you tested the chart?

I dont know what we are doing wrong, but when we are in multi master mode and we modify node in mesh-ui instance throught a reserve proxy that roundrobin requests, we arrive (very fast) to a situation were instances are not 'synchronized' together anymore.

It's why we stay in 1 master and 1 mesh-ui instance for editing nodes, then x slaves (replicas) used for quering GET requests only.

are you sure that cluster mode is working properly? even under heavy load modifications / publish of the same node in multiples instances at the same time?

@Jotschi
Copy link
Contributor Author

Jotschi commented Nov 1, 2018

@cschockaert We have customers which run Gentics Mesh in clustered mode which system in-front which regularly updates lots of nodes. The UI however is not much used in that setup.

That said I can only think of two things:

  • The UI has a internal state (stored in the browser) which it updates only if it deems needed. Could you check whether your "async" issue is even visible if you login again via a anonymous tab? If the issue is no longer visible that means that the stored state was not updated. If the issue is still visible it would be helpful if you could list the actions you did so that I can try to reproduce the issue.

  • Secondly (I don't think that is the case for you) - Do all instances use the same Elasticsearch instance? Are all instances clustered? The does at the moment highly rely on the Elasticsearch. If you run multiple instances of Gentics Mesh and each instance uses a dedicated ES you would also run into "sync" issues.

  • Is the sync issue persistent or does it disappear after a few seconds?

I have not yet tested the chart. I hope to find some time soon. I would like to create a dedicated Repo for the helm chart under the Gentics Github umbrella. One thing I would like to change is the configuration management. I think it would be easier to use env variables only instead of configmaps for the config yaml file.

@cschockaert
Copy link

cschockaert commented Nov 4, 2018

@Jotschi
yes we use a dedicated ES instance.

It's perhaps relative to mesh ui internal state. I think perhaps when we save & publish, mesh-ui instance A save the node, but it's mesh-ui instance B that publish the node, (because XHR request from mesh-ui are loadbalanced to differents masters instances)

In my remember i saw errors log server side about index exceptions.

I'll try to test again the errors we are getting and report them here if needed.

if there are an issue only in mesh-ui but not in mesh core api, a simple solution would be to sticky browser to 1 instance with a cookie for eg

@zhou
Copy link

zhou commented May 10, 2019

I got the Helm Chart working for the non-cluster setup. very cool.

However, I cannot get the cluster to work - we need this for production setup. I was able to setup the external Elasticsearch (not embedded). But when I started the master pod. It complaints about a
java.lang.NullPointerException at
com.gentics.mesh.graphdb.OrientDBDatabase.startServer(OrientDBDatabase.java:469)

Then the DB and Mesh shut themselves down. The logging trace is show below. Could you please take a look. It seems the DistributedManager is Null. Is the plugin (ODistributedAbstractPlugin?) missing in the configuration? @cprerovsky Could you please take a look? Thanks.

20:13:44.334 [amx-mesh-getmesh-master-0] INFO  [main] - Extracting OrientDB Studio
20:13:44.362 [amx-mesh-getmesh-master-0] INFO  [main] - Starting OrientDB Server
2019-05-10 20:13:44:331 INFO  Detected limit of amount of simultaneously open files is 1048576,  limit of open files for disk cache will be set to 523776 [ONative]
2019-05-10 20:13:44:384 INFO  Loading configuration from input stream [OServerConfigurationLoaderXml]
2019-05-10 20:13:44:640 INFO  OrientDB Server v3.0.18 - Veloce (build 747595e790a081371496f3bb9c57cec395644d82, branch 3.0.x) is starting up... [OServer]
2019-05-10 20:13:44:648 INFO  System is started under an effective user : `mesh` [OEngineLocalPaginated]
2019-05-10 20:13:44:653 INFO  WAL maximum segment size is set to 4,820 MB [OrientDBDistributed]
2019-05-10 20:13:44:656 INFO  Databases directory: /mesh [OServer]
2019-05-10 20:13:44:664 INFO  Creating the system database 'OSystem' for current server [OSystemDatabase]
2019-05-10 20:13:44:666 INFO  Direct IO for WAL located in /mesh/OSystem is allowed with block size 4096 bytes. [OCASDiskWriteAheadLog]
2019-05-10 20:13:44:667 INFO  Page size for WAL located in /mesh/OSystem is set to 4096 bytes. [OCASDiskWriteAheadLog]
2019-05-10 20:13:44:774 INFO  Storage 'plocal:/mesh/OSystem' is created under OrientDB distribution : 3.0.18 - Veloce (build 747595e790a081371496f3bb9c57cec395644d82, branch 3.0.x) [OLocalPaginatedStorage]
2019-05-10 20:13:45:210 INFO  Listening binary connections on 127.0.0.1:2424 (protocol v.37, socket=default) [OServerNetworkListener]
2019-05-10 20:13:45:218 INFO  Listening http connections on 127.0.0.1:2480 (protocol v.10, socket=default) [OServerNetworkListener]
2019-05-10 20:13:45:228 INFO  Found ORIENTDB_ROOT_PASSWORD variable, using this value as root's password [OServer]
2019-05-10 20:13:45:475 INFO  ODefaultPasswordAuthenticator is active [ODefaultPasswordAuthenticator]
2019-05-10 20:13:45:478 INFO  OServerConfigAuthenticator is active [OServerConfigAuthenticator]
2019-05-10 20:13:45:481 INFO  OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2019-05-10 20:13:45:492 WARNI GREMLIN language not available (not in classpath) [OGremlinHelper]
2019-05-10 20:13:45:498 INFO  OrientDB Studio available at http://127.0.0.1:2480/studio/index.html [OServer]
20:13:45.509 [amx-mesh-getmesh-master-0] ERROR [main] - Error while starting mesh
java.lang.NullPointerException: null
        at com.gentics.mesh.graphdb.OrientDBDatabase.startServer(OrientDBDatabase.java:469) ~[mesh.jar:na]
        at com.gentics.mesh.cli.BootstrapInitializerImpl.init(BootstrapInitializerImpl.java:246) ~[mesh.jar:na]
        at com.gentics.mesh.cli.MeshImpl.run(MeshImpl.java:131) [mesh.jar:na]
        at com.gentics.mesh.cli.MeshImpl.run(MeshImpl.java:95) [mesh.jar:na]
        at com.gentics.mesh.server.ServerRunner.main(ServerRunner.java:60) [mesh.jar:na]
20:13:45.510 [amx-mesh-getmesh-master-0] INFO  [main] - Mesh shutting down...
2019-05-10 20:13:45:498 INFO  OrientDB Server is active v3.0.18 - Veloce (build 747595e790a081371496f3bb9c57cec395644d82, branch 3.0.x). [OServer]
2019-05-10 20:13:45:599 INFO  OrientDB Server is shutting down... [OServer]
2019-05-10 20:13:45:600 INFO  Shutting down listeners: [OServer]
2019-05-10 20:13:45:602 INFO  - ONetworkProtocolBinary localhost/127.0.0.1:2424: [OServer]
2019-05-10 20:13:45:607 INFO  - ONetworkProtocolHttpDb localhost/127.0.0.1:2480: [OServer]
2019-05-10 20:13:45:608 INFO  Shutting down protocols [OServer]
2019-05-10 20:13:45:610 INFO  Shutting down plugins: [OServerPluginManager]
2019-05-10 20:13:45:610 INFO  - graph [OServerPluginManager]
2019-05-10 20:13:45:610 INFO  Shutting down databases: [OServer]
2019-05-10 20:13:45:610 INFO  Orient Engine is shutting down... [Orient]
2019-05-10 20:13:45:611 INFO  - shutdown storage: OSystem... [OrientDBDistributed]
8.294: [GC (System.gc()) [PSYoungGen: 40077K->4418K(263680K)] 47625K->11966K(643072K), 0.0080520 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
8.302: [Full GC (System.gc()) [PSYoungGen: 4418K->0K(263680K)] [ParOldGen: 7547K->10498K(379392K)] 11966K->10498K(643072K), [Metaspace: 41115K->41115K(1087488K)], 0.1080328 secs] [Times: user=0.14 sys=0.00, real=0.11 secs] 
2019-05-10 20:13:45:787 WARNI Storing configuration for property:'plugin.directory' not existing in current version [OClusterBasedStorageConfiguration]
2019-05-10 20:13:45:947 INFO  Clearing byte buffer pool [Orient]
2019-05-10 20:13:45:948 INFO  OrientDB Engine shutdown complete [Orient]
2019-05-10 20:13:45:953 INFO  OrientDB Server shutdown complete
20:13:45.961 [] INFO  [main] - Shutdown completed...
20:13:45.968 [] INFO  [Thread-0] - Mesh shutting down...
20:13:45.968 [] INFO  [Thread-0] - Shutdown completed...

@Jotschi
Copy link
Contributor Author

Jotschi commented May 10, 2019

@zhou Which version Gentics Mesh are you using?

@zhou
Copy link

zhou commented May 10, 2019

@Jotschi I upgraded to 0.32.0. Single node setup works great.

@Jotschi
Copy link
Contributor Author

Jotschi commented May 10, 2019

@zhou Thanks for the info. I have to check this.

@cschockaert
Copy link

The chart is not the lastest version i ll update it

@zhou
Copy link

zhou commented May 10, 2019

@Jotschi Thanks. I'm using a modified Helm Chart version of this. https://github.com/cschockaert/getmesh-chart

@zhou
Copy link

zhou commented May 10, 2019

@cschockaert do you have the cluster working in your latest chart? Looking forward to the update. Thanks!

@cschockaert
Copy link

Yes we are running on production with cluster enabled on GCP. You need an NFS to get it working

@zhou
Copy link

zhou commented May 10, 2019

I figured out the way to mount a NFS volume in AWS.

@zhou
Copy link

zhou commented May 10, 2019

@cschockaert very curious to see what's missing to setup the cluster. Is a handler missing in orientdb-server-config.xml?

@zhou
Copy link

zhou commented May 13, 2019

@cschockaert I know you must be busy. But if you get a chance, could you please update the chart? or could you please point me to the right direction. Really appreciate it.

@cschockaert
Copy link

Well il try to update it but i need to clean it from sensitive informations

@Jotschi
Copy link
Contributor Author

Jotschi commented May 13, 2019

@zhou Can you post the full startup log and your settings? What is your node and cluster name? Somehow the OrientDB Server can't start in distributed mode. I suspect a configuration issue.

@zhou
Copy link

zhou commented May 13, 2019

@Jotschi I will post the full log after I tried it in another namespace. the settings are mostly in the original helm chart.

@zhou
Copy link

zhou commented May 13, 2019

@Jotschi Please find the full log below:

kubectl logs pod/amx-mesh1-getmesh-master-0 --namespace=amx-mesh1 --container=getmesh -f

  • ls -alrth
    total 146648
    -rw-r--r-- 1 root root 143.2M May 8 16:19 mesh.jar
    drwxr-sr-x 2 mesh mesh 4.0K May 8 16:22 data
    lrwxrwxrwx 1 root mesh 7 May 8 16:22 config -> /config
    lrwxrwxrwx 1 root mesh 14 May 8 16:23 elasticsearch -> /elasticsearch
    drwxr-sr-x 1 mesh mesh 4.0K May 8 16:23 .
    drwxr-xr-x 1 root root 4.0K May 13 19:52 ..
  • export MESH_NODE_NAME=amx-mesh1-getmesh-master-0
  • exec java -Xmx2048m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCCause -DignoreSnapshotUpgradeCheck=true -cp /mesh/mesh.jar com.gentics.mesh.server.ServerRunner -initCluster
    Picked up JAVA_TOOL_OPTIONS: -Xms512m -Xmx512m -XX:MaxDirectMemorySize=256m -Dstorage.diskCache.bufferSize=256
    19:52:43.060 [] INFO [main] - Configuration file {mesh.yml} was not found within classpath.
    19:52:43.134 [] INFO [main] - Loading configuration file {config/mesh.yml}.
    19:52:43.453 [] INFO [main] - Setting env via field access {MESH_CLUSTER_ENABLED=true}
    19:52:43.454 [] INFO [main] - Setting env via field access {MESH_CLUSTER_NAME=default}
    19:52:43.454 [] INFO [main] - Setting env via field access {MESH_GRAPH_DB_DIRECTORY=/graphdb}
    19:52:43.454 [] INFO [main] - Setting env via field access {MESH_GRAPH_BACKUP_DIRECTORY=/backups}
    19:52:43.454 [] INFO [main] - Setting env via field access {MESH_ELASTICSEARCH_URL=http://amx-mesh1-elasticsearch-client:9200}
    19:52:43.454 [] INFO [main] - Setting env via field access {MESH_ELASTICSEARCH_START_EMBEDDED=false}
    19:52:43.454 [] INFO [main] - Setting env via field access {MESH_BINARY_DIR=/uploads}
    19:52:43.455 [] INFO [main] - Setting env via field access {MESH_AUTH_KEYSTORE_PATH=/config/keystore.jceks}
    19:52:43.455 [] INFO [main] - Setting env via field access {MESH_PLUGIN_DIR=/plugins}
    19:52:43.455 [] INFO [main] - Setting env via field access {MESH_NODE_NAME=amx-mesh1-getmesh-master-0}
    19:52:43.498 [amx-mesh1-getmesh-master-0] INFO [main] - ###############################################################
    19:52:43.499 [amx-mesh1-getmesh-master-0] INFO [main] - # Mesh Version 0.32.0 2019-05-08T16:18:36Z #
    19:52:43.499 [amx-mesh1-getmesh-master-0] INFO [main] - # Gentics Software #
    19:52:43.499 [amx-mesh1-getmesh-master-0] INFO [main] - #-------------------------------------------------------------#
    19:52:43.502 [amx-mesh1-getmesh-master-0] INFO [main] - # Vert.x Version: 3.7.0 #
    19:52:43.502 [amx-mesh1-getmesh-master-0] INFO [main] - # Cluster Name: default #
    19:52:43.502 [amx-mesh1-getmesh-master-0] INFO [main] - # Mesh Node Name: amx-mesh1-getmesh-master-0 #
    19:52:43.502 [amx-mesh1-getmesh-master-0] INFO [main] - ###############################################################
    May 13, 2019 7:52:44 PM com.orientechnologies.common.log.OLogManager log
    INFO: Detected limit of amount of simultaneously open files is 1048576, limit of open files for disk cache will be set to 523776
    1.463: [GC (Metadata GC Threshold) [PSYoungGen: 65798K->13908K(153088K)] 65798K->13988K(502784K), 0.0194363 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
    1.483: [Full GC (Metadata GC Threshold) [PSYoungGen: 13908K->0K(153088K)] [ParOldGen: 80K->13374K(349696K)] 13988K->13374K(502784K), [Metaspace: 20537K->20537K(1067008K)], 0.0479472 secs] [Times: user=0.07 sys=0.01, real=0.05 secs]
    19:52:44.766 [amx-mesh1-getmesh-master-0] INFO [main] - Init cluster flag was found. Creating initial graph database now.
    May 13, 2019 7:52:44 PM com.orientechnologies.common.log.OLogManager log
    INFO: OrientDB config DISKCACHE=256MB
    May 13, 2019 7:52:44 PM com.orientechnologies.common.log.OLogManager log
    INFO: System is started under an effective user : mesh
    May 13, 2019 7:52:44 PM com.orientechnologies.common.log.OLogManager log
    INFO: WAL maximum segment size is set to 1,117 MB
    May 13, 2019 7:52:45 PM com.orientechnologies.common.log.OLogManager log
    INFO: Direct IO for WAL located in /graphdb/storage is allowed with block size 4096 bytes.
    May 13, 2019 7:52:45 PM com.orientechnologies.common.log.OLogManager log
    INFO: Page size for WAL located in /graphdb/storage is set to 4096 bytes.
    May 13, 2019 7:52:46 PM com.orientechnologies.common.log.OLogManager log
    INFO: Storage 'plocal:/graphdb/storage' is opened under OrientDB distribution : 3.0.18 - Veloce (build 747595e790a081371496f3bb9c57cec395644d82, branch 3.0.x)
    3.795: [GC (Allocation Failure) [PSYoungGen: 131584K->10988K(176640K)] 144958K->24370K(526336K), 0.0203847 secs] [Times: user=0.02 sys=0.01, real=0.02 secs]
    5.018: [GC (Metadata GC Threshold) [PSYoungGen: 36109K->7828K(217600K)] 49492K->21219K(567296K), 0.0160170 secs] [Times: user=0.02 sys=0.01, real=0.02 secs]
    5.034: [Full GC (Metadata GC Threshold) [PSYoungGen: 7828K->0K(217600K)] [ParOldGen: 13390K->11528K(395264K)] 21219K->11528K(612864K), [Metaspace: 34408K->34408K(1081344K)], 0.0756587 secs] [Times: user=0.11 sys=0.00, real=0.07 secs]
    19:52:47.993 [amx-mesh1-getmesh-master-0] WARN [main] - You disabled the upgrade check for snapshot upgrades. Please note that upgrading a snapshot version to a release version could create unforseen errors since the snapshot may have altered your data in a way which was not anticipated by the release.
    19:52:47.994 [amx-mesh1-getmesh-master-0] WARN [main] - Press any key to continue. This warning will only be shown once.
    19:52:47.994 [amx-mesh1-getmesh-master-0] INFO [main] - Invoking database changelog check...
    19:52:48.125 [amx-mesh1-getmesh-master-0] INFO [main] - Creating database indices. This may take a few seconds...
    19:52:49.068 [amx-mesh1-getmesh-master-0] INFO [main] - Changelog completed.
    19:52:49.068 [amx-mesh1-getmesh-master-0] INFO [main] - Updating stored database revision and mesh version.
    May 13, 2019 7:52:49 PM com.orientechnologies.common.log.OLogManager log
    INFO: Orient Engine is shutting down...
    May 13, 2019 7:52:49 PM com.orientechnologies.common.log.OLogManager log
    INFO: - shutdown storage: storage...
    6.514: [GC (System.gc()) [PSYoungGen: 90356K->2195K(263680K)] 101884K->13731K(658944K), 0.0217463 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
    6.536: [Full GC (System.gc()) [PSYoungGen: 2195K->0K(263680K)] [ParOldGen: 11536K->7586K(395264K)] 13731K->7586K(658944K), [Metaspace: 36482K->36482K(1083392K)], 0.1505813 secs] [Times: user=0.16 sys=0.00, real=0.15 secs]
    May 13, 2019 7:52:49 PM com.orientechnologies.common.log.OLogManager log
    INFO: Clearing byte buffer pool
    May 13, 2019 7:52:49 PM com.orientechnologies.common.log.OLogManager log
    INFO: OrientDB Engine shutdown complete

19:52:49.457 [amx-mesh1-getmesh-master-0] INFO [main] - Extracting OrientDB Studio
19:52:49.485 [amx-mesh1-getmesh-master-0] INFO [main] - Starting OrientDB Server
2019-05-13 19:52:49:449 INFO Detected limit of amount of simultaneously open files is 1048576, limit of open files for disk cache will be set to 523776 [ONative]
2019-05-13 19:52:49:516 INFO Loading configuration from input stream [OServerConfigurationLoaderXml]
2019-05-13 19:52:49:772 INFO OrientDB Server v3.0.18 - Veloce (build 747595e790a081371496f3bb9c57cec395644d82, branch 3.0.x) is starting up... [OServer]
2019-05-13 19:52:49:780 INFO System is started under an effective user : mesh [OEngineLocalPaginated]
2019-05-13 19:52:49:781 INFO WAL maximum segment size is set to 4,787 MB [OrientDBDistributed]
2019-05-13 19:52:49:784 INFO Databases directory: /mesh [OServer]
2019-05-13 19:52:49:790 INFO Creating the system database 'OSystem' for current server [OSystemDatabase]
2019-05-13 19:52:49:792 INFO Direct IO for WAL located in /mesh/OSystem is allowed with block size 4096 bytes. [OCASDiskWriteAheadLog]
2019-05-13 19:52:49:793 INFO Page size for WAL located in /mesh/OSystem is set to 4096 bytes. [OCASDiskWriteAheadLog]
2019-05-13 19:52:49:904 INFO Storage 'plocal:/mesh/OSystem' is created under OrientDB distribution : 3.0.18 - Veloce (build 747595e790a081371496f3bb9c57cec395644d82, branch 3.0.x) [OLocalPaginatedStorage]
2019-05-13 19:52:50:297 INFO Listening binary connections on 127.0.0.1:2424 (protocol v.37, socket=default) [OServerNetworkListener]
2019-05-13 19:52:50:307 INFO Listening http connections on 127.0.0.1:2480 (protocol v.10, socket=default) [OServerNetworkListener]
2019-05-13 19:52:50:317 INFO Found ORIENTDB_ROOT_PASSWORD variable, using this value as root's password [OServer]
2019-05-13 19:52:50:528 INFO ODefaultPasswordAuthenticator is active [ODefaultPasswordAuthenticator]
2019-05-13 19:52:50:534 INFO OServerConfigAuthenticator is active [OServerConfigAuthenticator]
2019-05-13 19:52:50:536 INFO OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2019-05-13 19:52:50:543 WARNI GREMLIN language not available (not in classpath) [OGremlinHelper]
2019-05-13 19:52:50:550 INFO OrientDB Studio available at http://127.0.0.1:2480/studio/index.html [OServer]
19:52:50.560 [amx-mesh1-getmesh-master-0] ERROR [main] - Error while starting mesh
java.lang.NullPointerException: null
at com.gentics.mesh.graphdb.OrientDBDatabase.startServer(OrientDBDatabase.java:469) ~[mesh.jar:na]
at com.gentics.mesh.cli.BootstrapInitializerImpl.init(BootstrapInitializerImpl.java:246) ~[mesh.jar:na]
at com.gentics.mesh.cli.MeshImpl.run(MeshImpl.java:131) [mesh.jar:na]
at com.gentics.mesh.cli.MeshImpl.run(MeshImpl.java:95) [mesh.jar:na]
at com.gentics.mesh.server.ServerRunner.main(ServerRunner.java:60) [mesh.jar:na]
19:52:50.561 [amx-mesh1-getmesh-master-0] INFO [main] - Mesh shutting down...
2019-05-13 19:52:50:553 INFO OrientDB Server is active v3.0.18 - Veloce (build 747595e790a081371496f3bb9c57cec395644d82, branch 3.0.x). [OServer]
2019-05-13 19:52:50:620 INFO OrientDB Server is shutting down... [OServer]
2019-05-13 19:52:50:625 INFO Shutting down listeners: [OServer]
2019-05-13 19:52:50:626 INFO - ONetworkProtocolBinary localhost/127.0.0.1:2424: [OServer]
2019-05-13 19:52:50:630 INFO - ONetworkProtocolHttpDb localhost/127.0.0.1:2480: [OServer]
2019-05-13 19:52:50:631 INFO Shutting down protocols [OServer]
2019-05-13 19:52:50:631 INFO Shutting down plugins: [OServerPluginManager]
2019-05-13 19:52:50:631 INFO - graph [OServerPluginManager]
2019-05-13 19:52:50:631 INFO Shutting down databases: [OServer]
2019-05-13 19:52:50:632 INFO Orient Engine is shutting down... [Orient]
2019-05-13 19:52:50:632 INFO - shutdown storage: OSystem... [OrientDBDistributed]
8.168: [GC (System.gc()) [PSYoungGen: 40815K->4422K(263680K)] 48402K->12008K(658944K), 0.0082943 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
8.176: [Full GC (System.gc()) [PSYoungGen: 4422K->0K(263680K)] [ParOldGen: 7586K->10488K(395264K)] 12008K->10488K(658944K), [Metaspace: 41163K->41163K(1087488K)], 0.1015348 secs] [Times: user=0.19 sys=0.00, real=0.10 secs]
2019-05-13 19:52:50:859 WARNI Storing configuration for property:'plugin.directory' not existing in current version [OClusterBasedStorageConfiguration]
2019-05-13 19:52:51:012 INFO Clearing byte buffer pool [Orient]
2019-05-13 19:52:51:013 INFO OrientDB Engine shutdown complete [Orient]
2019-05-13 19:52:51:013 INFO OrientDB Server shutdown complete
19:52:51.016 [] INFO [main] - Shutdown completed...
19:52:51.024 [] INFO [Thread-0] - Mesh shutting down...
19:52:51.024 [] INFO [Thread-0] - Shutdown completed...
Heap
PSYoungGen total 263680K, used 9002K [0x00000000d5580000, 0x00000000e8980000, 0x0000000100000000)
eden space 242176K, 3% used [0x00000000d5580000,0x00000000d5e4ab90,0x00000000e4200000)
from space 21504K, 0% used [0x00000000e4200000,0x00000000e4200000,0x00000000e5700000)
to space 21504K, 0% used [0x00000000e7480000,0x00000000e7480000,0x00000000e8980000)
ParOldGen total 395264K, used 10488K [0x0000000080000000, 0x0000000098200000, 0x00000000d5580000)
object space 395264K, 2% used [0x0000000080000000,0x0000000080a3e198,0x0000000098200000)
Metaspace used 41165K, capacity 42050K, committed 42456K, reserved 1087488K
class space used 4945K, capacity 5292K, committed 5376K, reserved 1048576K

@Jotschi
Copy link
Contributor Author

Jotschi commented May 13, 2019

I did not spot anything problematic except that you seem to run mesh via:

exec java -Xmx2048m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCCause -DignoreSnapshotUpgradeCheck=true -cp /mesh/mesh.jar com.gentics.mesh.server.ServerRunner -initCluster

Normally you would not use ignoreSnapshotUpgradeCheck. Is there a reason why you use this setting? Additionally. I would recommend to use -initCluster only once and after the database has been setup to no longer use this setting. But it should be ignored once a db has been set up.

You could try a simpler node name (e.g. test). But I don't think the dashes are problematic.

@zhou
Copy link

zhou commented May 13, 2019

@Jotschi I removed the ignoreSnapshotUpgradeCheck and tried it. It's the same. Also, I don't understand why this would cause the NullPointerException when trying to get the distributedManager.

@zhou
Copy link

zhou commented May 14, 2019

@cschockaert understood. please take your time and post the update whenever is appropriate. Thanks.

@cschockaert
Copy link

@zhou updated chart in this branch: cschockaert/getmesh-chart#1

this is a wip version, but it's working in cluster mode,

there is a dedicated image for the ui but you can change that,
you need to add a keystore in secrets/ and a keystore password

@zhou
Copy link

zhou commented May 14, 2019

@cschockaert Thanks a lot. I will check it out today. I did already create a keystore.

@zhou
Copy link

zhou commented May 14, 2019

@cschockaert thanks for the updated chart. I still in process to test it out.

I see you use two images: mesh-tools and mesh-ui. What are those? and why do we have to separate them from the mesh itself?

@cschockaert
Copy link

Mesh ui is a forked version of old getmesh ui coz we needed a Quick and dirty fix on it.
Its a getmesh ui standalone without getmesh server api.
Mesh tool is a getmesh image with some scripts and tools for backup and restore

@zhou
Copy link

zhou commented May 15, 2019

@cschockaert I ran into below error when installing the chart. Did we miss some settings in values.yaml?

Error: render error in "getmesh/templates/getmesh-master-statefulset.yaml": template: getmesh/templates/getmesh-master-statefulset.yaml:144:20: executing "getmesh/templates/getmesh-master-statefulset.yaml" at <.Values.master.toler...>: can't evaluate field tolerations in type interface {}

@cschockaert
Copy link

Yes add this missing block in the value file.(with empty value)
Ill update it.

@zhou
Copy link

zhou commented May 15, 2019

@cschockaert after expanding the instance group, I'm able to start the cluster. However, it seems that the keystore I created for the old chart doesn't work anymore. I tried to recreate the keystore again using this command:

keytool -genseckey -keystore keystore.jceks -storetype jceks -storepass secret -keyalg HMacSHA256 -keysize 2048 -alias HS256 -keypass <key_pass>

It still gave me the error below:
19:49:05.518 [amx-mesh1-getmesh-master-0] ERROR [vert.x-eventloop-thread-8] - Error:
java.lang.RuntimeException: java.io.EOFException
at io.vertx.ext.auth.jwt.impl.JWTAuthProviderImpl.(JWTAuthProviderImpl.java:115) ~[mesh.jar:na]
at io.vertx.ext.auth.jwt.JWTAuth.create(JWTAuth.java:56) ~[mesh.jar:na]
at io.vertx.ext.auth.jwt.JWTAuth.create(JWTAuth.java:45) ~[mesh.jar:na]
at com.gentics.mesh.auth.provider.MeshJWTAuthProvider.(MeshJWTAuthProvider.java:76) ~[mesh.jar:na]
at com.gentics.mesh.auth.provider.MeshJWTAuthProvider_Factory.get(MeshJWTAuthProvider_Factory.java:45) ~[mesh.jar:na]
at com.gentics.mesh.auth.provider.MeshJWTAuthProvider_Factory.get(MeshJWTAuthProvider_Factory.java:11) ~[mesh.jar:na]
at dagger.internal.DoubleCheck.get(DoubleCheck.java:47) ~[mesh.jar:na]
at com.gentics.mesh.auth.handler.MeshJWTAuthHandler_Factory.get(MeshJWTAuthHandler_Factory.java:38) ~[mesh.jar:na]
at com.gentics.mesh.auth.handler.MeshJWTAuthHandler_Factory.get(MeshJWTAuthHandler_Factory.java:11) ~[mesh.jar:na]
at dagger.internal.DoubleCheck.get(DoubleCheck.java:47) ~[mesh.jar:na]
at com.gentics.mesh.auth.MeshAuthChain_Factory.get(MeshAuthChain_Factory.java:35) ~[mesh.jar:na]
at com.gentics.mesh.auth.MeshAuthChain_Factory.get(MeshAuthChain_Factory.java:9) ~[mesh.jar:na]
at dagger.internal.DoubleCheck.get(DoubleCheck.java:47) ~[mesh.jar:na]
at com.gentics.mesh.router.RouterStorage_Factory.get(RouterStorage_Factory.java:56) ~[mesh.jar:na]
at com.gentics.mesh.router.RouterStorage_Factory.get(RouterStorage_Factory.java:14) ~[mesh.jar:na]
at com.gentics.mesh.rest.RestAPIVerticle.start(RestAPIVerticle.java:180) ~[mesh.jar:na]
at io.vertx.core.impl.DeploymentManager.lambda$doDeploy$8(DeploymentManager.java:552) ~[mesh.jar:na]
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:320) ~[mesh.jar:na]
at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38) ~[mesh.jar:na]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[mesh.jar:na]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) ~[mesh.jar:na]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) ~[mesh.jar:na]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) ~[mesh.jar:na]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[mesh.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111-internal]

@zhou
Copy link

zhou commented May 15, 2019

@cschockaert another things is, can we leverage an outside, shared Elasticsearch environment? And how to set that up?

@cschockaert
Copy link

You can use the keystore / password created on startup by a vanilla getmesh instance.
For es you can customize the URL on values.yml and disable the es installation on the chart by removing elasticsearch in requirements

@zhou
Copy link

zhou commented May 16, 2019

@cschockaert @Jotschi I tried both approaches by using 1) keystore / password from a valina getmesh instance; 2) created a keystore with keytool. It still gave me the error of java.io.EOFException at at io.vertx.ext.auth.jwt.impl.JWTAuthProviderImpl.(JWTAuthProviderImpl.java:115) ~[mesh.jar:na]

Please advice. Thanks.

@Jotschi
Copy link
Contributor Author

Jotschi commented May 16, 2019

@zhou
Are you using OpenJDK 8? So far Gentics Mesh can only run using Java 8. We have however plans to support Java 11/12.

@zhou
Copy link

zhou commented May 16, 2019

@Jotschi the image is using openjdk version "1.8.0_111-internal". My laptop is running JDK 11. Do you suggest me using keytool from Java 8 to create the keystore file?

@Jotschi
Copy link
Contributor Author

Jotschi commented May 16, 2019

@zhou I personally would run:

mkdir keystore
docker run -e MESH_AUTH_KEYSTORE_PASS=mypass -v $PWD/keystore:/keystore gentics/mesh:0.32.0

You can CTRL+C mesh after a few seconds. Finally the keystore should be in the keystore folder.

@zhou
Copy link

zhou commented May 16, 2019

I think I got the cluster running!

It seems that the new chart uses a NEW directory called secrets/. After I created the dir and put the keystore inside it and install the chart again. It works!

@cschockaert can you confirm that this is part of the updates? Thanks!
@Jotschi thanks for the info as well!

@cschockaert
Copy link

cschockaert commented May 16, 2019

you are right, i updated the keystore path in the chart and added the secret folder (containing keystore) to .gitignore since we dont want to share sensitive information

@zhou
Copy link

zhou commented May 16, 2019

that caught me off guard - when I checked out the new branch, there was no "secrets" dir in it. So I still put the keystore in the "config" dir like before. And the cluster wouldn't start.

Only after examining the keystore related information very carefully and thoroughly, I found the "secrets/" in the .gitignore and getmesh-cm.yaml. Then I realized the change of location and finally got the cluster running.

Maybe creating the "secrets" dir and putting a simple README file in it would help others going forward. In the .gitignore, we can put the specific file in it to exclude it from getting into the git repo. For example: secrets/keystore.jceks

Still, thanks a lot for sharing the Chart. @cschockaert

@cschockaert
Copy link

yeah sry forgot to said it to you, it was made some time ago, i just pushed our draft work in the repository

@zhou
Copy link

zhou commented May 16, 2019

@cschockaert No problem.

Did you push some more updates? To the master branch or the feat/ branch?

Edit: just realized you were explaining the reason. No problem at all.

Also. I'm sure I may have some further questions when I tweak the cluster to use a shared Elasticsearch cluster.

Thanks again for sharing the work!

@zhou
Copy link

zhou commented May 17, 2019

@cschockaert could you please also share the Dockerfile to build mesh-ui? Thanks

@cschockaert
Copy link

Hello @zhou , mesh-ui is our custom forked mesh-ui we use with official mesh docker server image, you can just dont use it and fallback to offical ui provided by mesh

@cschockaert
Copy link

cschockaert commented Apr 19, 2021

Basically, it's just a forked version of mesh-ui v1 with some bug corrected, since we rely on the old mesh-ui we did not make any MR to the official mesh repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants