How do you move configuration from cluster A to cluster B ? #10714

fabltd · 2023-05-16T07:40:59Z

Hi

I have EMQX V5 installed in our dev cluster and would like to migrate the config to the prod cluster.

Whilst I can find the configuration in the pod. I cannot find out where this is stored and shared between the replicas. Its also not documented?

How do you move configuration from cluster A to cluster B ?

Thank you

HJianBo · 2023-05-16T09:42:01Z

Hi, @fabltd Do you know which version of EMQX you are currently using?

The current version (5.0.x) does not have a configuration migration feature. If you need to migrate the configuration in this version, you will need to do it manually.

For example, manually merge /opt/emqx/etc/emqx.conf and /opt/emqx/data/configs/cluster.hocon (or /opt/emqx/data/configs/cluster-override.conf) inside the emqx container, and copy them to the new node's /opt/emqx/etc/emqx.conf file

Updates: Added a Feature Request label. We will try delivering this kind of functionality in v5.1.0

fabltd · 2023-05-17T00:30:06Z

Hi

Yes using V5.0.25. I can see that the data is not persistant its using an empty dir.

I have implmented the following config below to implment persistance. The PVC is created and I see 3 x disks.

However EMQX won't start its in a continued CrashLoopBackoff due to the following error:

mkdir: cannot create directory ‘/opt/emqx/data/configs’: Permission denied

Any idea why it cannot write?

apiVersion: apps.emqx.io/v2alpha1
kind: EMQX
metadata:
    name: emqx
# Core
spec:
    image: emqx/emqx:5.0.25
    coreTemplate:
      spec:
        volumeClaimTemplates:
          storageClassName: standard
          resources:
            requests:
              storage: 20Mi
          accessModes:
            - ReadWriteOnce
        replicas: 3
# BootStrap Config
    bootstrapConfig: |
        dashboard {
          default_username: "admin"
          default_password: "public"
        }
# Dashboard
    dashboardServiceTemplate:
      metadata:
        name: emqx-dashboard
      spec:
        type: NodePort
        selector:
          apps.emqx.io/db-role: core
        ports:
          - name: "dashboard-listeners-http-bind"
            protocol: TCP
            port: 18083
            targetPort: 18083
            nodePort: 30008
# Listeners
    listenersServiceTemplate:
      metadata:
        name: emqx-listeners
      spec:
        type: LoadBalancer
        ports:
          - name: "tcp-default"
            protocol: TCP
            port: 1883
            targetPort: 1883

Rory-Z · 2023-05-17T02:58:14Z

Hi @fabltd Please check this: emqx/emqx-operator#716

fabltd · 2023-05-17T03:12:44Z

@Rory-Z - Thanks that fixed - why is it not in the docs?

Rory-Z · 2023-05-17T03:24:28Z

Hi @fabltd This is in document: https://docs.emqx.com/en/emqx-operator/latest/deployment/on-aws-eks.html#quickly-deploy-an-emqx-cluster

Or please let me know where is document for you read, maybe we missed

fabltd · 2023-05-17T03:31:47Z

@Rory-Z

Yes its not mentioned in the link above or here:

https://github.com/emqx/emqx-operator/blob/main/docs/en_US/tasks/configure-emqx-persistence.md

It should be added to this doc?

Rory-Z · 2023-05-17T03:39:16Z

You can create a new PR for emqx/emqx-operator.git main-2.1 branch.
In emqx/emqx-operator.git main branch, the EMQX Operator already add default value for podSecurityContext, but it's not release

fabltd · 2023-05-17T03:47:19Z

@Rory-Z not sure if you can help with the orginal ask:

In my dev cluster the config appears to all be in a file called cluster-override.conf.

I have copied this to the prod cluster but and restarted the pods but none of my dev rules are showing?

Any idea.

Rory-Z · 2023-05-17T05:54:43Z

@Rory-Z not sure if you can help with the orginal ask:

In my dev cluster the config appears to all be in a file called cluster-override.conf.

I have copied this to the prod cluster but and restarted the pods but none of my dev rules are showing?

Any idea.

Copy cluster-override.conf is right way, but I'm also don't know why the rule is miss.
@zhongwencool Any ideas ?

zhongwencool · 2023-05-17T06:26:31Z

you should stop all nodes, then copy cluster-override.conf,
otherwise the restart node will copy the old running node's cluster-override.conf

Rory-Z · 2023-05-17T06:31:48Z

you should stop all nodes, then copy cluster-override.conf, otherwise the restart node will copy the old running node's cluster-override.conf

Maybe can copy cluster-overwrite.conf content to .spec.bootstrapConfig in apps.emqx.io/v2alpha1 EMQX ( for EMQX bare node, it's etc/emqx.conf ), and create a new cluster ?

fabltd · 2023-05-18T03:16:50Z

I was unable to get the cluster-overwrite.conf to work. I understand later releases of V5 have moved to the file

cluster.hocon

As a test I built a test cluster and configured some options. Following this I built a new cluster and migragted the .hocon file.

However this did not work as expected the new cluster gives the following error:

500 INTERNAL_ERROR:error, function_clause, [{emqx_rule_engine_api,'-get_rule_metrics/1-fun-0-',['emqx@10.20.0.7',#{counters => #{},gauges => #{},rate => #{current => 0.0,last5m => 0.0,max => 0.0},slides => #{}}],[{file,"emqx_rule_engine_api.erl"},{line,524}]},{emqx_rule_engine_api,'-get_rule_metrics/1-lc$^1/1-0-',3,[{file,"emqx_rule_engine_api.erl"},{line,567}]},{emqx_rule_engine_api,'/rules/:id/metrics',2,[{file,"emqx_rule_engine_api.erl"},{line,426}]},{minirest_handler,apply_callback,3,[{file,"minirest_handler.erl"},{line,111}]},{minirest_handler,handle,2,[{file,"minirest_handler.erl"},{line,44}]},{minirest_handler,init,2,[{file,"minirest_handler.erl"},{line,27}]},{cowboy_handler,execute,2,[{file,"cowboy_handler.erl"},{line,41}]},{cowboy_stream_h,execute,3,[{file,"cowboy_stream_h.erl"},{line,318}]},{cowboy_stream_h,request_process,3,[{file,"cowboy_stream_h.erl"},{line,302}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]

fabltd · 2023-05-18T03:37:59Z

The above error results in the dashboard being unresponsive and I am unable to make changes to the rules engine.

@HJianBo HJianBo

Hi, @fabltd Do you know which version of EMQX you are currently using?

The current version (5.0.x) does not have a configuration migration feature. If you need to migrate the configuration in this version, you will need to do it manually.

For example, manually merge /opt/emqx/etc/emqx.conf and /opt/emqx/data/configs/cluster.hocon (or /opt/emqx/data/configs/cluster-override.conf) inside the emqx container, and copy them to the new node's /opt/emqx/etc/emqx.conf file

Updates: Added a Feature Request label. We will try delivering this kind of functionality in v5.1.0

This does not work.

Each time the crash occours.

Steps to reproduce

Copy config emqx-core-0:data/configs
Copy certs to emqx-core-0:data/certs

No other files are copied.

All pods restarted:

kubectl -n mqtt rollout restart statefulset emqx-core

Crash seen in dashboard when going to Flows view.

fabltd · 2023-05-18T06:57:20Z

@Rory-Z Any idea?

Rory-Z · 2023-05-18T07:08:57Z

@Rory-Z Any idea?

I have no idea, I think the function_clause error is the EMQX bug

JimMoen · 2023-05-18T07:33:19Z

See stack trace

['emqx@10.20.0.7',#{counters => #{},gauges => #{},rate => #{current => 0.0,last5m => 0.0,max => 0.0},slides => #{}}]

It seems get metrics failed on the node emqx@10.20.0.7. Are you sure the Rule has created on all nodes?

HJianBo · 2023-05-18T07:49:47Z

Can you check if there are any error logs when each EMQX node starts up?

And query through this interface List All Rules on each node to see if the rules you specified have been correctly created?

fabltd · 2023-05-18T07:51:56Z

I just add the config as was suggested and this happens.

It crashes the rules engine.

HJianBo · 2023-05-18T07:54:44Z

Could you please share the .hocon configuration if it's possible

fabltd · 2023-05-18T07:55:36Z

Hi Can I email it to you?

On Thu, 18 May 2023 at 08:54, JianBo He ***@***.***> wrote: Could you please share the .hocon configuration if it's possible — Reply to this email directly, view it on GitHub <#10714 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGCGRRS5DCJIXH3O3OYRN3XGXIU7ANCNFSM6AAAAAAYDL2HZE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

--

HJianBo · 2023-05-18T08:01:31Z

Yes, of course, heeejianbo@gmail.com

fabltd · 2023-05-18T08:58:04Z

Emailed. - Let me know if you would like access to the cluster is running in Google Cloud.

fabltd · 2023-05-22T08:16:59Z

Thanks for fixing - How do I update to the fixed version?

fabltd · 2023-05-24T13:25:29Z

Looks like after setting up from scratch there is still an issue with metrics. Both my - replicant nodes have crashed.

initial call: mria_rlog_replica:init/1, pid: <0.2102.0>, registered_name: '$mria_meta_shard', exit: {{timeout,{gen_server,call,[mria_lb,{probe,'emqx@emqx-core-2.emqx-headless.mqtt.svc.cluster.local','$mria_meta_shard'}]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},{mria_rlog,subscribe,4,[{file,"mria_rlog.erl"},{line,167}]},{mria_rlog_replica,try_connect,3,[{file,"mria_rlog_replica.erl"},{line,395}]},{mria_rlog_replica,handle_reconnect,1,[{file,"mria_rlog_replica.erl"},{line,341}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1205}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}, ancestors: [<0.2101.0>,mria_shards_sup,mria_rlog_sup,mria_sup,<0.1902.0>], message_queue_len: 0, messages: [], links: [<0.2101.0>], dictionary: [{rand_seed,{#{bits => 58,jump => #Fun<rand.3.92093067>,next => #Fun<rand.0.92093067>,type => exsss,uniform => #Fun<rand.1.92093067>,uniform_n => #Fun<rand.2.92093067>},[244355015406896546|90618611208143776]}},{'$logger_metadata$',#{domain => [mria,rlog,replica],shard => '$mria_meta_shard'}}], trap_exit: true, status: running, heap_size: 6772, stack_size: 29, reductions: 12103; neighbours:

Any ideas. This all worked in Dev but crashes in prod.

fabltd · 2023-05-24T13:39:42Z

This error is shown in the dashboard

500 NODE_DOWN:bad rpc call 'emqx@10.140.1.3', Reason {'EXIT', {badarg, [{ets,select_count, [emqx_activated_alarm, [{'$1',[],[true]}]], [{error_info, #{cause => id, module => erl_stdlib_errors}}]}, {emqx_mgmt_api, '-counting_total_fun/1-fun-0-',2, [{file,"emqx_mgmt_api.erl"}, {line,357}]}, {emqx_mgmt_api, maybe_apply_total_query,2, [{file,"emqx_mgmt_api.erl"}, {line,333}]}, {emqx_mgmt_api,do_select,2, [{file,"emqx_mgmt_api.erl"}, {line,299}]}, {emqx_mgmt_api,do_query,2,[]}]}}

This is the IP if the failed replicant pod.

Restarting all core pods and then replicate seems to have the replicate running again.

fabltd · 2023-05-30T08:23:24Z

@HJianBo

I am still having issues. I built a new instlall from scratch. It worked for a few days now its showing the following error again:

500 INTERNAL_ERROR:error, function_clause, [{emqx_rule_engine_api,'-get_rule_metrics/1-fun-0-',['emqx@10.140.5.6',#{counters => #{},gauges => #{},rate => #{current => 0.0,last5m => 0.0,max => 0.0},slides => #{}}],[{file,"emqx_rule_engine_api.erl"},{line,524}]},{emqx_rule_engine_api,'-get_rule_metrics/1-lc$^1/1-0-',3,[{file,"emqx_rule_engine_api.erl"},{line,567}]},{emqx_rule_engine_api,'-get_rule_metrics/1-lc$^1/1-0-',3,[{file,"emqx_rule_engine_api.erl"},{line,568}]},{emqx_rule_engine_api,'/rules/:id/metrics',2,[{file,"emqx_rule_engine_api.erl"},{line,426}]},{minirest_handler,apply_callback,3,[{file,"minirest_handler.erl"},{line,111}]},{minirest_handler,handle,2,[{file,"minirest_handler.erl"},{line,44}]},{minirest_handler,init,2,[{file,"minirest_handler.erl"},{line,27}]},{cowboy_handler,execute,2,[{file,"cowboy_handler.erl"},{line,41}]},{cowboy_stream_h,execute,3,[{file,"cowboy_stream_h.erl"},{line,318}]},{cowboy_stream_h,request_process,3,[{file,"cowboy_stream_h.erl"},{line,302}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]

fabltd · 2023-05-30T09:25:33Z

@HJianBo - I have updated to 5.0.26 I note the releases say the metrics issue should be fixed but its still occouring

fixes: emqx#10714 (comment)

Fixes https://emqx.atlassian.net/browse/EMQX-10073 Fixes emqx#10714 (comment) Similar issue to emqx#10743, but on the rule engine API.

thalesmg · 2023-05-30T14:55:55Z

Hi @fabltd , thanks for the logs.

The fix mentioned in the changelog you saw was for the bridges API, but the crash you encountered was in the rule engine API. We'll fix it here.

fabltd added the BUG label May 16, 2023

Rory-Z added help wanted and removed BUG labels May 16, 2023

HJianBo added the Enhancement label May 16, 2023

Rory-Z transferred this issue from emqx/emqx-operator May 16, 2023

Rory-Z assigned zhongwencool and HJianBo May 16, 2023

Rory-Z changed the title ~~Documentation for K8s Operator~~ How do you move configuration from cluster A to cluster B ? May 16, 2023

HJianBo added Feature and removed Enhancement help wanted labels May 16, 2023

HJianBo mentioned this issue May 19, 2023

fix(bridge_api): don't crash when formatting empty/unknown bridge metrics #10743

Merged

8 tasks

HJianBo added a commit to HJianBo/emqx that referenced this issue May 30, 2023

fix(rule): don't crash when formatting empty/unknown rule metrics

c7ffc80

fixes: emqx#10714 (comment)

thalesmg mentioned this issue May 30, 2023

fix(rule_engine_api): don't crash when formatting empty metrics #10884

Merged

8 tasks

thalesmg added a commit to thalesmg/emqx that referenced this issue May 30, 2023

fix(rule_engine_api): don't crash when formatting empty metrics

57aacb4

Fixes https://emqx.atlassian.net/browse/EMQX-10073 Fixes emqx#10714 (comment) Similar issue to emqx#10743, but on the rule engine API.

thalesmg closed this as completed in #10884 May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you move configuration from cluster A to cluster B ? #10714

How do you move configuration from cluster A to cluster B ? #10714

fabltd commented May 16, 2023

HJianBo commented May 16, 2023 •

edited

fabltd commented May 17, 2023

Rory-Z commented May 17, 2023

fabltd commented May 17, 2023

Rory-Z commented May 17, 2023

fabltd commented May 17, 2023

Rory-Z commented May 17, 2023

fabltd commented May 17, 2023

Rory-Z commented May 17, 2023

zhongwencool commented May 17, 2023

Rory-Z commented May 17, 2023

fabltd commented May 18, 2023

fabltd commented May 18, 2023 •

edited

fabltd commented May 18, 2023

Rory-Z commented May 18, 2023

JimMoen commented May 18, 2023

HJianBo commented May 18, 2023

fabltd commented May 18, 2023

HJianBo commented May 18, 2023

fabltd commented May 18, 2023 via email •

edited

HJianBo commented May 18, 2023

fabltd commented May 18, 2023

fabltd commented May 22, 2023

fabltd commented May 24, 2023

fabltd commented May 24, 2023 •

edited

fabltd commented May 30, 2023

fabltd commented May 30, 2023

thalesmg commented May 30, 2023

How do you move configuration from cluster A to cluster B ? #10714

How do you move configuration from cluster A to cluster B ? #10714

Comments

fabltd commented May 16, 2023

HJianBo commented May 16, 2023 • edited

fabltd commented May 17, 2023

Rory-Z commented May 17, 2023

fabltd commented May 17, 2023

Rory-Z commented May 17, 2023

fabltd commented May 17, 2023

Rory-Z commented May 17, 2023

fabltd commented May 17, 2023

Rory-Z commented May 17, 2023

zhongwencool commented May 17, 2023

Rory-Z commented May 17, 2023

fabltd commented May 18, 2023

fabltd commented May 18, 2023 • edited

fabltd commented May 18, 2023

Rory-Z commented May 18, 2023

JimMoen commented May 18, 2023

HJianBo commented May 18, 2023

fabltd commented May 18, 2023

HJianBo commented May 18, 2023

fabltd commented May 18, 2023 via email • edited

HJianBo commented May 18, 2023

fabltd commented May 18, 2023

fabltd commented May 22, 2023

fabltd commented May 24, 2023

fabltd commented May 24, 2023 • edited

fabltd commented May 30, 2023

fabltd commented May 30, 2023

thalesmg commented May 30, 2023

HJianBo commented May 16, 2023 •

edited

fabltd commented May 18, 2023 •

edited

fabltd commented May 18, 2023 via email •

edited

fabltd commented May 24, 2023 •

edited