Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XSite replication not working #106

Open
shivan opened this issue Nov 16, 2020 · 7 comments · May be fixed by #107
Open

XSite replication not working #106

shivan opened this issue Nov 16, 2020 · 7 comments · May be fixed by #107

Comments

@shivan
Copy link

shivan commented Nov 16, 2020

Hi,

I tried XSite replication on a local docker.

Config1 (using port 7300):

jgroups:
  diagnostics: true
  encrypt: false
xsite:
  transport: tunnel
  address: 10.35.50.133
  name: NYC
  port: 7300
  backups:
    - address: 10.35.50.133
      name: LON
      port: 7200
logging:
  console:
    level: debug

Config 2 (using port 7200):

jgroups:
  diagnostics: true
  encrypt: false
xsite:
  transport: tunnel
  address: 10.35.50.133
  name: LON
  port: 7200
  backups:
    - address: 10.35.50.133
      name: NYC
      port: 7300
logging:
  console:
    level: debug

My Docker-Commands:

docker run --rm -p 11222:11222 -p 7300:7300 -v c:/work/InfiniSpan:/user-config --name infinispan1 -e IDENTITIES_PATH="/user-config/identities.yaml" -e CONFIG_PATH="/user-config/config1.yaml" --network bridge infinispan/server:11.0.4.Final-2

docker run --rm -p 12222:11222 -p 7200:7300 -v c:/work/InfiniSpan:/user-config --name infinispan2 -e IDENTITIES_PATH="/user-config/identities.yaml" -e CONFIG_PATH="/user-config/config2.yaml" --network Bridge2 infinispan/server:11.0.4.Final-2

But when I create a new REPL_SYNC Cache on one of them, it won't be created on the other one automatically.
It only works when both are on the same network, but then they communicate through Multicast instead of XSite.

Is this docker container really working with XSite?

I'm using 11.0.4.Final-2

@pruivo
Copy link
Member

pruivo commented Nov 16, 2020

transport: tunnel the tunnel transport requires GossipRouter containers running to form the tunnel.
If you want to connect the containers directly, you have to remove the transport.

@pruivo
Copy link
Member

pruivo commented Nov 16, 2020

The cache creating is never sent though the cross-site channel. it must be created manually in both sites.

@shivan
Copy link
Author

shivan commented Nov 17, 2020

Hi @pruivo,
thanks for clarification. I now removed "transport". Then I created a Cache "MyCache" with

  • org.infinispan.DIST_SYNC

or even with

  • org.infinispan.REPL_SYNC

on both sides. Then created a key K1 on one side, but it won't appear on the other.

The logs tell:

Server 1:

-------------------------------------------------------------------
GMS: address=_3ba606cfc4dd-53898:NYC, cluster=relay-global, physical address=10.35.50.130:7300
-------------------------------------------------------------------
...
07:36:53,451 INFO  (jgroups-7,3ba606cfc4dd-53898) [org.jgroups.protocols.pbcast.GMS] _3ba606cfc4dd-53898:NYC: no members discovered after 3008 ms: creating cluster as coordinator
07:36:53,452 DEBUG (jgroups-7,3ba606cfc4dd-53898) [org.jgroups.protocols.pbcast.NAKACK2]
[_3ba606cfc4dd-53898:NYC setDigest()]
existing digest:  []
new digest:       _3ba606cfc4dd-53898:NYC: [0 (0)]
resulting digest: _3ba606cfc4dd-53898:NYC: [0 (0)]
07:36:53,453 DEBUG (jgroups-7,3ba606cfc4dd-53898) [org.jgroups.protocols.pbcast.GMS] _3ba606cfc4dd-53898:NYC: installing view [_3ba606cfc4dd-53898:NYC|0] (1) [_3ba606cfc4dd-53898:NYC]
07:36:53,453 DEBUG (jgroups-7,3ba606cfc4dd-53898) [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
07:36:53,487 DEBUG (jgroups-7,3ba606cfc4dd-53898) [org.jgroups.protocols.pbcast.GMS] _3ba606cfc4dd-53898:NYC: created cluster (first member). My view is [_3ba606cfc4dd-53898:NYC|0], impl is CoordGmsImpl
07:36:53,488 INFO  (jgroups-7,3ba606cfc4dd-53898) [org.jgroups.protocols.relay.RELAY2] _3ba606cfc4dd-53898:NYC: joined bridge cluster 'relay-global'
07:36:53,492 INFO  (jgroups-6,3ba606cfc4dd-53898) [org.infinispan.XSITE] ISPN000439: Received new x-site view: [NYC]

Server 2:

-------------------------------------------------------------------
GMS: address=_c8b821a383cb-41966:LON, cluster=relay-global, physical address=10.35.50.130:7200
-------------------------------------------------------------------
...
07:37:27,701 INFO  (jgroups-6,c8b821a383cb-41966) [org.jgroups.protocols.pbcast.GMS] _c8b821a383cb-41966:LON: no members discovered after 3006 ms: creating cluster as coordinator
07:37:27,702 DEBUG (jgroups-6,c8b821a383cb-41966) [org.jgroups.protocols.pbcast.NAKACK2]
[_c8b821a383cb-41966:LON setDigest()]
existing digest:  []
new digest:       _c8b821a383cb-41966:LON: [0 (0)]
resulting digest: _c8b821a383cb-41966:LON: [0 (0)]
07:37:27,703 DEBUG (jgroups-6,c8b821a383cb-41966) [org.jgroups.protocols.pbcast.GMS] _c8b821a383cb-41966:LON: installing view [_c8b821a383cb-41966:LON|0] (1) [_c8b821a383cb-41966:LON]
07:37:27,704 DEBUG (jgroups-6,c8b821a383cb-41966) [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
07:37:27,732 DEBUG (jgroups-6,c8b821a383cb-41966) [org.jgroups.protocols.pbcast.GMS] _c8b821a383cb-41966:LON: created cluster (first member). My view is [_c8b821a383cb-41966:LON|0], impl is CoordGmsImpl
07:37:27,732 INFO  (jgroups-6,c8b821a383cb-41966) [org.jgroups.protocols.relay.RELAY2] _c8b821a383cb-41966:LON: joined bridge cluster 'relay-global'
07:37:27,736 INFO  (jgroups-7,c8b821a383cb-41966) [org.infinispan.XSITE] ISPN000439: Received new x-site view: [LON]
08:09:22,618 DEBUG (SINGLE_PORT-ServerIO-4-2) [org.infinispan.commons.dataconversion.MediaTypeResolver] Loaded mime.types with 983 file types

So it tells, no members discovered on both sides... they still cannot find eachother.

@pruivo
Copy link
Member

pruivo commented Nov 17, 2020

The port mapping is incorrect. The relay cluster always binds to port 7900 in the container.
Try to use -p 7300:7900 for the first command and -p 7200:7900 for the second.

The templates org.infinispan.DIST/REPL_SYNC don't have cross-site enabled. I'll give you a proper config to test after we got a proper x-site view :)

@shivan
Copy link
Author

shivan commented Nov 17, 2020

Looks much better now...

But why do I get a read timeout now?

09:31:17,420 DEBUG (jgroups-6,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.pbcast.STABLE] _e7257e072ef3-53682:NYC: resume task started, max_suspend_time=33000
09:31:17,428 DEBUG (jgroups-6,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.pbcast.GMS] _e7257e072ef3-53682:NYC: installing view [_e7257e072ef3-53682:NYC|1] (2) [_e7257e072ef3-53682:NYC, _fe644dc03cb6-33733:LON]
09:31:17,444 DEBUG (FD_SOCK pinger-13,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.FD_SOCK] _e7257e072ef3-53682:NYC: pingable_mbrs=[_e7257e072ef3-53682:NYC, _fe644dc03cb6-33733:LON], ping_dest=_fe644dc03cb6-33733:LON
09:31:17,454 INFO  (jgroups-10,e7257e072ef3-53682) [org.infinispan.XSITE] ISPN000439: Received new x-site view: [LON, NYC]
09:31:17,524 DEBUG (jgroups-6,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
09:31:18,485 DEBUG (FD_SOCK pinger-13,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.FD_SOCK] _e7257e072ef3-53682:NYC: _fe644dc03cb6-33733:LON closed socket (eof)
09:31:18,486 DEBUG (FD_SOCK pinger-13,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.FD_SOCK] _e7257e072ef3-53682:NYC: broadcasting suspect(_fe644dc03cb6-33733:LON)
09:31:18,492 DEBUG (jgroups-6,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.FD_SOCK] _e7257e072ef3-53682:NYC: suspecting [_fe644dc03cb6-33733:LON]
09:31:18,507 DEBUG (jgroups-6,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.FD_SOCK] _e7257e072ef3-53682:NYC: broadcasting unsuspect(_fe644dc03cb6-33733:LON)
09:31:18,520 WARN  (TcpServer.Acceptor[7900]-2,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.TCP] JGRP000006: failed accepting connection from peer Socket[addr=/172.17.0.1,port=59978,localport=7900]: java.net.SocketTimeoutException: Read timed out java.net.SocketTimeoutException: Read timed out
        at java.base/java.net.SocketInputStream.socketRead0(Native Method)
        at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
        at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/java.io.DataInputStream.readFully(DataInputStream.java:200)
        at org.jgroups.blocks.cs.TcpConnection.readPeerAddress(TcpConnection.java:247)
        at org.jgroups.blocks.cs.TcpConnection.<init>(TcpConnection.java:53)
        at org.jgroups.blocks.cs.TcpServer$Acceptor.handleAccept(TcpServer.java:126)
        at org.jgroups.blocks.cs.TcpServer$Acceptor.run(TcpServer.java:111)
        at java.base/java.lang.Thread.run(Thread.java:834)

09:31:18,538 DEBUG (jgroups-15,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.FD_SOCK] _e7257e072ef3-53682:NYC: suspecting []
09:33:17,574 DEBUG (jgroups-15,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.UNICAST3] _e7257e072ef3-53682:NYC: closing expired connection for _fe644dc03cb6-33733:LON (120079 ms old) in send_table
09:33:17,575 DEBUG (jgroups-15,relay-global,_e7257e072ef3-53682:NYC) [org.jgroups.protocols.UNICAST3] _e7257e072ef3-53682:NYC: closing expired connection for _fe644dc03cb6-33733:LON (120079 ms old) in recv_table

Even if I try from my system commandline

telnet localhost 7200 or telnet localhost 7300 I get this error:

[org.jgroups.protocols.TCP] JGRP000006: failed accepting connection from peer Socket[addr=/172.17.0.1,port=60032,localport=7900]: java.net.SocketTimeoutException: Read timed out java.net.SocketTimeoutException: Read timed out

[org.jgroups.protocols.TCP] JGRP000006: failed accepting connection from peer Socket[addr=/172.19.0.1,port=53402,localport=7900]: java.net.SocketTimeoutException: Read timed out java.net.SocketTimeoutException: Read timed out

@pruivo
Copy link
Member

pruivo commented Nov 17, 2020

No idea. It pops up at the beginning for some reason but it doesn't affect the cluster.
The telnet commands are expected to create the exception. JGroups expects some handshake data when accepting the connection and telnet doesn't send that.

To enable xsite in a cache, just create a cache with the following configuration

On LON site:

<infinispan>
  <cache-container>
    <replicated-cache name="xsite-cache" statistics="true">
      <backups>
        <backup site="NYC" strategy="SYNC"/>
      </backups>
    </replicated-cache>
  </cache-container>
</infinispan>

On NYC site:

<infinispan>
  <cache-container>
    <replicated-cache name="xsite-cache" statistics="true">
      <backups>
        <backup site="LON" strategy="SYNC"/>
      </backups>
    </replicated-cache>
  </cache-container>
</infinispan>

You can use the console (http://localhost:11222) or use curl (or similar tool)
curl -XPOST -u <user>:<password> -H "Content-Type: application/xml" -d "@<path/to/cache.xml>" http://localhost:11222/rest/v2/caches/xsite-cache

@shivan
Copy link
Author

shivan commented Nov 17, 2020

Thanks.

Looks good now. Despite I can not see the entries on "Activity" but I see "Entries (4)" and when searching for that key, it appears!

Entered one item at NYC and searched for it on LON. WORKED!

Many thanks!

I think that example should be added to the readme at the section XSite Replication. I think I'll create a PR for that.

@shivan shivan linked a pull request Nov 17, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants