Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autosharding tests: inputs & syntax validation #158

Open
6 tasks done
donhardman opened this issue Sep 26, 2023 · 21 comments
Open
6 tasks done

Autosharding tests: inputs & syntax validation #158

donhardman opened this issue Sep 26, 2023 · 21 comments
Assignees
Labels
bug Something isn't working

Comments

@donhardman
Copy link
Contributor

donhardman commented Sep 26, 2023

  • Validate the execution of the "create table" command.
  • Ensure the minimum/maximum values for the replication factor and shards are set.
  • Verify the error handling for incorrect syntax.
  • Differentiate between local sharded and distributed tables based on syntax.
  • Check the error prompt when the replication factor exceeds the number of available nodes.
  • Handle errors for incorrect or unsupported values.
@donhardman
Copy link
Contributor Author

donhardman commented Sep 26, 2023

The syntax for sharding is CREATE TABLE A (...) shards=N rf=N

  • where N is a positive number
  • (...) may be omitted

The tests should cover the following cases and ensure that we have implemented the syntax correctly. They should return an error when required, and also perform the logic correctly when there are no errors:

  • create table a or default syntax works as expected
  • create table a random=X does not work
  • create table a shards=N rf=N works fine with different N as positive numbers
  • create table a SHARDS=x RF=x works fine also
  • create table a (...) shards=N rf=N works fine
  • create table a (...) shards=T rf=T does not work when we specify non-numbers value
  • create table a (...) shards=N rf=N works very high values should not work, which is the max allowed?
  • create table a shards=N rf=2 on a single node cannot be executed (rf=2 requires two nodes)
  • create table c:a shards=N rf=3 on a double node should show error
  • create table c:a SHARDS=x RF=x does not work when there is no created cluster with the name c

Should there be additional cases to examine within the syntax/validation group, we ought to include those as well.

@PavelShilin89
Copy link

Colleagues, spent a lot of time, tried different ways. However, tests with create table a shards=N rf=N and create table a SHARDS=x RF=x fail. The following tests are based on the above, so I can't run them yet.

Logs:

/Users/pavelshilin/Desktop/WORK/clt/clt test -d -t ./test/clt-tests/sharding/syntax/queries.rec ghcr.io/manticoresoftware/manticoresearch:test-kit-latest
Replaying data from the file: ./test/clt-tests/sharding/syntax/queries.rec
The replay result will be stored to the file: ./test/clt-tests/sharding/syntax/queries.rep
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
––– input –––
set -b
––– output –––
––– input –––
apt-get update -y > /dev/null; echo $?
––– output –––
0
––– input –––
apt-get install -y git > /dev/null; echo $?
––– output –––
0
––– input –––
git clone -b fix/sharding -q https://github.com/manticoresoftware/manticoresearch-buddy.git /workdir; echo $?
––– output –––
0
––– input –––
cd /workdir; echo $?
––– output –––
0
––– input –––
COMPOSER_HOME="$HOME/.config/composer" composer install -q; echo $?
––– output –––
0
––– input –––
cd /.clt; echo $?
––– output –––
0
––– input –––
mkdir -p /var/{run,lib,log}/manticore-1/{a,b,c,d,e}
––– output –––
––– input –––
searchd -c test/clt-tests/sharding/base/config/searchd-1.conf
––– output –––
Manticore %{SEMVER} #!/[a-z0-9]{7,9}@[0-9]{6}/!# dev (columnar %{SEMVER} %{COMMITDATE}) (secondary %{SEMVER} %{COMMITDATE})
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-%{YEAR}, Manticore Software LTD (https://manticoresearch.com)
[#!/[0-9]{2}:[0-9]{2}.[0-9]{3}/!#] [%{NUMBER}] using config file '%{PATH}' (%{NUMBER} chars)...
starting daemon version '%{SEMVER} #!/[a-z0-9]{7,9}@[0-9]{6}/!# dev (columnar %{SEMVER} %{COMMITDATE}) (secondary %{SEMVER} %{COMMITDATE})' ...
listening on all interfaces for mysql, port=%{NUMBER}
listening on all interfaces for sphinx and http(s), port=%{NUMBER}
listening on all interfaces for sphinx and http(s), port=%{NUMBER}
––– input –––
tail -n 100 -f /var/log/manticore-1/searchd.log | grep -qm1 'started v' && echo "Buddy started!"
––– output –––
Buddy started!
––– input –––
mysql -h0 -P19306 -e "CREATE TABLE a"
––– output –––
––– input –––
mysql -h0 -P19306 -e "SHOW TABLES\G"
––– output –––
*************************** 1. row ***************************
Index: a
Type: rt
––– input –––
mysql -h0 -P19306 -e "сreate table b shards=3 rf=2"
––– output –––
- dsadfds
+ ERROR 1064 (42000) at line 1: P02: syntax error, unexpected $end near '(null)'
––– input –––
mysql -h0 -P19306 -e "create table c SHARDS=3 RF=2"
––– output –––
- dsadfds
+ ERROR 1064 (42000) at line 1: Waiting timeout exceeded.

@donhardman
Copy link
Contributor Author

Can I have original queries.rec file to reproduce on my side ?

@PavelShilin89
Copy link

Here is my file

queries.rec.zip

@sanikolaev
Copy link
Collaborator

The solution was discussed in Slack - https://manticoresearch.slack.com/archives/C06270TRJAD/p1697711746155199 :

Here's how you can start Manticore to prepare autosharding tests locally:

snikolaev@dev2:/tmp$ git clone https://github.com/manticoresoftware/manticoresearch

snikolaev@dev2:/tmp$ cd manticoresearch/

snikolaev@dev2:/tmp/manticoresearch$ git checkout 181365f0b5b54f787f53b39c038f12c0fabbe206

snikolaev@dev2:/tmp/manticoresearch$ docker pull ghcr.io/manticoresoftware/manticoresearch:test-kit-8953ac8

snikolaev@dev2:/tmp/manticoresearch$ docker run -it -v $(pwd):/.clt/ ghcr.io/manticoresoftware/manticoresearch:test-kit-8953ac8 bash

root@444df2769cc2:/# apt-get update -y && apt-get install -y git
Get:1 http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev jammy InRelease [1838 kB]
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:5 http://repo.manticoresearch.com/repository/manticoresearch_jammy_dev jammy/main amd64 Packages [3082 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1271 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1351 kB]
Fetched 7880 kB in 1s (9425 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git is already the newest version (1:2.34.1-1ubuntu1.10).
The following package was automatically installed and is no longer required:
  manticore-executor
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.

root@444df2769cc2:/# git clone -b fix/sharding -q https://github.com/manticoresoftware/manticoresearch-buddy.git /workdir
root@444df2769cc2:/# cd /workdir/
root@444df2769cc2:/workdir# git checkout 824731fd30d321400ba62272cc353fdc6493e3c7

root@444df2769cc2:/workdir# COMPOSER_HOME="$HOME/.config/composer" composer install -q

root@444df2769cc2:/workdir# mkdir -p /var/{run,lib,log}/manticore-1/{a,b,c,d,e}

root@444df2769cc2:/workdir# cd /.clt/

root@444df2769cc2:/.clt# searchd -c test/clt-tests/sharding/base/config/searchd-1.conf
Manticore 6.2.13 8953ac8e6@231019 dev (columnar 2.2.5 b8be4eb@230928) (secondary 2.2.5 b8be4eb@230928)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com/)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com/)

[31:57.500] [579] using config file '/.clt/test/clt-tests/sharding/base/config/searchd-1.conf' (440 chars)...
starting daemon version '6.2.13 8953ac8e6@231019 dev (columnar 2.2.5 b8be4eb@230928) (secondary 2.2.5 b8be4eb@230928)' ...
listening on all interfaces for mysql, port=19306
listening on all interfaces for sphinx and http(s), port=19312
listening on all interfaces for sphinx and http(s), port=19308

root@444df2769cc2:/.clt# mysql -h0 -P19306 -e "create table t(id bigint) shards=10 rf=1"
root@444df2769cc2:/.clt# mysql -h0 -P19306 -e "show tables\G" | grep 'Index: t' | sort
Index: t
Index: t_s0
Index: t_s1
Index: t_s2
Index: t_s3
Index: t_s4
Index: t_s5
Index: t_s6
Index: t_s7
Index: t_s8
Index: t_s9

@sanikolaev
Copy link
Collaborator

The purpose of this task is to test the syntax. Whether a table is create or not is covered in another test - #159

@PavelShilin89 pls ask @donhardman to review your PR once the tests are done.

@sanikolaev
Copy link
Collaborator

@PavelShilin89 What are waiting for from @donhardman ? I see you've assigned this issue to him, but didn't provide any comment.

@donhardman
Copy link
Contributor Author

Which pull request is related to this task?

@PavelShilin89
Copy link

@donhardman all the changes were in https://github.com/manticoresoftware/manticoresearch/pull/1595/files. I'm already looking at your comments on the pull request.

@PavelShilin89
Copy link

Next, I made a new branch and a new pull request (https://github.com/manticoresoftware/manticoresearch/pull/1642/files) https://github.com/manticoresoftware/manticoresearch/pull/1642/files. To separate our tasks.

@sanikolaev
Copy link
Collaborator

sanikolaev commented Dec 1, 2023

@PavelShilin89 it's not clear what each PR is about:
image

Please update the titles.

@PavelShilin89
Copy link

I removed the sharding syntax and pull request branch. I left one actual branch sharding syntax new with PR and an actual pull request in it.

@PavelShilin89
Copy link

@donhardman Please check the tests in the test/sharding-syntax-new branch for your part:

/Users/pavelshilin/Desktop/WORK/manticoresearch/test/clt-tests/sharding/syntax/queries.rec
/Users/pavelshilin/Desktop/WORK/manticoresearch/test/clt-tests/sharding/syntax/queries-syntax.rec
/Users/pavelshilin/Desktop/WORK/manticoresearch/test/clt-tests/sharding/syntax/queries-negative-test.rec

Please give feedback and guide whether the behavior and error notifications are as desired.

@donhardman
Copy link
Contributor Author

As we discussed with Sergey, we'll stick to the plan where both shards and rf need to be integers. So, the correct error has been addressed as expected behavior, and it's not being sent to the Buddy at all.

––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:m(id bigint) SHARDS='10' RF='2'"
––– output –––
- ERROR 1064 (42000) at line 1: Failed to parse query
+ ERROR 1064 (42000) at line 1: P03: syntax error, unexpected BIGINT, expecting '=' near 'bigint SHARDS = '10' RF = '2''
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:j(id bigint) SHARDS='10' RF='3'"
––– output –––
- ERROR 1064 (42000) at line 1: Failed to parse query
+ ERROR 1064 (42000) at line 1: P03: syntax error, unexpected BIGINT, expecting '=' near 'bigint SHARDS = '10' RF = '3''
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:x(id bigint) SHARDS='10' RF='10'"
––– output –––
- ERROR 1064 (42000) at line 1: Failed to parse query
+ ERROR 1064 (42000) at line 1: P03: syntax error, unexpected BIGINT, expecting '=' near 'bigint SHARDS = '10' RF = '10''

1
  1. Works fine, no errors when I run it

This is expected, behaviour, please update tests

––– input –––
mysql -h0 -P1306 -e "create table l(id bigint) SHARDS=-10 RF=-1"
––– output –––
- ERROR 1064 (42000) at line 1: Failed to parse query
+ ERROR 1064 (42000) at line 1: P03: syntax error, unexpected BIGINT, expecting '=' near 'bigint SHARDS = '-10' RF = '-1''
––– input –––

Also expected, please update, valid query no error is valid:

––– input –––
mysql -h0 -P1306 -e "create table c:m SHARDS=10 RF=2"
––– output –––
- ERROR 1064 (42000) at line 1: Cluster 'c' does not exist
––– input –––

This one also valid, cuz we just proxy now the query to the daemon and daemon ignores options that cannot handle, update this case:

––– input –––
mysql -h0 -P1306 -e "create table h random=5"
––– output –––
- ERROR 1064 (42000) at line 1: You cannot set rf greater than 1 when creating single node sharded table.
––– input –––

This looks weird for me, I will check and fix in case we have issue:

––– input –––
mysql -h0 -P1306 -e "create table t(id bigint) shards=a rf=b"
––– output –––
- ERROR 1064 (42000) at line 1: Failed to parse query

@donhardman
Copy link
Contributor Author

I have fixed shards=a rf=b, please validate other fixes

@PavelShilin89
Copy link

@donhardman I noticed that in the test /Users/pavelshilin/Desktop/WORK/manticoresearch/test/clt-tests/sharding/syntax/queries-negative-test.rec after updating ghcr.io/manticoresoftware/manticoresearch:test-kit-latest the command results have changed and some of them are not as expected. Please check.

@PavelShilin89
Copy link

@donhardman In the test/sharding-syntax-new branch of the ./test/clt-tests/sharding/syntax/queries.rec test, part of the queries to create a sharded table in a cluster on different nodes fails with the error

ERROR 1064 (42000) at line 1: Waiting timeout exceeded.

@PavelShilin89 PavelShilin89 removed their assignment Mar 14, 2024
@PavelShilin89 PavelShilin89 added the bug Something isn't working label Mar 14, 2024
@donhardman
Copy link
Contributor Author

Here's the situation:
We have Servers #1 and #2 clustered on Cluster C. Server #1 successfully creates a local sharded table, but Server #2 cannot create a shared table—and that's expected.

The problem starts when a daemon syncs the tables to Server #1 and overwrites everything there. Consequently, the buddy daemon, already loaded with the information that it's the master node (since it was set up as local), fails to recognize the cluster setup. Therefore, we can't detect this issue unless we run a consistency check, which isn't ideal.

The process here seems flawed. We shouldn't be testing this, or we need to implement a way to validate consistency changes on the buddy daemon—which isn't a great solution either. Alternatively, we could simply throw an error when attempting actions that shouldn't be executed in the first place.

@sanikolaev
Copy link
Collaborator

Here's the situation:

To simplify:

  • you have a cluster of 2 nodes
  • on node 1: create table t shards=2 rf=1 (w/o prefix "c:")
  • on node 2: create table c:t shards=2 rf=2

causes the issue. We are not going to make improvements to support it for now (as it requires significant changes including in the architecture). We'd better just write to the docs about this edge case.

@PavelShilin89 please update the test accordingly.

@donhardman pls update it in the docs.

@sanikolaev
Copy link
Collaborator

sanikolaev commented Apr 16, 2024

@donhardman as discussed, pls also explain to Pavel the difference between the syntax and functionality tests that were designed initially.

@donhardman
Copy link
Contributor Author

As we discussed earlier, we decided it's a good idea to align the tests with the original highlight: inputs and syntax validation.

This means we already have tests covering replication logic and cluster creation, so this task is mainly about syntax. What we should check is just the syntax, ensuring it works as expected without generating any results.

Additionally, in the case where we create a local table and then try to create a clustered table on another node – this is a current limitation. I advise grouping the tests by requirements in different files for easier understanding. Also, to test the syntax, we don't need to start 8 cluster nodes, so let's save some time by running syntax checks only.

We have other checks that perform multi-node tests. If you want to add tests with 8 cluster nodes, feel free to add an extra .rec file there. Currently, we're testing up to 5 nodes.

As for a pull request or documentation, we don't have that yet because we're still waiting until all the tests are completed to ensure it works properly. I tried searching for it but couldn't find anything. I've created a task to implement it.

@donhardman donhardman removed their assignment Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants