Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result file was not created for successful/failed run of gemini #236

Open
aleksbykov opened this issue Apr 23, 2020 · 9 comments
Open

Result file was not created for successful/failed run of gemini #236

aleksbykov opened this issue Apr 23, 2020 · 9 comments
Labels
bug Something isn't working
Milestone

Comments

@aleksbykov
Copy link
Contributor

gemini tool : v.1.6.9
gemini cmd:

/$HOME/gemini -d --duration 3h --warmup 30m -c 100 -m mixed -f --non-interactive --cql-features normal --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" --test-cluster=10.0.67.95 --outfile /home/centos/gemini_result_a8d56584-9884-4c15-9747-59856e8d0791.log --seed 70 --oracle-cluster=10.0.129.183 '

Gemini finished exit status 0, but result log file was not created:

during run several actions for test cluster node were applyed:
NoCorruptRepair TerminateAndReplaceNode CorruptThenRepair TruncateMonkey AbortRepairMonkey

from output:

{"L":"INFO","T":"2020-04-22T09:07:43.538Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2020-04-22T09:37:43.661Z","M":"Warmup done"}
2020/04/22 10:17:17 error: failed to connect to 10.0.80.120:9042 due to error: gocql: no response received from cassandra within timeout period
2020/04/22 10:17:54 error: failed to connect to 10.0.80.120:9042 due to error: gocql: no response received from cassandra within timeout period
{"L":"INFO","T":"2020-04-22T12:37:43.542Z","N":"pump","M":"Test run completed. Exiting."}
{"L":"INFO","T":"2020-04-22T12:37:43.542Z","N":"generator","M":"stopping partition key generation loop","keys_created":13493536991,"keys_emitted":4808295}
{"L":"INFO","T":"2020-04-22T12:37:51.941Z","M":"All jobs complete"}
(END)
@aleksbykov aleksbykov changed the title Result file was not created for successful run of gemini Result file was not created for successful/failed run of gemini May 6, 2020
@aleksbykov
Copy link
Contributor Author

Gemini detect error, but result file was not created and no any describe messages in log:
gemini command:

gemini -d --duration 3h --warmup 30m -c 100 -m mixed -f --non-interactive --cql-features normal --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" --test-cluster=10.0.149.48 --outfile /home/centos/gemini_result_718be378-02bc-427e-b1bb-151e8f4310ef.log --seed 36 --oracle-cluster=10.0.138.207

gemini result in console log:

Seed:			36
Maximum duration:	3h0m0s
Warmup duration:	30m0s
Concurrency:		100
Test cluster:		[10.0.149.48]
Oracle cluster:		[10.0.138.207]
Output file:		/home/centos/gemini_result_718be378-02bc-427e-b1bb-151e8f4310ef.log
Schema: {
    "keyspace": {
        "name": "ks1",
        "replication": {
            "class": "SimpleStrategy",
            "replication_factor": "3"
        },
        "oracle_replication": {
            "class": "SimpleStrategy",
            "replication_factor": "1"
        }
    },
    "tables": [
        {
            "name": "table1",
            "partition_keys": [
                {
                    "name": "pk0",
                    "type": "tinyint"
                },
                {
                    "name": "pk1",
                    "type": "tinyint"
                },
                {
                    "name": "pk2",
                    "type": "tinyint"
                }
            ],
            "clustering_keys": [
                {
                    "name": "ck0",
                    "type": "blob"
                },
                {
                    "name": "ck1",
                    "type": "decimal"
                }
            ],
            "columns": [
                {
                    "name": "col0",
                    "type": "blob"
                },
                {
                    "name": "col1",
                    "type": "uuid"
                },
                {
                    "name": "col2",
                    "type": "inet"
                },
                {
                    "name": "col3",
                    "type": {
                        "types": [
                            "date",
                            "decimal",
                            "decimal",
                            "timestamp",
                            "double",
                            "float",
                            "float",
                            "timestamp"
                        ],
                        "frozen": false
                    }
                },
                {
                    "name": "col4",
                    "type": {
                        "key_type": "bigint",
                        "value_type": "double",
                        "frozen": false
                    }
                },
                {
                    "name": "col5",
                    "type": {
                        "key_type": "bigint",
                        "value_type": "tinyint",
                        "frozen": false
                    }
                },
                {
                    "name": "col6",
                    "type": {
                        "types": [
                            "smallint",
                            "timestamp",
                            "decimal",
                            "inet",
                            "int",
                            "inet",
                            "varint",
                            "decimal",
                            "uuid"
                        ],
                        "frozen": true
                    }
                },
                {
                    "name": "col7",
                    "type": "float"
                },
                {
                    "name": "col8",
                    "type": "varchar"
                },
                {
                    "name": "col9",
                    "type": "tinyint"
                },
                {
                    "name": "col10",
                    "type": {
                        "key_type": "date",
                        "value_type": "decimal",
                        "frozen": true
                    }
                },
                {
                    "name": "col11",
                    "type": {
                        "types": {
                            "udt_3606525730_0": "double",
                            "udt_3606525730_1": "varint",
                            "udt_3606525730_2": "duration",
                            "udt_3606525730_3": "timeuuid",
                            "udt_3606525730_4": "text",
                            "udt_3606525730_5": "tinyint",
                            "udt_3606525730_6": "timeuuid",
                            "udt_3606525730_7": "tinyint",
                            "udt_3606525730_8": "float"
                        },
                        "type_name": "udt_3606525730",
                        "frozen": true
                    }
                },
                {
                    "name": "col12",
                    "type": "ascii"
                }
            ],
            "indexes": [
                {
                    "name": "col7_idx",
                    "column": "col7",
                    "column_idx": 7
                },
                {
                    "name": "col9_idx",
                    "column": "col9",
                    "column_idx": 9
                }
            ],
            "materialized_views": [
                {
                    "name": "table1_mv_0",
                    "partition_keys": [
                        {
                            "name": "col1",
                            "type": "uuid"
                        },
                        {
                            "name": "pk0",
                            "type": "tinyint"
                        },
                        {
                            "name": "pk1",
                            "type": "tinyint"
                        },
                        {
                            "name": "pk2",
                            "type": "tinyint"
                        }
                    ],
                    "clustering_keys": [
                        {
                            "name": "ck0",
                            "type": "blob"
                        },
                        {
                            "name": "ck1",
                            "type": "decimal"
                        }
                    ]
                }
            ],
            "known_issues": {
                "https://github.com/scylladb/scylla/issues/3708": true
            },
            "table_options": null
        }
    ]
}
{"L":"INFO","T":"2020-05-05T20:48:35.366Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2020-05-05T21:18:35.629Z","M":"Warmup done"}
{"L":"WARN","T":"2020-05-05T22:16:08.263Z","N":"sample_results","M":"Errors detected. Exiting."}
{"L":"INFO","T":"2020-05-06T00:18:35.366Z","N":"pump","M":"Test run completed. Exiting."}
{"L":"INFO","T":"2020-05-06T00:18:35.366Z","N":"generator","M":"stopping partition key generation loop","keys_created":11285996639,"keys_emitted":4351495}
{"L":"INFO","T":"2020-05-06T00:18:43.688Z","M":"All jobs complete"}
Error: gemini encountered errors, exiting with non zero status
gemini encountered errors, exiting with non zero status

When error detected, on one of test cluster node was running ENOSPC nemesis( simulate the end of space on node). But 2 other nodes are running without errors and not any errors in logs on them.

@dahankzter dahankzter added this to the 1.7 milestone May 14, 2020
@dahankzter
Copy link
Contributor

This should be handled better now after application of PR #239 perhaps we can close this and repoen it after trying in the upcoming release @aleksbykov ?

@aleksbykov
Copy link
Contributor Author

@dahankzter yes we can close it

@aleksbykov
Copy link
Contributor Author

@dahankzter , unfortunately the issue is still reproduced.Not always

Seed:			54
Maximum duration:	3h0m0s
Warmup duration:	30m0s
Concurrency:		100
Test cluster:		[10.0.252.39]
Oracle cluster:		[10.0.130.36]
Output file:		/home/centos/gemini_result_dcaeb75a-f9d5-4dc7-9008-7f15586bdd49.log
Schema: {
    "keyspace": {
        "name": "ks1",
        "replication": {
            "class": "SimpleStrategy",
            "replication_factor": "3"
        },
        "oracle_replication": {
            "class": "SimpleStrategy",
            "replication_factor": "1"
        }
    },
    "tables": [
        {
            "name": "table1",
            "partition_keys": [
                {
                    "name": "pk0",
                    "type": "smallint"
                },
                {
                    "name": "pk1",
                    "type": "smallint"
                },
                {
                    "name": "pk2",
                    "type": "tinyint"
                }
            ],
            "clustering_keys": [
                {
                    "name": "ck0",
                    "type": "smallint"
                },
                {
                    "name": "ck1",
                    "type": "float"
                }
            ],
            "columns": [
                {
                    "name": "col0",
                    "type": {
                        "types": {
                            "udt_3625021091_0": "tinyint",
                            "udt_3625021091_1": "timeuuid",
                            "udt_3625021091_2": "inet"
                        },
                        "type_name": "udt_3625021091",
                        "frozen": true
                    }
                },
                {
                    "name": "col1",
                    "type": {
                        "types": [
                            "timeuuid",
                            "date",
                            "uuid",
                            "text",
                            "inet",
                            "varchar",
                            "date",
                            "text",
                            "inet",
                            "inet",
                            "varchar"
                        ],
                        "frozen": false
                    }
                },
                {
                    "name": "col2",
                    "type": "varchar"
                },
                {
                    "name": "col3",
                    "type": "varint"
                },
                {
                    "name": "col4",
                    "type": {
                        "key_type": "varint",
                        "value_type": "text",
                        "frozen": true
                    }
                },
                {
                    "name": "col5",
                    "type": "varint"
                },
                {
                    "name": "col6",
                    "type": "bigint"
                },
                {
                    "name": "col7",
                    "type": "date"
                },
                {
                    "name": "col8",
                    "type": {
                        "types": {
                            "udt_3663736436_0": "uuid",
                            "udt_3663736436_1": "blob",
                            "udt_3663736436_2": "tinyint",
                            "udt_3663736436_3": "boolean",
                            "udt_3663736436_4": "timeuuid",
                            "udt_3663736436_5": "float",
                            "udt_3663736436_6": "timeuuid",
                            "udt_3663736436_7": "duration",
                            "udt_3663736436_8": "text"
                        },
                        "type_name": "udt_3663736436",
                        "frozen": true
                    }
                },
                {
                    "name": "col9",
                    "type": "float"
                },
                {
                    "name": "col10",
                    "type": "varchar"
                },
                {
                    "name": "col11",
                    "type": "float"
                },
                {
                    "name": "col12",
                    "type": "float"
                }
            ],
            "indexes": [
                {
                    "name": "table1_col3_idx",
                    "column": "col3",
                    "column_idx": 3
                },
                {
                    "name": "table1_col5_idx",
                    "column": "col5",
                    "column_idx": 5
                },
                {
                    "name": "table1_col9_idx",
                    "column": "col9",
                    "column_idx": 9
                },
                {
                    "name": "table1_col11_idx",
                    "column": "col11",
                    "column_idx": 11
                },
                {
                    "name": "table1_col12_idx",
                    "column": "col12",
                    "column_idx": 12
                }
            ],
            "materialized_views": [
                {
                    "name": "table1_mv_0",
                    "partition_keys": [
                        {
                            "name": "col10",
                            "type": "varchar"
                        },
                        {
                            "name": "pk0",
                            "type": "smallint"
                        },
                        {
                            "name": "pk1",
                            "type": "smallint"
                        },
                        {
                            "name": "pk2",
                            "type": "tinyint"
                        }
                    ],
                    "clustering_keys": [
                        {
                            "name": "ck0",
                            "type": "smallint"
                        },
                        {
                            "name": "ck1",
                            "type": "float"
                        }
                    ]
                }
            ],
            "known_issues": {
                "https://github.com/scylladb/scylla/issues/3708": true
            }
        }
    ]
}
{"L":"INFO","T":"2020-07-02T10:02:00.806Z","N":"generator","M":"starting partition key generation loop"}
{"L":"WARN","T":"2020-07-02T10:32:07.757Z","N":"sample_results","M":"Errors detected. Exiting."}
{"L":"INFO","T":"2020-07-02T10:32:07.757Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2020-07-02T10:32:09.269Z","M":"result channel closed"}
gemini encountered errors, exiting with non zero status
Error: gemini encountered errors, exiting with non zero status

@roydahan
Copy link
Collaborator

roydahan commented Oct 1, 2020

@slivne / @penberg do you have someone that can handle this?
Some jobs are failing and we can't tell why and where was the error.
It could hide a real scylla bug.

aleksbykov pushed a commit to aleksbykov/gemini that referenced this issue Dec 24, 2020
Due to issue scylladb#236
gemini result file could be not created.

Let print the Sample result to stdout if error detected.
This allow to investigate issue while the scylladb#236 is not fixed
aleksbykov pushed a commit to aleksbykov/gemini that referenced this issue Jan 4, 2021
Due to issue scylladb#236
gemini result file could be not created.

Let print the Sample result to stdout if error detected.
This allow to investigate issue while the scylladb#236 is not fixed
penberg pushed a commit that referenced this issue Jan 4, 2021
Due to issue #236
gemini result file could be not created.

Let print the Sample result to stdout if error detected.
This allow to investigate issue while the #236 is not fixed
@dkropachev
Copy link
Collaborator

@roydahan, @aleksbykov , is this still the case ? I see we are doing .Sync on the end, so it should work properly, unless there is underlying golang issue that makes it loose data on exit.

@aleksbykov
Copy link
Contributor Author

@dkropachev , yes, again got such results. Looks like gemini stucked on exiting:
Job: https://jenkins.scylladb.com/job/enterprise-2023.1/job/gemini-/job/gemini-3h-with-nemesis-test/10/

gemini output:

}
{"L":"INFO","T":"2023-05-18T12:52:52.172Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-05-18T16:22:52.146Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-05-18T16:22:52.172Z","M":"Test run completed. Exiting."}
(E}
{"L":"INFO","T":"2023-05-18T12:52:52.172Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-05-18T16:22:52.146Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-05-18T16:22:52.172Z","M":"Test run completed. Exiting."}

and there is no result file.

@dkropachev
Copy link
Collaborator

@aleksbykov , Absense of result channel closed gives me a clue that workgroup did not quit on waiting:

		if err := g.Wait(); err != nil {
			logger.Debug("error detected", zap.Error(err))
		}

That part was redone using different mechanism at ef29ea5

So, i could have gone on v.1.8.0, could you please upgrade gemini on 2023.1 branch and let's see if it happens there.

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 8, 2023

@aleksbykov , did it happen recently ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants