New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query stalls in Presto 0.108 #3191
Comments
I also encountered the similar problem. https://groups.google.com/forum/#!topic/presto-users/Aw3XOhl5j7w I try to use presto 0.108 + jdk1.8.0_45 + HDP 2.1 + CentOS 6.5. I use hive connector. Some queries don't return at presto 0.108, but return at presto 0.107. Thanks. |
Could you look in the web ui and tell me what the status of the query is? In particular, it would be helpful to know:
|
Also, when you say "stall" do you mean that it never finishes, or does it just take a long time? |
|
I see. What state are the tasks in the child stage in? |
All of the child tasks are in RUNNING state. I'm going to check the heap dump or thread stacks (/v1/thread) to see where the query is blocked. |
I'm seeing an issue where stages never get any tasks scheduled. Might be the same problem. Did not happen with 0.107. It occurred twice after restart where we sent ~10 queries at all once. All were stuck the same way and never recovered. No other queries running. Subsequent queries were fine. |
@xerial also take a look at the task info (json document) which you can find by clicking on the link from the task, on the query page. If you don't mind sharing the info in there, it would be helpful to debugging too. |
My case is the following. http://gyazo.com/83a8c1e2b52b2a3e28821cf954fd5f34 presto query is the following
|
@cberner |
Here is the json data of the tasks of a stalled query: It looks like the child tasks already finished processing the data, but the parent task is still waiting the result. |
@xerial it looks like the problem is that the TableScan is hung/slow. In your screen shot it shows that none of the splits have finished running, and from the json stats it looks like none of the tasks have produced output. For example |
@yuananf we have seen that one internally. There is some bug where a stage fails immediately during setup, but the query misses the notification. @xerial If you open the info for one of the table scan tasks (click the task id in the table). Then look for "getOutputCalls", wait a minute and check it again. The number should increment each time the output method on the operator is called. There are stats for each of the main methods of an operator, so using this technique, you can see if the operators are still being invoked. If they are not, it is likely all of the worker threads are hung. |
Here is the jstack result of presto 0.108 worker in my case |
We found this bug #3212 which would cause stalls for JOINs. Not sure about the table sample ones though |
This might be related to our connector implementation. Let me close this ticket for now. Thanks! |
Feature Toggles should allow teams to modify system behavior without changing code. Feature Toggles are configured using google guice. Basic definition of toggles are crated using FeatureToggleBinder. FeatureToggleBinder creates FeatureToggle and additional configuration can be done using feature configuration. In current stage Feature Toggles supports: - if / else based feature toggles - Dependency Injection based Hot reloading implementation without restart require code refactoring to add an interface when injecting the new implementation/class - using various toggle strategies along with simple on / off toggles configuration: to allow feature toggle configuration four lines are needed in config.properties file ``` features.config-source-type=file features.config-source=/etc/feature-config.properties features.config-type=properties features.refresh-period=30s ``` `configuration-source-type` is source type for Feature Toggles configuration `features.config-source` is a source (file) of the configuration `features.config-type` format in which configuration is stored (json or properties) `features.refresh-period` configuration refresh period Defining Feature Toggles Feature toggle definition is done in google guice module using `FeatureToggleBinder` simple feature toggle definition ``` featureToggleBinder(binder) .featureId("featureXX") .bind() ``` This example creates bindings for @Inject ``` @Inject public Runner(@FeatureToggle("featureXX") Supplier<Boolean> isFeatureXXEnabled) { this.isFeatureXXEnabled = isFeatureXXEnabled; } ``` `isFeatureXXEnabled` can be used to test if feature is enabled or disabled: ``` boolean testFeatureXXEnabled() { return isFeatureXXEnabled.get(); } ``` hot reloadable feature toggle definition ``` featureToggleBinder(binder, Feature01.class) .featureId("feature01") .baseClass(Feature01.class) .defaultClass(Feature01Impl01.class) .allOf(Feature01Impl01.class, Feature01Impl02.class) .bind() ``` adding Feature Toggle switching strategy ``` featureToggleBinder(binder) .featureId("feature04") .toggleStrategy("AllowAll") .toggleStrategyConfig(ImmutableMap.of("key", "value", "key2", "value2")) ``` feature-config.properties file example ``` # feature query-logger feature.query-logger.enabled=true feature.query-logger.strategy=OsToggle feature.query-logger.strategy.os_name=.*Linux.* #feature.query-rate-limiter feature.query-rate-limiter.currentInstance=com.facebook.presto.server.protocol.QueryBlockingRateLimiter # feature.query-cancel feature.query-cancel.strategy=AllowList feature.query-cancel.strategy.allow-list-source=.*IDEA.* feature.query-cancel.strategy.allow-list-user=.*prestodb ``` in this example for first feature `query-logger` changing value of feature.query-logger.enabled to `false` will 'disable' this feature. Changes will be effective within refresh period. Pass column delimiter info to reader (prestodb#6338) Summary: Pull Request resolved: facebookincubator/velox#6338 Reviewed By: Yuhta Differential Revision: D48457913 fbshipit-source-id: 57d76dfa229de3801bf3181f780a485b628427ad Memory pool refactoring by removing MemoryPoolBase and ScopedMemoryPool (prestodb#3191) Summary: Remove MemoryPoolBase and ScopedMemoryPool, and consolidate into one memory pool interface MemoryPool and one production implementation MemoryPoolImp. For the details about the memory pool hierarchy and ownership see the class comment for MemoryPool. A query gets the root memory pool object from memory manager by calling IMemoryManager::getChild(). IMemoryManager is the interface of memory manger. During the query execution, we calls a parent memory pool's getChild() to create a child memory pool object. This PR also changes the references between parent and child memory pool objects: 1. parent pool object tracks the child pool object through a raw pointer; 2. child pool object holds a shared reference to parent pool object so a parent pool object can only destroy after all its child pool objects have been destroyed; 4. child pool object destruction removes its raw pointer tracked in the parent pool and release the shared reference on its parent. Pull Request resolved: facebookincubator/velox#3191 Reviewed By: mbasmanova Differential Revision: D41206814 Pulled By: xiaoxmeng fbshipit-source-id: d9ce695c9cf2f558b56c46fc67fd6b7e1c7eac57
Feature Toggles should allow teams to modify system behavior without changing code. Feature Toggles are configured using google guice. Basic definition of toggles are crated using FeatureToggleBinder. FeatureToggleBinder creates FeatureToggle and additional configuration can be done using feature configuration. In current stage Feature Toggles supports: - if / else based feature toggles - Dependency Injection based Hot reloading implementation without restart require code refactoring to add an interface when injecting the new implementation/class - using various toggle strategies along with simple on / off toggles configuration: to allow feature toggle configuration four lines are needed in config.properties file ``` features.config-source-type=file features.config-source=/etc/feature-config.properties features.config-type=properties features.refresh-period=30s ``` `configuration-source-type` is source type for Feature Toggles configuration `features.config-source` is a source (file) of the configuration `features.config-type` format in which configuration is stored (json or properties) `features.refresh-period` configuration refresh period Defining Feature Toggles Feature toggle definition is done in google guice module using `FeatureToggleBinder` simple feature toggle definition ``` featureToggleBinder(binder) .featureId("featureXX") .bind() ``` This example creates bindings for @Inject ``` @Inject public Runner(@FeatureToggle("featureXX") Supplier<Boolean> isFeatureXXEnabled) { this.isFeatureXXEnabled = isFeatureXXEnabled; } ``` `isFeatureXXEnabled` can be used to test if feature is enabled or disabled: ``` boolean testFeatureXXEnabled() { return isFeatureXXEnabled.get(); } ``` hot reloadable feature toggle definition ``` featureToggleBinder(binder, Feature01.class) .featureId("feature01") .baseClass(Feature01.class) .defaultClass(Feature01Impl01.class) .allOf(Feature01Impl01.class, Feature01Impl02.class) .bind() ``` adding Feature Toggle switching strategy ``` featureToggleBinder(binder) .featureId("feature04") .toggleStrategy("AllowAll") .toggleStrategyConfig(ImmutableMap.of("key", "value", "key2", "value2")) ``` feature-config.properties file example ``` # feature query-logger feature.query-logger.enabled=true feature.query-logger.strategy=OsToggle feature.query-logger.strategy.os_name=.*Linux.* #feature.query-rate-limiter feature.query-rate-limiter.currentInstance=com.facebook.presto.server.protocol.QueryBlockingRateLimiter # feature.query-cancel feature.query-cancel.strategy=AllowList feature.query-cancel.strategy.allow-list-source=.*IDEA.* feature.query-cancel.strategy.allow-list-user=.*prestodb ``` in this example for first feature `query-logger` changing value of feature.query-logger.enabled to `false` will 'disable' this feature. Changes will be effective within refresh period. Pass column delimiter info to reader (prestodb#6338) Summary: Pull Request resolved: facebookincubator/velox#6338 Reviewed By: Yuhta Differential Revision: D48457913 fbshipit-source-id: 57d76dfa229de3801bf3181f780a485b628427ad Memory pool refactoring by removing MemoryPoolBase and ScopedMemoryPool (prestodb#3191) Summary: Remove MemoryPoolBase and ScopedMemoryPool, and consolidate into one memory pool interface MemoryPool and one production implementation MemoryPoolImp. For the details about the memory pool hierarchy and ownership see the class comment for MemoryPool. A query gets the root memory pool object from memory manager by calling IMemoryManager::getChild(). IMemoryManager is the interface of memory manger. During the query execution, we calls a parent memory pool's getChild() to create a child memory pool object. This PR also changes the references between parent and child memory pool objects: 1. parent pool object tracks the child pool object through a raw pointer; 2. child pool object holds a shared reference to parent pool object so a parent pool object can only destroy after all its child pool objects have been destroyed; 4. child pool object destruction removes its raw pointer tracked in the parent pool and release the shared reference on its parent. Pull Request resolved: facebookincubator/velox#3191 Reviewed By: mbasmanova Differential Revision: D41206814 Pulled By: xiaoxmeng fbshipit-source-id: d9ce695c9cf2f558b56c46fc67fd6b7e1c7eac57
Feature Toggles should allow teams to modify system behavior without changing code. Feature Toggles are configured using google guice. Basic definition of toggles are crated using FeatureToggleBinder. FeatureToggleBinder creates FeatureToggle and additional configuration can be done using feature configuration. In current stage Feature Toggles supports: - if / else based feature toggles - Dependency Injection based Hot reloading implementation without restart require code refactoring to add an interface when injecting the new implementation/class - using various toggle strategies along with simple on / off toggles configuration: to allow feature toggle configuration four lines are needed in config.properties file ``` features.config-source-type=file features.config-source=/etc/feature-config.properties features.config-type=properties features.refresh-period=30s ``` `configuration-source-type` is source type for Feature Toggles configuration `features.config-source` is a source (file) of the configuration `features.config-type` format in which configuration is stored (json or properties) `features.refresh-period` configuration refresh period Defining Feature Toggles Feature toggle definition is done in google guice module using `FeatureToggleBinder` simple feature toggle definition ``` featureToggleBinder(binder) .featureId("featureXX") .bind() ``` This example creates bindings for @Inject ``` @Inject public Runner(@FeatureToggle("featureXX") Supplier<Boolean> isFeatureXXEnabled) { this.isFeatureXXEnabled = isFeatureXXEnabled; } ``` `isFeatureXXEnabled` can be used to test if feature is enabled or disabled: ``` boolean testFeatureXXEnabled() { return isFeatureXXEnabled.get(); } ``` hot reloadable feature toggle definition ``` featureToggleBinder(binder, Feature01.class) .featureId("feature01") .baseClass(Feature01.class) .defaultClass(Feature01Impl01.class) .allOf(Feature01Impl01.class, Feature01Impl02.class) .bind() ``` adding Feature Toggle switching strategy ``` featureToggleBinder(binder) .featureId("feature04") .toggleStrategy("AllowAll") .toggleStrategyConfig(ImmutableMap.of("key", "value", "key2", "value2")) ``` feature-config.properties file example ``` # feature query-logger feature.query-logger.enabled=true feature.query-logger.strategy=OsToggle feature.query-logger.strategy.os_name=.*Linux.* #feature.query-rate-limiter feature.query-rate-limiter.currentInstance=com.facebook.presto.server.protocol.QueryBlockingRateLimiter # feature.query-cancel feature.query-cancel.strategy=AllowList feature.query-cancel.strategy.allow-list-source=.*IDEA.* feature.query-cancel.strategy.allow-list-user=.*prestodb ``` in this example for first feature `query-logger` changing value of feature.query-logger.enabled to `false` will 'disable' this feature. Changes will be effective within refresh period. Pass column delimiter info to reader (prestodb#6338) Summary: Pull Request resolved: facebookincubator/velox#6338 Reviewed By: Yuhta Differential Revision: D48457913 fbshipit-source-id: 57d76dfa229de3801bf3181f780a485b628427ad Memory pool refactoring by removing MemoryPoolBase and ScopedMemoryPool (prestodb#3191) Summary: Remove MemoryPoolBase and ScopedMemoryPool, and consolidate into one memory pool interface MemoryPool and one production implementation MemoryPoolImp. For the details about the memory pool hierarchy and ownership see the class comment for MemoryPool. A query gets the root memory pool object from memory manager by calling IMemoryManager::getChild(). IMemoryManager is the interface of memory manger. During the query execution, we calls a parent memory pool's getChild() to create a child memory pool object. This PR also changes the references between parent and child memory pool objects: 1. parent pool object tracks the child pool object through a raw pointer; 2. child pool object holds a shared reference to parent pool object so a parent pool object can only destroy after all its child pool objects have been destroyed; 4. child pool object destruction removes its raw pointer tracked in the parent pool and release the shared reference on its parent. Pull Request resolved: facebookincubator/velox#3191 Reviewed By: mbasmanova Differential Revision: D41206814 Pulled By: xiaoxmeng fbshipit-source-id: d9ce695c9cf2f558b56c46fc67fd6b7e1c7eac57
Although I couldn't find the exact cause, but some queries started to stall in Presto 0.108. For example, the following queries stalled:
This is not specific to TABLESAMPLE operation, since the other type of queries (e.g., joins) also stalls.
This bug might be related to our connector implementation, but it looks like when skipping entries this stall happens. Please let me if you know anything on this problem.
Thanks.
The text was updated successfully, but these errors were encountered: