Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query stalls in Presto 0.108 #3191

Closed
xerial opened this issue Jun 29, 2015 · 18 comments
Closed

Query stalls in Presto 0.108 #3191

xerial opened this issue Jun 29, 2015 · 18 comments

Comments

@xerial
Copy link
Member

xerial commented Jun 29, 2015

Although I couldn't find the exact cause, but some queries started to stall in Presto 0.108. For example, the following queries stalled:

SELECT * FROM nasdaq TABLESAMPLE BERNOULLI (1);
SELECT count(l_linenumber) FROM tpch_s1.lineitem TABLESAMPLE BERNOULLI(1)

This is not specific to TABLESAMPLE operation, since the other type of queries (e.g., joins) also stalls.

This bug might be related to our connector implementation, but it looks like when skipping entries this stall happens. Please let me if you know anything on this problem.

Thanks.

@wyukawa
Copy link
Contributor

wyukawa commented Jun 29, 2015

I also encountered the similar problem.

https://groups.google.com/forum/#!topic/presto-users/Aw3XOhl5j7w

I try to use presto 0.108 + jdk1.8.0_45 + HDP 2.1 + CentOS 6.5.

I use hive connector.

Some queries don't return at presto 0.108, but return at presto 0.107.

Thanks.

@cberner
Copy link
Contributor

cberner commented Jun 29, 2015

Could you look in the web ui and tell me what the status of the query is? In particular, it would be helpful to know:

  • does the UI report it as BLOCKED?
  • what state are the different stages in?
  • for stages that are still running, what state are the tasks in?

@cberner
Copy link
Contributor

cberner commented Jun 29, 2015

Also, when you say "stall" do you mean that it never finishes, or does it just take a long time?

@xerial
Copy link
Member Author

xerial commented Jun 29, 2015

@cberner

  • Yes. It shows BLOCKED at the root stage, and the child stages are in RUNNING state.
  • By stall, I mean it never finishes without any progress in processed rows (e.g., outputPositions count).

@cberner
Copy link
Contributor

cberner commented Jun 29, 2015

I see. What state are the tasks in the child stage in?

@xerial
Copy link
Member Author

xerial commented Jun 30, 2015

All of the child tasks are in RUNNING state.

I'm going to check the heap dump or thread stacks (/v1/thread) to see where the query is blocked.

@electrum
Copy link
Contributor

I'm seeing an issue where stages never get any tasks scheduled. Might be the same problem. Did not happen with 0.107.

It occurred twice after restart where we sent ~10 queries at all once. All were stuck the same way and never recovered. No other queries running. Subsequent queries were fine.

@cberner
Copy link
Contributor

cberner commented Jun 30, 2015

@xerial also take a look at the task info (json document) which you can find by clicking on the link from the task, on the query page. If you don't mind sharing the info in there, it would be helpful to debugging too.

@yuananf
Copy link
Contributor

yuananf commented Jun 30, 2015

Is this helpful?
When restarting a presto cluster, execute show tables, it also never finished, and the web page looks like below:

timline 20150630140035

@wyukawa
Copy link
Contributor

wyukawa commented Jun 30, 2015

My case is the following.

http://gyazo.com/83a8c1e2b52b2a3e28821cf954fd5f34

presto query is the following

WITH 
... AS 
    (
    SELECT
       .....
    FROM
        presto_view
    WHERE 
        ...
    GROUP BY 
        ...
    )
SELECT
    ....
    SUM(...)
        OVER(
            PARTITION BY
                ...
        ) AS ...
FROM
    ...

@xerial
Copy link
Member Author

xerial commented Jun 30, 2015

@cberner
Sure. After excluding some private information, I'll share it.

@xerial
Copy link
Member Author

xerial commented Jun 30, 2015

Here is the json data of the tasks of a stalled query:
https://gist.github.com/xerial/b14041846ff246849275

20150630_071237_06262_zieat

It looks like the child tasks already finished processing the data, but the parent task is still waiting the result.

@cberner
Copy link
Contributor

cberner commented Jun 30, 2015

@xerial it looks like the problem is that the TableScan is hung/slow. In your screen shot it shows that none of the splits have finished running, and from the json stats it looks like none of the tasks have produced output. For example 20150630_071237_06262_zieat.1.0 shows 0 outputPositions. Can you try running jstack on one of your workers to see if the TableScan thread is stuck?

@dain
Copy link
Contributor

dain commented Jun 30, 2015

@yuananf we have seen that one internally. There is some bug where a stage fails immediately during setup, but the query misses the notification.

@xerial If you open the info for one of the table scan tasks (click the task id in the table). Then look for "getOutputCalls", wait a minute and check it again. The number should increment each time the output method on the operator is called. There are stats for each of the main methods of an operator, so using this technique, you can see if the operators are still being invoked. If they are not, it is likely all of the worker threads are hung.

@wyukawa
Copy link
Contributor

wyukawa commented Jul 1, 2015

Here is the jstack result of presto 0.108 worker in my case

https://gist.github.com/wyukawa/2f8d09a03aef8b844370

@cberner
Copy link
Contributor

cberner commented Jul 1, 2015

@dain I think this will fix the issue with stages that fail during setup not failing the query: #3204

It doesn't explain some of the other hangs though

@cberner
Copy link
Contributor

cberner commented Jul 1, 2015

We found this bug #3212 which would cause stalls for JOINs. Not sure about the table sample ones though

@xerial
Copy link
Member Author

xerial commented Jul 7, 2015

This might be related to our connector implementation. Let me close this ticket for now.
If we can find another cause, we will open a new ticket.

Thanks!

@xerial xerial closed this as completed Jul 7, 2015
branimir-vujicic added a commit to axiomq/presto that referenced this issue Sep 18, 2023
Feature Toggles should allow teams to modify system behavior without changing
code. Feature Toggles are configured using google guice. Basic definition of
toggles are crated using FeatureToggleBinder. FeatureToggleBinder creates
FeatureToggle and additional configuration can be done using feature
configuration.
In current stage Feature Toggles supports:
- if / else based feature toggles
- Dependency Injection based
  Hot reloading implementation without restart require code refactoring to
  add an interface when injecting the new implementation/class
- using various toggle strategies along with simple on / off toggles

configuration:

to allow feature toggle configuration four lines are needed in
config.properties file
```
features.config-source-type=file
features.config-source=/etc/feature-config.properties
features.config-type=properties
features.refresh-period=30s
```

`configuration-source-type` is source type for Feature Toggles configuration
`features.config-source` is a source (file) of the configuration
`features.config-type` format in which configuration is stored (json or properties)
`features.refresh-period` configuration refresh period

Defining Feature Toggles

Feature toggle definition is done in google guice module using `FeatureToggleBinder`

simple feature toggle definition
```
    featureToggleBinder(binder)
                        .featureId("featureXX")
                        .bind()
```
This example creates bindings for @Inject
```
    @Inject
    public Runner(@FeatureToggle("featureXX") Supplier<Boolean> isFeatureXXEnabled)
    {
        this.isFeatureXXEnabled = isFeatureXXEnabled;
    }
```
`isFeatureXXEnabled` can be used to test if feature is enabled or disabled:
```
    boolean testFeatureXXEnabled()
    {
        return isFeatureXXEnabled.get();
    }
```

hot reloadable feature toggle definition
```
featureToggleBinder(binder, Feature01.class)
                        .featureId("feature01")
                        .baseClass(Feature01.class)
                        .defaultClass(Feature01Impl01.class)
                        .allOf(Feature01Impl01.class, Feature01Impl02.class)
                        .bind()
```
adding Feature Toggle switching strategy
```
featureToggleBinder(binder)
                        .featureId("feature04")
                        .toggleStrategy("AllowAll")
                        .toggleStrategyConfig(ImmutableMap.of("key", "value", "key2", "value2"))
```

feature-config.properties file example
```
# feature query-logger
feature.query-logger.enabled=true
feature.query-logger.strategy=OsToggle
feature.query-logger.strategy.os_name=.*Linux.*

#feature.query-rate-limiter
feature.query-rate-limiter.currentInstance=com.facebook.presto.server.protocol.QueryBlockingRateLimiter

# feature.query-cancel
feature.query-cancel.strategy=AllowList
feature.query-cancel.strategy.allow-list-source=.*IDEA.*
feature.query-cancel.strategy.allow-list-user=.*prestodb
```
in this example for first feature `query-logger`
changing value of feature.query-logger.enabled to `false` will 'disable' this feature.
Changes will be effective within refresh period.
Pass column delimiter info to reader (prestodb#6338)

Summary: Pull Request resolved: facebookincubator/velox#6338

Reviewed By: Yuhta

Differential Revision: D48457913

fbshipit-source-id: 57d76dfa229de3801bf3181f780a485b628427ad
Memory pool refactoring by removing MemoryPoolBase and ScopedMemoryPool (prestodb#3191)

Summary:
Remove MemoryPoolBase and ScopedMemoryPool, and consolidate into
one memory pool interface MemoryPool and one production implementation
MemoryPoolImp. For the details about the memory pool hierarchy and
ownership see the class comment for MemoryPool.

A query gets the root memory pool object from memory manager by calling
IMemoryManager::getChild(). IMemoryManager is the interface of memory
manger. During the query execution, we calls a parent memory pool's getChild()
to create a child memory pool object.

This PR also changes the references between parent and child memory pool
objects:
1. parent pool object tracks the child pool object through a raw pointer;
2. child pool object holds a shared reference to parent pool object so a parent
    pool object can only destroy after all its child pool objects have been destroyed;
4. child pool object destruction removes its raw pointer tracked in the parent pool
    and release the shared reference on its parent.

Pull Request resolved: facebookincubator/velox#3191

Reviewed By: mbasmanova

Differential Revision: D41206814

Pulled By: xiaoxmeng

fbshipit-source-id: d9ce695c9cf2f558b56c46fc67fd6b7e1c7eac57
branimir-vujicic added a commit to axiomq/presto that referenced this issue Sep 18, 2023
Feature Toggles should allow teams to modify system behavior without changing
code. Feature Toggles are configured using google guice. Basic definition of
toggles are crated using FeatureToggleBinder. FeatureToggleBinder creates
FeatureToggle and additional configuration can be done using feature
configuration.
In current stage Feature Toggles supports:
- if / else based feature toggles
- Dependency Injection based
  Hot reloading implementation without restart require code refactoring to
  add an interface when injecting the new implementation/class
- using various toggle strategies along with simple on / off toggles

configuration:

to allow feature toggle configuration four lines are needed in
config.properties file
```
features.config-source-type=file
features.config-source=/etc/feature-config.properties
features.config-type=properties
features.refresh-period=30s
```

`configuration-source-type` is source type for Feature Toggles configuration
`features.config-source` is a source (file) of the configuration
`features.config-type` format in which configuration is stored (json or properties)
`features.refresh-period` configuration refresh period

Defining Feature Toggles

Feature toggle definition is done in google guice module using `FeatureToggleBinder`

simple feature toggle definition
```
    featureToggleBinder(binder)
                        .featureId("featureXX")
                        .bind()
```
This example creates bindings for @Inject
```
    @Inject
    public Runner(@FeatureToggle("featureXX") Supplier<Boolean> isFeatureXXEnabled)
    {
        this.isFeatureXXEnabled = isFeatureXXEnabled;
    }
```
`isFeatureXXEnabled` can be used to test if feature is enabled or disabled:
```
    boolean testFeatureXXEnabled()
    {
        return isFeatureXXEnabled.get();
    }
```

hot reloadable feature toggle definition
```
featureToggleBinder(binder, Feature01.class)
                        .featureId("feature01")
                        .baseClass(Feature01.class)
                        .defaultClass(Feature01Impl01.class)
                        .allOf(Feature01Impl01.class, Feature01Impl02.class)
                        .bind()
```
adding Feature Toggle switching strategy
```
featureToggleBinder(binder)
                        .featureId("feature04")
                        .toggleStrategy("AllowAll")
                        .toggleStrategyConfig(ImmutableMap.of("key", "value", "key2", "value2"))
```

feature-config.properties file example
```
# feature query-logger
feature.query-logger.enabled=true
feature.query-logger.strategy=OsToggle
feature.query-logger.strategy.os_name=.*Linux.*

#feature.query-rate-limiter
feature.query-rate-limiter.currentInstance=com.facebook.presto.server.protocol.QueryBlockingRateLimiter

# feature.query-cancel
feature.query-cancel.strategy=AllowList
feature.query-cancel.strategy.allow-list-source=.*IDEA.*
feature.query-cancel.strategy.allow-list-user=.*prestodb
```
in this example for first feature `query-logger`
changing value of feature.query-logger.enabled to `false` will 'disable' this feature.
Changes will be effective within refresh period.
Pass column delimiter info to reader (prestodb#6338)

Summary: Pull Request resolved: facebookincubator/velox#6338

Reviewed By: Yuhta

Differential Revision: D48457913

fbshipit-source-id: 57d76dfa229de3801bf3181f780a485b628427ad
Memory pool refactoring by removing MemoryPoolBase and ScopedMemoryPool (prestodb#3191)

Summary:
Remove MemoryPoolBase and ScopedMemoryPool, and consolidate into
one memory pool interface MemoryPool and one production implementation
MemoryPoolImp. For the details about the memory pool hierarchy and
ownership see the class comment for MemoryPool.

A query gets the root memory pool object from memory manager by calling
IMemoryManager::getChild(). IMemoryManager is the interface of memory
manger. During the query execution, we calls a parent memory pool's getChild()
to create a child memory pool object.

This PR also changes the references between parent and child memory pool
objects:
1. parent pool object tracks the child pool object through a raw pointer;
2. child pool object holds a shared reference to parent pool object so a parent
    pool object can only destroy after all its child pool objects have been destroyed;
4. child pool object destruction removes its raw pointer tracked in the parent pool
    and release the shared reference on its parent.

Pull Request resolved: facebookincubator/velox#3191

Reviewed By: mbasmanova

Differential Revision: D41206814

Pulled By: xiaoxmeng

fbshipit-source-id: d9ce695c9cf2f558b56c46fc67fd6b7e1c7eac57
branimir-vujicic added a commit to axiomq/presto that referenced this issue Sep 18, 2023
Feature Toggles should allow teams to modify system behavior without changing
code. Feature Toggles are configured using google guice. Basic definition of
toggles are crated using FeatureToggleBinder. FeatureToggleBinder creates
FeatureToggle and additional configuration can be done using feature
configuration.
In current stage Feature Toggles supports:
- if / else based feature toggles
- Dependency Injection based
  Hot reloading implementation without restart require code refactoring to
  add an interface when injecting the new implementation/class
- using various toggle strategies along with simple on / off toggles

configuration:

to allow feature toggle configuration four lines are needed in
config.properties file
```
features.config-source-type=file
features.config-source=/etc/feature-config.properties
features.config-type=properties
features.refresh-period=30s
```

`configuration-source-type` is source type for Feature Toggles configuration
`features.config-source` is a source (file) of the configuration
`features.config-type` format in which configuration is stored (json or properties)
`features.refresh-period` configuration refresh period

Defining Feature Toggles

Feature toggle definition is done in google guice module using `FeatureToggleBinder`

simple feature toggle definition
```
    featureToggleBinder(binder)
                        .featureId("featureXX")
                        .bind()
```
This example creates bindings for @Inject
```
    @Inject
    public Runner(@FeatureToggle("featureXX") Supplier<Boolean> isFeatureXXEnabled)
    {
        this.isFeatureXXEnabled = isFeatureXXEnabled;
    }
```
`isFeatureXXEnabled` can be used to test if feature is enabled or disabled:
```
    boolean testFeatureXXEnabled()
    {
        return isFeatureXXEnabled.get();
    }
```

hot reloadable feature toggle definition
```
featureToggleBinder(binder, Feature01.class)
                        .featureId("feature01")
                        .baseClass(Feature01.class)
                        .defaultClass(Feature01Impl01.class)
                        .allOf(Feature01Impl01.class, Feature01Impl02.class)
                        .bind()
```
adding Feature Toggle switching strategy
```
featureToggleBinder(binder)
                        .featureId("feature04")
                        .toggleStrategy("AllowAll")
                        .toggleStrategyConfig(ImmutableMap.of("key", "value", "key2", "value2"))
```

feature-config.properties file example
```
# feature query-logger
feature.query-logger.enabled=true
feature.query-logger.strategy=OsToggle
feature.query-logger.strategy.os_name=.*Linux.*

#feature.query-rate-limiter
feature.query-rate-limiter.currentInstance=com.facebook.presto.server.protocol.QueryBlockingRateLimiter

# feature.query-cancel
feature.query-cancel.strategy=AllowList
feature.query-cancel.strategy.allow-list-source=.*IDEA.*
feature.query-cancel.strategy.allow-list-user=.*prestodb
```
in this example for first feature `query-logger`
changing value of feature.query-logger.enabled to `false` will 'disable' this feature.
Changes will be effective within refresh period.
Pass column delimiter info to reader (prestodb#6338)

Summary: Pull Request resolved: facebookincubator/velox#6338

Reviewed By: Yuhta

Differential Revision: D48457913

fbshipit-source-id: 57d76dfa229de3801bf3181f780a485b628427ad
Memory pool refactoring by removing MemoryPoolBase and ScopedMemoryPool (prestodb#3191)

Summary:
Remove MemoryPoolBase and ScopedMemoryPool, and consolidate into
one memory pool interface MemoryPool and one production implementation
MemoryPoolImp. For the details about the memory pool hierarchy and
ownership see the class comment for MemoryPool.

A query gets the root memory pool object from memory manager by calling
IMemoryManager::getChild(). IMemoryManager is the interface of memory
manger. During the query execution, we calls a parent memory pool's getChild()
to create a child memory pool object.

This PR also changes the references between parent and child memory pool
objects:
1. parent pool object tracks the child pool object through a raw pointer;
2. child pool object holds a shared reference to parent pool object so a parent
    pool object can only destroy after all its child pool objects have been destroyed;
4. child pool object destruction removes its raw pointer tracked in the parent pool
    and release the shared reference on its parent.

Pull Request resolved: facebookincubator/velox#3191

Reviewed By: mbasmanova

Differential Revision: D41206814

Pulled By: xiaoxmeng

fbshipit-source-id: d9ce695c9cf2f558b56c46fc67fd6b7e1c7eac57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants