Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to configure insert into Hive partition via configuration property #4999

Conversation

ssheikin
Copy link
Contributor

@ssheikin ssheikin commented Aug 27, 2020

Allow to set default value for hive.insert_existing_partitions_behavior property

@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from a373929 to bec6d28 Compare August 28, 2020 15:47
@ssheikin ssheikin changed the title Add property to hive connector Allow to configure insert into Hive partition via configuration property Aug 28, 2020
@ssheikin ssheikin requested a review from kokosing August 28, 2020 15:49
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum Would you please take a look?

@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch 2 times, most recently from 55fa327 to 92b56bc Compare September 1, 2020 14:13
@PostConstruct
public void validate()
{
InsertExistingPartitionsBehavior.validate(insertExistingPartitionsBehavior, immutablePartitions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we deprecate immutable partitions property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. immutable partitions = true is used to protect setting hive.insert-existing-partitions-behavior to APPEND state via session properties.

Copy link
Member

@losipiuk losipiuk Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could extend hive.insert-existing-partitions-behavior with FORCE_ERROR (name subject to discussion) which behaves as ERROR, but also be non overridable by session property.

Though personally I feel that having separate properties is easier to understand.

Copy link
Contributor Author

@ssheikin ssheikin Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason immutable partitions=true allows setting insert-existing-partitions-behavior=OWERWRITE that's why it is not an option. Apart from that deprecating breaks backwards compatibility and requires a longer discussion.
Won't do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the code correctly now if we specify hive.immutable-partitions=true and NOT specify hive.insert-existing-partitions-behavior the Presto will fail to start.
This is valid setup today. To be consistent with what we have right now we should default hive.insert-existing-partitions-behavior to ERROR if hive.immutable-partitions is true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you add test for above.

Copy link
Contributor Author

@ssheikin ssheikin Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

I've tried to keep old logic of insert-existing-partitions-behavior calculation, when hive.insert-existing-partitions-behavior is NOT specified:

    private Optional<InsertExistingPartitionsBehavior> insertExistingPartitionsBehavior = Optional.empty();

    public InsertExistingPartitionsBehavior getInsertExistingPartitionsBehavior()
    {
        return insertExistingPartitionsBehavior.orElseThrow(() -> new NoSuchElementException("insertExistingPartitionsBehavior was not set during presto initialization"));
    }

    @Config("hive.insert-existing-partitions-behavior")
    @ConfigDescription("Default value for insert existing partitions behavior")
    public HiveConfig setInsertExistingPartitionsBehavior(InsertExistingPartitionsBehavior insertExistingPartitionsBehavior)
    {
        this.insertExistingPartitionsBehavior = Optional.of(requireNonNull(insertExistingPartitionsBehavior, "insertExistingPartitionsBehavior is null"));
        return this;
    }

    @PostConstruct
    public void validate()
    {
        if (insertExistingPartitionsBehavior.isEmpty()) {
            insertExistingPartitionsBehavior = getInsertExistingPartitionsBehaviorForBackwardCompatibility();
        }
        else {
            InsertExistingPartitionsBehavior.validate(insertExistingPartitionsBehavior.get(), immutablePartitions);
        }
    }

    private Optional<InsertExistingPartitionsBehavior> getInsertExistingPartitionsBehaviorForBackwardCompatibility()
    {
        return Optional.of(immutablePartitions ? ERROR : APPEND);
    }

but seems it's not good idea from frameworks' point of view (even not looking to even more complicated logic), because tests are not configured to maintain such complex logic TestHiveConfig#testDefaults() does not invoke @PostConstruct method.

  • However it is possible to move logic from @PostConstruct to getter (code becomes not that clear).
    private Optional<InsertExistingPartitionsBehavior> insertExistingPartitionsBehavior = Optional.empty();

    public InsertExistingPartitionsBehavior getInsertExistingPartitionsBehavior()
    {
        if (insertExistingPartitionsBehavior.isEmpty()) {
            insertExistingPartitionsBehavior = getInsertExistingPartitionsBehaviorForBackwardCompatibility();
        }
        return insertExistingPartitionsBehavior.orElseThrow(() -> new NoSuchElementException("insertExistingPartitionsBehavior was not set during presto initialization"));
    }

    private Optional<InsertExistingPartitionsBehavior> getInsertExistingPartitionsBehaviorForBackwardCompatibility()
    {
        return Optional.of(immutablePartitions ? ERROR : APPEND);
    }

    @Config("hive.insert-existing-partitions-behavior")
    @ConfigDescription("Default value for insert existing partitions behavior")
    public HiveConfig setInsertExistingPartitionsBehavior(InsertExistingPartitionsBehavior insertExistingPartitionsBehavior)
    {
        this.insertExistingPartitionsBehavior = Optional.of(requireNonNull(insertExistingPartitionsBehavior, "insertExistingPartitionsBehavior is null"));
        return this;
    }

    @PostConstruct
    public void validate()
    {
        insertExistingPartitionsBehavior.ifPresent(v -> InsertExistingPartitionsBehavior.validate(v, immutablePartitions));
    }

@kokosing I see three possible solutions:

  1. apply solution proposed by @losipiuk which changes default value hive.insert-existing-partitions-behavior to ERROR, which breaks backward compatibility when immutablePartitions=false (in this case insert-existing-partitions-behavior was set to APPEND).
  2. Break backwards compatibility for one of valid setups today: hive.immutable-partitions=true and hive.insert-existing-partitions-behavior is NOT specified.
  3. Implement logic in getter.

What solution would be better?

Copy link
Contributor Author

@ssheikin ssheikin Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll implement solution 3.

@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from 92b56bc to 7ff1101 Compare September 4, 2020 12:35
@findepi findepi requested review from losipiuk and removed request for findepi September 9, 2020 17:24
@findepi
Copy link
Member

findepi commented Sep 9, 2020

@zaz968m there are conflicts now. can you please rebase?

@ssheikin ssheikin closed this Sep 10, 2020
@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from 7ff1101 to 9f6c316 Compare September 10, 2020 06:23
@ssheikin ssheikin reopened this Sep 10, 2020
@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from 9da6b0f to 25c2056 Compare September 10, 2020 06:53
losipiuk
losipiuk previously approved these changes Sep 10, 2020
Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum can you review the docs?

@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from 25c2056 to e345b66 Compare September 10, 2020 20:22
Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking back LGTM for now until last piece of discussion is resolved

@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from e345b66 to 5848ca6 Compare September 15, 2020 13:41
@ssheikin
Copy link
Contributor Author

@losipiuk @kokosing all comments are addressed.

assertEquals(APPEND, actual);
}

private Object getDefaultValueInsertExistingPartitionsBehavior(Connector connector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use concrete type here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires unnecessary explicit casting. I think it does not worth adding additional characters in this case.
However I would be glad to substitute method's body with shorter and more understandable implementation, but I can't find one.

@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from 5848ca6 to 0b043e3 Compare September 16, 2020 11:53
@ssheikin
Copy link
Contributor Author

@electrum all comments are addressed

@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from 0b043e3 to c9ec327 Compare September 16, 2020 13:49
presto-docs/src/main/sphinx/connector/hive.rst Outdated Show resolved Hide resolved
@ssheikin ssheikin force-pushed the sasha/hive-insert-existing-partitions-behavior branch from c9ec327 to 5534628 Compare September 16, 2020 22:20
@losipiuk losipiuk merged commit 6149fa2 into trinodb:master Sep 17, 2020
@losipiuk losipiuk mentioned this pull request Sep 17, 2020
9 tasks
@martint martint added this to the 342 milestone Sep 24, 2020
@ssheikin ssheikin deleted the sasha/hive-insert-existing-partitions-behavior branch October 7, 2020 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

6 participants