Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement insert overwrite unpartitioned table #648

Open
wants to merge 3 commits into
base: master
from

Conversation

4 participants
@xumingming
Copy link
Contributor

commented Apr 18, 2019

For #510, Implement insert overwrite unpartitioned table

@martint

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Apr 19, 2019

@prestosql prestosql deleted a comment from cla-bot bot Apr 22, 2019

@prestosql prestosql deleted a comment from cla-bot bot Apr 22, 2019

@electrum electrum self-requested a review Apr 22, 2019

@xumingming

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

Any chance this PR get reviewed?

@electrum

This comment has been minimized.

Copy link
Member

commented May 7, 2019

@xumingming My sincere apologies for the delay. I am reviewing this now.

@electrum
Copy link
Member

left a comment

Thanks for this change. I know that the Hive code, especially in SemiTransactionalHiveMetastore, can be difficult to work with.

Regarding usage of Mockito, we have a general policy of not using mocking libraries in Presto, as they tend to encourage bad practices when writing code, and they often result in brittle tests that depend on internal implementation details rather than external behavior. For TestSemiTransactionalHiveMetastore, it's more difficult to suggest a replacement, since you are injecting faults and testing side effects. I'm going to follow up shortly.

*/
WriteInfo getQueryWriteInfo(LocationHandle locationHandle);

WriteInfo getTableWriteInfo(LocationHandle locationHandle);
default WriteInfo getTableWriteInfo(LocationHandle locationHandle)

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

There are only two callers of this. Instead of adding a default method, let's add the flag to both callers.

{
return new WriteInfo(locationHandle.getTargetPath(), locationHandle.getWritePath(), locationHandle.getWriteMode());
if (overwrite) {

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

Nit: invert this condition and return early. This makes the code a bit simpler.

if (!overwrite) {
    return new WriteInfo(locationHandle.getTargetPath(), locationHandle.getWritePath(), locationHandle.getWriteMode());
}

WriteMode writeMode = locationHandle.getWriteMode();
switch ...
case DIRECT_TO_TARGET_EXISTING_DIRECTORY:
case DIRECT_TO_TARGET_NEW_DIRECTORY:
default:
throw new UnsupportedOperationException(format(

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

I believe this error message is visible to end users when using S3. In that case, we should use PrestoException with a more friendly error message:

throw new PrestoException(NOT_SUPPORTED, "Overwriting unpartitioned table not supported when writing directly to target directory");

(this error message is still not great, but better than mentioning an implementation detail)

@@ -1347,18 +1352,37 @@ public HiveInsertTableHandle beginInsert(ConnectorSession session, ConnectorTabl
for (PartitionUpdate partitionUpdate : partitionUpdates) {
if (partitionUpdate.getName().isEmpty()) {
// insert into unpartitioned table
if (!table.get().getStorage().getStorageFormat().getInputFormat().equals(

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

Nit: keep this on one line (don't wrap) so that it looks like the check above.

This comment has been minimized.

Copy link
@xumingming

xumingming May 10, 2019

Author Contributor

Will do.

I'd like to discuss it a little bit here, I wrap because the line is too long to read, and I find that the presto code are often very very long in one line, it does not annoy you guys? Or because you are all using very big monitor?

partitionStatistics);

if (partitionUpdate.getUpdateMode() == OVERWRITE) {
PrincipalPrivileges principalPrivileges = getTablePrincipalPrivileges(

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

Nit: our code style for argument wrapping is to either

  • wrap every argument onto a separate line
  • keep all arguments on the same line

For the getTablePrincipalPrivileges and createTable calls, let's keep them on one line, since that is similar to the existing code in this method. finishInsertIntoExistingTable should stay wrapped, as it's longer.

@@ -85,4 +101,52 @@ public void testCreateMixedPredicate()

createPredicate(ImmutableList.of(TEST_COLUMN_HANDLE), partitions.build());
}

@Test
public void testGetTablePrincipalPrivileges()

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

What we want to test here is the conversion from Set<HivePrivilegeInfo> in getTablePrincipalPrivileges(). This is difficult because it mixes the conversion with fetching partitions from the metastore.

Instead, we can move the conversion elsewhere, such as helper method on PrincipalPrivileges:

public static PrincipalPrivileges fromHivePrivilegeInfo(Set<HivePrivilegeInfo> privileges)

Then the method is easy to test without needing to mock the metastore.

This comment has been minimized.

Copy link
@dain

dain May 9, 2019

Member

I think you can use the FileHiveMetastore with the TestingHdfsEnvironment for this test.


public Assertion(LocationHandle locationHandle, boolean overwrite)
{
HdfsEnvironment hdfsEnvironment = mock(HdfsEnvironment.class);

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

Rather than mock(), reuse TestingHdfsEnvironment in TestBackgroundHiveSplitLoader (make it public)

public class TestHiveLocationService
{
@Test
public void testGetTableWriteInfo_append()

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

Nit: use an underscore violates the standard Java method naming convention. Name this testGetTableWriteInfoAppend(). I do agree it's more readable in this case, but it looks out of place in our code base.

@Test
public void testGetTableWriteInfo_append()
{
assertThat(locationHandle(LocationHandle.WriteMode.STAGE_AND_MOVE_TO_TARGET_DIRECTORY), false)

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

Nit: Static import all the WriteMode constants to make the tests more readable

}

private static LocationHandle locationHandle(LocationHandle.WriteMode writeMode,
String targetPath, String writePath)

This comment has been minimized.

Copy link
@electrum

electrum May 7, 2019

Member

Nit: don't wrap the parameters here (or put each one on a separate line)

pom.xml Outdated
@@ -1126,6 +1126,20 @@
<artifactId>failsafe</artifactId>
<version>2.0.1</version>
</dependency>

<dependency>
<groupId>org.mockito</groupId>

This comment has been minimized.

Copy link
@dain

dain May 9, 2019

Member

We have found over the years that mocking frameworks lead to brittle tests that really don't end up testing much. I know there are good ways to use mocking frameworks, but it is too easy to use them in ways that ends up with difficult to maintain tests. As a result, we have actively avoided them in Presto.

Instead of mocking frameworks, we typically have a local, in-memory, version of interfaces or services for testing. For example, TestingAccessControlManager or TestingMetadata. I'll comment below on how I think we can test this without introducing mockito to Presto.

@@ -85,4 +101,52 @@ public void testCreateMixedPredicate()

createPredicate(ImmutableList.of(TEST_COLUMN_HANDLE), partitions.build());
}

@Test
public void testGetTablePrincipalPrivileges()

This comment has been minimized.

Copy link
@dain

dain May 9, 2019

Member

I think you can use the FileHiveMetastore with the TestingHdfsEnvironment for this test.

import static org.testng.Assert.assertEquals;
import static org.testng.Assert.fail;

public class TestSemiTransactionalHiveMetastore

This comment has been minimized.

Copy link
@dain

dain May 9, 2019

Member

I think you can use the FileHiveMetastore with the TestingHdfsEnvironment for this test.

@xumingming

This comment has been minimized.

Copy link
Contributor Author

commented May 10, 2019

Thanks @electrum @dain for the reviewing, I'll update the code in the recent days.

Clean up code according to review comments
1. Removed mockito dependency
2. Cleaned up code style
@xumingming

This comment has been minimized.

Copy link
Contributor Author

commented May 12, 2019

Hi @electrum , @dain , I have cleaned up the codes:

  1. Removed mockito
  2. Cleaned up code style.
  3. Removed two tests -- Without Mockito, it requires a fully functional TestingHdfsEnviroment & TestingFileSystem.
@@ -1126,6 +1126,13 @@
<artifactId>failsafe</artifactId>
<version>2.0.1</version>
</dependency>

<dependency>
<groupId>org.objenesis</groupId>

This comment has been minimized.

Copy link
@dain

dain May 14, 2019

Member

This doesn't seem to be used anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.