New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Aliyun Object Storage Service (OSS) #485
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -211,6 +211,16 @@ ifneq "$(PROTOCOL)" "" | |
sed $(SED_OPTS) "s|YOUR_AZURE_BLOB_STORAGE_ACCOUNT_NAME|$(WASB_ACCOUNT_NAME)|" $(PROTOCOL_HOME)/$(PROTOCOL)-site.xml; \ | ||
sed $(SED_OPTS) "s|YOUR_AZURE_BLOB_STORAGE_ACCOUNT_KEY|$(WASB_ACCOUNT_KEY)|" $(PROTOCOL_HOME)/$(PROTOCOL)-site.xml; \ | ||
fi; \ | ||
if [ $(PROTOCOL) = oss ]; then \ | ||
if [ -z "$(OSS_ACCESS_KEY_ID)" ] || [ -z "$(OSS_SECRET_ACCESS_KEY)" ] || [ -z "$(OSS_ENDPOINT)" ]; then \ | ||
echo "Aliyun Keys or Endpoint (OSS_ACCESS_KEY_ID, OSS_SECRET_ACCESS_KEY, OSS_ENDPOINT) not set"; \ | ||
rm -rf $(PROTOCOL_HOME); \ | ||
exit 1; \ | ||
fi; \ | ||
sed $(SED_OPTS) "s|YOUR_OSS_ACCESS_KEY_ID|$(OSS_ACCESS_KEY_ID)|" $(PROTOCOL_HOME)/$(PROTOCOL)-site.xml; \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these calls to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes |
||
sed $(SED_OPTS) "s|YOUR_OSS_SECRET_ACCESS_KEY|$(OSS_SECRET_ACCESS_KEY)|" $(PROTOCOL_HOME)/$(PROTOCOL)-site.xml; \ | ||
sed $(SED_OPTS) "s|YOUR_OSS_ENDPOINT|$(OSS_ENDPOINT)|" $(PROTOCOL_HOME)/$(PROTOCOL)-site.xml; \ | ||
fi; \ | ||
echo "Created $(PROTOCOL) server configuration"; \ | ||
fi | ||
endif | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -285,6 +285,22 @@ | |
<version>5.4.0</version> | ||
</dependency> | ||
|
||
<!-- Aliyun Dependencies --> | ||
<dependency> | ||
<groupId>org.apache.hadoop</groupId> | ||
<artifactId>hadoop-aliyun</artifactId> | ||
<version>${hdp.hadoop.version}</version> | ||
</dependency> | ||
|
||
<!-- Use version 3.8.1 (version 3.0.0 would produce a lot of output for | ||
a NoSuchKey error. The issue is detailed here: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would we be hitting the NoSuchKey error? |
||
https://github.com/aliyun/aliyun-oss-java-sdk/issues/145 --> | ||
<dependency> | ||
<groupId>com.aliyun.oss</groupId> | ||
<artifactId>aliyun-sdk-oss</artifactId> | ||
<version>3.8.1</version> | ||
</dependency> | ||
|
||
<!-- HADOOP Dependencies --> | ||
<dependency> | ||
<groupId>org.apache.hadoop</groupId> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ public String getExternalTablePath(String basePath, String path) { | |
return StringUtils.removeStart(path, basePath); | ||
} | ||
}, | ||
OSS("oss"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inline with my comment above, is it possible to make this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, I think we need a good name for aliyun's service. OSS is what their service is called ... but it's pretty confusing>. |
||
S3("s3"), | ||
WASBS("wasbs"); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -367,7 +367,7 @@ public void textFormatBZip2CopyFromStdin() throws Exception { | |
* | ||
* @throws Exception if test fails to run | ||
*/ | ||
@Test(groups = {"features", "gpdb", "hcfs", "security"}, timeOut = 120000) | ||
@Test(groups = {"features", "gpdb", "hcfs", "security"}, timeOut = 180000) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we expect the newer aliyun tests to run slower? Why did the timeout need to be increased? |
||
public void textFormatWideRowsInsert() throws Exception { | ||
|
||
int rows = 10; | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -829,4 +829,105 @@ under the License. | |||||
<resolver>org.greenplum.pxf.plugins.json.JsonResolver</resolver> | ||||||
</plugins> | ||||||
</profile> | ||||||
|
||||||
<!-- Aliyun (Alibaba Cloud) profiles --> | ||||||
<profile> | ||||||
<name>oss:text</name> | ||||||
<description>This profile is suitable for using when reading delimited single line records | ||||||
from plain text, tab-delimited, files on Alibaba Cloud | ||||||
</description> | ||||||
<plugins> | ||||||
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter> | ||||||
<accessor>org.greenplum.pxf.plugins.hdfs.LineBreakAccessor</accessor> | ||||||
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver> | ||||||
</plugins> | ||||||
<protocol>oss</protocol> | ||||||
</profile> | ||||||
<profile> | ||||||
<name>oss:csv</name> | ||||||
<description>This profile is suitable for using when reading delimited single line records | ||||||
from plain text CSV files on Alibaba Cloud | ||||||
</description> | ||||||
<plugins> | ||||||
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter> | ||||||
<accessor>org.greenplum.pxf.plugins.hdfs.LineBreakAccessor</accessor> | ||||||
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver> | ||||||
</plugins> | ||||||
<protocol>oss</protocol> | ||||||
</profile> | ||||||
<profile> | ||||||
<name>oss:text:multi</name> | ||||||
<description>This profile is suitable for using when reading delimited single or multi line | ||||||
records (with quoted linefeeds) from plain text files on Alibaba Cloud. It is not splittable (non | ||||||
parallel) and slower than HdfsTextSimple. | ||||||
</description> | ||||||
<plugins> | ||||||
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsFileFragmenter</fragmenter> | ||||||
<accessor>org.greenplum.pxf.plugins.hdfs.QuotedLineBreakAccessor</accessor> | ||||||
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver> | ||||||
</plugins> | ||||||
<protocol>oss</protocol> | ||||||
</profile> | ||||||
<profile> | ||||||
<name>oss:parquet</name> | ||||||
<description>A profile for reading and writing Parquet data from Alibaba Cloud | ||||||
</description> | ||||||
<plugins> | ||||||
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter> | ||||||
<accessor>org.greenplum.pxf.plugins.hdfs.ParquetFileAccessor</accessor> | ||||||
<resolver>org.greenplum.pxf.plugins.hdfs.ParquetResolver</resolver> | ||||||
</plugins> | ||||||
<protocol>oss</protocol> | ||||||
</profile> | ||||||
<profile> | ||||||
<name>oss:avro</name> | ||||||
<description>This profile is suitable for using when reading Avro files (i.e | ||||||
fileName.avro) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it seems like you specify |
||||||
</description> | ||||||
<plugins> | ||||||
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter> | ||||||
<accessor>org.greenplum.pxf.plugins.hdfs.AvroFileAccessor</accessor> | ||||||
<resolver>org.greenplum.pxf.plugins.hdfs.AvroResolver</resolver> | ||||||
</plugins> | ||||||
<protocol>oss</protocol> | ||||||
</profile> | ||||||
<profile> | ||||||
<name>oss:json</name> | ||||||
<description> | ||||||
Access JSON data either as: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* one JSON record per line (default) | ||||||
* or multiline JSON records with an IDENTIFIER parameter indicating a member name used | ||||||
to determine the encapsulating json object to return | ||||||
</description> | ||||||
<plugins> | ||||||
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter> | ||||||
<accessor>org.greenplum.pxf.plugins.json.JsonAccessor</accessor> | ||||||
<resolver>org.greenplum.pxf.plugins.json.JsonResolver</resolver> | ||||||
</plugins> | ||||||
<protocol>oss</protocol> | ||||||
</profile> | ||||||
<profile> | ||||||
<name>oss:AvroSequenceFile</name> | ||||||
<description> | ||||||
Read an Avro format stored in sequence file, with separated schema file from Alibaba Cloud | ||||||
</description> | ||||||
<plugins> | ||||||
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter> | ||||||
<accessor>org.greenplum.pxf.plugins.hdfs.SequenceFileAccessor</accessor> | ||||||
<resolver>org.greenplum.pxf.plugins.hdfs.AvroResolver</resolver> | ||||||
</plugins> | ||||||
<protocol>oss</protocol> | ||||||
</profile> | ||||||
<profile> | ||||||
<name>oss:SequenceFile</name> | ||||||
<description> | ||||||
Profile for accessing Sequence files serialized with a custom Writable class | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
</description> | ||||||
<plugins> | ||||||
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter> | ||||||
<accessor>org.greenplum.pxf.plugins.hdfs.SequenceFileAccessor</accessor> | ||||||
<resolver>org.greenplum.pxf.plugins.hdfs.WritableResolver</resolver> | ||||||
</plugins> | ||||||
<protocol>oss</protocol> | ||||||
</profile> | ||||||
</profiles> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<configuration> | ||
<property> | ||
<name>fs.oss.endpoint</name> | ||
<value>YOUR_OSS_ENDPOINT</value> | ||
<description>Aliyun OSS endpoint to connect to. An up-to-date list is | ||
provided in the Aliyun OSS Documentation.</description> | ||
</property> | ||
<property> | ||
<name>fs.oss.accessKeyId</name> | ||
<value>YOUR_OSS_ACCESS_KEY_ID</value> | ||
<description>Aliyun Access Key ID</description> | ||
</property> | ||
<property> | ||
<name>fs.oss.accessKeySecret</name> | ||
<value>YOUR_OSS_SECRET_ACCESS_KEY</value> | ||
<description>Aliyun Access Key Secret</description> | ||
</property> | ||
<property> | ||
<name>fs.AbstractFileSystem.oss.impl</name> | ||
<value>org.apache.hadoop.fs.aliyun.oss.OSS</value> | ||
</property> | ||
<property> | ||
<name>fs.oss.impl</name> | ||
<value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value> | ||
</property> | ||
</configuration> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if these are following an existing convention or if they have to be named this way because third-party deps require them, but the
OSS_*
seems very generic. Could we prependALIYUN_
to the environment variables?