Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,11 @@ public interface PluginTask
@ConfigDefault("null")
public Optional<String> getAccessKeyId();

@Config("http_proxy")
@ConfigDefault("null")
public Optional<HttpProxy> getHttpProxy();
public void setHttpProxy(Optional<HttpProxy> httpProxy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: How users should configure when they don't want any HTTP proxy for Embulk while the env HTTP_PROXY is set?

Copy link
Contributor Author

@muga muga Apr 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR doesn't support it. If they want, they need to unset env HTTP_PROXY for Embulk and embulk-input-s3 for now.

I think that we could take one of the following approaches.

  1. Not support. Users need to unset env HTTP_PROXY
  2. Provide an option to create HttpProxy from env variables specified by users
  3. Stop creating HttpProxy from env variables.

1st one was OK for me. I'm not confident that such requests are really minor but, how about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that it can be a kind of compatibility issue. If a user has been setting the HTTP_PROXY env so far, this change may break their behavior.

I thought the best option can be eventually 2. What do you think about taking 3 for the time being, and implementing 2 later?

Copy link
Contributor Author

@muga muga Apr 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I will fix this by 3rd one.

As another idea, how about introducing new EMBULK_HTTP_PROXY env? If users can configure http proxy for Embulk as this en. The name is reasonable for us. Because we don't need to care of the compatibility issues so much. If no problem, I will care of this when this HttpProxy is moved to embulk/embulk-util-config.git.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 326407e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should introduce new standards in environment variables. No one will get benefits with the new env.


It's still a design choice we should have a kind of "default" envs to use. I'm personally -1 on defaults. My idea is just to receive a new env configuration such as :

http_proxy:
  - env: [ "HTTP_PROXY", "http_proxy", ... ]  # First match
  # Exclusive from host, port, ...

Even if it has a kind of default like below, the benefit of users are very small. Users save just ~10 key types in making a configuration. Is it really a benefit?

  - env: []  # Use default
  - env_default: True

I believe users achieve more benefits by the proxy configuration is always explicit and clear. They won't be confused any implicit configurations they're unaware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your thoughts. I think that it's good to have both of your env option and default env. The default env is helped for the following use cases.

  • An Embulk user might have multiple Embulk config files for ingesting data from S3 to somewhere. Because there're various bucket names, paths and file formats. It doesn't make sense to define same http proxy configuration for each Embulk config file.
  • The http proxy configuration is sometimes required by both input plugin and output plugin (executor plugin as well). For example, if an user ingests data from S3 to S3, he needs to declare http proxy configuration in both of in: and out:. It's better to separate the configuration from others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so to be honest. If users are typing their configurations every runtime, it may be useful to skip typing such env configurations. But, it's just per making configurations. Does this really matter? Users can utilize Liquid templates as well.

I agree that the benefits are not zero, but I think the risk is larger by that default proxy configuration.

  • While we encourage to use this HttpProxy class for many plugins, but our encourage will not cover all. HttpProxy is just a tooling. Implicit configurations may cause confusions especially when used with third-party non-HttpProxy plugins.
  • For your latter case, there are expectedly many cases as well that no proxy is used for input while output needs proxy: such as uploading (internal) application server data into external services. The risk of later confusion is expectedly larger than just a few key types in making a configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I changed my mind. Because my suggestion is not very good for this layer. I will introduce env option but, not on this PR. It will be implemented on https://github.com/embulk/embulk-util-config


@Config("incremental")
@ConfigDefault("true")
public boolean getIncremental();
Expand Down Expand Up @@ -144,9 +149,38 @@ protected ClientConfiguration getClientConfiguration(PluginTask task)
clientConfig.setMaxErrorRetry(3); // SDK default: 3
clientConfig.setSocketTimeout(8*60*1000); // SDK default: 50*1000

// set http proxy
if (task.getHttpProxy().isPresent()) {
setHttpProxyInAwsClient(clientConfig, task.getHttpProxy().get());
}

return clientConfig;
}

private void setHttpProxyInAwsClient(ClientConfiguration clientConfig, HttpProxy httpProxy)
{
// host
clientConfig.setProxyHost(httpProxy.getHost());

// port
if (httpProxy.getPort().isPresent()) {
clientConfig.setProxyPort(httpProxy.getPort().get());
}

// useHttps
clientConfig.setProtocol(httpProxy.useHttps() ? Protocol.HTTPS : Protocol.HTTP);

// user
if (httpProxy.getUser().isPresent()) {
clientConfig.setProxyUsername(httpProxy.getUser().get());
}

// password
if (httpProxy.getPassword().isPresent()) {
clientConfig.setProxyPassword(httpProxy.getPassword().get());
}
}

private FileList listFiles(PluginTask task)
{
try {
Expand Down
60 changes: 60 additions & 0 deletions embulk-input-s3/src/main/java/org/embulk/input/s3/HttpProxy.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
package org.embulk.input.s3;

import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.base.Optional;

/**
* HttpProxy is config unit for Input/Output plugins' configs.
*
* TODO
* This unit will be moved to embulk/embulk-plugin-units.git.
*/
public class HttpProxy
{
private final String host;
private final Optional<Integer> port;
private final boolean https;
private final Optional<String> user;
private final Optional<String> password; // TODO use SecretString

@JsonCreator
public HttpProxy(
@JsonProperty("host") String host,
@JsonProperty("port") Optional<Integer> port,
@JsonProperty("https") boolean https,
@JsonProperty("user") Optional<String> user,
@JsonProperty("password") Optional<String> password)
{
this.host = host;
this.port = port;
this.https = https;
this.user = user;
this.password = password;
}

public String getHost()
{
return host;
}

public Optional<Integer> getPort()
{
return port;
}

public boolean useHttps()
{
return https;
}

public Optional<String> getUser()
{
return user;
}

public Optional<String> getPassword()
{
return password;
}
}
112 changes: 112 additions & 0 deletions embulk-input-s3/src/test/java/org/embulk/input/s3/TestHttpProxy.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
package org.embulk.input.s3;

import com.google.common.base.Optional;
import com.google.common.collect.ImmutableMap;
import org.embulk.EmbulkTestRuntime;
import org.embulk.config.ConfigSource;
import org.embulk.input.s3.S3FileInputPlugin.S3PluginTask;
import org.junit.Before;
import org.junit.Rule;
import org.junit.Test;

import java.net.URISyntaxException;
import java.util.Map;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.fail;

public class TestHttpProxy
{
@Rule
public EmbulkTestRuntime runtime = new EmbulkTestRuntime();

private ConfigSource config;

@Before
public void createResources()
{
config = runtime.getExec().newConfigSource();
setupS3Config(config);
}

@Test
public void checkDefaultHttpProxy()
{
ConfigSource conf = config.deepCopy();
setupS3Config(conf);
S3PluginTask task = conf.loadConfig(S3PluginTask.class);
assertTrue(!task.getHttpProxy().isPresent());
}

@Test
public void checkHttpProxy()
{
{ // specify host
String host = "my_host";
Map<String, Object> httpProxyMap = ImmutableMap.<String, Object>of("host", host);
ConfigSource conf = config.deepCopy().set("http_proxy", httpProxyMap);
S3PluginTask task = conf.loadConfig(S3PluginTask.class);

assertHttpProxy(new HttpProxy(host, Optional.<Integer>absent(), false, Optional.<String>absent(), Optional.<String>absent()),
task.getHttpProxy().get());
}

{ // specify host, port, use_ssl
String host = "my_host";
int port = 8080;
boolean useSsl = true;
Map<String, Object> httpProxyMap = ImmutableMap.<String, Object>of(
"host", host,
"port", 8080,
"https", true);
ConfigSource conf = config.deepCopy().set("http_proxy", httpProxyMap);
S3PluginTask task = conf.loadConfig(S3PluginTask.class);

assertHttpProxy(new HttpProxy(host, Optional.of(port), true, Optional.<String>absent(), Optional.<String>absent()),
task.getHttpProxy().get());
}

{ // specify host, port, use_ssl, user, password
String host = "my_host";
int port = 8080;
boolean useSsl = true;
String user = "my_user";
String password = "my_pass";
Map<String, Object> httpProxyMap = ImmutableMap.<String, Object>of(
"host", host,
"port", 8080,
"https", true,
"user", user,
"password", password);
ConfigSource conf = config.deepCopy().set("http_proxy", httpProxyMap);
S3PluginTask task = conf.loadConfig(S3PluginTask.class);

assertHttpProxy(new HttpProxy(host, Optional.of(port), true, Optional.of(user), Optional.of(password)),
task.getHttpProxy().get());
}
}

private static void setupS3Config(ConfigSource config)
{
config.set("bucket", "my_bucket").set("path_prefix", "my_path_prefix");
}

private static void assertHttpProxy(HttpProxy expected, HttpProxy actual)
{
assertEquals(expected.getHost(), actual.getHost());
assertEquals(expected.getPort().isPresent(), actual.getPort().isPresent());
if (expected.getPort().isPresent()) {
assertEquals(expected.getPort().get(), actual.getPort().get());
}
assertEquals(expected.useHttps(), actual.useHttps());
assertEquals(expected.getUser().isPresent(), actual.getUser().isPresent());
if (expected.getUser().isPresent()) {
assertEquals(expected.getUser().get(), actual.getUser().get());
}
assertEquals(expected.getPassword().isPresent(), actual.getPassword().isPresent());
if (expected.getPassword().isPresent()) {
assertEquals(expected.getPassword().get(), actual.getPassword().get());
}
}
}