Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tunneling JDBC traffic over an HTTP proxy #541

Merged
merged 14 commits into from Feb 23, 2016
Merged

Add support for tunneling JDBC traffic over an HTTP proxy #541

merged 14 commits into from Feb 23, 2016

Conversation

kunalkandekar
Copy link

Frequently data stores to be accessed by Gobblin reside outside data centers. In these cases, outbound access from a data center typically needs to go through a gateway proxy for security purposes. In some cases this is an HTTP proxy. However, some protocols like JDBC don't support the concept of "proxies", let alone HTTP proxies, and hence a solution is needed to enable this.

This Pull Request provides a method of tunneling JDBC connections over an HTTP proxy. Note that it is currently only implemented for JDBC, but can be extended to work with any other TCP-based protocol.

The way this works for JDBC is:

  1. When a JdbcProvider is invoked, it checks if the WorkUnit has a proxy host and port defined.
  2. If so, it extracts the remote host and port from the connectionUrl.
  3. It then creates a Tunnel instance configured with the remote host and port. The Tunnel starts a thread that listens on an arbitrary port on localhost.
  4. The JdbcProvider then changes the connectionUrl to replace the remote host and port with the localhost and port and passes it on to the driver.
  5. Hence when the driver creates a JDBC connection, it connects to the Tunnel socket. The Tunnel then connects to the remote host through the proxy via a HTTP CONNECT request.
  6. If established successfully, the Tunnel then just relays bytes back and forth.
  7. When the JDBC data source is closed down, the Tunnel is shut down as well.

The Tunnel can accept as many connections as the JdbcExtractor opens. The Tunnel uses NIO to minimize resource usage.

@chavdar
Copy link
Contributor

chavdar commented Dec 18, 2015

It looks like the patch uses Java 8 features. Currently, Gobblin is Java 7-based.

@kunalkandekar
Copy link
Author

Ah, right. Will fix.

@kunalkandekar
Copy link
Author

The test that fails has a 137 return code from its Gradle Test Executor. It could be that a specific test is failing because it's using up too much memory, in which case TravisCI SIGKILLS it (see: https://discuss.gradle.org/t/gradle-travis-ci/11928/3). I'll try to fix the test to use less memory.

@kunalkandekar
Copy link
Author

@chavdar It seems TravisCI is killing certain test processes (possibly because they are consuming too many resources.) I just significantly reduced the memory usage of some tests, but they still continue to get killed. Can you think of anything we can do to debug or sidestep this issue?

@kunalkandekar
Copy link
Author

While I'm trying to debug the TravisCI/gradle issue, here are instructions for performing end-to-end testing for these changes using gobblin standalone and a publicly accessible MySQL database: https://gist.github.com/kunalkandekar/b4f8c5fff6df62084e93

@chavdar
Copy link
Contributor

chavdar commented Jan 7, 2016

@sahilTakiar can you please help reviewing this?

@sahilTakiar
Copy link
Contributor

Will start reviewing today.

@sahilTakiar
Copy link
Contributor

Few general comments:

1: Can you squash all your commits into a single commit (http://stackoverflow.com/questions/5189560/squash-my-last-x-commits-together-using-git)
2: Can you remove all mentions of "GaaP" so that this code is not tied to a specific proxy - you should just be able to make it a configuration option (like user.agent.name or something along those lines)
3: Is there a more detailed design doc on how this all works?
4: Adding more javadocs to the Tunnel.java class would be very helpful, and consider breaking the Tunnel.java class down into multiple classes as it is pretty long.
5: If you could follow the Gobblin coding style guide that would be great (https://github.com/linkedin/gobblin/wiki/CodingStyle)

@kunalkandekar
Copy link
Author

Thanks for the comments, will incorporate the feedback.

@liyinan926
Copy link
Contributor

@kunalkandekar just one more comment: Gobblin is not using the standard LinkedIn coding style so none of our classes use variable names starting with _. Check it out here https://github.com/linkedin/gobblin/wiki/CodingStyle.

@kunalkandekar
Copy link
Author

Thanks @sahilTakiar, am incorporating your feedback. Out of curiosity, could you explain the rationale between squashing commits into a single one? There is no detailed design doc, only the comment on the original commit, will include details in the JavaDocs.

@sahilTakiar
Copy link
Contributor

We generally try to have 1 commit per new feature in Gobblin. This way we can have a single, descriptive commit message describing the change.

@kunalkandekar
Copy link
Author

@sahilTakiar I see. Would it make sense to squash the commits once I push my current changes or should I do so beforehand?

@chavdar
Copy link
Contributor

chavdar commented Jan 11, 2016

@pcadabam Can you also help @sahilTakiar with the review?

@pcadabam-zz
Copy link
Contributor

Looking into it.

*
* Implements a tunnel through a proxy to resource on the internet
*/
public class Tunnel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class seems too long. Can you pull out the private classes into a different file? Also Can you add more javadoc for the public methods and the classes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this also what @sahilTakiar suggested. I'm refactoring accordingly.

@sahilTakiar
Copy link
Contributor

@kunalkandekar you can go ahead to push all your local changes, and squash the commits once the PR is ready to be merged

_delayedDoubleEchoServer = startDoubleEchoServer(1000);
System.out.println("Delayed DoubleEchoServer on " + _delayedDoubleEchoServer.getServerSocketPort());
_talkFirstEchoServer = startTalkFirstEchoServer();
System.out.println("TalkFirstEchoServer on " + _delayedDoubleEchoServer.getServerSocketPort());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@pcadabam-zz
Copy link
Contributor

Overall looks good.
Main comments around documentation and some refactor. Like @sahilTakiar said please squash the commits and update the pr.
Here are the steps

git rebase -i HEAD~num_commits_to_squash
An editor will show up where you can squash/pick commits
git push upstream master --force


private static final ByteBuffer OK_REPLY = ByteBuffer.wrap("HTTP/1.1 200".getBytes());
private static final Set<ByteBuffer> OK_REPLIES = new HashSet<ByteBuffer>(
Arrays.asList(OK_REPLY, ByteBuffer.wrap("HTTP/1.0 200".getBytes())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be HTTP 1.1 instead of HTTP 1.0 - might be good to make the HTTP version a string constant so that version changes are easy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP/1.0 string is actually in addition to the HTTP/1.1 response. This ProxySetupHandler compares the response from the proxy against both strings. This is to handle the case where the intermediate proxy is based on HTTP 1.0 rather than 1.1. May help to move them to string constants, however.

kkandeka added 3 commits January 21, 2016 21:18
* @author kkandekar@linkedin.com
*/
abstract class EasyThread extends Thread {
static Set<EasyThread> ALL_THREADS = Collections.synchronizedSet(new HashSet<EasyThread>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this final so it can't be mutated by any other class

@sahilTakiar
Copy link
Contributor

@kunalkandekar LGTM!

Are there any other changes you want to commit before I merge this?

@kunalkandekar
Copy link
Author

Hey @sahilTakiar, I'm just running some end-to-end tests once more on the latest changes with some adverse situations (connections being cut unexpectedly, the proxy dying halfway through, etc.) to ensure these are handled cleanly. Will ping as soon as I'm done.

@sahilTakiar
Copy link
Contributor

Can you resolve the merge conflicts?

@kunalkandekar
Copy link
Author

Whoops, hadn't noticed the conflicts, sorry. Merged and pushed.

@sahilTakiar
Copy link
Contributor

Do you know what's going on with the Java 8 build on Travis? Let's just disable any tests that currently don't run on Travis using the disabledOnTravis tag.

@kunalkandekar
Copy link
Author

@sahilTakiar I've tried disabling all Tunnel tests by adding them to the "disabledOnTravis" group, but they are apparently still running and the build is still failing. Could you take a look and let me know if I'm doing it wrong?

sahilTakiar added a commit that referenced this pull request Feb 23, 2016
Add support for tunneling JDBC traffic over an HTTP proxy
@sahilTakiar sahilTakiar merged commit 07be6b5 into apache:master Feb 23, 2016
@sahilTakiar
Copy link
Contributor

@kunalkandekar thanks for doing all of this and for addressing all the comments! This feature has now been merged!

@kunalkandekar
Copy link
Author

@sahilTakiar nice! Thanks!

sahilTakiar pushed a commit to sahilTakiar/gobblin that referenced this pull request Mar 3, 2016
sahilTakiar added a commit that referenced this pull request Mar 3, 2016
Fixing FindBugs warnings for gobblin-tunnel from #541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants