Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDBC Writer #893

Merged
merged 19 commits into from May 12, 2016
Merged

JDBC Writer #893

merged 19 commits into from May 12, 2016

Conversation

jinhyukchang
Copy link
Contributor

Pull request for JDBC Writer. Will combine to one commit after the review.

Design doc: https://docs.google.com/a/linkedin.com/document/d/1pSWZtgNssxaV1c19xdd1rKft9JaXM2qmqDwwbIb0scM/edit?usp=sharing

@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 42.527% when pulling 19e7dbd on jinhyukchang:master into f5573e1 on linkedin:master.

@kadaan
Copy link
Contributor

kadaan commented Mar 29, 2016

Any reason the design doc can't be public?

@coveralls
Copy link

Coverage Status

Coverage increased (+1.9%) to 43.763% when pulling f4e408a on jinhyukchang:master into f5573e1 on linkedin:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.9%) to 43.684% when pulling ac1de18 on jinhyukchang:master into f5573e1 on linkedin:master.

@chavdar
Copy link
Contributor

chavdar commented Apr 15, 2016

@kadaan Internally, for reviews, we use a company google docs account. Once we finalize the internal review, we'll share the design.

@@ -102,6 +102,7 @@
public static final String JOB_JAR_FILES_KEY = "job.jars";
public static final String JOB_LOCAL_FILES_KEY = "job.local.files";
public static final String JOB_HDFS_FILES_KEY = "job.hdfs.files";
public static final String JOB_JAR_HDFS_FILES_KEY = "job.jars.hdfs";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make name consistent with above property: job.hdfs.jars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @see gobblin.converter.AvroToAvroConverterBase#convertSchema(org.apache.avro.Schema, gobblin.configuration.WorkUnitState)
*/
@Override
public Schema convertSchema(Schema inputSchema, WorkUnitState workUnit) throws SchemaConversionException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this method need to first populate a list of fields to remove, and then remove them remove them each from the schema? Can't it simply create a Schema based on the value of CONVERTER_AVRO_FIELDS_PICK_FIELDS.

AvroUtils.getFieldSchema will fetch the Schema of a field given a dot separated path to the field name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just to reduce it to remove problem as removal tool is already there in Gobblin. It seems like it can be easily done by AvroUtils and will update it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was exploring AvroUtils and was not able to find an easy way to reconstruct Schema mainly because getFieldSchema only provides leaf information which means anyway converter needs to traverse the Schema.
I've used Trie to present dependencies from input and reconstruct the Schema. (Just traversing Schema and add field when it matches was dropped once I learned Schema fields are immutable). Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yes its unfortunate that just having the leaf Schemas is not enough. A Trie is one way to do this, but I think any nested data format would work. For example, I think you could do this using JSON, which has the advantage that you can use standard libraries like GSON. However, your current Trie approach works too, so I will leave it up to you.


private void validateNotEmpty(String s, String name) {
if(StringUtils.isEmpty(s)) {
throw new IllegalArgumentException(name + " should not be empty.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Guava's Preconditions to do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

…of removing fields, Using Lombok, added converterInitializer to retrieve date field info once
@jinhyukchang
Copy link
Contributor Author

jinhyukchang commented Apr 29, 2016

Update the code per PR comments.

Conflicts:
	build.gradle
	gobblin-core/src/main/java/gobblin/converter/filter/AvroSchemaFieldRemover.java
	gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/MRJobLauncher.java
@jinhyukchang
Copy link
Contributor Author

Merged from origin to sync up to date.

@coveralls
Copy link

coveralls commented Apr 29, 2016

Coverage Status

Coverage increased (+0.9%) to 45.179% when pulling 67598da on jinhyukchang:master into 911ab2e on linkedin:master.

import gobblin.writer.commands.JdbcWriterCommands;
import gobblin.writer.commands.JdbcWriterCommandsFactory;

public class AvroToJdbcEntryConverterInitializer implements ConverterInitializer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

@jinhyukchang
Copy link
Contributor Author

@sahilTakiar Just updated the code addressing PR. Would you take a look when you have a chance?

@coveralls
Copy link

coveralls commented May 3, 2016

Coverage Status

Coverage increased (+0.7%) to 44.991% when pulling 09ffbfc on jinhyukchang:master into 911ab2e on linkedin:master.

@sahilTakiar
Copy link
Contributor

hey @jinhyukchang I discussed with @chavdar and everything in this PR looks good! I can merge this PR, but I have one last request.

Can you add some documentation to the Gobblin Docs about this feature, you can probably just copy what you have in your Google Doc. If you have time its always nice to add documentation on configuration keys and documentation on the different plugin points. This is generally useful for any new Gobblin user who wants to write to a RDMS (I already know of one user waiting on this PR because they want to use this).

All Gobblin documentation is checked into GitHub under gobblin-docs. I suggest putting the documentation in a new file called gobblin-docs/user-guide/Gobblin-JDBC-Writer.md. The format of the files is all in Markdown. It's should be a relatively simple process, but if you have more questions this page has more details: http://gobblin.readthedocs.io/en/latest/developer-guide/Contributing/

@chavdar
Copy link
Contributor

chavdar commented May 9, 2016

@jinhyukchang we will feature the JDBC writer as one of the highlights for our next release (0.7.0) so it will be great to have some supporting documentation.

@jinhyukchang
Copy link
Contributor Author

@sahilTakiar , @chavdar ,

Great, thanks for the review and will add documentation shortly.

@jinhyukchang
Copy link
Contributor Author

jinhyukchang commented May 10, 2016

@sahilTakiar , @chavdar
Just added documentation. Let me know how it looks. (http://jnchang-ld1:8000/)

Seems now there's a conflict with master branch and I will resolve it once documentation is reviewed.

@sahilTakiar
Copy link
Contributor

Documentation LGTM!

When you rebase this PR, can you also squash all the commits together into one commit?

@jinhyukchang
Copy link
Contributor Author

Sure, will do that.

Update code following findbugs report.

Conflicts:
	gobblin-runtime/src/main/java/gobblin/runtime/AbstractJobLauncher.java
@coveralls
Copy link

coveralls commented May 10, 2016

Coverage Status

Coverage increased (+0.5%) to 45.648% when pulling 387da5a on jinhyukchang:master into 9971e73 on linkedin:master.

@coveralls
Copy link

coveralls commented May 11, 2016

Coverage Status

Coverage increased (+0.7%) to 45.968% when pulling 43d40e0 on jinhyukchang:master into 73e8720 on linkedin:master.

@sahilTakiar sahilTakiar merged commit 98893b4 into apache:master May 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants