New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDBC Writer #893
JDBC Writer #893
Conversation
Any reason the design doc can't be public? |
@kadaan Internally, for reviews, we use a company google docs account. Once we finalize the internal review, we'll share the design. |
@@ -102,6 +102,7 @@ | |||
public static final String JOB_JAR_FILES_KEY = "job.jars"; | |||
public static final String JOB_LOCAL_FILES_KEY = "job.local.files"; | |||
public static final String JOB_HDFS_FILES_KEY = "job.hdfs.files"; | |||
public static final String JOB_JAR_HDFS_FILES_KEY = "job.jars.hdfs"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make name consistent with above property: job.hdfs.jars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @see gobblin.converter.AvroToAvroConverterBase#convertSchema(org.apache.avro.Schema, gobblin.configuration.WorkUnitState) | ||
*/ | ||
@Override | ||
public Schema convertSchema(Schema inputSchema, WorkUnitState workUnit) throws SchemaConversionException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this method need to first populate a list of fields to remove, and then remove them remove them each from the schema? Can't it simply create a Schema
based on the value of CONVERTER_AVRO_FIELDS_PICK_FIELDS
.
AvroUtils.getFieldSchema
will fetch the Schema
of a field given a dot separated path to the field name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just to reduce it to remove problem as removal tool is already there in Gobblin. It seems like it can be easily done by AvroUtils and will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was exploring AvroUtils and was not able to find an easy way to reconstruct Schema mainly because getFieldSchema only provides leaf information which means anyway converter needs to traverse the Schema.
I've used Trie to present dependencies from input and reconstruct the Schema. (Just traversing Schema and add field when it matches was dropped once I learned Schema fields are immutable). Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yes its unfortunate that just having the leaf Schemas is not enough. A Trie is one way to do this, but I think any nested data format would work. For example, I think you could do this using JSON, which has the advantage that you can use standard libraries like GSON. However, your current Trie approach works too, so I will leave it up to you.
|
||
private void validateNotEmpty(String s, String name) { | ||
if(StringUtils.isEmpty(s)) { | ||
throw new IllegalArgumentException(name + " should not be empty."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Guava's Preconditions
to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
…of removing fields, Using Lombok, added converterInitializer to retrieve date field info once
Update the code per PR comments. |
Conflicts: build.gradle gobblin-core/src/main/java/gobblin/converter/filter/AvroSchemaFieldRemover.java gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/MRJobLauncher.java
Merged from origin to sync up to date. |
import gobblin.writer.commands.JdbcWriterCommands; | ||
import gobblin.writer.commands.JdbcWriterCommandsFactory; | ||
|
||
public class AvroToJdbcEntryConverterInitializer implements ConverterInitializer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add.
@sahilTakiar Just updated the code addressing PR. Would you take a look when you have a chance? |
hey @jinhyukchang I discussed with @chavdar and everything in this PR looks good! I can merge this PR, but I have one last request. Can you add some documentation to the Gobblin Docs about this feature, you can probably just copy what you have in your Google Doc. If you have time its always nice to add documentation on configuration keys and documentation on the different plugin points. This is generally useful for any new Gobblin user who wants to write to a RDMS (I already know of one user waiting on this PR because they want to use this). All Gobblin documentation is checked into GitHub under |
@jinhyukchang we will feature the JDBC writer as one of the highlights for our next release (0.7.0) so it will be great to have some supporting documentation. |
@sahilTakiar , @chavdar , Great, thanks for the review and will add documentation shortly. |
@sahilTakiar , @chavdar Seems now there's a conflict with master branch and I will resolve it once documentation is reviewed. |
Documentation LGTM! When you rebase this PR, can you also squash all the commits together into one commit? |
Sure, will do that. |
Update code following findbugs report. Conflicts: gobblin-runtime/src/main/java/gobblin/runtime/AbstractJobLauncher.java
Pull request for JDBC Writer. Will combine to one commit after the review.
Design doc: https://docs.google.com/a/linkedin.com/document/d/1pSWZtgNssxaV1c19xdd1rKft9JaXM2qmqDwwbIb0scM/edit?usp=sharing