Skip to content

Conversation

@rjrudin
Copy link
Contributor

@rjrudin rjrudin commented Jan 29, 2020

The key changes are in MarkLogicSinkTask and RunFlowWriteBatchListener. See the comments in those classes for the changes.

This is not intended to be merged, it's just being created for feedback for now.

There are 3 new properties for using this - a DHF flow name, and optional set of steps, and an option for logging each flow response. If the flow name is set, then that flow will be run. Otherwise, only the regular ingestion happens.

I had to make one plumbing change - DatabaseClientCreator is now DatabaseClientConfigBuilder, as I needed to reuse the DatabaseClientConfig object. That's a small plumbing change though that could be made independently of this change.

@rjrudin
Copy link
Contributor Author

rjrudin commented Feb 7, 2020

DH 5.2.0-rc1 will be available on the VPN soon; once it is, I'll update this to depend on that instead of depending on a locally built DH snapshot.

@rjrudin rjrudin changed the title WIP: Prototype of running a flow using DHF 5.2.0 Can now run a flow using DHF 5.2.0 Mar 20, 2020
@BillFarber
Copy link
Contributor

I was able to make this work today, but I ran into two minor issues:

  1. I had to download and add commons-logging-1.2.jar to the /kafka/libs directory. Without that jar, the connector crashed - even I set the logging option to false. It also caused the flow to not complete in DHF.
  2. Based on your comment above, "optional set of steps", I thought I could leave that option blank. However, that also caused a failure. Something like "no such step found". So, I was forced to specify the step(s) I wanted to run.

@rjrudin
Copy link
Contributor Author

rjrudin commented Mar 28, 2020

I'll check on the logging - I remember having to removing a logging jar (forget which one) from the DHF jar because with it included, the connector wouldn't start up.

For #2 - I am assuming that when you didn't specify any steps, you ran a flow that included an ingestion step. An ingestion step - which tries to read files from disk - won't work in the connector. So the set of steps is still optional, because your flow may not have any ingestion steps.

@BillFarber
Copy link
Contributor

For #2 - The flow consists of a single, Mapping step.

@rjrudin
Copy link
Contributor Author

rjrudin commented Mar 30, 2020

I somehow fixed the error about commons-logging not existing by removing the exclusion of the spring-boot dependency from DH. I don't know why that fixed the problem, as "gradle dependencies" still doesn't show commons-logging being included anywhere. Including spring-boot doesn't really matter - it only adds the spring-boot jar, as its two dependencies - spring-context and spring-core - were already being depended on by other DH dependencies.

For the error when no steps are specified - I'm getting a test/fix in place for that, I reproduced it when the property existed without a value.

The key changes are in MarkLogicSinkTask and RunFlowWriteBatchListener. See the comments in those classes for the changes.

There are 3 new properties for using this - a DHF flow name, and optional set of steps, and an option for logging each flow response. If the flow name is set, then that flow will be run. Otherwise, only the regular ingestion happens. 

I had to make one plumbing change - DatabaseClientCreator is now DatabaseClientConfigBuilder, as I needed to reuse the DatabaseClientConfig object. That's a small plumbing change though that could be made independently of this change.
@BillFarber
Copy link
Contributor

Both fixes worked for me.

@BillFarber BillFarber merged commit 2d47762 into marklogic:develop Apr 2, 2020
BillFarber added a commit that referenced this pull request Apr 19, 2020
* Updating the AWS quickstart document.

* Updating the AWS quickstart document. (#19)

* Can now run a flow using DHF 5.2.0 (#21)

The key changes are in MarkLogicSinkTask and RunFlowWriteBatchListener. See the comments in those classes for the changes.

There are 3 new properties for using this - a DHF flow name, and optional set of steps, and an option for logging each flow response. If the flow name is set, then that flow will be run. Otherwise, only the regular ingestion happens. 

I had to make one plumbing change - DatabaseClientCreator is now DatabaseClientConfigBuilder, as I needed to reuse the DatabaseClientConfig object. That's a small plumbing change though that could be made independently of this change.

Co-authored-by: Rob Rudin <rob.rudin@marklogic.com>

* Updating the AWS quickstart document. (#23)

Co-authored-by: rjrudin <rjrudin@gmail.com>
Co-authored-by: Rob Rudin <rob.rudin@marklogic.com>
@rjrudin rjrudin deleted the feature/runFlow branch August 10, 2022 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants