Skip to content

Latest commit

 

History

History
30 lines (24 loc) · 2.1 KB

File metadata and controls

30 lines (24 loc) · 2.1 KB

Key Component

Source Classes

Source classes represents the protocol of connections with data systems. In the Gobblin framework, a source class actually acts in two roles:

  • As the planner when the job starts by generating work units and initiating extractors
  • As an agent in each work unit on behalf of the job when the work unit is picked up a task executor

In DIL the work unit generation function is unanimous across all protocols, hence it is handled by MultistageSource. The extractor is initiated with a connection object, and the connection object is tied to the protocols, hence the initiation is handled by separate sub-classes:

Each subclass holds a set of job keys, so that the extractors can have proper execution context; therefore, the agent function is handled in sub-classes.