Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETL class #187

Closed
aellenhicks opened this issue Aug 16, 2016 · 13 comments
Closed

ETL class #187

aellenhicks opened this issue Aug 16, 2016 · 13 comments
Labels

Comments

@aellenhicks
Copy link

We are requesting a class for extract, transform, and load process that has data items (or ICEs more generally) as specified input and the data items or (ICEs more generally) as specified output.

@aellenhicks
Copy link
Author

Following up on this class request. We need this for a project. Is it possible to get this in the next two weeks?

@alanruttenberg
Copy link
Contributor

Sorry for the delay in responding. I would be comfortable having you or a designate be a committer on the project, so that processes such as ETLs could be added directly by you. Would this lead to a better situation for you?

@mcourtot mcourtot added the NTR label Nov 11, 2016
@mcourtot
Copy link
Contributor

Hi @aellenhicks : could you please provide label, definition and position in hierarchy? If you have a reference for the class (PMID or else) and an example of usage that would be ideal.

@alanruttenberg
Copy link
Contributor

Proposed: A planned process which takes as input a database and fills another
database by extracting concretizations of information entities from
the first, transforming them, and loading the transformed
concretizations into the second.

Has specified input information content entity
Has specified output information content entity

Editor note: We don't define database in IAO, currently, as the bare
term is ambiguous. Reasonable interpretations of the word might be the
material entity, an information structure, an information content
entity. However this definition commits, at least, to there being some
material thing which bear concretizations of information entities and
that there are new concretizations created during the process. We
consider the ETL process in terms of information entities rather than
the concretizations. No committment is made as to whether the
specified output information content entitie are the same or different
than the original - both scenarios are plausible.

@aellenhicks
Copy link
Author

Concretiztions are specifically dependent continuants, so this raises the question of how they can be extracted from one database and loaded into another. The editor note specifies that the loaded concretization is a new (and hence different) thing, but the definition does not seem to convey that. Is it really the concretization that is extracted?

I like the fact that the output can be identical to the input.

@alanruttenberg
Copy link
Contributor

The editor note is just that. We haven't yet talked about processes in which the concretizations are inputs or outputs, but clearly they are in the process. Extracting and loading would are like "copying" in the sense that copying is a process by which a new concretization is made. The struggle is to use the familiar words "extract","transform","load" and mesh it with IAO, particularly as IAO isn't fully developed. Do you have a proposal for an alternative wording, or do you think there's a better account?

@aellenhicks
Copy link
Author

It seems to me that copying or extracting also involves the generically dependent continuant, or ICE. So perhaps transforming is an algorithmic way of producing a new ICE (and hence new sdc that concretizes it) from another ICE. Does this analysis sound right to you?

@alanruttenberg
Copy link
Contributor

Only some transformations create new ICE's I think. For example, suppose you are correcting a misspelling during the transform (not uncommon), in that case I would think that both concretizations are of the the same ICE.

Would this do better?

A planned process which takes as input a database and copies concretizations from the first, optionally transforms then copies the result to the second

@alanruttenberg
Copy link
Contributor

However it should be noted that lack of a clear identity criterion for information content entities is an outstanding problem for IAO. For the most part we have been loose about what counts as a concretization.

@alanruttenberg
Copy link
Contributor

btw, I'm proposing the editor preferred term "database extract, transform, and load process" and alternative term "ETL", ok?

alanruttenberg added a commit that referenced this issue Dec 21, 2016
@aellenhicks
Copy link
Author

aellenhicks commented Dec 22, 2016 via email

@alanruttenberg
Copy link
Contributor

What I mean by transformed is not that the concretization is changed, but rather that there is a process that takes one as input and creates one as output. The newly created concretization might be of the same information artifact, or might be the seed of a new one. Take an analogy to copying a piece of writing by hand. You look at what you are copying and then write something else. Depending on what you are doing the something else might be another concretization of the original ICE, or not. For example if you copy from cursive to block lettering you are creating a new concretization of the same ICE. On the other hand if you copy most of the writing but substitute fictional names for for real names in the original writing you are creating a concretization of a new ICE (perhaps originally concretized in your head).

Summary: Transformation in the sense I intend is not changing something. It is a process in which something new is created that is constructed, at least in part, by using the input as a template.

@alanruttenberg
Copy link
Contributor

This was added in the last release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants