New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-15096: Cannot reuse association database with ApPipeTask reruns #29
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the documentation updates. With a couple clarifications I think it will be good. Thank you for keeping the docs in sync with the code changes!
@@ -64,7 +64,7 @@ To process your ingested data, run | |||
|
|||
.. prompt:: bash | |||
|
|||
ap_pipe.py repo --calib repo/calibs --rerun processed --id visit=123456 ccdnum=42 filter=g --template templates | |||
ap_pipe.py repo --calib repo/calibs --rerun processed -c associator.level1_db.db_name=association.db --id visit=123456 ccdnum=42 filter=g --template templates | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @morriscb said over on Jira that the config field should be db_url
, not db_name
. Unless I'm misinterpreting, please update this accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the config field is db_name
. It we for some reason had to configure l1dbproto
then that would be relevant, but AFAIK we don't. (I can't test it because AssociationDBSqliteTask
and AssociationL1DBProtoTask
have incompatible APIs, and ap_pipe
assumes the former.)
.. note:: | ||
|
||
You must :ref:`configure <command-line-task-config-howto>` the association database location. | ||
In the examples above, this is done with the ``-c`` option, but a personal config file may be more convenient if you intend to run ``ap_pipe`` many times. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the user fails to configure the database location and tries to run ap_pipe anyway? Please add a sentence about what will happen in this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps more importantly, it's not clear where (path?) this Must-Be-Named database will be created (or is expected to already exist). Please clarify this and make sure it is consistent with the pseudo file-tree in the Expected Outputs section. Can a path be given for the db_name
/db_url
or just a name?
You say the database name must be specified, but please explain what it does with the name. Something like, "ap_pipe will look for a database that already exists at the location specified. If it finds one, it will connect to it and continue associating, and if it doesn't find one, it will create one."
@@ -104,11 +108,11 @@ something like | |||
|
|||
.. code-block:: none | |||
|
|||
association.db <--- the Prompt Products Database with DIAObjects | |||
repo/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I take away from this is that the database with whatever name the user chooses will be created in the same directory which contains repo
, assuming there is not already a file there with that name. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's created relative to the working directory. Is there some obvious other place to put it, given that spelling out a full rerun path kind of defeats the point of reruns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I think I tend to assume that "repo" is the working directory even though that's not a requirement and the examples do spell out repo
rather than .
as the input.
Please replace "It should look something like" with the following, or something close to it:
"At runtime, if you do not provide a path to an existing Prompt Products Database, a new database with the name you provide will be created (in your working directory or at the path you specify). The result from running ap_pipe
should look something like"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sentence sounds too much like saying there's a default location. How about changing the example to -c associator.level1_db.db_name=ppdb/association.db
and modifying the "after" directory layout to match? Will that make things unambiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds good. I understand you want to avoid hinting at any kind of default. Thanks!
4cdf04a
to
450df32
Compare
450df32
to
e28c1e7
Compare
Because ApPipeTask can't work with in-memory databases, users must provide a location themselves. The documentation has been updated accordingly.
e28c1e7
to
285599e
Compare
This PR removes the database-in-repository model from
ap_pipe
as well as all the code that was necessary to enforce it. SQLite configs are no longer allowed to specify:memory:
, as testing revealed that in-memory databases are incompatible with the requirement that the database be fully initialized beforeAssociationTask
is created. The documentation has been updated to emphasize that the user must always choose a database location in addition to the usual output repository.