Skip to content

Conversation

@allieychen
Copy link
Contributor

With the following change, the user is able to use docker run to run VT:

  • Add a script to use pipelines to run variant transforms.
  • Add a new docker file to have the above script as the entry point.

Tested: integration tests & manually tested on the new image.
Todo: add tests and update the doc

@coveralls
Copy link

coveralls commented Feb 8, 2019

Pull Request Test Coverage Report for Build 1637

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 88.937%

Totals Coverage Status
Change from base Build 1635: 0.0%
Covered Lines: 7042
Relevant Lines: 7918

💛 - Coveralls

# See the License for the specific language governing permissions and
# limitations under the License.

readonly google_cloud_project="${PROJECT}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GOOGLE_CLOUD_PROJECT variable should be used for this: it's already automatically set in some environments like Cloud Shell

However, we shouldn't force the user to set it anyway if we can avoid it. Given that you are mapping in $HOME/.config, the current project should be recoverable from gcloud

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Aaron! Updated. PTAL :)

@allieychen allieychen force-pushed the docker branch 5 times, most recently from f5bb3e9 to 406fbdb Compare February 13, 2019 19:05
# See the License for the specific language governing permissions and
# limitations under the License.

FROM google/cloud-sdk

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need cloud-sdk? Usually cloud-sdk:slim is sufficient...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.


readonly google_cloud_project="${GOOGLE_CLOUD_PROJECT:-$(gcloud config get-value project)}"
readonly temp_location="${TEMP_LOCATION}"
readonly service_account_scopes="${SERVICE_ACCOUNT_SCOPE:-https://www.googleapis.com/auth/cloud-platform}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Isn't cloud-platform always sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cloud-platform is sufficient. I don't expect many users would change this value, but I don't know whether there are cases that they want to limit their scope.

What I am trying to do is to provide flexibility, and the user can still have the options to change all pipelines' flags (see pipelines_flag below). I am debating with myself whether I should do this, since on one hand I don't want to expose the details of pipelines tool, but on the other hand we should allow the user to customize it.

# limitations under the License.

readonly google_cloud_project="${GOOGLE_CLOUD_PROJECT:-$(gcloud config get-value project)}"
readonly temp_location="${TEMP_LOCATION}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do users typically want to inspect the temporary files? Ideally we would just build a bucket, set a reasonable TTL and delete it when we are done.

This isn't something we should do in this change but would be good to think about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is a good point and removing it can simplify the running of our tool. I have created an issue (issue 442) for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the temp_location provided here is for pipelines logging. And the user doesn't need to provide one.

Copy link
Contributor Author

@allieychen allieychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Aaron! I have updated the base image. PTAL and let me know your opinions on my other comments. :)


readonly google_cloud_project="${GOOGLE_CLOUD_PROJECT:-$(gcloud config get-value project)}"
readonly temp_location="${TEMP_LOCATION}"
readonly service_account_scopes="${SERVICE_ACCOUNT_SCOPE:-https://www.googleapis.com/auth/cloud-platform}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cloud-platform is sufficient. I don't expect many users would change this value, but I don't know whether there are cases that they want to limit their scope.

What I am trying to do is to provide flexibility, and the user can still have the options to change all pipelines' flags (see pipelines_flag below). I am debating with myself whether I should do this, since on one hand I don't want to expose the details of pipelines tool, but on the other hand we should allow the user to customize it.

# limitations under the License.

readonly google_cloud_project="${GOOGLE_CLOUD_PROJECT:-$(gcloud config get-value project)}"
readonly temp_location="${TEMP_LOCATION}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is a good point and removing it can simplify the running of our tool. I have created an issue (issue 442) for it.

# See the License for the specific language governing permissions and
# limitations under the License.

FROM google/cloud-sdk
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

# limitations under the License.

readonly google_cloud_project="${GOOGLE_CLOUD_PROJECT:-$(gcloud config get-value project)}"
readonly temp_location="${TEMP_LOCATION}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the temp_location provided here is for pipelines logging. And the user doesn't need to provide one.

@allieychen allieychen force-pushed the docker branch 6 times, most recently from 0273d44 to f151b68 Compare March 5, 2019 20:06
@allieychen
Copy link
Contributor Author

Thanks Aaron! Based on our off-line discussion, I have modified the runner to take flags rather than environment variables. One sample run command:
docker run -v ~/.config:/root/.config NEW_DOCKER_IMAGE --command "vcf_to_bq --input_pattern INPUT_PATTERN --output_table OUTPUT_TABLE --temp_location TEMP_LOCATION".
PTAL :)

function parse_args {
while [[ "$#" -gt 0 ]]; do
case "$1" in
${PROJECT_OPT})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just inline these flags:
"--project" etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

zones="$2"
;;

${COMMAND_OPT})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to make this a positional argument instead

docker run -v ~/.config:/root/.config ... vcf_to_bq --input_pattern INPUT_PATTERN --output_table OUTPUT_TABLE --temp_location TEMP_LOCATION

Optional, though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@allieychen allieychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Aaron! Now we can run it as
docker run -v ~/.config:/root/.config NEW_DOCKER_IMAGE "vcf_to_bq --input_pattern INPUT_PATTERN --output_table OUTPUT_TABLE --temp_location TEMP_LOCATION".
PTAL :)


function main {
options=`getopt -o '' -l project:,temp_location:,docker_image:,zones: -- "$@"`
parse_args "$@"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you pass $options not $@ here?

Copy link
Contributor Author

@allieychen allieychen Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can. getopt inserts -- between the long arguments and the command (i.e.,
vcf_to_bq --input_pattern ...), which I find hard to understand, so I used "$@" instead (I should remove options= though). Why do u prefer $options?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - as discussed offline, let's just add a comment that we are using getopt only for checking (and drop options=)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to put this inside parse_args since that function makes assumptions about the state of the arguments (ie, if the getopt line is removed, parse_args will fail).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@allieychen allieychen merged commit 7a2309a into googlegenomics:master Mar 14, 2019
@allieychen allieychen deleted the docker branch March 14, 2019 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants