Skip to content

Conversation

chenmoneygithub
Copy link
Contributor

  1. Add cloud training support for BERT example, including cloud logging and GCS I/O.
  2. Add tensorboard logging to BERT example.
  3. Replace the hard-coded TransformerEncoder with keras_nlp.TransformerEncoder.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I think overall, we need to support a few more things here.

  • It should be possible to mix and match gcs buckets.
  • It should be possible to run locally from data on a public gcs bucket, with local logging and saving.

I think the simplest way to do this would be to try to make our inputs behave like the gfile API. Where anything with a gs:// prefix is handled correctly. I think we can for the most part do that by just calling gfile routines, instead of forking our code based on google cloud storage or no.

"Skip restoring from checkpoint if True",
)

flags.DEFINE_bool(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the full invocation end up looking like? Could we make this more simple where all of these paths could be gs:// paths? Similar to how gfile handles this? So something like

bert_train.py \
  --input_files=gs://foo-bucket/foo-bert-data \
  --checkpoint_save_directory=gs://bar-bucket/training-01/pretrained-checkpoints \
  --saved_model_output=gs://bar-bucket/training-01/pretrained-model \
  ...

I don't think we should demand that logging and source data are coming from the same bucket. And overall I think a flow like this would be simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!

import shutil
import sys

import google.cloud.logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this import break the current quick start unless we are pip installing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!

def get_checkpoint_callback():
if FLAGS.checkpoint_save_directory is not None:
if FLAGS.use_cloud_storage:
storage_client = storage.Client()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use GFile as a consistent interface? E.g. https://www.tensorflow.org/api_docs/python/tf/io/gfile/rmtree

GFile says it should support gs:// prefix for the entire API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!

@mattdangerw
Copy link
Member

Quick notes from meeting

  • delete the util file, now unused.
  • update the input_files -> input_directory arg for all the other scripts (split sentences and preprocess).
  • update the readme quickstart

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good to me.

Let's make sure we test the simple local invocation in the README before we merge, to make sure we didn't break anythin.

@chenmoneygithub
Copy link
Contributor Author

Checked! The quick start still works!

@chenmoneygithub chenmoneygithub merged commit 837c8f4 into keras-team:master Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants