Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we run this in Google Colab? #11

Closed
logfella opened this issue Nov 13, 2019 · 3 comments
Closed

Can we run this in Google Colab? #11

logfella opened this issue Nov 13, 2019 · 3 comments

Comments

@logfella
Copy link

hey I would love to try this out but I'm not very proficient with ML.

is it possible to run the biggest T5 model on a Google Colab Notebook? did anyone set one up? thanks!

@adarob
Copy link
Collaborator

adarob commented Nov 13, 2019

It is possible to do inference on the 11B model in a Colab and to train/fine-tune the 3B param model using the free TPU. We will be releasing a notebook in the near future.

If you launch a larger TPU in Cloud, you can connect to it and train the 11B model via Colab.

@anatoly-khomenko
Copy link

Hi @adarob ,

I was trying to fine-tune the 11B model in google cloud and got out of memory error on TPU.
Is there anything to change in the parameters?

Here is how I run the fine tuning:

`export PROJECT=projectname
export ZONE=us-central1-b
export BUCKET=gs://uniquebucketname
export TPU_NAME=t5-ex2
export DATA_DIR="${BUCKET}/t5-boolq-data-dir"
export MODEL_DIR="${BUCKET}/t5_boolq-small-model_dir"

ctpu up --name=$TPU_NAME --project=$PROJECT --zone=$ZONE --tpu-size=v3-8 --tpu-only --tf-version=1.15.dev20190821

t5_mesh_transformer --tpu="${TPU_NAME}" --gcp_project="${PROJECT}" --tpu_zone="${ZONE}" --model_dir="${MODEL_DIR}" --t5_tfds_data_dir="${DATA_DIR}" --gin_file="dataset.gin" --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" --gin_param="utils.tpu_mesh_shape.tpu_topology = '2x2'" --gin_param="MIXTURE_NAME = 'super_glue_boolq_v102'" --gin_file="gs://t5-data/pretrained_models/11B/operative_config.gin"`

The complete stack trace is attached:

T5-11B-TPU-stack-trace.txt

@alespeggio
Copy link

It is possible to do inference on the 11B model in a Colab and to train/fine-tune the 3B param model using the free TPU. We will be releasing a notebook in the near future.

If you launch a larger TPU in Cloud, you can connect to it and train the 11B model via Colab.

Hi @adarob,

I'm trying to fine-tune the T5-small pre-trained model (60 million parameters) on Google Colab (with free TPU) on a custom dataset. However, even if I use an extremely small dataset, the notebook runs out of RAM (35GB on Google Colab). You said that it is possible to fine-tune the 3B param model using free TPU. Hence, I'm wondering if I'm doing something wrong.

I describe below the list of commands that I run.
I install the package with the following command:
pip install t5[gcp]

and I execute this command to fine-tune the model with my dataset.
t5_mesh_transformer --model_dir="/content/small" --gin_file="dataset.gin" --gin_file="/content/small/operative_config.gin" --gin_param="utils.run.train_dataset_fn = @t5.models.mesh_transformer.tsv_dataset_fn" --gin_param="tsv_dataset_fn.filename = 'custom_dataset.tsv'" --gin_file="learning_rate_schedules/constant_0_001.gin" --gin_param="run.train_steps = 1010000"

Have others encountered the same issue?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants