Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models #89

Open
LysandreJik opened this issue Nov 8, 2019 · 5 comments

Comments

@LysandreJik
Copy link

Hi, this issue is related to ALBERT and especially the V2 models, specifically the xlarge version 2.

TLDR: The ALBERT-xlarge V2 model seems to be different to other V2/V1 models.

The models are accessible through the HUB; in order to inspect them I save the checkpoints which I then load in the modeling.AlbertModel available in the modeling.py file. I use this script to save the checkpoint to a file.

In a different script, I load the checkpoint in a model from modeling.py (a line has to be added to the scope so that the modeling scope begins with module, same as the HUB module). I load the checkpoint in this script. In that same script I load a HUB module, and I compare the outputs of both models given the same input values.

For every model, I check that the difference is near to zero by checking the maximum difference between tensor values (included at the bottom of the second script). Here are the results:

ALBERT-BASE-V1 max difference: pooled 8.009374e-06, full transformer 2.3543835e-06
ALBERT-LARGE-V1 max difference: pooled 2.5719404e-05 full transformer 1.8417835e-05
ALBERT-XLARGE-V1 max difference: pooled 0.0006218478 full transformer 0.0
ALBERT-XXLARGE-V1 max difference: pooled 0.0 full transformer 1.0311604e-05

ALBERT-BASE-V2 max difference: pooled 2.3335218e-05 full transformer 4.9591064e-05
ALBERT-LARGE-V2 max difference: pooled 0.00015488267 full transformer 0.00010347366
ALBERT-XLARGE-V2 max difference: pooled 1.9535216 full transformer 5.152705
ALBERT-XXLARGE-V2 max difference: pooled 1.7762184e-05 full transformer 2.592802e-06

Is there an issue with this model in particular, does it have a particular architecture change that is different from the others?
I have had no problems replicating the SQuAD results on all of the V1 models, but I could not do so on the V2 models apart for the base one. Is this related? Thank you for your time.

@insop
Copy link

insop commented Nov 25, 2019

(a line has to be added to the scope so that the modeling scope begins with module, same as the HUB module).

Hi @LysandreJik

I have tried to add module scope, but I don't seem to get it working with a line.
Below is how I get the module scope done and get your compare_albert.py working, doesn't look good but it works.

Could you tell how you get module scope done?

Thank you,

$ diff -uN a.py b.py
--- a.py        2019-11-25 01:07:50.000000000 -0800
+++ b.py        2019-11-25 01:08:23.000000000 -0800
@@ -1,4 +1,4 @@
-def get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_group=0):
+def get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_group=0, add_scope='module'):
   """Compute the union of the current variables and checkpoint variables."""
   assignment_map = {}
   initialized_variable_names = {}
@@ -8,8 +8,15 @@
     name = var.name
     m = re.match("^(.*):\\d+$", name)
     if m is not None:
-      name = m.group(1)
-    name_to_variable[name] = var
+      # add 'module' scope name to match tf hub module
+      if add_scope is not None:
+          name = 'module/' + m.group(1)
+      else:
+          name = m.group(1)
+      # NOTE: store name as value for scope matching
+      # since 'var' value was not used
+      name_to_variable[name] = m.group(1)
+  
   init_vars = tf.train.list_variables(init_checkpoint)
   init_vars_name = [name for (name, _) in init_vars]
 
@@ -20,7 +27,7 @@
   else:
     assignment_map = collections.OrderedDict()
 
-  for name in name_to_variable:
+  for name, old_name in name_to_variable.items():
     if name in init_vars_name:
       tvar_name = name
     elif (re.sub(r"/group_\d+/", "/group_0/",
@@ -50,7 +57,12 @@
       if not group_matched:
         assignment_map[0][tvar_name] = name
     else:
-      assignment_map[tvar_name] = name
+      if add_scope is not None:
+        # add 'module' scope name to match tf hub module
+        # <'module/'+ xxx, xxx>
+        assignment_map[tvar_name] = old_name
+      else:
+        assignment_map[tvar_name] = name
     initialized_variable_names[name] = 1
     initialized_variable_names[six.ensure_str(name) + ":0"] = 1

@LysandreJik
Copy link
Author

Hi @insop, to add the module scope, I added the following line at line 194 of modeling.py:

with tf.variable_scope("module"):

Which results in the __init__ method of AlbertModel beginning with these few lines:

[...]
    config = copy.deepcopy(config)
    if not is_training:
      config.hidden_dropout_prob = 0.0
      config.attention_probs_dropout_prob = 0.0

    input_shape = get_shape_list(input_ids, expected_rank=2)
    batch_size = input_shape[0]
    seq_length = input_shape[1]

    if input_mask is None:
      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)

    if token_type_ids is None:
      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)

    with tf.variable_scope("module"):
      with tf.variable_scope(scope, default_name="bert"):
        with tf.variable_scope("embeddings"):
          # Perform embedding lookup on the word ids.
          (self.word_embedding_output,
[...]

@insop
Copy link

insop commented Nov 26, 2019

Hi @LysandreJik
Thanks a lot, it works like a charm!

@LysandreJik
Copy link
Author

Great to hear, please let me know if you manage to convert the v2 models/reproduce the results!

@insop
Copy link

insop commented Nov 27, 2019

Hi @LysandreJik

I have ran your script (compare_albert.py with different input_string, see below) for v2 models.
My run large model shows more difference, not as large as your data for xlarge.
For xlarge, difference seems okay.

I have a question to run squad, but I will post in other open issue link that I saw you were there.

I thought I saw you on other post, but I was mistaken.
Were you able to run run_squad_sp.py?
(in order to prevent being digressed, I could find other way to communicate in case you were able to run run_squad_sp.py without any issue).

Thank you,


$ python -c 'import tensorflow as tf; print(tf.__version__)'
1.15.0

// one change I've made is this
# Create inputs
#input_sentence = "this is nice".lower()
input_sentence = "The most difficult thing is the decision to act, the rest is merely tenacity. The fears are paper tigers. You can do anything you decide to do. You can act to change and control your life; and the procedure, the process is its own reward.".lower()


model: base

Comparing the HUB and TF1 layers
-- pooled            1.5154481e-05
-- full transformer  3.1471252e-05


model: large

Comparing the HUB and TF1 layers
-- pooled            0.014360733
-- full transformer  0.014184952


model: xlarge

Comparing the HUB and TF1 layers
-- pooled            1.6540289e-06
-- full transformer  4.9889088e-05

model: xxlarge

Comparing the HUB and TF1 layers
-- pooled            2.5779009e-05
-- full transformer  1.8566847e-05

@andrewluchen andrewluchen transferred this issue from google-research/google-research Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants