Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how does one get attention values out of the predictions? #143

Open
silky opened this issue Dec 21, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@silky
Copy link
Contributor

commented Dec 21, 2018

thanks for this library! i've managed to try an simple model and i'm excited to do more.

i'm using is config:

(pytext) 12:04 PM noon ∈ pytext (master*) ♫ cat sst2.json
{
  "config": {
    "task": {
      "DocClassificationTask": {
        "model": {
          "representation": {
            "BiLSTMDocAttention": {}
          }
        },
        "data_handler": {
          "columns_to_read": [
            "doc_label",
            "text"
          ],
          "train_path": "data/train.tsv",
          "eval_path": "data/val.tsv",
          "test_path": "data/test.tsv"
        },
        "trainer": {
          "epochs": 150
        }
      }
    }
  }
}

and following the tutorial i can train and get predictions. awesome!

but how do i find the attention?

it doesn't appear to be anywhere in the results of pytext.create_predictor ... ?

thanks for any help!

@silky

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2018

i've now gotten to the point that i've looked at the shapes of all the numpy arrays in workspace.blobs. there's one, with the informative name "107" that has the right dimension, but the values don't look right. nothing else has the right size

am i missing something here?

near as i can tell, attention is turned on in my model...

@bethebunny

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2018

Hey, we've only recently gotten this as a feature request, so we haven't really done anything explicit for making the attention weights exported. You're on the right track with looking in the caffe2 workspace blobs as the quickest way to get this to actually work without making code changes, although I can't give you a great answer about where these will be in the workspace blobs as I haven't had time to look into it yet.

If you do something like this: https://github.com/facebookresearch/pytext/blob/master/pytext/__init__.py#L80
or the similar "load_from_db" function in caffe2, you can actually print out the caffe2 graph structure. That will give you a bit of a better idea of which blobs are the outputs of the attention operators, although reading these structures is a bit obtuse.

It's also possible that your model is not learning something particularly explainable from the attention inputs which is why the values are confusing; I'm not a particularly good person to answer whether this is likely to be valuable, but I've heard claims both ways.

@silky

This comment has been minimized.

Copy link
Contributor Author

commented Dec 22, 2018

thanks for that tip @bethebunny ; using the graph i was able to figure out which tensor it was

image

(i picked node 110).

i'm happy with this hack for now, but also happy to be directed on how to add this into the codebase via a pr so that it's much nicer in the future :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.