Add text embedding serving #214

coderrg · 2023-05-26T07:22:17Z

Resolves #206

lib/bumblebee/text.ex

jonatanklosko · 2023-05-26T11:42:38Z

@coderrg awesome thanks, just one nitpick from me!

@seanmor5 you've been using embeddings, so please check if this matches what you'd expect :D

seanmor5 · 2023-05-26T12:12:51Z

The only thing I would add is an option to include a function which transforms the output attribute as a part of the serving. I'm not sure HF has this, but it seems common with sentence transformer embeddings

rajrajhans · 2023-05-26T18:01:04Z

Thanks for the great work!

A beginner question about this - if someone could help.

This line - encoder.(params, input)[output_attribute] mandatorily extracts output_attribute from encoder.(params, input).

However, if I'm trying to get CLIP text embedding with textual_projection, then my model would look something like -

{:ok, %{model: text_model, params: text_params, spec: text_spec}} =
        Bumblebee.load_model({:hf, clip_model_name},
          module: Bumblebee.Text.ClipText,
          architecture: :base
        )

my_model = text_model
           |> Axon.nx(& &1.pooled_state)
           |> Axon.dense(512, use_bias: false, name: "text_projection")

Here, the output of my_model would be the actual Nx tensor output that I would like. But, since the text embedding always tries to extract output_attribute, I will have to add another layer to shift the actual output inside an object

my_model_2 =
        text_model
        |> Axon.nx(& &1.pooled_state)
        |> Axon.dense(512, use_bias: false, name: "text_projection")
        |> Axon.nx(fn x -> %{my_output_attribute: x} end)

And then pass :my_output_attribute as the value for output_attribute option.

Is there another way around this? Please correct me if I'm wrong.
Maybe having output_attribute optional here would be a good way out?

coderrg · 2023-05-27T03:59:16Z

I appreciate the suggestions @jonatanklosko and @seanmor5 — I've implemented those changes! @seanmor5 I know folks will sometimes apply L2 normalization to their embeddings, so I added support for it; let me know if there were other functions you had in mind. And @rajrajhans, thank you for bringing that up. For models like CLIP that have a projection head, it would be helpful to have the option to directly retrieve the model output.

My guess is that for most models, the pooled state (as an attribute of the output) would be used as the embedding, so perhaps it would be best to add a non-default option :none for output_attribute to account for the projection head case (e.g., CLIP). I've gone ahead and done this for now, but I can also make output_attribute optional altogether if that would be preferred. Curious to hear what other folks think.

trodrigu · 2023-05-27T12:35:38Z

This is looking great @coderrg! I was curious about the postprocessing and mean pooling. Should this be a postprocessing function or is this out of scope?

coderrg · 2023-05-28T00:47:43Z

Thanks @trodrigu — that makes sense. I've added support for mean_pooling as an embedding function based on your example since it seems useful for sentence transformers. Does my implementation line up with what you'd expect?

Also, since mean pooling and other functions applied to output embeddings are not mutually exclusive, I changed embedding_function to embedding_functions so that it takes a list of functions to be applied to the output embedding in order. The supported functions are now :l2_normalization and :mean_pooling.

lib/bumblebee/text.ex

lib/bumblebee/text/text_embedding.ex

jonatanklosko · 2023-05-29T10:29:05Z

Awesome, thanks for the feedback everyone!

@coderrg I left a couple more comments, but it's looking great :D

coderrg · 2023-06-01T00:58:27Z

Thanks @jonatanklosko, I've implemented your suggestions. I also added nil as the default option for :output_pool and :embedding_processor — please let me know if I misunderstood that suggestion, though.

jonatanklosko

Perfect, a couple final comment and we should be good to go!

lib/bumblebee/text.ex

lib/bumblebee/text/text_embedding.ex

test/bumblebee/text/text_embedding_test.exs

Co-authored-by: Jonatan Kłosko <jonatanklosko@gmail.com>

lib/bumblebee/utils/nx.ex

jonatanklosko

Awesome, thanks!

lib/bumblebee/utils/nx.ex

Add text embedding serving

5884438

jonatanklosko reviewed May 26, 2023

View reviewed changes

lib/bumblebee/text.ex Outdated Show resolved Hide resolved

coderrg added 3 commits May 26, 2023 19:56

Set output_attribute default to :pooled_state

6b2fe18

Add embedding_function option to allow for transforming output

96b5c1a

Allow for output_attribute to be :none

8e8ebfa

Add support for embedding functions list and mean pooling function

2caee5b

Change :none option to nil for output_attribute

8a27542

jonatanklosko reviewed May 29, 2023

View reviewed changes

lib/bumblebee/text.ex Outdated Show resolved Hide resolved

jonatanklosko reviewed May 29, 2023

View reviewed changes

lib/bumblebee/text/text_embedding.ex Outdated Show resolved Hide resolved

jonatanklosko reviewed May 29, 2023

View reviewed changes

lib/bumblebee/text/text_embedding.ex Outdated Show resolved Hide resolved

Move processing logic to embedding_fun

4f2ddc5

jonatanklosko reviewed Jun 1, 2023

View reviewed changes

coderrg and others added 7 commits June 1, 2023 07:38

Update lib/bumblebee/text.ex

bc42578

Co-authored-by: Jonatan Kłosko <jonatanklosko@gmail.com>

Update lib/bumblebee/text.ex

ac23753

Co-authored-by: Jonatan Kłosko <jonatanklosko@gmail.com>

Update test/bumblebee/text/text_embedding_test.exs

652188d

Co-authored-by: Jonatan Kłosko <jonatanklosko@gmail.com>

Check if norm is 0 in normalize

d8a5005

Allow for batched input in normalize

8a83ed4

Update test/bumblebee/text/text_embedding_test.exs

bb3ca9d

Co-authored-by: Jonatan Kłosko <jonatanklosko@gmail.com>

Add test for normalized text embedding

65a607d

jonatanklosko reviewed Jun 2, 2023

View reviewed changes

lib/bumblebee/utils/nx.ex Outdated Show resolved Hide resolved

Update lib/bumblebee/utils/nx.ex

6efca47

jonatanklosko approved these changes Jun 2, 2023

View reviewed changes

jonatanklosko reviewed Jun 2, 2023

View reviewed changes

lib/bumblebee/utils/nx.ex Outdated Show resolved Hide resolved

Update lib/bumblebee/utils/nx.ex

aacd6a1

jonatanklosko reviewed Jun 2, 2023

View reviewed changes

lib/bumblebee/utils/nx.ex Show resolved Hide resolved

Update lib/bumblebee/utils/nx.ex

6e130f6

jonatanklosko merged commit 23de64b into elixir-nx:main Jun 2, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text embedding serving #214

Add text embedding serving #214

coderrg commented May 26, 2023 •

edited

Loading

jonatanklosko commented May 26, 2023

seanmor5 commented May 26, 2023

rajrajhans commented May 26, 2023

coderrg commented May 27, 2023

trodrigu commented May 27, 2023

coderrg commented May 28, 2023

jonatanklosko commented May 29, 2023

coderrg commented Jun 1, 2023

jonatanklosko left a comment

jonatanklosko left a comment

Add text embedding serving #214

Add text embedding serving #214

Conversation

coderrg commented May 26, 2023 • edited Loading

jonatanklosko commented May 26, 2023

seanmor5 commented May 26, 2023

rajrajhans commented May 26, 2023

coderrg commented May 27, 2023

trodrigu commented May 27, 2023

coderrg commented May 28, 2023

jonatanklosko commented May 29, 2023

coderrg commented Jun 1, 2023

jonatanklosko left a comment

Choose a reason for hiding this comment

jonatanklosko left a comment

Choose a reason for hiding this comment

coderrg commented May 26, 2023 •

edited

Loading