embetter: better embeddings #15

koaning · 2021-10-31T09:23:20Z

This is conceptual work in progress. The maintainer is actively researching this, please do not work on it.

Problem Statement

When you submit where is my phoone and you get similarities you may get things like:

where is my phone
where is my credit card

Depending on your task, either the "where is" part of the sentence is more important or the "phone" part is more important. The encoder, however, may be very brittle when it comes to spelling errors. So to put it more generally;

The similarity in an embedded space in our case is very much "general". I'm using "general" here, as opposed to "specific" to indicate that these similarities have been constructed without having a task in mind.

Similar Issue

Suppose that we are deduplicating and we have a zipcode, city, first-, and last-name. How would our encoding be able to understand that having the same city is not a strong signal while having the first name certainly is? Can we really expect a standard encoding to understand this? Without labels ... I think not.

The text was updated successfully, but these errors were encountered:

koaning · 2021-10-31T09:26:57Z

Embetter: making better embeddings.

So how might we go about making our embeddings a bit more "specific"?

I think the main thing to do is to have a human steer it by labeling.

But how do we connect the two? By training an embedding on top of the encoder!

koaning · 2021-10-31T09:28:57Z

The idea is that this will allow us to "fine-tune" what similarity actually means in our embedded space.

I'm not sure if this is best done by having another package out there called embetter or if it should be a submodule here. Either way, I wanted to have this idea written down somewhere so that I might discuss it with certain folks.

koaning · 2021-10-31T16:17:10Z

Aaaaand it's going in a seperate repo. https://github.com/koaning/embetter

koaning changed the title ~~embetter submodule~~ embetter: better embeddings Oct 31, 2021

koaning self-assigned this Oct 31, 2021

koaning closed this as completed Oct 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embetter: better embeddings #15

embetter: better embeddings #15

koaning commented Oct 31, 2021 •

edited

koaning commented Oct 31, 2021

koaning commented Oct 31, 2021

koaning commented Oct 31, 2021

embetter: better embeddings #15

embetter: better embeddings #15

Comments

koaning commented Oct 31, 2021 • edited

Problem Statement

Similar Issue

koaning commented Oct 31, 2021

Embetter: making better embeddings.

koaning commented Oct 31, 2021

koaning commented Oct 31, 2021

koaning commented Oct 31, 2021 •

edited