Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test search with metadata - can someone explain what metadata is stored/associated? #59

Closed
shashi-netra opened this issue May 31, 2019 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@shashi-netra
Copy link

I am still trying to figure out SPTAG, but unclear on what the metadata methods provide.
for example in the Test() method in sample:

the metadata is generated thus:

for i in range(n):
       m += str(i) + '\n'
   m = m.encode()

does mean each line is associated with each row in the vectors?
can someone provide a simple example of storing metadata with the vectors?

@midneet
Copy link

midneet commented Jun 5, 2019

I've tried some examples to build index with metadata using IndexBuild. And I use a simple word2vec data to build, so my input data would be like
for each line: apple\t0.229233|0.21099|0.108552|0.135154|-0.045957|....|-0.040005|0.1802|0.103172|-0.202125|-0.135632|-0.057288
and if you test the query with IndexSearch, it will return results, and for each line, it shows the most similar (shortest distance) data in the index in format of "distance@metadata" separated by
"|", so it would be like
apple:0.00@apple|0.001@Apple|0.05@apple pen|....|0.5@fruit|
so I think the metadata could be associated with anything you want to annotate the vectors you input. In my case the metadata is the original word before transforming to embedding. Still it's my guess :P

@MaggieQi MaggieQi added the question Further information is requested label Jul 10, 2019
@joskei
Copy link

joskei commented Aug 20, 2019

I still didn't get how to make this work for words. Do you have a sample simple data?

@shashi-netra
Copy link
Author

I just couldn't get it to work for any kind of data. And my questions here have remained unanswered. It seems the Microsoft team couldn't be bothered, unfortunately, and I have given up on using this tool.

@shashi-netra
Copy link
Author

BTW we have recently open-sourced pgANN that solves this problem with a PostgreSQL backend. HTH.

@joskei
Copy link

joskei commented Aug 21, 2019

So here's my code. This returns something but I'm not sure why it returns it that way. Can somebody explain? This is based on the sample code from the github site.

`using Microsoft.ANN.SPTAGManaged;
using System;
using System.IO;
using System.Text;

namespace SPTAG_Tester
{
class Program
{
static int dimension = 2;
static int n = 14;
static int k = 3;

    static byte[] createFloatArray(int n)
    {
        byte[] data = new byte[n * dimension * sizeof(float)];
        
        for (int i = 0; i < n; i++)
            for (int j = 0; j < dimension; j++)
                Array.Copy(BitConverter.GetBytes((float)i), 0, data, (i * dimension + j) * sizeof(float), 4);
        return data;

     
    }

    static byte[] createMetadata(int n)
    {
        StringBuilder sb = new StringBuilder();

        sb.Append("kitten\n");
        sb.Append("hamster\n");
        sb.Append("tarantula\n");
        sb.Append("puppy\n");
        sb.Append("crocodile\n");
        sb.Append("dolphin\n");
        sb.Append("panda bear\n");
        sb.Append("lobster\n");
        sb.Append("capybara\n");
        sb.Append("elephant\n");
        sb.Append("mosquito\n");
        sb.Append("goldfish\n");
        sb.Append("horse\n");
        sb.Append("chicken\n");

        return Encoding.ASCII.GetBytes(sb.ToString());
    }

    static void Main()
    {
        {
            AnnIndex idx = new AnnIndex("BKT", "Float", dimension);
            idx.SetBuildParam("DistCalcMethod", "L2");
            byte[] data = createFloatArray(n);

            byte[] meta = createMetadata(n);
            idx.BuildWithMetaData(data, meta, n, true);
            idx.Save("testcsharp");
        }

        AnnIndex index = AnnIndex.Load("testcsharp");
        BasicResult[] res = index.SearchWithMetaData(createFloatArray(1), k);
        for (int i = 0; i < res.Length; i++)
            Console.WriteLine("result " + i.ToString() + ":" + res[i].Dist.ToString() + "@(" + res[i].VID.ToString() + "," + Encoding.ASCII.GetString(res[i].Meta) + ")");
        Console.WriteLine("test finish!");

        Console.ReadLine();
    }
}

}`

The result is:
result 0:0@(0,kitten ) result 1:2@(1,hamster ) result 2:8@(2,tarantula )

So some question:

  • Why is it returning the first 3?
  • Can I do a search base on my metadata? How?
  • What should be the content of my "data" variable (the one generated from createFloatArray(int n)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants