Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted or unsupported index after saving. #40

Closed
janfait opened this issue Nov 4, 2023 · 3 comments
Closed

Corrupted or unsupported index after saving. #40

janfait opened this issue Nov 4, 2023 · 3 comments

Comments

@janfait
Copy link

janfait commented Nov 4, 2023

Hello, stuck with the below. Would appreciate any tips.

My vectors look like this:

[[7.91172300e-01 6.69090297e-01 2.91000000e+02]
 [6.11795087e-01 3.69995315e-01 8.11000000e+02]
 [6.12826115e-01 3.79121037e-01 6.68000000e+02]
 [4.94505465e-01 3.66105550e-01 1.79000000e+02]
 [8.57812207e-01 3.69706741e-01 2.87000000e+02]
 [4.87957676e-01 3.83922704e-01 1.90000000e+02]
 [5.79707092e-01 5.88521933e-01 8.22000000e+02]
 [8.77284651e-01 3.60034340e-01 3.27000000e+02]
 [6.96175913e-01 4.77069307e-01 2.67000000e+02]
 [8.37530029e-01 6.95131995e-01 7.31000000e+02]]

Building and saving my index with this process works nicely.

    df = pd.read_csv(input_csv)
    vectors = df[['Size', 'Gps', 'CategoryCluster']].values
    ids = df['Id'].tolist()
    index = Index(Space.Euclidean, num_dimensions=vectors.shape[1])

    index.add_items(vectors,ids)
    
    #test that the index works
    queries = index.get_vectors([884])
    neighbors, distances = index.query(queries, k=5)
    print(neighbors)
    print(distances)

    index.save(index_path)

The below data is returned from prints. All good.

[[ 884 556793 524883 662437 529508]]
[[0. 0.0011078 0.00121032 0.00268939 0.00401055]]

When trying to read the index for later use with:

index = Index.load(index_path)

I get:
RuntimeError: Index seems to be corrupted or unsupported. Advancing to the next linked list requires 13312 additional bytes (from position 129997), but index data only has 130147 bytes in total.
It is not clear to me where to start with debugging. Do you have any tips on what could be wrong here?

I am on Windows 10 Pro
Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz, 2301 MHz
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32

@janfait
Copy link
Author

janfait commented Nov 4, 2023

I was able to get it running in Docker so I assume it was related to my operating system. Closing

@janfait janfait closed this as completed Nov 4, 2023
@naediros
Copy link

For anyone struggling here as @janfait:
try to open() it as 'rb' , then it works for me just fine even in Windows 10 Pro without Docker (Python 3.9 at least)

with open('my_index.voy', 'rb') as f:
    index = Index.load(f) 

@han1399013493
Copy link

For anyone struggling here as @janfait: try to open() it as 'rb' , then it works for me just fine even in Windows 10 Pro without Docker (Python 3.9 at least)

with open('my_index.voy', 'rb') as f:
    index = Index.load(f) 

very thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants