recommender: inverted_index: rows cleared by clear_row occasionally does not disappear in distributed mode #682

kmaehashi opened this Issue Feb 24, 2014 · 3 comments


None yet
2 participants

kmaehashi commented Feb 24, 2014

When using recommender with inverted_index in distributed mode, rows erased by clear_row are occasionally returned by get_all_rows API, even after MIX runs.

Steps to reproduce:

Run jubarecommender with inverted_index in distributed mode.

jubaconfig --debug --zookeeper localhost:2181 --cmd write --type recommender --name my-cluster --file /opt/jubatus/share/jubatus/example/config/recommender/inverted_index.json
jubarecommender --zookeeper localhost:2181 -n my-cluster --interval_count 512 & 

Run the following code.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import time

from jubatus.recommender.client import Recommender
from jubatus.common import Datum

interval_count = 512

if __name__ == '__main__':
    cli = Recommender("", 9199, "my-cluster")
    print "update_row"
    for i in xrange(interval_count - 1):
        cli.update_row("foo", Datum({"key-" + str(i): i}))

    print "clear_row"
    cli.clear_row("foo") # this update request triggers MIX

    print "waiting for MIX to complete..."

    print "get_all_rows"
    print cli.get_all_rows()

However, when I change the server argument to --interval_count 10 and the script to interval_count = 10, the row disappears.

@kmaehashi kmaehashi added bug labels Feb 24, 2014

@kmaehashi kmaehashi added the _update label Feb 24, 2014

@kmaehashi kmaehashi added this to the 0.5.3 milestone Feb 25, 2014

@kmaehashi kmaehashi added the _updated label Feb 25, 2014


hido commented Mar 23, 2014

By modifying this line, with small constant number as value, this script always succeeds even with larger number of interval_count

        cli.update_row("foo", Datum({"key-" + str(i): value}))

This indicates that the occasional error seems to be caused by numerical error in deciding whether or not to erase a cell in inverted_index_storage by equals-to-0.0f conditions.

@kmaehashi kmaehashi modified the milestones: Near Future, 0.5.3 Mar 24, 2014

@kmaehashi kmaehashi added the v1.0 label Jul 11, 2016

@kmaehashi kmaehashi modified the milestones: Near Future, 0.9.5 Sep 5, 2016

@kmaehashi kmaehashi assigned kmaehashi and unassigned hido Sep 5, 2016


kmaehashi commented Sep 26, 2016

Maybe we can add new interface like mark_column_removed to inverted_index_storage, which sets column2norm to 0, and call it from inverted_index::remove_row.


kmaehashi commented Oct 19, 2016

Fixed via LGTM except for comments above.

@kmaehashi kmaehashi closed this Oct 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment