recommender: inverted_index: rows cleared by clear_row occasionally does not disappear in distributed mode #682

Closed
kmaehashi opened this Issue Feb 24, 2014 · 3 comments

Comments

Projects
None yet
2 participants
Owner

kmaehashi commented Feb 24, 2014

When using recommender with inverted_index in distributed mode, rows erased by clear_row are occasionally returned by get_all_rows API, even after MIX runs.

Steps to reproduce:

Run jubarecommender with inverted_index in distributed mode.

jubaconfig --debug --zookeeper localhost:2181 --cmd write --type recommender --name my-cluster --file /opt/jubatus/share/jubatus/example/config/recommender/inverted_index.json
jubarecommender --zookeeper localhost:2181 -n my-cluster --interval_count 512 & 

Run the following code.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import time

from jubatus.recommender.client import Recommender
from jubatus.common import Datum

interval_count = 512

if __name__ == '__main__':
    cli = Recommender("127.0.0.1", 9199, "my-cluster")
    print "update_row"
    for i in xrange(interval_count - 1):
        cli.update_row("foo", Datum({"key-" + str(i): i}))

    print "clear_row"
    cli.clear_row("foo") # this update request triggers MIX

    print "waiting for MIX to complete..."
    time.sleep(3)

    print "get_all_rows"
    print cli.get_all_rows()

However, when I change the server argument to --interval_count 10 and the script to interval_count = 10, the row disappears.

@kmaehashi kmaehashi added bug labels Feb 24, 2014

@kmaehashi kmaehashi added the _update label Feb 24, 2014

@kmaehashi kmaehashi added this to the 0.5.3 milestone Feb 25, 2014

@kmaehashi kmaehashi added the _updated label Feb 25, 2014

Contributor

hido commented Mar 23, 2014

By modifying this line, with small constant number as value, this script always succeeds even with larger number of interval_count

        cli.update_row("foo", Datum({"key-" + str(i): value}))

This indicates that the occasional error seems to be caused by numerical error in deciding whether or not to erase a cell in inverted_index_storage by equals-to-0.0f conditions.

@kmaehashi kmaehashi modified the milestones: Near Future, 0.5.3 Mar 24, 2014

@kmaehashi kmaehashi added the v1.0 label Jul 11, 2016

@kmaehashi kmaehashi modified the milestones: Near Future, 0.9.5 Sep 5, 2016

@kmaehashi kmaehashi assigned kmaehashi and unassigned hido Sep 5, 2016

Owner

kmaehashi commented Sep 26, 2016

Maybe we can add new interface like mark_column_removed to inverted_index_storage, which sets column2norm to 0, and call it from inverted_index::remove_row.

Owner

kmaehashi commented Oct 19, 2016

Fixed via LGTM except for comments above.

@kmaehashi kmaehashi closed this Oct 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment