Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recommender: clear_row don't free acquired resources #294

Closed
rimms opened this issue Mar 15, 2013 · 11 comments
Closed

recommender: clear_row don't free acquired resources #294

rimms opened this issue Mar 15, 2013 · 11 comments

Comments

@rimms
Copy link
Member

rimms commented Mar 15, 2013

I tried to measure memory size on before and after call clear_row.

The result is following. The value is RSS size which got by using get_status.

method after start-up after update_row 100K ID after clear_row 10K ID
lsh 4312 521888 522048
minhash 4308 216708 216840
euclid_lsh 4336 236324 236672
inverted_index 4292 434044 434192

We cannot use jubarecommender long time without some solution (e.g. stop service and add memory ...).

@suma
Copy link
Member

suma commented Mar 18, 2013

Current implementation doesn't provide its solution without clear.

Because their remove_row only free id value(std::string), not execute compaction the storage and container(like unordered_map).

https://github.com/jubatus/jubatus/blob/master/src/recommender/lsh.cpp#L92
https://github.com/jubatus/jubatus/blob/master/src/recommender/lsh.hpp#L76
https://github.com/jubatus/jubatus/blob/master/src/recommender/recommender_base.hpp#L81
typedef pfi::data::unordered_map<std::string, row_t> tbl_t; in storage/storage_type.hpp
https://github.com/jubatus/jubatus/blob/master/src/storage/sparse_matrix_storage.hpp#L65
https://github.com/jubatus/jubatus/blob/master/src/storage/sparse_matrix_storage.cpp#L147

@beam2d
Copy link
Member

beam2d commented Mar 19, 2013

As @suma said, current implementation doesn't run any compaction on tables in recommenders.
We can add compaction codes on master tables, which include orig_ and master of algorithm-specific tables. Diff tables are cleared at the end of MIX.

Note that we cannot save memory before MIX, because we must retain markers representing removals in diff and share them with other servers on MIX. For example, bit_index_storage::remove_row only overwrites the row in diff by an empty bit vector.
https://github.com/jubatus/jubatus/blob/master/src/storage/bit_index_storage.cpp#L68

@rimms
Copy link
Member Author

rimms commented Jun 24, 2013

@suma, @beam2d, Thank you.

I understood that it's specification of Jubatus.

@rimms rimms closed this as completed Jun 24, 2013
@kmaehashi
Copy link
Member

I'm currently facing this problem in my project using standalone-mode... and hoping this to be solved in Jubatus layer.
How about defining inherited version of each storage class for standalone-mode and removing entry in there (instead of overwriting zero for distributed-mode)?

@beam2d
Copy link
Member

beam2d commented Feb 3, 2014

We can erase an entry in the diff table that is not contained in the master table, because we do not need to MIX such entry with other servers. If we fix the implementation to behave as such, the standalone-mode server just erases items on removal, while distributed-mode servers correctly MIXes removals.

@kmaehashi kmaehashi self-assigned this Feb 3, 2014
@kmaehashi
Copy link
Member

In today's meeting, we agreed to fix this using the method proposed by @beam2d.
I've reopened, changed milestone to Near Future and assigned to me.

@kmaehashi kmaehashi reopened this Feb 3, 2014
@kmaehashi
Copy link
Member

After applying #659, #685, and #694, I observed some mitigation.

Some things to note:

  • In the current architecture, we cannot "completely" remove rows, as records like column2id_ cannot be cleared.
  • As far as I tested, RSS value does not shrink by just erasing elements. I suppose it depends on how unordered_map adjusts its bucket size, and memory fragmentation may also be involved.

Here are RSS values tested on Recommender (inverted_index):

0.5.2

  1. Startup : 5776 KB
  2. update_row (10k) : 13564 KB
  3. clear_row (10k) : 13544 KB
  4. update_row (10k) : 18896 KB

0.5.2 + Patch #659, #685 and #694

  1. Startup : 5784 KB
  2. update_row (10k) : 13456 KB
  3. clear_row (10k) : 13472 KB // does not shrink!
  4. update_row (10k) : 15824 KB // about 3 MB better than 0.5.2

@kumagi kumagi assigned gintenlabo and unassigned kmaehashi Mar 3, 2014
@kumagi
Copy link
Member

kumagi commented Mar 3, 2014

We should try rehash of unordered_map.

@kmaehashi
Copy link
Member

From discussion in meeting in 2014-03-03, we asked @gintenlabo to try rehash and other possible ideas.

@kmaehashi kmaehashi modified the milestones: 0.6.0, Near Future May 7, 2014
@kmaehashi kmaehashi modified the milestones: Near Future, 0.6.0 Jun 16, 2014
@kmaehashi kmaehashi modified the milestones: 0.6.3, Near Future Sep 1, 2014
@gintenlabo
Copy link
Contributor

I tried table.rehash(0); to shrink buffers, but memory usage is not changed after clear_row.
c.f. https://github.com/gintenlabo/jubatus_core/tree/fix-issue-294

@kmaehashi kmaehashi modified the milestones: Pending, 0.6.3 Sep 17, 2014
@kmaehashi
Copy link
Member

As we think this behavior is by-design, we close this issue.

@kmaehashi kmaehashi removed this from the Pending milestone Jul 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants