New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smaller transactions #396

Closed
wants to merge 25 commits into
base: develop
from

Conversation

Projects
None yet
5 participants
@jdonkervliet
Contributor

jdonkervliet commented Mar 1, 2015

Hi Syncany team,

This PR is supposed to close issue #364 and provide Syncany with a the ability to send smaller transactions to the remote. It does this by creating multiple DatabaseVersion objects during the indexing phase.

As an added bonus, the indexing is now pipelined with the actual transactions. This means that the complete process becomes faster.

We have provided a limited amount of test code and JavaDoc documentation for our additions. These are also included in this PR.

It's a big feature and also quite technical. We welcome any and all feedback you might have.

thegeman and others added some commits Feb 27, 2015

Implementing pipelined file indexing and uploading.
The indexing of files is now done by a threaded method that creates multiple databaseVersions and makes them available to an iterator.

When a new databaseVersion is made is currently decided by a size in bytes in the UpOperation class. This setting should be moved to a configuration container.
Cleaning up code.
Moved some code around. In particular the checks for when to stop
indexing/transactions.

Also provided a few 'Listener' interfaces which make the structure we use for
filling a queue with DatabaseVersions much cleaner. These interfaces may need to
be placed in a util package. Would like to discuss this.
Moved transaction size limit.
This was located in the global settings, and is now in the UpOperation settings, since it only affects the UpOperation.
Polishing for #364
Moved TransactionSize option from global config to UpOperationOptions.
Added JavaDoc for 2 new large methods.
Hooked test files in test suite.
@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Mar 3, 2015

Member

Here comes my review. Don't be alarmed, they are always long :) Also, I have been told to not sugarcode things (coughs, Pim, coughs), so don't take any comments the wrong way, please!

Also: This is a dry review -- meaning I'm looking at the code on Github. A change this big needs to be run though, so another review will follow after this (after changes).

(a) The reponsibility of the Deduper is deduplication. It does not and should not know anything about transactions. I suppose calling in maxTotalSize and making the argument optional would be good enough for now.

(b) I don't like the firstFileIndex being passed to the deduplicate() method. Any chance we could leave the method to be unaware of any indices and just let it walk through the file list?

(c) I have multiple issues with the AsyncIndexer:

(c.1) First, it runs a thread. This is not obvious by the name of the class and very unusal for an Iterator. I'm not saying it's a no-go, it's just uncommon and hence questionable.

(c.2) The thread starts in the constructor and is uncontrollable thread (cannot be stopped). If a thread has to be used inside the iterator, it should be started in hasNext()...

(c.3) hasNext() implements busy waiting. Could this be done with a BlockingQueue?

(c.4) If you don't implement a method, you should throw an exception, or is remove() called some place else?

(d) The names DatabaseVersionIterator and DatabaseVersionListener are very generic and don't really say what they iterate over (a normal list? No!) or listen to (Changes of any kind? No!).

(e) General "Clean Code"-provoked comment: Naming boolean expressions makes the code much easier to understand for others:

if (remoteDatabaseVersionsToResume != null && remoteTransactionsToResume != null 
   && remoteDatabaseVersionsToResume.size() == remoteTransactionsToResume.size()) {
...
}

// vs.

boolean resumingPossibleDueToConsistentState = 
  remoteDatabaseVersionsToResume != null && remoteTransactionsToResume != null &&
  remoteDatabaseVersionsToResume.size() == remoteTransactionsToResume.size();

if (resumingPossible) { 
...
}

(f) I largely understand and agree with what you are doing in the UpOperation. I only looked at the diff but I understand what's happening. That's awesome! 👍 However, I feel that I need to test this and see it in Eclipse, so you'll have to wait for a second round of reviewing. 👍 😄

Member

binwiederhier commented Mar 3, 2015

Here comes my review. Don't be alarmed, they are always long :) Also, I have been told to not sugarcode things (coughs, Pim, coughs), so don't take any comments the wrong way, please!

Also: This is a dry review -- meaning I'm looking at the code on Github. A change this big needs to be run though, so another review will follow after this (after changes).

(a) The reponsibility of the Deduper is deduplication. It does not and should not know anything about transactions. I suppose calling in maxTotalSize and making the argument optional would be good enough for now.

(b) I don't like the firstFileIndex being passed to the deduplicate() method. Any chance we could leave the method to be unaware of any indices and just let it walk through the file list?

(c) I have multiple issues with the AsyncIndexer:

(c.1) First, it runs a thread. This is not obvious by the name of the class and very unusal for an Iterator. I'm not saying it's a no-go, it's just uncommon and hence questionable.

(c.2) The thread starts in the constructor and is uncontrollable thread (cannot be stopped). If a thread has to be used inside the iterator, it should be started in hasNext()...

(c.3) hasNext() implements busy waiting. Could this be done with a BlockingQueue?

(c.4) If you don't implement a method, you should throw an exception, or is remove() called some place else?

(d) The names DatabaseVersionIterator and DatabaseVersionListener are very generic and don't really say what they iterate over (a normal list? No!) or listen to (Changes of any kind? No!).

(e) General "Clean Code"-provoked comment: Naming boolean expressions makes the code much easier to understand for others:

if (remoteDatabaseVersionsToResume != null && remoteTransactionsToResume != null 
   && remoteDatabaseVersionsToResume.size() == remoteTransactionsToResume.size()) {
...
}

// vs.

boolean resumingPossibleDueToConsistentState = 
  remoteDatabaseVersionsToResume != null && remoteTransactionsToResume != null &&
  remoteDatabaseVersionsToResume.size() == remoteTransactionsToResume.size();

if (resumingPossible) { 
...
}

(f) I largely understand and agree with what you are doing in the UpOperation. I only looked at the diff but I understand what's happening. That's awesome! 👍 However, I feel that I need to test this and see it in Eclipse, so you'll have to wait for a second round of reviewing. 👍 😄

@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Mar 3, 2015

Member

(g) The FilteredListener imports sun.reflect.generics.reflectiveObjects.NotImplementedException, which is probably not on purpose. Also: Why don't you just not implement isValid() in FilteredListener? It's abstract anyway, right, so why not leave it out?

(h) In util, there is an interface Listener with a method process() without any comments. There is no way anyone will ever know what this is for unless you give some context or name it to reflect what it's for.

Member

binwiederhier commented Mar 3, 2015

(g) The FilteredListener imports sun.reflect.generics.reflectiveObjects.NotImplementedException, which is probably not on purpose. Also: Why don't you just not implement isValid() in FilteredListener? It's abstract anyway, right, so why not leave it out?

(h) In util, there is an interface Listener with a method process() without any comments. There is no way anyone will ever know what this is for unless you give some context or name it to reflect what it's for.

jdonkervliet added some commits Mar 4, 2015

Removed DatabaseVersionIterator.
This class is replaced by using a stock BlockingQueue.

jdonkervliet added some commits Mar 4, 2015

Removed method that is no longer needed.
Replaced by using an end-of-stream marker.
Changed the name of several classes.
Changed the '*Listener' names to '*Consumer' names to better reflect their purpose.
@thegeman

This comment has been minimized.

Show comment
Hide comment
@thegeman

thegeman Mar 4, 2015

Contributor

Thanks for the feedback so far. As Pim said, no need to sugar-coat, we can take it 😉

I discussed with @jdonkervliet our initial thoughts, so I will include those below. In general, I will refrain from commenting on the major design decisions we made (parallelization, how transactions are built, how failures are handled, etc.) until you have had the chance to dig into the code with an impartial mindset. We would be more than happy to discuss at a later time.

(a) We agree that the Deduper should not have knowledge of transactions. Perhaps renaming it to something more generic like maxTotalSize is indeed a cleaner solution.

(b) The issue with the file indices is that we somehow need to call the deduplicate method multiple times, while starting at different points in the list of files. This could be done with e.g. an Iterator instead of a list and an index, but then we can not provide the file index to listener.onFileStart. Without knowing exactly what this event triggers, we dare not remove the notion of file indices. Any thoughts on this?

(c) The comments on the AsyncIndexer are fair. @jdonkervliet is working on addressing this at the moment.

(d) In general, most of the new classes we introduced seemed sensible at the time (because we knew what they were supposed to do), but looking back we agree that some of their names are ambiguous at best. We will propose something more descriptive soon, unless you have some suggestions for this.

(e) Fair point, we will change this and also have a look at the rest of our changes for similar issues.

(f) Glad to hear you agree (so far) with the approach we have taken. Looking forward to the second round of reviewing on this because this class in particular was tricky to get right. That is, to get it to do what we think is right 😉

(g) Fixed by @jdonkervliet.

(h) Comments are indeed lacking in some places, especially the utility classes. It was work in progress, but we decided to share our changes with you as soon as we had a "decent" version. Will be completed soon.

Contributor

thegeman commented Mar 4, 2015

Thanks for the feedback so far. As Pim said, no need to sugar-coat, we can take it 😉

I discussed with @jdonkervliet our initial thoughts, so I will include those below. In general, I will refrain from commenting on the major design decisions we made (parallelization, how transactions are built, how failures are handled, etc.) until you have had the chance to dig into the code with an impartial mindset. We would be more than happy to discuss at a later time.

(a) We agree that the Deduper should not have knowledge of transactions. Perhaps renaming it to something more generic like maxTotalSize is indeed a cleaner solution.

(b) The issue with the file indices is that we somehow need to call the deduplicate method multiple times, while starting at different points in the list of files. This could be done with e.g. an Iterator instead of a list and an index, but then we can not provide the file index to listener.onFileStart. Without knowing exactly what this event triggers, we dare not remove the notion of file indices. Any thoughts on this?

(c) The comments on the AsyncIndexer are fair. @jdonkervliet is working on addressing this at the moment.

(d) In general, most of the new classes we introduced seemed sensible at the time (because we knew what they were supposed to do), but looking back we agree that some of their names are ambiguous at best. We will propose something more descriptive soon, unless you have some suggestions for this.

(e) Fair point, we will change this and also have a look at the rest of our changes for similar issues.

(f) Glad to hear you agree (so far) with the approach we have taken. Looking forward to the second round of reviewing on this because this class in particular was tricky to get right. That is, to get it to do what we think is right 😉

(g) Fixed by @jdonkervliet.

(h) Comments are indeed lacking in some places, especially the utility classes. It was work in progress, but we decided to share our changes with you as soon as we had a "decent" version. Will be completed soon.

@binwiederhier

This comment has been minimized.

Show comment
Hide comment
@binwiederhier

binwiederhier Mar 4, 2015

Member

Quick note on (h): I'm a fan of descriptive names more than I am of comments. I you name your methods, variables and classes correctly, you need fewer comments. Now back to work.

Member

binwiederhier commented Mar 4, 2015

Quick note on (h): I'm a fan of descriptive names more than I am of comments. I you name your methods, variables and classes correctly, you need fewer comments. Now back to work.

@pimotte

This comment has been minimized.

Show comment
Hide comment
@pimotte

pimotte Mar 28, 2015

Member

Fixed some stuff from this branch, pushed this to: https://github.com/syncany/syncany/tree/feature/issue364-pim

Member

pimotte commented Mar 28, 2015

Fixed some stuff from this branch, pushed this to: https://github.com/syncany/syncany/tree/feature/issue364-pim

@pimotte

This comment has been minimized.

Show comment
Hide comment
@pimotte

pimotte Apr 10, 2015

Member

Superseded by #437

Member

pimotte commented Apr 10, 2015

Superseded by #437

@pimotte pimotte closed this Apr 10, 2015

@cr0 cr0 removed the status:in-progress label Apr 10, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment