-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepositoryIterator and ReferenceIterator implementations #20
Conversation
Merge #19 first. |
Codecov Report
@@ Coverage Diff @@
## master #20 +/- ##
============================================
- Coverage 88.46% 86.63% -1.83%
- Complexity 11 25 +14
============================================
Files 6 10 +4
Lines 182 232 +50
Branches 17 23 +6
============================================
+ Hits 161 201 +40
- Misses 14 19 +5
- Partials 7 12 +5
Continue to review full report at Codecov.
|
} catch { | ||
case _: URISyntaxException => None | ||
} | ||
}).distinct.min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is min
used because of something specific or just to get one of the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min === sorted.head
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's what I meant, do we need them sorted or do we just want the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want the first after sort them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed IRL: sorted is needed, lgtm then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit that it's not clear to me either, why sorting is needed.
Could be a good idea to document that
} | ||
|
||
object RepositoryProvider { | ||
var provider: RepositoryProvider = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is a singleton, what about moving everything to the RepositoryProvider
object instead of having the class and manually manage the singleton?
Also, if you do RepositoryProvider("foo")
and then RepositoryProvider("bar")
what you get is a repository provider with "foo"
as localPath, which is a bit misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is from another PR. Can you comment this there?: #19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object SivaRDDProvider { | ||
var provider: SivaRDDProvider = _ | ||
|
||
def apply(sc: SparkContext): SivaRDDProvider = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as with RepositoryProvider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is from another PR. Can you comment this there?: #19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having all changes from a different PRs together make review harder
Yes, sorry about that, but at this early stage on the project, is really difficult split functionality and go forward without depending of another in-process functionalities. |
Using a base abstract class RootedRepoIterator, we add two implementations, one of them to iterate repository metadata (repository id, urls, is fork) and references metadata (repository_id, name, hash). With this RootedRepoIterator we should be able to implement CommitIterator and BlobIterator too. Filter logic must be implemented before start with BlobIterator. - Split test logic into Traits to be able to use them in all the Specs. - Added a BaseRootedRepoIterator trait with a helper to test iterators more easly.
402afa9
to
53ced47
Compare
Using a base abstract class RootedRepoIterator, we add two implementations, one of them to iterate repository metadata (repository id, urls, is fork) and references metadata (repository_id, name, hash).
With this RootedRepoIterator we should be able to implement CommitIterator and BlobIterator too.
Filter logic must be implemented before start with BlobIterator.