-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SubSsampler dataset #66
Add SubSsampler dataset #66
Conversation
I'll wait up on #63 to be merged to ensure the tests are running correctly! If I may, for next PRs, try to make sure that they are not inter-dependent, that will make things more straightforward for reviewing 👍 |
Ok, sorry for that ! |
Codecov Report
@@ Coverage Diff @@
## master #66 +/- ##
==========================================
+ Coverage 85.21% 86.25% +1.04%
==========================================
Files 20 21 +1
Lines 771 866 +95
==========================================
+ Hits 657 747 +90
- Misses 114 119 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again! Would you mind opening a different PR for the model?
@x0s I have made the changes, tell me it's ok for you ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Thanks for adapting your code,
You seem to use a class where only a function is needed. There're actually not benefit doing so, because it encourage implicit states. As you know from the zen of python:
Explicit is better than implicit.
Simple is better than complex.
I don't have time to review the algorithm(subsampling part), but it could be great to have more comments to understand how you make it.
Also, we may use a lighter fixture for the checkedlabels
or metadata
. It could be good to make it consistent with already existing fixtures to avoid confusion.
Deterministic behaviour is really expected, we need to be able to reproduce our result so they're stable across time
Also please make the whole PR compliant to PEP8 naming convention as you said it to do so:
PS : Ok for the PEP8, I will change that
Using this package may help.
@@ -5,6 +5,8 @@ | |||
import numpy as np | |||
import pandas as pd | |||
import torch | |||
from random import SystemRandom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using SystemRandom
? it generates unreproducible sequences that also change depending on the system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I use random I got this error
Standard pseudo-random generators are not suitable for security/cryptographic purposes.
in Codacy/PR Quality Review so I switch to SystemRandom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with SystemRandom
but if we only need a scalar random value, you can easily discard Codacy warning. If there is indeed a lack of availability on some systems as mentioned in the earlier reference, it might be safer to stick with a bare random.random()
.
Alternatively, this file has a numpy dependency which has a large support of numpy.random
!
frame_per_seq: int | ||
frame per sequence to take | ||
|
||
frame_per_seq: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the issue here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think he was referring to the double parameter's description of frame_per_seq
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it's fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we were talking about the same thing but there are two mentions of the frame_per_seq
, and there is no mention of the seed
argument in the constructor docstring. Mind fixing that up? :)
Example | ||
------- | ||
subsampler = WildFireSubSampler(dataset.metadata, 2) | ||
dataset.metadata = subsampler.computeSubSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata
attribute is not expected to be modified this way.
It is way better to initialize a Dataset
with the expected metadata instead of modifying the metadata attribute.
Remember that at init, a Dataset is bound to a metadata
file and a path leading to the frames
described in the metadata file.
Dodging init may create divergence and unexpected behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks, not here anymore in last update
self.metadata = metadata | ||
self.metadata.index = np.arange(len(self.metadata)) | ||
self.imgs = self.metadata['imgFile'] | ||
self.SubSetImgs = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow PEP8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are saying that because the s of SubSetImgs should be lower case right ?
|
||
self.assertEqual(len(wildfire), 1999) | ||
|
||
subsampler = WildFireSubSampler(wildfire.metadata, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Subsampler input is a metadata file and outputs another metadata file.
Why not simply init a WildFireDataset with the new subsampled metadata file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I have change that !
@@ -191,3 +193,97 @@ def set_splits(self, dataframes): | |||
# Determine estimated(posterior) parameters | |||
self.n_samples_ = {set_: len(self.splits[set_]) for set_ in ['train', 'val', 'test']} | |||
self.ratios_ = {set_: (self.n_samples_[set_] / len(self.wildfire)) for set_ in ['train', 'val', 'test']} | |||
|
|||
|
|||
class WildFireSubSampler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this subsampler is not bound to WildFire Dataset and can be used with any wildfire video dataset annotated the same way. To help user understand what's made here, The class name can reflect better this possibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
Parameters | ||
--------- | ||
metadata: Pandas.DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be great to share same interface with WildFireDataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevent anymore with last update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the files changed, is this still an issue?
meta2 = subsampler.computeSubSet() | ||
|
||
self.assertEqual(len(meta2), 400) | ||
self.assertFalse(wildfire.metadata['imgFile'].values.tolist() == meta2['imgFile'].values.tolist()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need reproducibility when dealing with random numbers. If not ensured, we cannot guarantee any result nor stability.
If you don't want to be bound to a random generator, seeds can be hyperoptimized, while maintaining the predictor deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done !
from pyronear.datasets.wildfire import WildFireDataset, WildFireSubSampler | ||
|
||
|
||
class WildFireDatasetSubSampler(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please append with Tester
so this class name doesn't collide with others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I don't understand your comment :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think he was suggesting to append Tester
at the end of this class's name :) (so WildFireDatasetSubSamplerTester
or something like WildFireSubSamplerTester
is you want a shorter name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind changing the class name so that we can close this thread? 🙏
self.SubSetImgs = [] | ||
self.probTh = probTh | ||
self.frame_per_seq = frame_per_seq | ||
# Define sequences numbers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you re-extracting the fBase from the imgFile ? Shouldn't it be already present since previous step was Frame extraction ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right sorry for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the docstring to fix, the PR looks alright, almost there! I'm still having concerns about adding new .csv
content files each time we want to add a new feature and its unittests though.
frame_per_seq: int | ||
frame per sequence to take | ||
|
||
frame_per_seq: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we were talking about the same thing but there are two mentions of the frame_per_seq
, and there is no mention of the seed
argument in the constructor docstring. Mind fixing that up? :)
from pyronear.datasets.wildfire import WildFireDataset, WildFireSubSampler | ||
|
||
|
||
class WildFireDatasetSubSampler(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind changing the class name so that we can close this thread? 🙏
We do not need a new csv for each feture only a csv complete enough to test all cases. I suggest to open an issue on this subject and to create a PR soon to solve it. I just made a commit to change the name of the test class and to remove the error in the doc. Can we validate the PR? |
Just missing the |
d4a7cdc
to
363346a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I'll open an issue later for fixture & content files addition in PR, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Cette PR introduit mon wildfireSubSampler DataSet, le but de ce dataset est de combiner K frame d’une même séquence. Ce Dataset permet d’entrainer une version modifiée du ResNet18, le ssResNet18. Ce model passe les K frame dans la partie convolution du model indépendamment puis les assemble dans la partie fully connected pour prédire s’il y a du feu ou non. L’idée et d’ajouter une sorte d’information temporelle pour améliorer les prédictions