Add SubSsampler dataset #66

MateoLostanlen · 2020-05-17T15:27:57Z

Cette PR introduit mon wildfireSubSampler DataSet, le but de ce dataset est de combiner K frame d’une même séquence. Ce Dataset permet d’entrainer une version modifiée du ResNet18, le ssResNet18. Ce model passe les K frame dans la partie convolution du model indépendamment puis les assemble dans la partie fully connected pour prédire s’il y a du feu ou non. L’idée et d’ajouter une sorte d’information temporelle pour améliorer les prédictions

frgfm · 2020-05-19T15:12:54Z

I'll wait up on #63 to be merged to ensure the tests are running correctly! If I may, for next PRs, try to make sure that they are not inter-dependent, that will make things more straightforward for reviewing 👍

MateoLostanlen · 2020-05-19T16:50:46Z

Ok, sorry for that !

codecov · 2020-05-19T16:58:56Z

Codecov Report

Merging #66 into master will increase coverage by 1.04%.
The diff coverage is 95.45%.

@@            Coverage Diff             @@
##           master      #66      +/-   ##
==========================================
+ Coverage   85.21%   86.25%   +1.04%     
==========================================
  Files          20       21       +1     
  Lines         771      866      +95     
==========================================
+ Hits          657      747      +90     
- Misses        114      119       +5

Flag	Coverage Δ
#unittests	`86.25% <95.45%> (+1.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pyronear/datasets/wildfire/wildfire.py	`97.22% <95.34%> (-1.14%)`	⬇️
pyronear/datasets/wildfire/__init__.py	`100.00% <100.00%> (ø)`
pyronear/utils/collect_env.py	`70.66% <0.00%> (-1.34%)`	⬇️
pyronear/datasets/openfire.py	`89.18% <0.00%> (-0.29%)`	⬇️
pyronear/nn/__init__.py	`100.00% <0.00%> (ø)`
pyronear/models/utils.py	`98.18% <0.00%> (ø)`
pyronear/models/resnet.py	`82.60% <0.00%> (ø)`
pyronear/nn/functional.py	`100.00% <0.00%> (ø)`
pyronear/datasets/utils.py	`91.66% <0.00%> (ø)`
pyronear/utils/__init__.py	`100.00% <0.00%> (ø)`
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 61e6410...b86ca35. Read the comment docs.

frgfm

Thanks again! Would you mind opening a different PR for the model?

pyronear/datasets/wildfire/split_strategy.py

pyronear/datasets/wildfire/wildfire.py

test/test_datasets_wildfire_split.py

pyronear/models/ssresnet18.py

MateoLostanlen · 2020-06-28T20:02:11Z

@x0s I have made the changes, tell me it's ok for you ?

x0s

Hi Thanks for adapting your code,

You seem to use a class where only a function is needed. There're actually not benefit doing so, because it encourage implicit states. As you know from the zen of python:

Explicit is better than implicit.
Simple is better than complex.

I don't have time to review the algorithm(subsampling part), but it could be great to have more comments to understand how you make it.

Also, we may use a lighter fixture for the checkedlabels or metadata. It could be good to make it consistent with already existing fixtures to avoid confusion.

Deterministic behaviour is really expected, we need to be able to reproduce our result so they're stable across time

Also please make the whole PR compliant to PEP8 naming convention as you said it to do so:

PS : Ok for the PEP8, I will change that

Using this package may help.

x0s · 2020-06-30T12:08:49Z

pyronear/datasets/wildfire/wildfire.py

@@ -5,6 +5,8 @@
 import numpy as np
 import pandas as pd
 import torch
+from random import SystemRandom


Why are you using SystemRandom ? it generates unreproducible sequences that also change depending on the system

When I use random I got this error
Standard pseudo-random generators are not suitable for security/cryptographic purposes.
in Codacy/PR Quality Review so I switch to SystemRandom

I'm not familiar with SystemRandom but if we only need a scalar random value, you can easily discard Codacy warning. If there is indeed a lack of availability on some systems as mentioned in the earlier reference, it might be safer to stick with a bare random.random().

Alternatively, this file has a numpy dependency which has a large support of numpy.random !

x0s · 2020-06-30T12:08:58Z

pyronear/datasets/wildfire/wildfire.py

+    frame_per_seq: int
+        frame per sequence to take
+
+    frame_per_seq: float


What is the issue here ?

I think he was referring to the double parameter's description of frame_per_seq!

Thanks, it's fixed

Not sure if we were talking about the same thing but there are two mentions of the frame_per_seq, and there is no mention of the seedargument in the constructor docstring. Mind fixing that up? :)

x0s · 2020-06-30T12:13:00Z

pyronear/datasets/wildfire/wildfire.py

+    Example
+    -------
+    subsampler = WildFireSubSampler(dataset.metadata, 2)
+    dataset.metadata = subsampler.computeSubSet()


The metadata attribute is not expected to be modified this way.
It is way better to initialize a Dataset with the expected metadata instead of modifying the metadata attribute.
Remember that at init, a Dataset is bound to a metadata file and a path leading to the frames described in the metadata file.
Dodging init may create divergence and unexpected behaviour

Ok thanks, not here anymore in last update

x0s · 2020-06-30T12:13:20Z

pyronear/datasets/wildfire/wildfire.py

+        self.metadata = metadata
+        self.metadata.index = np.arange(len(self.metadata))
+        self.imgs = self.metadata['imgFile']
+        self.SubSetImgs = []


Please follow PEP8

You are saying that because the s of SubSetImgs should be lower case right ?

x0s · 2020-06-30T12:17:30Z

test/test_datasets_wildfire_subsampler.py

+
+        self.assertEqual(len(wildfire), 1999)
+
+        subsampler = WildFireSubSampler(wildfire.metadata, 2)


The Subsampler input is a metadata file and outputs another metadata file.

Why not simply init a WildFireDataset with the new subsampled metadata file?

You are right, I have change that !

x0s · 2020-06-30T12:21:05Z

pyronear/datasets/wildfire/wildfire.py

@@ -191,3 +193,97 @@ def set_splits(self, dataframes):
        # Determine estimated(posterior) parameters
        self.n_samples_ = {set_: len(self.splits[set_]) for set_ in ['train', 'val', 'test']}
        self.ratios_ = {set_: (self.n_samples_[set_] / len(self.wildfire)) for set_ in ['train', 'val', 'test']}
+
+
+class WildFireSubSampler:


Actually this subsampler is not bound to WildFire Dataset and can be used with any wildfire video dataset annotated the same way. To help user understand what's made here, The class name can reflect better this possibility

x0s · 2020-06-30T12:24:51Z

pyronear/datasets/wildfire/wildfire.py

+
+    Parameters
+    ---------
+    metadata: Pandas.DataFrame


It could be great to share same interface with WildFireDataset

Not relevent anymore with last update

Since the files changed, is this still an issue?

x0s · 2020-06-30T12:32:06Z

test/test_datasets_wildfire_subsampler.py

+        meta2 = subsampler.computeSubSet()
+
+        self.assertEqual(len(meta2), 400)
+        self.assertFalse(wildfire.metadata['imgFile'].values.tolist() == meta2['imgFile'].values.tolist())


We definitely need reproducibility when dealing with random numbers. If not ensured, we cannot guarantee any result nor stability.
If you don't want to be bound to a random generator, seeds can be hyperoptimized, while maintaining the predictor deterministic.

x0s · 2020-06-30T12:39:53Z

test/test_datasets_wildfire_subsampler.py

+from pyronear.datasets.wildfire import WildFireDataset, WildFireSubSampler
+
+
+class WildFireDatasetSubSampler(unittest.TestCase):


Please append with Tester so this class name doesn't collide with others

I'm sorry but I don't understand your comment :/

I think he was suggesting to append Tester at the end of this class's name :) (so WildFireDatasetSubSamplerTester or something like WildFireSubSamplerTester is you want a shorter name)

Would you mind changing the class name so that we can close this thread? 🙏

x0s · 2020-06-30T13:23:38Z

pyronear/datasets/wildfire/wildfire.py

+        self.SubSetImgs = []
+        self.probTh = probTh
+        self.frame_per_seq = frame_per_seq
+        # Define sequences numbers


Why are you re-extracting the fBase from the imgFile ? Shouldn't it be already present since previous step was Frame extraction ?

Yes, you are right sorry for that

frgfm

Apart from the docstring to fix, the PR looks alright, almost there! I'm still having concerns about adding new .csv content files each time we want to add a new feature and its unittests though.

frgfm · 2020-09-09T15:53:05Z

pyronear/datasets/wildfire/wildfire.py

+    frame_per_seq: int
+        frame per sequence to take
+
+    frame_per_seq: float


Not sure if we were talking about the same thing but there are two mentions of the frame_per_seq, and there is no mention of the seedargument in the constructor docstring. Mind fixing that up? :)

frgfm · 2020-09-09T15:54:59Z

test/test_datasets_wildfire_subsampler.py

+from pyronear.datasets.wildfire import WildFireDataset, WildFireSubSampler
+
+
+class WildFireDatasetSubSampler(unittest.TestCase):


Would you mind changing the class name so that we can close this thread? 🙏

MateoLostanlen · 2020-09-17T15:41:51Z

Apart from the docstring to fix, the PR looks alright, almost there! I'm still having concerns about adding new .csv content files each time we want to add a new feature and its unittests though.

We do not need a new csv for each feture only a csv complete enough to test all cases. I suggest to open an issue on this subject and to create a PR soon to solve it.

I just made a commit to change the name of the test class and to remove the error in the doc. Can we validate the PR?

frgfm · 2020-09-17T15:47:58Z

Apart from the docstring to fix, the PR looks alright, almost there! I'm still having concerns about adding new .csv content files each time we want to add a new feature and its unittests though.

We do not need a new csv for each feture only a csv complete enough to test all cases. I suggest to open an issue on this subject and to create a PR soon to solve it.

I just made a commit to change the name of the test class and to remove the error in the doc. Can we validate the PR?

Just missing the seed argument in the docstring 🙏
Well, it's far from being one .csv for all features. In this PR for instance, we introduce a 2000 lines content file for testing. Let's discuss this in an issue as you mentioned, it will be easier then!

frgfm

Looks good to me! I'll open an issue later for fixture & content files addition in PR, thanks!

the change have been made

frgfm

Looks good to me!

MateoLostanlen added 10 commits May 1, 2020 16:54

Allow test ratio to zero in WildFireSplitter

0821854

fix flake8

a59ebad

add a test to improve code coverage

9bb7bfa

fix test

d4c6b90

Add SubSampler DataSet

7772b38

add tests for SubSampler dataset

07ea3a6

add SSresnet18 model

e7939f2

clean code in subsampler dataset

e902954

Add test for SSResNet18

bbccf8f

Merge remote-tracking branch 'upstream/master' into addSubSamplerDataSet

9594a62

MateoLostanlen requested a review from frgfm May 17, 2020 15:52

MateoLostanlen added the type: enhancement New feature or request label May 17, 2020

MateoLostanlen added this to In progress in PyroNear kanban via automation May 17, 2020

MateoLostanlen added module: datasets Related to datasets module: models Related to models labels May 17, 2020

Update SSresnet18 for a better combinaition of frames

9610013

frgfm requested changes May 19, 2020

View reviewed changes

pyronear/datasets/wildfire/split_strategy.py Outdated Show resolved Hide resolved

pyronear/datasets/wildfire/wildfire.py Outdated Show resolved Hide resolved

test/test_datasets_wildfire_split.py Outdated Show resolved Hide resolved

PyroNear kanban automation moved this from In progress to Review in progress May 19, 2020

use of .to instead of .cuda

0898e34

frgfm reviewed May 20, 2020

View reviewed changes

pyronear/models/ssresnet18.py Outdated Show resolved Hide resolved

MateoLostanlen added 4 commits May 20, 2020 16:06

remove useless if

5bdb64c

remove change already made in PR pyronear#63

c67ee3f

remove change already made in PR pyronear#63 2

2cc7da2

remove test with test ratio equal zero, impossible without pyronear#63

8072b95

MateoLostanlen mentioned this pull request May 22, 2020

Add sub sampler resnet18 #69

Merged

Create an other PR to add the ssresnet18

87ca39b

MateoLostanlen removed the module: models Related to models label May 22, 2020

MateoLostanlen added 4 commits June 28, 2020 22:05

replace random per SystemRandom

ab1d0e8

fix license header

6ab8d03

put back import unitest

6850ec1

put back import random for shuffle

1390042

MateoLostanlen requested a review from x0s June 30, 2020 08:54

x0s previously requested changes Jun 30, 2020

View reviewed changes

MateoLostanlen added 7 commits July 18, 2020 10:24

define subSampler as a function

76bb9bf

allow to get metadata from path

7747e51

update tests

886da26

remove useless import

93768c2

reproduce random error

32eaa6b

put back SystemRandom

b559480

add seed for reproducibility

fe083b0

frgfm requested changes Sep 9, 2020

View reviewed changes

fix typo and class name

363346a

MateoLostanlen force-pushed the addSubSamplerDataSet branch from d4a7cdc to 363346a Compare September 17, 2020 16:17

add seed to docstring

e95c480

frgfm previously approved these changes Sep 17, 2020

View reviewed changes

update main.yml

b86ca35

MateoLostanlen dismissed frgfm’s stale review via b86ca35 September 18, 2020 16:53

MateoLostanlen requested a review from frgfm September 18, 2020 17:03

frgfm approved these changes Sep 18, 2020

View reviewed changes

PyroNear kanban automation moved this from Review in progress to Reviewer approved Sep 18, 2020

MateoLostanlen merged commit 8116724 into pyronear:master Sep 18, 2020

PyroNear kanban automation moved this from Reviewer approved to Done Sep 18, 2020

MateoLostanlen deleted the addSubSamplerDataSet branch January 16, 2021 09:48


		self.assertEqual(len(wildfire), 1999)

		subsampler = WildFireSubSampler(wildfire.metadata, 2)

		from pyronear.datasets.wildfire import WildFireDataset, WildFireSubSampler


		class WildFireDatasetSubSampler(unittest.TestCase):

Add SubSsampler dataset #66

Add SubSsampler dataset #66

Conversation

MateoLostanlen commented May 17, 2020

frgfm commented May 19, 2020

MateoLostanlen commented May 19, 2020

codecov bot commented May 19, 2020 • edited

Codecov Report

frgfm left a comment

Choose a reason for hiding this comment

MateoLostanlen commented Jun 28, 2020

x0s left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frgfm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MateoLostanlen commented Sep 17, 2020

frgfm commented Sep 17, 2020

frgfm left a comment

Choose a reason for hiding this comment

frgfm left a comment

Choose a reason for hiding this comment

codecov bot commented May 19, 2020 •

edited