Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BENCHMARK] Run parameter benchmarks with different input data and update them. #207

Conversation

Irallia
Copy link
Collaborator

@Irallia Irallia commented May 22, 2022

min_var_length = 30 (remains)

old value new value
min_var_length results all min_var_length results all
min_var_length results all min_var_length results all
min_var_length results all min_var_length results all
min_var_length results all min_var_length results all
min_var_length results all min_var_length results all

max_var_length = 10.000 (changed from 100.000)

old value new value
max_var_length results all max_var_length results all
max_var_length results all max_var_length results all
max_var_length results all max_var_length results all
max_var_length results all max_var_length results all
max_var_length results all max_var_length results all

max_tol_inserted_length = 50 (remains)

old value new value
max_tol_inserted_length results all max_tol_inserted_length results all
max_tol_inserted_length results all max_tol_inserted_length results all
max_tol_inserted_length results all max_tol_inserted_length results all
max_tol_inserted_length results all max_tol_inserted_length results all
max_tol_inserted_length results all max_tol_inserted_length results all

max_tol_deleted_length = 50 (remains)

old value new value
max_tol_deleted_length results all max_tol_deleted_length results all
max_tol_deleted_length results all max_tol_deleted_length results all
max_tol_deleted_length results all max_tol_deleted_length results all
max_tol_deleted_length results all max_tol_deleted_length results all
max_tol_deleted_length results all max_tol_deleted_length results all

max_overlap = 50 (changed from 10)

old value new value
max_overlap results all max_overlap results all
max_overlap results all max_overlap results all
max_overlap results all max_overlap results all
max_overlap results all max_overlap results all
max_overlap results all max_overlap results all

partition_max_distance = 50 (changed from 1.000)

old value new value
partition_max_distance results all partition_max_distance results all
partition_max_distance results all partition_max_distance results all
partition_max_distance results all partition_max_distance results all
partition_max_distance results all partition_max_distance results all
partition_max_distance results all partition_max_distance results all

hierarchical_clustering_cutoff = 0.3 (changed from 0.5)

old value new value
hierarchical_clustering_cutoff results all hierarchical_clustering_cutoff results all
hierarchical_clustering_cutoff results all hierarchical_clustering_cutoff results all
hierarchical_clustering_cutoff results all hierarchical_clustering_cutoff results all
hierarchical_clustering_cutoff results all hierarchical_clustering_cutoff results all
hierarchical_clustering_cutoff results all hierarchical_clustering_cutoff results all

Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
[TEST] Update tests

Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
Signed-off-by: Lydia Buntrock <lydia.buntrock@fu-berlin.de>
@codecov
Copy link

codecov bot commented May 22, 2022

Codecov Report

Merging #207 (c86937f) into master (fe3f74c) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #207   +/-   ##
=======================================
  Coverage   98.35%   98.35%           
=======================================
  Files          18       18           
  Lines         850      850           
=======================================
  Hits          836      836           
  Misses         14       14           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe3f74c...c86937f. Read the comment docs.

@Irallia
Copy link
Collaborator Author

Irallia commented May 22, 2022

iGenVar only - new Plots:

old values new values
iGenVar_only-results all iGenVar_only-results all
iGenVar_only-results DUP_as_INS all iGenVar_only-results DUP_as_INS all

@Irallia Irallia self-assigned this May 22, 2022
Copy link
Member

@joergi-w joergi-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 😄
Two ideas for the code:

Comment on lines +150 to +152
min_qual=list(range(config["quality_ranges"]["iGenVar"]["from"],
config["quality_ranges"]["iGenVar"]["to"],
config["quality_ranges"]["iGenVar"]["step"])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied for in each input – is there a way to pre-define this?

dataset=["Illumina_Paired_End", "Illumina_Mate_Pair", "MtSinai_PacBio", "PacBio_CCS", "10X_Genomics"],
parameter_name="partition_max_distance"),
expand("results/parameter_benchmarks/{dataset}/plots/{parameter_name}.results.all.png",
dataset=["Illumina_Paired_End", "Illumina_Mate_Pair", "MtSinai_PacBio", "PacBio_CCS", "10X_Genomics"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dataset is also always the same

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this appears in every benchmark workflow. So I think I will try to make it shorter in another PR.

Copy link
Collaborator

@joshuak94 joshuak94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! How did you come up with the new values?

@Irallia
Copy link
Collaborator Author

Irallia commented May 24, 2022

Looks good! How did you come up with the new values?

With exactly this PR. So far I have tested iGenvar only on one dataset regarding the parameters, now I have added others. And then I changed only one parameter at a time and saw how our result behaves. With some back and forth I have now ended up with the dafault values and plots created here. On the plots you can see that larger or smaller values would not give any improvement.

@Irallia Irallia merged commit 71ad200 into seqan:master May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants