Don't concatenate data for `region_df` / `field_df` before necessary #72

hoxbro · 2023-11-23T17:18:34Z

Save intermediate data for field_df and region_df in a list. This list is then concatenated into the respective dataframe when called. An example of code before and after can be run with this code:

This branch: Define annotations (size=5,000): 0.067 seconds
Main branch: Define annotations (size=5,000): 6.149 seconds

from contextlib import contextmanager
from time import perf_counter

import numpy as np
import pandas as pd

from holonote.annotate import Annotator, SQLiteDB


@contextmanager
def catchtime(message):
    start = perf_counter()
    yield
    print(f"{message}: {perf_counter() - start:.3f} seconds")


annotator = Annotator({"TIME": int}, connector=SQLiteDB(table_name="slowness"))


N = 5_000
rng = np.random.default_rng(1337)
start_time = np.arange(0, N * 2, 2)
end_time = start_time + 1
data = pd.DataFrame(
    {
        "start_time": start_time,
        "end_time": end_time,
        "description": rng.choice(["A", "B"], N),
    }
)

with catchtime(f"Define annotations (size={N:,})"):
    annotator.define_annotations(data, TIME=("start_time", "end_time"))

hoxbro · 2023-11-23T17:19:28Z

holonote/annotate/annotator.py

@@ -149,9 +149,7 @@ def clear_regions(self):
    def _add_annotation(self, **fields):
        # Primary key specification is optional
        if self.connector.primary_key.field_name not in fields:
-            index_val = self.connector.primary_key(
-                self.connector, list(self.annotation_table._field_df.index)


I don't think this is needed anymore but I'm not completely sure.

hoxbro · 2023-11-23T17:21:09Z

holonote/annotate/table.py

@@ -20,8 +20,6 @@ class AnnotationTable(param.Parameterized):

    columns = ("region", "dim", "value", "_id")

-    index = param.List(default=[])


This class is not used as a parameterized class and makes it possible to remove the method _update_index for a property that looks up the index.

codspeed-hq · 2023-11-24T09:50:34Z

CodSpeed Performance Report

Merging #72 will improve performances by ×11

_{Comparing dont_concat_before_nessesary (94410c6) with main (a8562ed)}

Summary

⚡ 3 improvements

Benchmarks breakdown

	Benchmark	`main`	`dont_concat_before_nessesary`	Change
⚡	`test_define_annotations[1000]`	14,238.4 ms	405.1 ms	×35
⚡	`test_define_annotations[100]`	873.8 ms	43.1 ms	×20
⚡	`test_define_annotations[10]`	86.4 ms	7.6 ms	×11

holonote/annotate/table.py

hoxbro added 7 commits November 23, 2023 17:36

Don't concat region_df with new data before nessesary

5a4b15a

Don't concat field_df with new data before nessesary

b2044ab

Remove _update_index

eef4f05

Ensure new fields are merged in update_annotations

0a71293

Update other places

62549c4

Reduce calls to field_df before nessesary

e34a56d

Have AnnotationTable be a normal class

0c4ad3a

hoxbro commented Nov 23, 2023

View reviewed changes

jbednar changed the title ~~Don't concatenate data for region_df / field_df before nessesary~~ Don't concatenate data for region_df / field_df before necessary Nov 23, 2023

hoxbro mentioned this pull request Nov 24, 2023

Benchmark #73

Merged

hoxbro added 2 commits November 24, 2023 10:35

Merge branch 'main' into dont_concat_before_nessesary

2f3312e

More clean up

6f9edd5

Don't create a dataframe for each commit in add_annotation

b7a3696

jlstevens reviewed Nov 24, 2023

View reviewed changes

holonote/annotate/table.py Outdated Show resolved Hide resolved

hoxbro added 2 commits November 24, 2023 11:49

Make {region, field}_df private again

d3afe56

Remove code

4b42679

jlstevens reviewed Nov 24, 2023

View reviewed changes

holonote/annotate/table.py Show resolved Hide resolved

Reset when reverting to snapshot

94410c6

hoxbro merged commit 61e7811 into main Nov 27, 2023
17 checks passed

hoxbro deleted the dont_concat_before_nessesary branch November 27, 2023 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't concatenate data for `region_df` / `field_df` before necessary #72

Don't concatenate data for `region_df` / `field_df` before necessary #72

hoxbro commented Nov 23, 2023 •

edited

Loading

hoxbro Nov 23, 2023

hoxbro Nov 23, 2023 •

edited

Loading

codspeed-hq bot commented Nov 24, 2023 •

edited

Loading

		@@ -20,8 +20,6 @@ class AnnotationTable(param.Parameterized):

		columns = ("region", "dim", "value", "_id")

		index = param.List(default=[])

Don't concatenate data for region_df / field_df before necessary #72

Don't concatenate data for region_df / field_df before necessary #72

Conversation

hoxbro commented Nov 23, 2023 • edited Loading

hoxbro Nov 23, 2023

Choose a reason for hiding this comment

hoxbro Nov 23, 2023 • edited Loading

Choose a reason for hiding this comment

codspeed-hq bot commented Nov 24, 2023 • edited Loading

CodSpeed Performance Report

Merging #72 will improve performances by ×11

Summary

Benchmarks breakdown

Don't concatenate data for `region_df` / `field_df` before necessary #72

Don't concatenate data for `region_df` / `field_df` before necessary #72

hoxbro commented Nov 23, 2023 •

edited

Loading

hoxbro Nov 23, 2023 •

edited

Loading

codspeed-hq bot commented Nov 24, 2023 •

edited

Loading