Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't concatenate data for region_df / field_df before necessary #72

Merged
merged 13 commits into from
Nov 27, 2023

Conversation

hoxbro
Copy link
Member

@hoxbro hoxbro commented Nov 23, 2023

Save intermediate data for field_df and region_df in a list. This list is then concatenated into the respective dataframe when called. An example of code before and after can be run with this code:

This branch: Define annotations (size=5,000): 0.067 seconds
Main branch: Define annotations (size=5,000): 6.149 seconds

from contextlib import contextmanager
from time import perf_counter

import numpy as np
import pandas as pd

from holonote.annotate import Annotator, SQLiteDB


@contextmanager
def catchtime(message):
    start = perf_counter()
    yield
    print(f"{message}: {perf_counter() - start:.3f} seconds")


annotator = Annotator({"TIME": int}, connector=SQLiteDB(table_name="slowness"))


N = 5_000
rng = np.random.default_rng(1337)
start_time = np.arange(0, N * 2, 2)
end_time = start_time + 1
data = pd.DataFrame(
    {
        "start_time": start_time,
        "end_time": end_time,
        "description": rng.choice(["A", "B"], N),
    }
)

with catchtime(f"Define annotations (size={N:,})"):
    annotator.define_annotations(data, TIME=("start_time", "end_time"))

@@ -149,9 +149,7 @@ def clear_regions(self):
def _add_annotation(self, **fields):
# Primary key specification is optional
if self.connector.primary_key.field_name not in fields:
index_val = self.connector.primary_key(
self.connector, list(self.annotation_table._field_df.index)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed anymore but I'm not completely sure.

@@ -20,8 +20,6 @@ class AnnotationTable(param.Parameterized):

columns = ("region", "dim", "value", "_id")

index = param.List(default=[])
Copy link
Member Author

@hoxbro hoxbro Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is not used as a parameterized class and makes it possible to remove the method _update_index for a property that looks up the index.

@jbednar jbednar changed the title Don't concatenate data for region_df / field_df before nessesary Don't concatenate data for region_df / field_df before necessary Nov 23, 2023
@hoxbro hoxbro mentioned this pull request Nov 24, 2023
Copy link

codspeed-hq bot commented Nov 24, 2023

CodSpeed Performance Report

Merging #72 will improve performances by ×11

Comparing dont_concat_before_nessesary (94410c6) with main (a8562ed)

Summary

⚡ 3 improvements

Benchmarks breakdown

Benchmark main dont_concat_before_nessesary Change
test_define_annotations[1000] 14,238.4 ms 405.1 ms ×35
test_define_annotations[100] 873.8 ms 43.1 ms ×20
test_define_annotations[10] 86.4 ms 7.6 ms ×11

@hoxbro hoxbro merged commit 61e7811 into main Nov 27, 2023
17 checks passed
@hoxbro hoxbro deleted the dont_concat_before_nessesary branch November 27, 2023 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants