Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry: Record DB creation time #1210

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

edoardopirovano
Copy link
Contributor

Now that we've established that TRAP caching seems to be working as expected, I'd like to move on to measuring what performance gains it's giving us. In order to do that, we actually need some telemetry covering the extraction time, which we do not have at the moment. For scanned languages, this happens as part of the DB creation (alongside finalization). We could try and record a more granular time for just the extraction time of individual languages, and separately recording the time for TRAP import. However, I'm hesitant to add more language-specific fields to our telemetry so I think for now just recording the total DB creation time will be enough and is certainly an improvement on what we have now (nothing).

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.
  • Confirm the changelog has been updated if necessary.

@edoardopirovano edoardopirovano requested a review from a team as a code owner August 24, 2022 11:05
Copy link
Contributor

@henrymercer henrymercer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly sensible, but I wonder whether the term "database creation time" could be misleading, as it's not clear to me whether it includes tracing or not. "Extraction time" is another option, however this also seems unclear. If it's sufficient to just look at scanned languages, we could go for something like "scanned_languages_database_creation_time"?

@edoardopirovano
Copy link
Contributor Author

edoardopirovano commented Aug 24, 2022

This looks mostly sensible, but I wonder whether the term "database creation time" could be misleading, as it's not clear to me whether it includes tracing or not. "Extraction time" is another option, however this also seems unclear. If it's sufficient to just look at scanned languages, we could go for something like "scanned_languages_database_creation_time"?

Hmm, indeed this is why opened this for review before doing the backend changes so we can decide on precisely what we want to measure and how to call it. As it's set up at the moment, this records (extraction of scanned languages + TRAP import time for all languages). I'd be keen to not add extaction_time_$lang fields for each scanned language since then we have to add DB columns whenever we add a language. We can't easily separate out the TRAP import times by language either because it's all done by one CLI command on the cluster. We could separate out scanned_languages_extraction_time from trap_import_time if we wanted to, by plumbing this down a bit deeper.

Edit: Hmm, part of that isn't quite true. We actually don't call finalize on the whole cluster, so we could separate out the TRAP import of individual languages if we wanted to. I still think it's a bad idea for the same reasons that separating out individual extractions of scanned languages is a bad idea.

@edoardopirovano
Copy link
Contributor Author

@henrymercer How about the slightly finer-grained version I've just pushed that splits up extraction of scanned languages from TRAP import? That's as fine-grained as we can go without adding per-language fields.

src/analyze.ts Fixed Show resolved Hide resolved
@henrymercer
Copy link
Contributor

That looks reasonable to me. We already need to add DB columns whenever we add a language, so we wouldn't be introducing additional friction, however I can appreciate the desire to avoid adding too much to what is already a long list of columns.

src/analyze.ts Show resolved Hide resolved
src/analyze.ts Outdated Show resolved Hide resolved
@edoardopirovano edoardopirovano merged commit 92c650b into main Aug 24, 2022
@edoardopirovano edoardopirovano deleted the edoardo/record-db-creation-time branch August 24, 2022 14:14
@github-actions github-actions bot mentioned this pull request Aug 25, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants