Skip to content

feat: use object.groupBy as much as possible#28

Merged
ryoppippi merged 2 commits intomainfrom
refactor
Jun 7, 2025
Merged

feat: use object.groupBy as much as possible#28
ryoppippi merged 2 commits intomainfrom
refactor

Conversation

@ryoppippi
Copy link
Owner

No description provided.

@ryoppippi ryoppippi requested a review from Copilot June 7, 2025 20:48
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ryoppippi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request. This PR focuses on refactoring the data loading and aggregation logic within src/data-loader.ts. The primary goal, as indicated by the title, is to leverage the Object.groupBy method where applicable to simplify and potentially improve the readability of the code that groups data entries before aggregating them.

Highlights

  • Refactor Data Aggregation: The core change involves replacing the manual iteration and aggregation using Map objects with a pattern that first collects all relevant data entries into an array, then uses Object.groupBy to group them by a specific key (date, session key, or month), and finally uses reduce on each group to calculate the aggregated totals.
  • Apply Object.groupBy to Daily Data: The loadDailyUsageData function has been updated to group individual usage entries by date using Object.groupBy before summing up tokens and cost for each day.
  • Apply Object.groupBy to Session Data: The loadSessionData function now uses Object.groupBy to group usage entries by a session key (projectPath/sessionId). The aggregation logic within the reduce step also handles finding the latest activity timestamp and collecting unique versions for each session.
  • Apply Object.groupBy to Monthly Data: The loadMonthlyUsageData function, which previously aggregated daily data from loadDailyUsageData into months using a Map, now uses Object.groupBy to group the daily usage objects by month.

Changelog

  • src/data-loader.ts
    • Refactored loadDailyUsageData to use Object.groupBy for date-based aggregation.
    • Refactored loadSessionData to use Object.groupBy for session-based aggregation, including updates to how lastActivity and versions are determined.
    • Refactored loadMonthlyUsageData to use Object.groupBy for month-based aggregation of daily data.
    • Adjusted filtering logic in loadDailyUsageData and loadSessionData to occur after the main aggregation step.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jun 7, 2025

Open in StackBlitz

npm i https://pkg.pr.new/ryoppippi/ccusage@28

commit: 2d67512

@ryoppippi ryoppippi merged commit 5fbaede into main Jun 7, 2025
9 checks passed
@ryoppippi ryoppippi deleted the refactor branch June 7, 2025 20:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the daily, session, and monthly data loaders to use Object.groupBy for grouping, replacing manual Map usage and reducing boilerplate aggregation logic.

  • Replaced manual Map builds with Object.groupBy in loadDailyUsageData, loadSessionData, and loadMonthlyUsageData
  • Aggregations now use Array.prototype.reduce on grouped entries
  • Date‐range filtering and sorting retain existing semantics but operate on grouped results
Comments suppressed due to low confidence (1)

src/data-loader.ts:150

  • [nitpick] The variable name allEntries is generic and reused across multiple functions. Consider renaming it to dailyEntries to improve clarity and maintainability.
const allEntries: { data: UsageData; date: string; cost: number }[] = [];

Comment on lines +183 to +188
.map(([date, entries]) => {
if (entries == null) {
return undefined;
}

if (options?.since || options?.until) {
results = results.filter((data) => {
const dateStr = data.date.replace(/-/g, ""); // Convert to YYYYMMDD
if (options.since && dateStr < options.since) return false;
if (options.until && dateStr > options.until) return false;
return entries.reduce(
Copy link

Copilot AI Jun 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null check for entries is redundant because Object.groupBy always returns an array for each key. You can remove this branch to simplify the code.

Suggested change
.map(([date, entries]) => {
if (entries == null) {
return undefined;
}
if (options?.since || options?.until) {
results = results.filter((data) => {
const dateStr = data.date.replace(/-/g, ""); // Convert to YYYYMMDD
if (options.since && dateStr < options.since) return false;
if (options.until && dateStr > options.until) return false;
return entries.reduce(
.map(([date, entries]) =>
entries.reduce(

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +215
.filter((item) => item != null)
.filter((item) => {
// Filter by date range if specified
Copy link

Copilot AI Jun 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Chaining multiple .filter calls leads to extra array traversals. Consider merging the non-null check and date-range filter into a single pass to reduce iterations.

Suggested change
.filter((item) => item != null)
.filter((item) => {
// Filter by date range if specified
.filter((item) => {
// Ensure item is not null and filter by date range if specified
if (item == null) {
return false;
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors loadDailyUsageData, loadSessionData, and loadMonthlyUsageData to use Object.groupBy for aggregating data. This change makes the grouping intent very clear and leverages a modern JavaScript feature.

The implementation correctly handles the Partial<Record<K, T[]>> return type of Object.groupBy by checking for null or undefined entry arrays and filtering them out, which is good practice.

One notable improvement in loadSessionData is the addition of default values for sessionId and projectPath when parsing file paths, enhancing robustness.

The primary area for consideration is the potential impact on performance, as the new approach generally involves more passes over the data arrays compared to the previous Map-based in-place aggregation. This is discussed in more detail in the specific comments.

Summary of Findings

  • Potential Performance Impact in loadDailyUsageData: The refactor to use Object.groupBy in loadDailyUsageData results in multiple iterations over the collected entries (grouping, then mapping/reducing, then filtering). This could be less performant than the previous single-pass Map-based aggregation for large datasets.
  • Potential Performance Impact in loadSessionData: In loadSessionData, after grouping by session, each session's data is iterated multiple times (to find the latest entry, to collect versions, and to sum totals). This contrasts with the previous incremental update approach and might affect performance for sessions with many entries.
  • Potential Performance Impact in loadMonthlyUsageData: The loadMonthlyUsageData function now performs Object.groupBy and subsequent aggregation on the results of loadDailyUsageData. This means the dailyData array is iterated multiple times after it has already been computed, potentially adding overhead.

Merge Readiness

The pull request successfully implements the goal of using Object.groupBy. However, there are medium-severity concerns regarding potential performance degradation in all three refactored functions due to increased data iterations. It would be beneficial to discuss these performance aspects, and potentially benchmark the changes against the previous implementation if large datasets are common, before merging.

As an AI assistant, I am not authorized to approve pull requests. Please ensure this code is reviewed and approved by the appropriate team members.

Comment on lines +178 to 222
// Group by date using Object.groupBy
const groupedByDate = Object.groupBy(allEntries, (entry) => entry.date);

// Aggregate each group
const results = Object.entries(groupedByDate)
.map(([date, entries]) => {
if (entries == null) {
return undefined;
}

if (options?.since || options?.until) {
results = results.filter((data) => {
const dateStr = data.date.replace(/-/g, ""); // Convert to YYYYMMDD
if (options.since && dateStr < options.since) return false;
if (options.until && dateStr > options.until) return false;
return entries.reduce(
(acc, entry) => ({
date,
inputTokens:
acc.inputTokens + (entry.data.message.usage.input_tokens ?? 0),
outputTokens:
acc.outputTokens + (entry.data.message.usage.output_tokens ?? 0),
cacheCreationTokens:
acc.cacheCreationTokens +
(entry.data.message.usage.cache_creation_input_tokens ?? 0),
cacheReadTokens:
acc.cacheReadTokens +
(entry.data.message.usage.cache_read_input_tokens ?? 0),
totalCost: acc.totalCost + entry.cost,
}),
{
date,
inputTokens: 0,
outputTokens: 0,
cacheCreationTokens: 0,
cacheReadTokens: 0,
totalCost: 0,
},
);
})
.filter((item) => item != null)
.filter((item) => {
// Filter by date range if specified
if (options?.since || options?.until) {
const dateStr = item.date.replace(/-/g, ""); // Convert to YYYYMMDD
if (options.since && dateStr < options.since) return false;
if (options.until && dateStr > options.until) return false;
}
return true;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new approach using Object.groupBy followed by Object.entries().map().reduce() is clear in its intent. However, it introduces multiple iterations over the allEntries data:

  1. Object.groupBy iterates once.
  2. Object.entries().map() iterates over the groups, and reduce() iterates over entries within each group.
  3. The subsequent .filter() for date range iterates again.

The previous Map-based approach aggregated data in a more incremental fashion, typically involving fewer full passes over the dataset.

Have you considered the potential performance implications for very large log files or a high number of entries? It might be worth benchmarking this against the older implementation if performance is critical for large datasets.

The check if (entries == null) (line 184) and the filter((item) => item != null) (line 213) are correctly implemented to handle the Partial<Record<K, T[]>> return type of Object.groupBy where group arrays can be undefined.

Comment on lines +302 to 371
// Group by session using Object.groupBy
const groupedBySessions = Object.groupBy(
allEntries,
(entry) => entry.sessionKey,
);

// Aggregate each session group
const results = Object.entries(groupedBySessions)
.map(([_, entries]) => {
if (entries == null) {
return undefined;
}

// Find the latest timestamp for lastActivity
const latestEntry = entries.reduce((latest, current) =>
current.timestamp > latest.timestamp ? current : latest,
);

if (options?.since || options?.until) {
results = results.filter((session) => {
const dateStr = session.lastActivity.replace(/-/g, ""); // Convert to YYYYMMDD
if (options.since && dateStr < options.since) return false;
if (options.until && dateStr > options.until) return false;
// Collect all unique versions
const versionSet = new Set<string>();
for (const entry of entries) {
if (entry.data.version) {
versionSet.add(entry.data.version);
}
}

// Aggregate totals
const aggregated = entries.reduce(
(acc, entry) => ({
sessionId: latestEntry.sessionId,
projectPath: latestEntry.projectPath,
inputTokens:
acc.inputTokens + (entry.data.message.usage.input_tokens ?? 0),
outputTokens:
acc.outputTokens + (entry.data.message.usage.output_tokens ?? 0),
cacheCreationTokens:
acc.cacheCreationTokens +
(entry.data.message.usage.cache_creation_input_tokens ?? 0),
cacheReadTokens:
acc.cacheReadTokens +
(entry.data.message.usage.cache_read_input_tokens ?? 0),
totalCost: acc.totalCost + entry.cost,
lastActivity: formatDate(latestEntry.timestamp),
versions: Array.from(versionSet).sort(),
}),
{
sessionId: latestEntry.sessionId,
projectPath: latestEntry.projectPath,
inputTokens: 0,
outputTokens: 0,
cacheCreationTokens: 0,
cacheReadTokens: 0,
totalCost: 0,
lastActivity: formatDate(latestEntry.timestamp),
versions: Array.from(versionSet).sort(),
},
);

return aggregated;
})
.filter((item) => item != null)
.filter((item) => {
// Filter by date range if specified
if (options?.since || options?.until) {
const dateStr = item.lastActivity.replace(/-/g, ""); // Convert to YYYYMMDD
if (options.since && dateStr < options.since) return false;
if (options.until && dateStr > options.until) return false;
}
return true;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to loadDailyUsageData, this refactoring for loadSessionData uses Object.groupBy effectively. The addition of default values for sessionId (line 266) and projectPath (line 268) is a nice improvement for robustness.

Regarding performance and complexity within the aggregation step (lines 310-361):

  • Finding latestEntry involves a pass over entries for the current session group.
  • Collecting versionSet involves another pass over entries.
  • The final reduce to aggregate totals is a third pass over entries.

This multi-pass approach within each group's processing, after the initial Object.groupBy, could be less efficient than the previous Map-based method that updated session data incrementally in one pass over the raw log lines.

Could this lead to noticeable slowdowns for sessions with a very large number of individual log entries? The increased number of iterations per session group might be a concern for performance-sensitive scenarios.

Comment on lines +386 to +418
// Group daily data by month using Object.groupBy
const groupedByMonth = Object.groupBy(dailyData, (data) =>
data.date.substring(0, 7),
);

for (const data of dailyData) {
// Extract YYYY-MM from YYYY-MM-DD
const month = data.date.substring(0, 7);

const existing = monthlyMap.get(month) || {
month,
inputTokens: 0,
outputTokens: 0,
cacheCreationTokens: 0,
cacheReadTokens: 0,
totalCost: 0,
};

existing.inputTokens += data.inputTokens;
existing.outputTokens += data.outputTokens;
existing.cacheCreationTokens += data.cacheCreationTokens;
existing.cacheReadTokens += data.cacheReadTokens;
existing.totalCost += data.totalCost;
// Aggregate each month group
const monthlyArray = Object.entries(groupedByMonth)
.map(([month, dailyEntries]) => {
if (dailyEntries == null) {
return undefined;
}

monthlyMap.set(month, existing);
}
return dailyEntries.reduce(
(acc, data) => ({
month,
inputTokens: acc.inputTokens + data.inputTokens,
outputTokens: acc.outputTokens + data.outputTokens,
cacheCreationTokens:
acc.cacheCreationTokens + data.cacheCreationTokens,
cacheReadTokens: acc.cacheReadTokens + data.cacheReadTokens,
totalCost: acc.totalCost + data.totalCost,
}),
{
month,
inputTokens: 0,
outputTokens: 0,
cacheCreationTokens: 0,
cacheReadTokens: 0,
totalCost: 0,
},
);
})
.filter((item) => item != null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In loadMonthlyUsageData, the dailyData (which is already processed and potentially filtered by loadDailyUsageData) is now processed again:

  1. Object.groupBy(dailyData, ...) iterates over dailyData.
  2. Object.entries().map().reduce() iterates over the monthly groups and their constituent daily entries.

The previous implementation iterated dailyData once to populate monthlyMap.

This change might introduce additional overhead, especially since dailyData itself is the result of prior processing. Was this potential performance trade-off considered? For instance, if loadDailyUsageData returns a large number of daily records, these subsequent iterations could add up.

ryoppippi added a commit that referenced this pull request Jun 22, 2025
feat: use object.groupBy as much as possible
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants