feat: use object.groupBy as much as possible by ryoppippi · Pull Request #28 · ryoppippi/ccusage

ryoppippi · 2025-06-07T20:48:27Z

No description provided.

gemini-code-assist

Hello @ryoppippi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request. This PR focuses on refactoring the data loading and aggregation logic within src/data-loader.ts. The primary goal, as indicated by the title, is to leverage the Object.groupBy method where applicable to simplify and potentially improve the readability of the code that groups data entries before aggregating them.

Highlights

Refactor Data Aggregation: The core change involves replacing the manual iteration and aggregation using Map objects with a pattern that first collects all relevant data entries into an array, then uses Object.groupBy to group them by a specific key (date, session key, or month), and finally uses reduce on each group to calculate the aggregated totals.
Apply Object.groupBy to Daily Data: The loadDailyUsageData function has been updated to group individual usage entries by date using Object.groupBy before summing up tokens and cost for each day.
Apply Object.groupBy to Session Data: The loadSessionData function now uses Object.groupBy to group usage entries by a session key (projectPath/sessionId). The aggregation logic within the reduce step also handles finding the latest activity timestamp and collecting unique versions for each session.
Apply Object.groupBy to Monthly Data: The loadMonthlyUsageData function, which previously aggregated daily data from loadDailyUsageData into months using a Map, now uses Object.groupBy to group the daily usage objects by month.

Changelog

src/data-loader.ts
- Refactored loadDailyUsageData to use Object.groupBy for date-based aggregation.
- Refactored loadSessionData to use Object.groupBy for session-based aggregation, including updates to how lastActivity and versions are determined.
- Refactored loadMonthlyUsageData to use Object.groupBy for month-based aggregation of daily data.
- Adjusted filtering logic in loadDailyUsageData and loadSessionData to occur after the main aggregation step.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

pkg-pr-new · 2025-06-07T20:48:58Z

Open in StackBlitz

npm i https://pkg.pr.new/ryoppippi/ccusage@28

commit: 2d67512

Copilot

Pull Request Overview

This PR refactors the daily, session, and monthly data loaders to use Object.groupBy for grouping, replacing manual Map usage and reducing boilerplate aggregation logic.

Replaced manual Map builds with Object.groupBy in loadDailyUsageData, loadSessionData, and loadMonthlyUsageData
Aggregations now use Array.prototype.reduce on grouped entries
Date‐range filtering and sorting retain existing semantics but operate on grouped results

Comments suppressed due to low confidence (1)

src/data-loader.ts:150

[nitpick] The variable name allEntries is generic and reused across multiple functions. Consider renaming it to dailyEntries to improve clarity and maintainability.

const allEntries: { data: UsageData; date: string; cost: number }[] = [];

Copilot · 2025-06-07T20:49:56Z

src/data-loader.ts

+		.map(([date, entries]) => {
+			if (entries == null) {
+				return undefined;
+			}

-	if (options?.since || options?.until) {
-		results = results.filter((data) => {
-			const dateStr = data.date.replace(/-/g, ""); // Convert to YYYYMMDD
-			if (options.since && dateStr < options.since) return false;
-			if (options.until && dateStr > options.until) return false;
+			return entries.reduce(


The null check for entries is redundant because Object.groupBy always returns an array for each key. You can remove this branch to simplify the code.

Suggested change

.map(([date, entries]) => {

if (entries == null) {

return undefined;

}

if (options?.since || options?.until) {

results = results.filter((data) => {

const dateStr = data.date.replace(/-/g, ""); // Convert to YYYYMMDD

if (options.since && dateStr < options.since) return false;

if (options.until && dateStr > options.until) return false;

return entries.reduce(

.map(([date, entries]) =>

entries.reduce(

Copilot · 2025-06-07T20:49:56Z

src/data-loader.ts

+		.filter((item) => item != null)
+		.filter((item) => {
+			// Filter by date range if specified


[nitpick] Chaining multiple .filter calls leads to extra array traversals. Consider merging the non-null check and date-range filter into a single pass to reduce iterations.

Suggested change

.filter((item) => item != null)

.filter((item) => {

// Filter by date range if specified

.filter((item) => {

// Ensure item is not null and filter by date range if specified

if (item == null) {

return false;

}

gemini-code-assist

Code Review

This pull request successfully refactors loadDailyUsageData, loadSessionData, and loadMonthlyUsageData to use Object.groupBy for aggregating data. This change makes the grouping intent very clear and leverages a modern JavaScript feature.

The implementation correctly handles the Partial<Record<K, T[]>> return type of Object.groupBy by checking for null or undefined entry arrays and filtering them out, which is good practice.

One notable improvement in loadSessionData is the addition of default values for sessionId and projectPath when parsing file paths, enhancing robustness.

The primary area for consideration is the potential impact on performance, as the new approach generally involves more passes over the data arrays compared to the previous Map-based in-place aggregation. This is discussed in more detail in the specific comments.

Summary of Findings

Potential Performance Impact in loadDailyUsageData: The refactor to use Object.groupBy in loadDailyUsageData results in multiple iterations over the collected entries (grouping, then mapping/reducing, then filtering). This could be less performant than the previous single-pass Map-based aggregation for large datasets.
Potential Performance Impact in loadSessionData: In loadSessionData, after grouping by session, each session's data is iterated multiple times (to find the latest entry, to collect versions, and to sum totals). This contrasts with the previous incremental update approach and might affect performance for sessions with many entries.
Potential Performance Impact in loadMonthlyUsageData: The loadMonthlyUsageData function now performs Object.groupBy and subsequent aggregation on the results of loadDailyUsageData. This means the dailyData array is iterated multiple times after it has already been computed, potentially adding overhead.

Merge Readiness

The pull request successfully implements the goal of using Object.groupBy. However, there are medium-severity concerns regarding potential performance degradation in all three refactored functions due to increased data iterations. It would be beneficial to discuss these performance aspects, and potentially benchmark the changes against the previous implementation if large datasets are common, before merging.

As an AI assistant, I am not authorized to approve pull requests. Please ensure this code is reviewed and approved by the appropriate team members.

gemini-code-assist · 2025-06-07T20:50:04Z

src/data-loader.ts

+	// Group by date using Object.groupBy
+	const groupedByDate = Object.groupBy(allEntries, (entry) => entry.date);
+
+	// Aggregate each group
+	const results = Object.entries(groupedByDate)
+		.map(([date, entries]) => {
+			if (entries == null) {
+				return undefined;
+			}

-	if (options?.since || options?.until) {
-		results = results.filter((data) => {
-			const dateStr = data.date.replace(/-/g, ""); // Convert to YYYYMMDD
-			if (options.since && dateStr < options.since) return false;
-			if (options.until && dateStr > options.until) return false;
+			return entries.reduce(
+				(acc, entry) => ({
+					date,
+					inputTokens:
+						acc.inputTokens + (entry.data.message.usage.input_tokens ?? 0),
+					outputTokens:
+						acc.outputTokens + (entry.data.message.usage.output_tokens ?? 0),
+					cacheCreationTokens:
+						acc.cacheCreationTokens +
+						(entry.data.message.usage.cache_creation_input_tokens ?? 0),
+					cacheReadTokens:
+						acc.cacheReadTokens +
+						(entry.data.message.usage.cache_read_input_tokens ?? 0),
+					totalCost: acc.totalCost + entry.cost,
+				}),
+				{
+					date,
+					inputTokens: 0,
+					outputTokens: 0,
+					cacheCreationTokens: 0,
+					cacheReadTokens: 0,
+					totalCost: 0,
+				},
+			);
+		})
+		.filter((item) => item != null)
+		.filter((item) => {
+			// Filter by date range if specified
+			if (options?.since || options?.until) {
+				const dateStr = item.date.replace(/-/g, ""); // Convert to YYYYMMDD
+				if (options.since && dateStr < options.since) return false;
+				if (options.until && dateStr > options.until) return false;
+			}
 			return true;
 		});


The new approach using Object.groupBy followed by Object.entries().map().reduce() is clear in its intent. However, it introduces multiple iterations over the allEntries data:

Object.groupBy iterates once.

Object.entries().map() iterates over the groups, and reduce() iterates over entries within each group.

The subsequent .filter() for date range iterates again.

The previous Map-based approach aggregated data in a more incremental fashion, typically involving fewer full passes over the dataset.

Have you considered the potential performance implications for very large log files or a high number of entries? It might be worth benchmarking this against the older implementation if performance is critical for large datasets.

The check if (entries == null) (line 184) and the filter((item) => item != null) (line 213) are correctly implemented to handle the Partial<Record<K, T[]>> return type of Object.groupBy where group arrays can be undefined.

gemini-code-assist · 2025-06-07T20:50:05Z

src/data-loader.ts

+	// Group by session using Object.groupBy
+	const groupedBySessions = Object.groupBy(
+		allEntries,
+		(entry) => entry.sessionKey,
+	);
+
+	// Aggregate each session group
+	const results = Object.entries(groupedBySessions)
+		.map(([_, entries]) => {
+			if (entries == null) {
+				return undefined;
+			}
+
+			// Find the latest timestamp for lastActivity
+			const latestEntry = entries.reduce((latest, current) =>
+				current.timestamp > latest.timestamp ? current : latest,
+			);

-	if (options?.since || options?.until) {
-		results = results.filter((session) => {
-			const dateStr = session.lastActivity.replace(/-/g, ""); // Convert to YYYYMMDD
-			if (options.since && dateStr < options.since) return false;
-			if (options.until && dateStr > options.until) return false;
+			// Collect all unique versions
+			const versionSet = new Set<string>();
+			for (const entry of entries) {
+				if (entry.data.version) {
+					versionSet.add(entry.data.version);
+				}
+			}
+
+			// Aggregate totals
+			const aggregated = entries.reduce(
+				(acc, entry) => ({
+					sessionId: latestEntry.sessionId,
+					projectPath: latestEntry.projectPath,
+					inputTokens:
+						acc.inputTokens + (entry.data.message.usage.input_tokens ?? 0),
+					outputTokens:
+						acc.outputTokens + (entry.data.message.usage.output_tokens ?? 0),
+					cacheCreationTokens:
+						acc.cacheCreationTokens +
+						(entry.data.message.usage.cache_creation_input_tokens ?? 0),
+					cacheReadTokens:
+						acc.cacheReadTokens +
+						(entry.data.message.usage.cache_read_input_tokens ?? 0),
+					totalCost: acc.totalCost + entry.cost,
+					lastActivity: formatDate(latestEntry.timestamp),
+					versions: Array.from(versionSet).sort(),
+				}),
+				{
+					sessionId: latestEntry.sessionId,
+					projectPath: latestEntry.projectPath,
+					inputTokens: 0,
+					outputTokens: 0,
+					cacheCreationTokens: 0,
+					cacheReadTokens: 0,
+					totalCost: 0,
+					lastActivity: formatDate(latestEntry.timestamp),
+					versions: Array.from(versionSet).sort(),
+				},
+			);
+
+			return aggregated;
+		})
+		.filter((item) => item != null)
+		.filter((item) => {
+			// Filter by date range if specified
+			if (options?.since || options?.until) {
+				const dateStr = item.lastActivity.replace(/-/g, ""); // Convert to YYYYMMDD
+				if (options.since && dateStr < options.since) return false;
+				if (options.until && dateStr > options.until) return false;
+			}
 			return true;
 		});


Similar to loadDailyUsageData, this refactoring for loadSessionData uses Object.groupBy effectively. The addition of default values for sessionId (line 266) and projectPath (line 268) is a nice improvement for robustness.

Regarding performance and complexity within the aggregation step (lines 310-361):

Finding latestEntry involves a pass over entries for the current session group.

Collecting versionSet involves another pass over entries.

The final reduce to aggregate totals is a third pass over entries.

This multi-pass approach within each group's processing, after the initial Object.groupBy, could be less efficient than the previous Map-based method that updated session data incrementally in one pass over the raw log lines.

Could this lead to noticeable slowdowns for sessions with a very large number of individual log entries? The increased number of iterations per session group might be a concern for performance-sensitive scenarios.

gemini-code-assist · 2025-06-07T20:50:05Z

src/data-loader.ts

+	// Group daily data by month using Object.groupBy
+	const groupedByMonth = Object.groupBy(dailyData, (data) =>
+		data.date.substring(0, 7),
+	);

-	for (const data of dailyData) {
-		// Extract YYYY-MM from YYYY-MM-DD
-		const month = data.date.substring(0, 7);
-
-		const existing = monthlyMap.get(month) || {
-			month,
-			inputTokens: 0,
-			outputTokens: 0,
-			cacheCreationTokens: 0,
-			cacheReadTokens: 0,
-			totalCost: 0,
-		};
-
-		existing.inputTokens += data.inputTokens;
-		existing.outputTokens += data.outputTokens;
-		existing.cacheCreationTokens += data.cacheCreationTokens;
-		existing.cacheReadTokens += data.cacheReadTokens;
-		existing.totalCost += data.totalCost;
+	// Aggregate each month group
+	const monthlyArray = Object.entries(groupedByMonth)
+		.map(([month, dailyEntries]) => {
+			if (dailyEntries == null) {
+				return undefined;
+			}

-		monthlyMap.set(month, existing);
-	}
+			return dailyEntries.reduce(
+				(acc, data) => ({
+					month,
+					inputTokens: acc.inputTokens + data.inputTokens,
+					outputTokens: acc.outputTokens + data.outputTokens,
+					cacheCreationTokens:
+						acc.cacheCreationTokens + data.cacheCreationTokens,
+					cacheReadTokens: acc.cacheReadTokens + data.cacheReadTokens,
+					totalCost: acc.totalCost + data.totalCost,
+				}),
+				{
+					month,
+					inputTokens: 0,
+					outputTokens: 0,
+					cacheCreationTokens: 0,
+					cacheReadTokens: 0,
+					totalCost: 0,
+				},
+			);
+		})
+		.filter((item) => item != null);


In loadMonthlyUsageData, the dailyData (which is already processed and potentially filtered by loadDailyUsageData) is now processed again:

Object.groupBy(dailyData, ...) iterates over dailyData.

Object.entries().map().reduce() iterates over the monthly groups and their constituent daily entries.

The previous implementation iterated dailyData once to populate monthlyMap.

This change might introduce additional overhead, especially since dailyData itself is the result of prior processing. Was this potential performance trade-off considered? For instance, if loadDailyUsageData returns a large number of daily records, these subsequent iterations could add up.

feat: use object.groupBy as much as possible

ryoppippi added 2 commits June 7, 2025 21:42

feat: use object.groupBy as much as possible

92aa386

fix: remove type assertion

29f4e42

ryoppippi requested a review from Copilot June 7, 2025 20:48

gemini-code-assist bot reviewed Jun 7, 2025

View reviewed changes

ryoppippi merged commit 5fbaede into main Jun 7, 2025
9 checks passed

ryoppippi deleted the refactor branch June 7, 2025 20:49

Copilot AI reviewed Jun 7, 2025

View reviewed changes

gemini-code-assist bot suggested changes Jun 7, 2025

View reviewed changes

ryoppippi added a commit that referenced this pull request Jun 22, 2025

Merge pull request #28 from ryoppippi/refactor

28a97fc

feat: use object.groupBy as much as possible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use object.groupBy as much as possible#28

feat: use object.groupBy as much as possible#28
ryoppippi merged 2 commits intomainfrom
refactor

ryoppippi commented Jun 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

pkg-pr-new bot commented Jun 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 7, 2025

Uh oh!

Copilot AI Jun 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 7, 2025

Uh oh!

gemini-code-assist bot Jun 7, 2025

Uh oh!

gemini-code-assist bot Jun 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ryoppippi commented Jun 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

pkg-pr-new bot commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pkg-pr-new bot commented Jun 7, 2025 •

edited

Loading