Skip to content

Conversation

@ChenZiHong-Gavin
Copy link
Collaborator

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds support for Reader classes to handle different file formats in GraphGen. The changes introduce a new reader architecture that unifies file handling for TXT, JSON, JSONL, and CSV formats, replacing scattered file reading logic with a centralized approach.

  • Refactors file reading logic into dedicated Reader classes for each format
  • Adds CSV support to the existing TXT, JSON, and JSONL file types
  • Removes the legacy file reading utilities and input data type configuration
  • Updates the web UI to support CSV files and reorganizes example files

Reviewed Changes

Copilot reviewed 41 out of 44 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
graphgen/bases/base_reader.py Introduces abstract base class for file readers
graphgen/models/reader/ Adds reader implementations for TXT, JSON, JSONL, and CSV formats
graphgen/utils/file.py Removes legacy file reading utility
webui/count_tokens.py Adds CSV support to token counting functionality
webui/app.py Updates UI to support CSV files and removes legacy data loading
graphgen/graphgen.py Refactors to use new reader architecture

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

f"Missing '{self.text_column}' in document: {doc}"
)
except json.JSONDecodeError as e:
print(f"Error decoding JSON line: {line}. Error: {e}")
Copy link

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace print statement with proper logging using the logger utility available in the project. This ensures consistent error handling and logging format across the codebase.

Copilot uses AI. Check for mistakes.
[os.path.join(examples_dir, "txt_demo.txt")],
[os.path.join(examples_dir, "raw_demo.jsonl")],
[os.path.join(examples_dir, "chunked_demo.json")],
[os.path.join(examples_dir, "jsonl_demo.jsonl")],
Copy link

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example file references point to non-existent files. Based on the diff, there is no 'jsonl_demo.jsonl' file in the examples directory, only files like 'csv_demo.csv' and 'json_demo.json' are being added.

Suggested change
[os.path.join(examples_dir, "jsonl_demo.jsonl")],

Copilot uses AI. Check for mistakes.
ChenZiHong-Gavin and others added 3 commits September 23, 2025 16:58
@ChenZiHong-Gavin ChenZiHong-Gavin merged commit 363a560 into main Sep 23, 2025
2 checks passed
@ChenZiHong-Gavin ChenZiHong-Gavin deleted the refact-reader branch September 23, 2025 09:05
@ChenZiHong-Gavin ChenZiHong-Gavin mentioned this pull request Sep 23, 2025
31 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants