-
Notifications
You must be signed in to change notification settings - Fork 45
feat: support Reader classes #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds support for Reader classes to handle different file formats in GraphGen. The changes introduce a new reader architecture that unifies file handling for TXT, JSON, JSONL, and CSV formats, replacing scattered file reading logic with a centralized approach.
- Refactors file reading logic into dedicated Reader classes for each format
- Adds CSV support to the existing TXT, JSON, and JSONL file types
- Removes the legacy file reading utilities and input data type configuration
- Updates the web UI to support CSV files and reorganizes example files
Reviewed Changes
Copilot reviewed 41 out of 44 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| graphgen/bases/base_reader.py | Introduces abstract base class for file readers |
| graphgen/models/reader/ | Adds reader implementations for TXT, JSON, JSONL, and CSV formats |
| graphgen/utils/file.py | Removes legacy file reading utility |
| webui/count_tokens.py | Adds CSV support to token counting functionality |
| webui/app.py | Updates UI to support CSV files and removes legacy data loading |
| graphgen/graphgen.py | Refactors to use new reader architecture |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| f"Missing '{self.text_column}' in document: {doc}" | ||
| ) | ||
| except json.JSONDecodeError as e: | ||
| print(f"Error decoding JSON line: {line}. Error: {e}") |
Copilot
AI
Sep 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace print statement with proper logging using the logger utility available in the project. This ensures consistent error handling and logging format across the codebase.
| [os.path.join(examples_dir, "txt_demo.txt")], | ||
| [os.path.join(examples_dir, "raw_demo.jsonl")], | ||
| [os.path.join(examples_dir, "chunked_demo.json")], | ||
| [os.path.join(examples_dir, "jsonl_demo.jsonl")], |
Copilot
AI
Sep 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example file references point to non-existent files. Based on the diff, there is no 'jsonl_demo.jsonl' file in the examples directory, only files like 'csv_demo.csv' and 'json_demo.json' are being added.
| [os.path.join(examples_dir, "jsonl_demo.jsonl")], |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
No description provided.