Skip to content

feat: Add ExcelDataLoader for Data Formulator#158

Closed
rafaelascanio wants to merge 1 commit into
microsoft:mainfrom
rafaelascanio:feat/excel-data-loader
Closed

feat: Add ExcelDataLoader for Data Formulator#158
rafaelascanio wants to merge 1 commit into
microsoft:mainfrom
rafaelascanio:feat/excel-data-loader

Conversation

@rafaelascanio
Copy link
Copy Markdown

This commit introduces a new data loader for Microsoft's Data Formulator, enabling you to load data directly from Excel files (.xlsx).

Key changes include:

  • Created ExcelDataLoader class in py-src/data_formulator/data_loader/excel_data_loader.py, inheriting from ExternalDataLoader.
  • Implemented methods in ExcelDataLoader:
    • list_params(): Defines file_path as a required parameter.
    • __init__(): Initializes the loader with the Excel file path and DuckDB connection, using pandas and openpyxl to read the file.
    • list_tables(): Lists all sheets in the Excel file as available tables, providing sheet names, column details, and sample data.
    • ingest_data(): Loads data from a specified Excel sheet into a DuckDB table, using the inherited ingest_df_to_duckdb method. Supports custom table naming via name_as.
    • view_query_sample(): Returns a JSON sample of the first few rows of a specified sheet.
    • ingest_data_from_query(): Raises NotImplementedError as direct querying is not applicable to Excel files in this context.
  • Registered ExcelDataLoader in py-src/data_formulator/data_loader/__init__.py to make it available to the application.
  • Added openpyxl to requirements.txt as a necessary dependency for pandas to handle .xlsx files (pandas was already listed).
  • Created comprehensive unit tests in py-src/data_formulator/tests/test_excel_data_loader.py covering various functionalities of ExcelDataLoader, including initialization, listing tables, data ingestion, sample viewing, and error handling scenarios. A temporary Excel file with multiple sheets is generated during test setup for thorough testing.

This new loader enhances Data Formulator's capability to work with diverse data sources, allowing you to easily integrate your existing Excel-based datasets for visualization and analysis.

This commit introduces a new data loader for Microsoft's Data Formulator, enabling you to load data directly from Excel files (.xlsx).

Key changes include:

- Created `ExcelDataLoader` class in `py-src/data_formulator/data_loader/excel_data_loader.py`, inheriting from `ExternalDataLoader`.
- Implemented methods in `ExcelDataLoader`:
    - `list_params()`: Defines `file_path` as a required parameter.
    - `__init__()`: Initializes the loader with the Excel file path and DuckDB connection, using pandas and openpyxl to read the file.
    - `list_tables()`: Lists all sheets in the Excel file as available tables, providing sheet names, column details, and sample data.
    - `ingest_data()`: Loads data from a specified Excel sheet into a DuckDB table, using the inherited `ingest_df_to_duckdb` method. Supports custom table naming via `name_as`.
    - `view_query_sample()`: Returns a JSON sample of the first few rows of a specified sheet.
    - `ingest_data_from_query()`: Raises `NotImplementedError` as direct querying is not applicable to Excel files in this context.
- Registered `ExcelDataLoader` in `py-src/data_formulator/data_loader/__init__.py` to make it available to the application.
- Added `openpyxl` to `requirements.txt` as a necessary dependency for pandas to handle `.xlsx` files (`pandas` was already listed).
- Created comprehensive unit tests in `py-src/data_formulator/tests/test_excel_data_loader.py` covering various functionalities of `ExcelDataLoader`, including initialization, listing tables, data ingestion, sample viewing, and error handling scenarios. A temporary Excel file with multiple sheets is generated during test setup for thorough testing.

This new loader enhances Data Formulator's capability to work with diverse data sources, allowing you to easily integrate your existing Excel-based datasets for visualization and analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant