Skip to content

Spreadsheet Parser is a full-stack app that ingests Google Spreadsheets, validates and normalizes data, and stores it in MongoDB. It features a React interface for managing datasets, viewing records, and exporting data as CSV, with a Node.js/Express backend integrated with the Google Sheets API.

License

Notifications You must be signed in to change notification settings

reyden142/spreadsheet-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Spreadsheet Parser

Node.js Express React MongoDB Vite Google Sheets API Railway

Full-stack web application that ingests Google Spreadsheets, validates and normalizes data, stores it in MongoDB, and provides a React UI to manage parsed datasets.

🌐 Live Website

Access the application: https://spreadsheet-parser-production.up.railway.app

Note: The application is deployed on Railway. Make sure all environment variables are configured correctly for production use.


πŸ“Έ Screenshots

Main Interface - Spreadsheet Ingestion

Main Interface The main interface showing the spreadsheet URL input form and ingestion summary with statistics.

Dataset Management & Viewer

Dataset Management The dataset management interface displaying the list of ingested datasets and the detailed data viewer with all parsed rows and columns.


✨ Features

  • πŸ“₯ Ingest Google Sheets - Parse and import data from Google Spreadsheets
  • βœ… Data Validation - Automatic validation and normalization of spreadsheet data
  • πŸ’Ύ MongoDB Storage - Persistent storage of parsed datasets
  • πŸ“‹ Dataset Management - View, download, and delete parsed datasets
  • πŸ“₯ CSV Export - Download datasets as CSV files
  • 🎨 Modern UI - Clean and intuitive React interface

πŸ—οΈ Architecture at a Glance

  • Backend (/backend): Express API, Google Sheets client, parser/validator, MongoDB persistence (Mongoose), CSV exporter.
  • Frontend (/frontend): React + Vite single-page UI for ingestion, listing datasets, viewing rows, downloading CSV, deleting datasets.
  • Database: MongoDB (local or Atlas). Each ingest is stored as a document containing headers, rows, and parsing summary/logs.

πŸ“‹ Prerequisites

  • βœ… Node.js 18+
  • βœ… npm 9+
  • βœ… MongoDB 6+ (local mongod or Atlas connection string)
  • βœ… Google Cloud project with Sheets API enabled and a service account that has access to your copied spreadsheet.

πŸ”§ Google Sheets Setup

  1. πŸ“„ Open the provided sample sheet and choose File β†’ Make a copy.
  2. πŸ‘€ Share your copy with the Google service account email (from the credentials you create below) as a Viewer.
  3. πŸ”— Grab the URL of your copy and keep it handy for testing.

Example test sheet URL (replace with your copy): https://docs.google.com/spreadsheets/d/YOUR_COPY_ID/edit#gid=0


βš™οΈ Environment Variables

Backend Environment Variables

Create /backend/.env (based on backend/env.example):

PORT=4000
MONGODB_URI=mongodb://localhost:27017/spreadsheet_parser
GOOGLE_SERVICE_ACCOUNT_EMAIL=your-service-account@project.iam.gserviceaccount.com
GOOGLE_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
GOOGLE_SHEETS_SCOPES=https://www.googleapis.com/auth/spreadsheets.readonly
ALLOWED_ORIGIN=http://localhost:5173
  • Service account: In Google Cloud console create credentials β†’ Service Account β†’ add Sheets API role (or basic). Download JSON key, copy client_email and private_key. Store the private key exactly as shown above (escaped newlines).

Frontend Environment Variables (Optional)

Create /frontend/.env:

VITE_API_BASE_URL=http://localhost:4000

Note: In production, the frontend uses relative URLs (same origin), so this is only needed for local development with separate servers.


πŸš€ Installation & Running

Local Development

1. Backend

cd backend
npm install
npm run dev

Server runs on http://localhost:4000.

2. Frontend

cd frontend
npm install
npm run dev

Vite dev server runs on http://localhost:5173.

Production Build

# Install all dependencies
npm run install:all

# Build frontend
npm run build

# Start backend (serves both API and frontend)
npm start

πŸ“– How to Use

Step 1: Access the Application 🌐

  1. Open your web browser
  2. Navigate to: https://spreadsheet-parser-production.up.railway.app

Step 2: Prepare Your Google Sheet πŸ“Š

  1. Create a Google Spreadsheet or use an existing one
  2. Ensure the first row contains column headers
  3. Fill in your data rows below the headers
  4. Share the spreadsheet with your Google Service Account email (set in environment variables) as a Viewer

Step 3: Ingest a Spreadsheet πŸ“₯

  1. On the homepage, you'll see an input field for the Google Sheet URL
  2. Copy the Google Spreadsheet URL (format: https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit#gid=0)
  3. Paste the URL into the input field
  4. Click the "Ingest Spreadsheet" button
  5. Wait for the processing to complete

Step 4: View Results βœ…

After ingestion, you'll see:

  • Summary Card showing:
    • Total rows read
    • Rows successfully inserted
    • Rows skipped (if any)
    • Skip reasons (if applicable)
  • Dataset List showing all ingested spreadsheets

Step 5: Manage Datasets πŸ“‹

For each dataset, you can:

  • πŸ‘οΈ View - Click to see the full dataset with all rows and columns
  • πŸ“₯ Download CSV - Export the dataset as a CSV file
  • πŸ—‘οΈ Delete - Remove the dataset from the database

Step 6: View Dataset Details πŸ”

  1. Click on any dataset in the list
  2. View the complete data in a table format
  3. See headers and all parsed rows
  4. Navigate back to the dataset list using the back button

πŸ”„ Workflow

  1. πŸ“„ Open the React UI.
  2. πŸ“‹ Paste your copied Google Sheet URL and click Ingest Spreadsheet.
  3. βš™οΈ Backend fetches the first worksheet, validates headers, coerces values (numbers/dates), skips empty rows, logs issues, and stores everything in MongoDB.
  4. βœ… The UI refreshes showing:
    • Summary of the ingest (rows read/inserted/skipped, warnings/logs).
    • Table of datasets with view / CSV download / delete actions.
    • Dataset viewer renders rows as a table.

πŸ”Œ API Endpoints

Method Endpoint Description
GET /health Health check endpoint
POST /api/spreadsheets Ingest spreadsheet by URL
GET /api/spreadsheets List stored datasets
GET /api/spreadsheets/:id Retrieve dataset (headers + rows + summary)
GET /api/spreadsheets/:id/csv Download dataset as CSV
DELETE /api/spreadsheets/:id Delete dataset

βœ… Testing Checklist

  • βœ… Start MongoDB (mongod) locally or ensure Atlas cluster reachable.
  • βœ… Run backend & frontend dev servers.
  • βœ… In UI, ingest your sheet URL.
  • βœ… Confirm:
    • Summary shows expected counts/logs.
    • Dataset list refreshes with new entry.
    • Viewer renders rows with correct headers.
    • CSV download returns well-formed file.
    • Delete action removes dataset.

πŸ› Troubleshooting

Authentication Errors πŸ”

  • Ensure the service account email has Viewer access to your copied sheet
  • Verify that GOOGLE_PRIVATE_KEY retains newline escapes (\\n in env file)
  • Check that Google Sheets API is enabled in your Google Cloud project

Data Issues πŸ“Š

  • Duplicate headers / empty header: Fix the sheet's first row, then re-run ingest
  • Missing data: Check that rows aren't completely empty (they'll be skipped)

Connection Issues πŸ”Œ

  • CORS issues: Set ALLOWED_ORIGIN to your frontend origin (e.g., http://localhost:5173)
  • Mongo connection failures:
    • Verify the MongoDB URI
    • Ensure MongoDB is running (for local) or Atlas cluster is accessible
    • Whitelist IP addresses in MongoDB Atlas Network Access settings
    • For Railway deployment, use 0.0.0.0/0 to allow all IPs

Deployment Issues πŸš€

  • Environment variables not working: Ensure variables are set at the service level in Railway, not project level
  • Frontend not loading: Check that npm run build completed successfully and frontend/dist exists
  • Port issues: Railway automatically sets PORT environment variable

πŸ“¦ Project Structure

spreadsheet-parser/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ controllers/     # Request handlers
β”‚   β”‚   β”œβ”€β”€ infrastructure/  # Database connection
β”‚   β”‚   β”œβ”€β”€ models/          # Mongoose models
β”‚   β”‚   β”œβ”€β”€ routes/          # Express routes
β”‚   β”‚   β”œβ”€β”€ services/        # Business logic (Google Sheets, parser)
β”‚   β”‚   └── index.js          # Express app entry point
β”‚   └── package.json
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.jsx           # Main React component
β”‚   β”‚   β”œβ”€β”€ main.jsx          # React entry point
β”‚   β”‚   └── styles.css        # Application styles
β”‚   β”œβ”€β”€ index.html
β”‚   └── package.json
β”œβ”€β”€ package.json              # Root package.json (monorepo)
β”œβ”€β”€ railway.toml              # Railway deployment config
└── README.md

πŸ“ License

This project is licensed under the MIT License.

MIT License

Copyright (c) 2025 Reyden Jenn Cagata

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


πŸ“§ Support

For issues, questions, or contributions, please open an issue on the repository.


Made with ❀️ using Node.js, Express, React, and MongoDB

About

Spreadsheet Parser is a full-stack app that ingests Google Spreadsheets, validates and normalizes data, and stores it in MongoDB. It features a React interface for managing datasets, viewing records, and exporting data as CSV, with a Node.js/Express backend integrated with the Google Sheets API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages