📊 Spreadsheet Parser

Full-stack web application that ingests Google Spreadsheets, validates and normalizes data, stores it in MongoDB, and provides a React UI to manage parsed datasets.

🌐 Live Website

Access the application: https://spreadsheet-parser-production.up.railway.app

Note: The application is deployed on Railway. Make sure all environment variables are configured correctly for production use.

📸 Screenshots

Main Interface - Spreadsheet Ingestion

The main interface showing the spreadsheet URL input form and ingestion summary with statistics.

Dataset Management & Viewer

The dataset management interface displaying the list of ingested datasets and the detailed data viewer with all parsed rows and columns.

✨ Features

📥 Ingest Google Sheets - Parse and import data from Google Spreadsheets
✅ Data Validation - Automatic validation and normalization of spreadsheet data
💾 MongoDB Storage - Persistent storage of parsed datasets
📋 Dataset Management - View, download, and delete parsed datasets
📥 CSV Export - Download datasets as CSV files
🎨 Modern UI - Clean and intuitive React interface

🏗️ Architecture at a Glance

Backend (/backend): Express API, Google Sheets client, parser/validator, MongoDB persistence (Mongoose), CSV exporter.
Frontend (/frontend): React + Vite single-page UI for ingestion, listing datasets, viewing rows, downloading CSV, deleting datasets.
Database: MongoDB (local or Atlas). Each ingest is stored as a document containing headers, rows, and parsing summary/logs.

📋 Prerequisites

✅ Node.js 18+
✅ npm 9+
✅ MongoDB 6+ (local mongod or Atlas connection string)
✅ Google Cloud project with Sheets API enabled and a service account that has access to your copied spreadsheet.

🔧 Google Sheets Setup

📄 Open the provided sample sheet and choose File → Make a copy.
👤 Share your copy with the Google service account email (from the credentials you create below) as a Viewer.
🔗 Grab the URL of your copy and keep it handy for testing.

Example test sheet URL (replace with your copy): https://docs.google.com/spreadsheets/d/YOUR_COPY_ID/edit#gid=0

⚙️ Environment Variables

Backend Environment Variables

Create /backend/.env (based on backend/env.example):

PORT=4000
MONGODB_URI=mongodb://localhost:27017/spreadsheet_parser
GOOGLE_SERVICE_ACCOUNT_EMAIL=your-service-account@project.iam.gserviceaccount.com
GOOGLE_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
GOOGLE_SHEETS_SCOPES=https://www.googleapis.com/auth/spreadsheets.readonly
ALLOWED_ORIGIN=http://localhost:5173

Service account: In Google Cloud console create credentials → Service Account → add Sheets API role (or basic). Download JSON key, copy client_email and private_key. Store the private key exactly as shown above (escaped newlines).

Frontend Environment Variables (Optional)

Create /frontend/.env:

VITE_API_BASE_URL=http://localhost:4000

Note: In production, the frontend uses relative URLs (same origin), so this is only needed for local development with separate servers.

🚀 Installation & Running

Local Development

1. Backend

cd backend
npm install
npm run dev

Server runs on http://localhost:4000.

2. Frontend

cd frontend
npm install
npm run dev

Vite dev server runs on http://localhost:5173.

Production Build

# Install all dependencies
npm run install:all

# Build frontend
npm run build

# Start backend (serves both API and frontend)
npm start

📖 How to Use

Step 1: Access the Application 🌐

Open your web browser
Navigate to: https://spreadsheet-parser-production.up.railway.app

Step 2: Prepare Your Google Sheet 📊

Create a Google Spreadsheet or use an existing one
Ensure the first row contains column headers
Fill in your data rows below the headers
Share the spreadsheet with your Google Service Account email (set in environment variables) as a Viewer

Step 3: Ingest a Spreadsheet 📥

On the homepage, you'll see an input field for the Google Sheet URL
Copy the Google Spreadsheet URL (format: https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit#gid=0)
Paste the URL into the input field
Click the "Ingest Spreadsheet" button
Wait for the processing to complete

Step 4: View Results ✅

After ingestion, you'll see:

Summary Card showing:
- Total rows read
- Rows successfully inserted
- Rows skipped (if any)
- Skip reasons (if applicable)
Dataset List showing all ingested spreadsheets

Step 5: Manage Datasets 📋

For each dataset, you can:

👁️ View - Click to see the full dataset with all rows and columns
📥 Download CSV - Export the dataset as a CSV file
🗑️ Delete - Remove the dataset from the database

Step 6: View Dataset Details 🔍

Click on any dataset in the list
View the complete data in a table format
See headers and all parsed rows
Navigate back to the dataset list using the back button

🔄 Workflow

📄 Open the React UI.
📋 Paste your copied Google Sheet URL and click Ingest Spreadsheet.
⚙️ Backend fetches the first worksheet, validates headers, coerces values (numbers/dates), skips empty rows, logs issues, and stores everything in MongoDB.
✅ The UI refreshes showing:
- Summary of the ingest (rows read/inserted/skipped, warnings/logs).
- Table of datasets with view / CSV download / delete actions.
- Dataset viewer renders rows as a table.

🔌 API Endpoints

Method	Endpoint	Description
`GET`	`/health`	Health check endpoint
`POST`	`/api/spreadsheets`	Ingest spreadsheet by URL
`GET`	`/api/spreadsheets`	List stored datasets
`GET`	`/api/spreadsheets/:id`	Retrieve dataset (headers + rows + summary)
`GET`	`/api/spreadsheets/:id/csv`	Download dataset as CSV
`DELETE`	`/api/spreadsheets/:id`	Delete dataset

✅ Testing Checklist

✅ Start MongoDB (mongod) locally or ensure Atlas cluster reachable.
✅ Run backend & frontend dev servers.
✅ In UI, ingest your sheet URL.
✅ Confirm:
- Summary shows expected counts/logs.
- Dataset list refreshes with new entry.
- Viewer renders rows with correct headers.
- CSV download returns well-formed file.
- Delete action removes dataset.

🐛 Troubleshooting

Authentication Errors 🔐

Ensure the service account email has Viewer access to your copied sheet
Verify that GOOGLE_PRIVATE_KEY retains newline escapes (\\n in env file)
Check that Google Sheets API is enabled in your Google Cloud project

Data Issues 📊

Duplicate headers / empty header: Fix the sheet's first row, then re-run ingest
Missing data: Check that rows aren't completely empty (they'll be skipped)

Connection Issues 🔌

CORS issues: Set ALLOWED_ORIGIN to your frontend origin (e.g., http://localhost:5173)
Mongo connection failures:
- Verify the MongoDB URI
- Ensure MongoDB is running (for local) or Atlas cluster is accessible
- Whitelist IP addresses in MongoDB Atlas Network Access settings
- For Railway deployment, use 0.0.0.0/0 to allow all IPs

Deployment Issues 🚀

Environment variables not working: Ensure variables are set at the service level in Railway, not project level
Frontend not loading: Check that npm run build completed successfully and frontend/dist exists
Port issues: Railway automatically sets PORT environment variable

📦 Project Structure

spreadsheet-parser/
├── backend/
│   ├── src/
│   │   ├── controllers/     # Request handlers
│   │   ├── infrastructure/  # Database connection
│   │   ├── models/          # Mongoose models
│   │   ├── routes/          # Express routes
│   │   ├── services/        # Business logic (Google Sheets, parser)
│   │   └── index.js          # Express app entry point
│   └── package.json
├── frontend/
│   ├── src/
│   │   ├── App.jsx           # Main React component
│   │   ├── main.jsx          # React entry point
│   │   └── styles.css        # Application styles
│   ├── index.html
│   └── package.json
├── package.json              # Root package.json (monorepo)
├── railway.toml              # Railway deployment config
└── README.md

📝 License

This project is licensed under the MIT License.

MIT License

Copyright (c) 2025 Reyden Jenn Cagata

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📧 Support

For issues, questions, or contributions, please open an issue on the repository.

Made with ❤️ using Node.js, Express, React, and MongoDB

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
backend		backend
frontend		frontend
img		img
README.md		README.md
license		license
package.json		package.json
railway.toml		railway.toml
start.sh		start.sh

License

reyden142/spreadsheet-parser

Folders and files

Latest commit

History

Repository files navigation