The Multilingual AI Annotation Platform for Low-Resource Languages
LinguaLabel is a cutting-edge annotation platform and marketplace connecting AI companies with native speakers of underserved languages. We serve the 3 billion people whose languages lack adequate NLP tools.
- Only ~20 of 7,000+ world languages have adequate NLP tools
- AI companies spend $1B+ annually on annotation but can't access rare language speakers
- Scale AI, Mercor, and Surge AI focus on English-first markets
- Purpose-built annotation tools for multilingual NLP (RTL support, complex scripts, audio)
- Native speaker marketplace with diaspora and in-country recruitment
- Quality-first approach with multi-annotator consensus and expert review
| Region | Languages |
|---|---|
| Indian | Hindi, Bengali |
| African | Swahili, Yoruba |
| Arabic | Egyptian Arabic, Gulf Arabic |
- Frontend: Next.js 15 (TypeScript, Tailwind CSS)
- Backend: Python 3.10+ (FastAPI, SQLAlchemy)
- Database: PostgreSQL
- Annotation: Label Studio SDK integration
- Payments: Stripe Connect for annotator payouts
- Infrastructure: Vercel (frontend) + Railway (backend)
LinguaLabel/
├── frontend/ # Next.js web application
│ ├── src/app/ # App router pages
│ └── src/lib/ # API client and utilities
├── backend/ # FastAPI server
│ ├── app/
│ │ ├── core/ # Config, database, security
│ │ ├── models/ # SQLAlchemy models
│ │ ├── routers/ # API endpoints
│ │ ├── schemas/ # Pydantic schemas
│ │ └── services/ # External integrations
│ └── alembic/ # Database migrations
├── annotation/ # Label Studio customizations
├── docs/ # Documentation
└── scripts/ # Deployment and utility scripts
- Node.js 18+
- Python 3.10+
- PostgreSQL 15+
- Docker (optional, for Label Studio)
# Clone the repository
git clone https://github.com/rohanbsher/LinguaLabel.git
cd LinguaLabel
# Frontend setup
cd frontend
npm install
npm run dev
# Runs on http://localhost:3000
# Backend setup (in another terminal)
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env with your database URL and API keys
# Run database migrations
alembic upgrade head
# Start the server
uvicorn main:app --reload --port 8000
# API docs at http://localhost:8000/docs
# Label Studio (optional, in another terminal)
docker run -d -p 8080:8080 -v labelstudio-data:/label-studio/data heartexlabs/label-studio:latest
# Runs on http://localhost:8080Backend (.env):
# Database
DATABASE_URL=postgresql://user:password@localhost/lingualabel
# Security
SECRET_KEY=your-secret-key-min-32-chars
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30
# CORS
CORS_ORIGINS=["http://localhost:3000"]
# Label Studio (optional)
LABEL_STUDIO_URL=http://localhost:8080
LABEL_STUDIO_API_KEY=your-api-key
# Stripe (optional)
STRIPE_SECRET_KEY=sk_test_...
STRIPE_PUBLISHABLE_KEY=pk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...Frontend (.env.local):
NEXT_PUBLIC_API_URL=http://localhost:8000- Connect your GitHub repository to Vercel
- Set the root directory to
frontend - Add environment variable:
NEXT_PUBLIC_API_URL= Your Railway backend URL
# Or deploy via CLI
cd frontend
npx vercel --prod- Create a new Railway project
- Add PostgreSQL from the Railway marketplace
- Connect your GitHub repository
- Set the root directory to
backend - Add environment variables:
DATABASE_URL= (auto-filled from PostgreSQL addon)SECRET_KEY= (generate a secure random string)CORS_ORIGINS=["https://your-app.vercel.app"]LABEL_STUDIO_URL= (optional)STRIPE_SECRET_KEY= (optional)
The railway.json config will automatically run migrations on deploy.
For production Label Studio, you can:
- Deploy on Railway using Docker
- Use Label Studio Cloud (https://app.heartex.com/)
- Self-host on any Docker-capable platform
POST /api/auth/register- Register a new userPOST /api/auth/login- Login and get access tokenGET /api/auth/me- Get current user info
GET /api/projects- List projectsPOST /api/projects- Create a new projectGET /api/projects/{id}- Get project detailsPOST /api/projects/{id}/tasks- Add tasks to projectPOST /api/projects/{id}/sync- Sync with Label Studio
GET /api/payments/status- Get Stripe Connect statusPOST /api/payments/connect/onboard- Start Connect onboardingGET /api/payments/earnings- Get earnings summaryPOST /api/payments/withdraw- Request withdrawal
GET /api/languages- List supported languagesGET /api/stats- Platform statistics
- Phase 1-7: Core platform (auth, dashboard, projects)
- Phase 8: Label Studio integration
- Phase 9: Stripe Connect payments
- Phase 10: Deployment configuration
- Take rate: 25-35% on annotator payments
- Target customers: AI labs, translation companies, research institutions
- Year 1 goal: $700K ARR, 800 annotators, 30 languages
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License
Built with passion for multilingual AI.
Repository: https://github.com/rohanbsher/LinguaLabel
Serving the 3 billion people whose languages are underserved by AI.