Skip to content

goyalayush-tech/script-bridge-forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeoGrammy - Bridging Ancient Wisdom with Modern AI

A unified auxiliary script system for India that preserves the phonetic fidelity and cultural heritage of 50+ Indian languages while embracing the power of modern AI technology.

🌟 Features

  • Advanced Script Converter: AI-powered transliteration between NeoGrammy and 50+ Indian languages with 95%+ phonetic accuracy
  • Corpus Explorer: Interactive database of ancient and modern texts with parallel views
  • AI Assistant: Smart suggestions for transliteration help and morphological analysis
  • Community Feedback: Collaborative system for continuous improvement
  • Unicode-Ready: PUA block mapping (U+16000–160FF) for NeoGrammy characters
  • Speech Integration: Text-to-speech and speech-to-text capabilities

🚀 Quick Start

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Deploy to GitHub Pages
npm run deploy:github-pages

# Preview production build locally
npm run preview

🚀 Deployment

Quick Deploy Options

  1. GitHub Pages (Recommended)

    npm run deploy:github-pages

    Your site will be available at: https://goyalayush-tech.github.io/script-bridge-forge/

  2. Manual Deployment

    npm run build
    # Upload the 'dist/' folder to your web server
  3. Automated CI/CD

    • GitHub Actions workflow automatically deploys on main branch pushes
    • Staging environment available via develop branch
    • Production deployment via manual approval

Production Checklist

  • Build optimization complete
  • Static asset optimization
  • SEO and meta tags configured
  • Error boundaries implemented
  • Performance monitoring ready
  • Analytics integration prepared

📚 Research & Datasets

NeoGrammy leverages a comprehensive ecosystem of Indian NLP datasets. See our detailed research documentation:

  • Research & Datasets Guide - Complete overview of datasets powering NeoGrammy
  • Text Corpora: IndicCorp, Samanantar, BPCC, ILCI
  • Speech Datasets: Shrutilipi, IndicVoices, Kathbath
  • Task-Specific: IndicXlit, Dakshina, Naamapadam

🏗️ Architecture

Front-End

  • React + Vite: Modern, fast development experience
  • TailwindCSS + shadcn/ui: Beautiful, accessible UI components
  • Framer Motion: Smooth animations and transitions

Back-End (Planned)

  • FastAPI (Python): High-performance REST & WebSocket APIs
  • PostgreSQL/Neo4j: Advanced database for linguistic data
  • AI Models: IndicXlit, morphological analyzers, semantic bridges

Key Technologies

  • IndicXlit: 11M-parameter Transformer for transliteration
  • Three-Tier Pipeline: Phonological → Morphological → Semantic processing
  • Unicode Integration: Custom font with ligatures for conjuncts

🎯 Supported Languages

Scheduled Languages (22)

  • Indo-Aryan: Hindi, Bengali, Marathi, Gujarati, Punjabi, Urdu, Assamese, Odia, Kashmiri, Konkani, Maithili, Nepali, Sindhi, Dogri
  • Dravidian: Tamil, Telugu, Kannada, Malayalam
  • Austro-Asiatic: Santali
  • Tibeto-Burman: Manipuri, Bodo

Historical & Classical

  • Sanskrit, Vedic Sanskrit, Pali, Prakrit, Apabhraṃśa, Gandhari, Classical Tamil

Regional & Tribal

  • Bhojpuri, Magahi, Awadhi, Tulu, Gondi, Ho, Mundari, Khasi, Sora, and more

📊 Performance Metrics

  • Phonetic Preservation: ≥95% accuracy
  • Semantic Preservation: ≥90% accuracy
  • Morphological Accuracy: ≥85%
  • Accent/Prosody Fidelity: ≥80%
  • Model Size: ≤500MB
  • Inference Latency: ≤100ms per sentence

🤝 Contributing

We welcome contributions from linguists, developers, and researchers! See our Contributing Guide for details.

Areas of Interest

  • Dataset expansion and annotation
  • Model training and fine-tuning
  • UI/UX improvements
  • Documentation and localization
  • Research partnerships

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • AI4Bharat & Bhashini for foundational datasets and models
  • Mozilla Common Voice for speech data contributions
  • Academic Partners for historical corpus digitization
  • Open Source Community for collaborative development

📞 Contact


Bridging 5,000 years of linguistic heritage with cutting-edge AI technology

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages