Skip to content

kritsanan1/Python-Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Isan Traditional Music Dataset

AI Recognition System for Phin and Khaen Playing Styles

🎵 Overview

This project provides a structured, ethically compliant dataset and ML pipeline for training AI models to recognize and analyze the playing styles of two traditional Isan instruments:

  • Phin (พิณ): A three-stringed lute from Northeastern Thailand
  • Khaen (แคน): A bamboo mouth organ, the national instrument of Laos

🎯 Key Features

  • Ethical Compliance: Full consent, cultural attribution, respectful data handling
  • 🎼 Comprehensive Feature Extraction: MFCCs, chroma, spectral, temporal, and pitch features
  • 🏗️ Structured Dataset: Organized by instrument and playing technique
  • 🤖 ML Pipeline: Complete training and evaluation workflow
  • 📊 Detailed Metadata: Cultural context, performer info, technical specifications
  • 📚 Documentation: Full reproducibility and transparency

📁 Project Structure

.
├── dataset_manager.py       # Dataset creation and feature extraction
├── model_training.py        # ML training pipeline
├── main.py                  # Demo and visualization
├── dataset/                 # Main dataset directory
│   ├── phin/               # Phin recordings
│   │   ├── fast_plucking/
│   │   ├── legato/
│   │   ├── vibrato/
│   │   └── harmonic_plucking/
│   ├── khaen/              # Khaen recordings
│   │   ├── single_note/
│   │   ├── harmonic_overblow/
│   │   ├── drone/
│   │   └── melodic_phrase/
│   ├── processed_features/  # Extracted features (.npz)
│   ├── annotations/         # Transcriptions
│   └── metadata/           # Documentation
└── models/                 # Trained models

🚀 Quick Start

1. Initialize Dataset Structure

from dataset_manager import IsanInstrumentDataset

# Create dataset infrastructure
dataset = IsanInstrumentDataset("./dataset")
dataset.generate_dataset_description()

2. Add Recordings with Metadata

# Create metadata
metadata = dataset.create_metadata_entry(
    instrument="phin",
    technique="fast_plucking",
    performer="Master Somchai Isan",
    recording_date="2025-11-20",
    location="Sakon Nakhon",
    key="Isan pentatonic scale",
    tempo="♩=80",
    cultural_context="Lam Vang folk song"
)

# Process audio file
features, feature_vector = dataset.process_recording(
    audio_path="path/to/recording.wav",
    metadata=metadata,
    save_features=True
)

3. Train ML Model

python model_training.py

4. Run Demo

python main.py

🎼 Supported Techniques

Phin (พิณ):

  • Fast Plucking: Rapid successive plucks
  • Legato: Smooth, connected notes
  • Vibrato: Pitch modulation
  • Harmonic Plucking: Overtone emphasis

Khaen (แคน):

  • Single Note: Individual note playing
  • Harmonic Overblow: Harmonic overtones
  • Drone: Sustained notes
  • Melodic Phrase: Melodic sequences

📊 Audio Specifications

  • Sample Rate: 44.1 kHz
  • Bit Depth: 24-bit
  • Format: WAV (uncompressed)
  • Contexts: Solo and ensemble performances

🔬 Feature Extraction

Each recording is analyzed for:

  • MFCCs (13 coefficients): Timbral characteristics
  • Chroma (12 bins): Pitch class distribution
  • Spectral Features: Centroid, rolloff, bandwidth
  • Temporal: Zero-crossing rate, RMS energy
  • Onset Detection: Plucking/breath dynamics
  • Pitch Tracking: Melodic contour
  • Tempo: Beat tracking

🤝 Ethical Guidelines

  1. Consent: All recordings require explicit informed consent
  2. Attribution: Performers and communities must be credited
  3. Cultural Respect: Document cultural origins and context
  4. Privacy: Protect performer identity when requested
  5. Usage: Research and education purposes only
  6. Compensation: Fair compensation for performers
  7. Representation: Include diverse regional styles

📈 Evaluation Metrics

  • Accuracy: Overall classification performance
  • F1-Score: Especially for rare techniques
  • Cross-Cultural: Generalization across regions
  • Expert Validation: Feedback from traditional musicians

🛠️ Dependencies

# Install required packages (managed by Replit)
librosa>=0.10.0
numpy>=1.24.0
pandas>=2.0.0
matplotlib>=3.7.0
scikit-learn>=1.3.0

📚 Citation & Usage

When using this dataset:

  1. Credit the performers and communities
  2. Acknowledge the cultural heritage
  3. Reference the Isan/Lao cultural context
  4. Use responsibly for research and education
  5. Share improvements back to the community

🌏 Cultural Context

This dataset represents musical traditions from:

  • Isan Region (Northeastern Thailand)
  • Lao PDR (Laos)
  • Various regional styles and variations

📖 References

  • Thai National Radio Archives
  • Isan Music Preservation Society
  • Traditional Lao Music Documentation Project

🤔 FAQ

Q: How do I add my own recordings? A: Use dataset.process_recording() with proper metadata and consent documentation.

Q: Can I contribute to the dataset? A: Yes! Ensure ethical guidelines are followed and provide complete metadata.

Q: What if I need to classify a new technique? A: Add the technique to the dataset structure and retrain the model.

📞 Support

For questions or contributions:

  • Ensure all ethical guidelines are followed
  • Provide complete metadata
  • Include consent documentation

Dataset created for AI research in traditional music analysis
Respecting cultural heritage and intellectual property rights 🎶

About

Good starting point for working on data science projects. Contains numpy, pandas, matplotlib, tensorflow and more.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published