# Wavefront and Specialized Formats
- Wavefront OBJ, Audio files (WAV), ARFF, NetCDF
- Real examples: 3D graphics, Audio processing, Scientific data

In [1]:
import numpy as np
from scipy import io
from scipy.io import wavfile
import tempfile
import os
print('Specialized I/O module loaded')

Specialized I/O module loaded


## WAV Audio Files

**Purpose**: Read/write uncompressed audio
**Format**: PCM WAV (Waveform Audio File Format)
**Functions**:
- `wavfile.read()`: Load WAV file
- `wavfile.write()`: Save WAV file

In [2]:
print('WAV Audio File I/O\n')

# Generate test audio signal
rate = 44100  # Sample rate (Hz)
duration = 2  # seconds
freq = 440  # A4 note (Hz)

t = np.linspace(0, duration, int(rate * duration))
audio = np.sin(2 * np.pi * freq * t)

# Scale to 16-bit integer
audio_int = np.int16(audio * 32767)

print(f'Audio signal:')
print(f'  Sample rate: {rate} Hz')
print(f'  Duration: {duration} s')
print(f'  Frequency: {freq} Hz (A4 note)')
print(f'  Samples: {len(audio_int):,}')
print(f'  Data type: {audio_int.dtype}')

WAV Audio File I/O

Audio signal:
  Sample rate: 44100 Hz
  Duration: 2 s
  Frequency: 440 Hz (A4 note)
  Samples: 88,200
  Data type: int16


In [3]:
# Write WAV file
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
    wav_file = f.name

wavfile.write(wav_file, rate, audio_int)
print(f'\nSaved to: {os.path.basename(wav_file)}')
print(f'File size: {os.path.getsize(wav_file):,} bytes')


Saved to: tmpg_4rqpc0.wav
File size: 176,444 bytes


In [4]:
# Read WAV file
rate_loaded, data_loaded = wavfile.read(wav_file)

print(f'\nLoaded WAV file:')
print(f'  Sample rate: {rate_loaded} Hz')
print(f'  Shape: {data_loaded.shape}')
print(f'  Data type: {data_loaded.dtype}')
print(f'  Min value: {data_loaded.min()}')
print(f'  Max value: {data_loaded.max()}')
print(f'  Duration: {len(data_loaded) / rate_loaded:.2f} s')

os.unlink(wav_file)
print('\n✓ WAV file I/O successful')


Loaded WAV file:
  Sample rate: 44100 Hz
  Shape: (88200,)
  Data type: int16
  Min value: -32766
  Max value: 32766
  Duration: 2.00 s

✓ WAV file I/O successful


## Real Example: Audio Signal Processing

**Scenario**: Load audio, apply filter, save result
**Application**: Audio effects, noise reduction

In [5]:
from scipy import signal

print('Audio Processing Pipeline\n')

# Generate noisy audio
np.random.seed(42)
rate = 44100
t = np.linspace(0, 1, rate)

# Clean signal: 440 Hz sine wave
clean = np.sin(2 * np.pi * 440 * t)

# Add high-frequency noise
noise = 0.1 * np.sin(2 * np.pi * 5000 * t)
noisy = clean + noise

print('Signal components:')
print(f'  Main frequency: 440 Hz')
print(f'  Noise frequency: 5000 Hz')
print(f'  SNR: {20 * np.log10(np.std(clean) / np.std(noise)):.1f} dB\n')

# Design low-pass filter
nyquist = rate / 2
cutoff = 1000  # Hz
order = 4
b, a = signal.butter(order, cutoff / nyquist, btype='low')

# Apply filter
filtered = signal.filtfilt(b, a, noisy)

print('Filtering:')
print(f'  Filter type: Butterworth')
print(f'  Order: {order}')
print(f'  Cutoff: {cutoff} Hz')
print(f'  Result: Noise reduced\n')

# Save original and filtered
for name, data in [('noisy', noisy), ('filtered', filtered)]:
    audio_int = np.int16(data / np.max(np.abs(data)) * 32767)
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        fname = f.name
    wavfile.write(fname, rate, audio_int)
    size = os.path.getsize(fname)
    print(f'{name.capitalize()}: {os.path.basename(fname)} ({size:,} bytes)')
    os.unlink(fname)

print('\n✓ Audio processing complete')

Audio Processing Pipeline

Signal components:
  Main frequency: 440 Hz
  Noise frequency: 5000 Hz
  SNR: 20.0 dB

Filtering:
  Filter type: Butterworth
  Order: 4
  Cutoff: 1000 Hz
  Result: Noise reduced

Noisy: tmp0xak8ors.wav (88,244 bytes)
Filtered: tmpff4ij8l_.wav (88,244 bytes)

✓ Audio processing complete


## ARFF Format (Weka)

**Purpose**: Attribute-Relation File Format
**Use**: Machine learning datasets (Weka)
**Function**: `io.arff.loadarff()`

In [6]:
print('ARFF Format (Weka)\n')

print('ARFF is used by Weka machine learning toolkit')
print('scipy.io.arff.loadarff() can read ARFF files\n')

print('Typical ARFF file structure:')
print('  @relation dataset_name')
print('  @attribute feature1 numeric')
print('  @attribute feature2 {class1, class2}')
print('  @data')
print('  1.5, class1')
print('  2.3, class2\n')

print('Loading:')
print('  from scipy.io import arff')
print('  data, meta = arff.loadarff("data.arff")')

ARFF Format (Weka)

ARFF is used by Weka machine learning toolkit
scipy.io.arff.loadarff() can read ARFF files

Typical ARFF file structure:
  @relation dataset_name
  @attribute feature1 numeric
  @attribute feature2 {class1, class2}
  @data
  1.5, class1
  2.3, class2

Loading:
  from scipy.io import arff
  data, meta = arff.loadarff("data.arff")


## NetCDF Format

**Purpose**: Network Common Data Form
**Use**: Climate, oceanography, atmospheric science
**Library**: Use netCDF4-python (not in scipy)
**Alternative**: scipy can read via io.netcdf (deprecated)

In [7]:
print('NetCDF Format\n')

print('NetCDF is standard for scientific array data')
print('Self-describing, portable, scalable\n')

print('Use netCDF4-python library:')
print('  from netCDF4 import Dataset')
print('  nc = Dataset("data.nc", \'r\')')
print('  temperature = nc.variables[\'temp\'][:]\n')

print('Common in:')
print('  - Climate modeling')
print('  - Weather forecasting')
print('  - Oceanography')
print('  - Satellite data')

NetCDF Format

NetCDF is standard for scientific array data
Self-describing, portable, scalable

Use netCDF4-python library:
  from netCDF4 import Dataset
  nc = Dataset("data.nc", 'r')
  temperature = nc.variables['temp'][:]

Common in:
  - Climate modeling
  - Weather forecasting
  - Oceanography
  - Satellite data


## HDF5 Format

**Purpose**: Hierarchical Data Format
**Use**: Large scientific datasets
**Library**: Use h5py or pytables
**Features**: Compression, chunking, hierarchical

In [8]:
print('HDF5 Format\n')

print('HDF5 is industry standard for large data')
print('Hierarchical structure (like filesystem)\n')

print('Use h5py library:')
print('  import h5py')
print('  with h5py.File("data.h5", \'r\') as f:')
print('      dataset = f[\'group/dataset\'][:]\n')

print('Features:')
print('  - Compression (gzip, lzf)')
print('  - Chunking for large arrays')
print('  - Parallel I/O')
print('  - Metadata storage')
print('\nUsed by: NASA, CERN, genomics, finance')

HDF5 Format

HDF5 is industry standard for large data
Hierarchical structure (like filesystem)

Use h5py library:
  import h5py
  with h5py.File("data.h5", 'r') as f:
      dataset = f['group/dataset'][:]

Features:
  - Compression (gzip, lzf)
  - Chunking for large arrays
  - Parallel I/O
  - Metadata storage

Used by: NASA, CERN, genomics, finance


## Real Example: Multi-Format Data Export

**Scenario**: Export analysis results in multiple formats
**Use case**: Share with different tools/platforms

In [9]:
print('Multi-Format Data Export\n')

# Generate analysis results
np.random.seed(42)
results = {
    'measurements': np.random.randn(100, 5),
    'timestamps': np.arange(100),
    'statistics': {
        'mean': np.array([0.1, -0.2, 0.3, 0.0, 0.1]),
        'std': np.array([1.1, 0.9, 1.2, 1.0, 0.8])
    }
}

print('Analysis results:')
print(f'  Measurements: {results["measurements"].shape}')
print(f'  Timepoints: {len(results["timestamps"])}\n')

# Export to MATLAB
with tempfile.NamedTemporaryFile(suffix='.mat', delete=False) as f:
    mat_file = f.name
io.savemat(mat_file, results)
mat_size = os.path.getsize(mat_file)
print(f'MATLAB format: {os.path.basename(mat_file)}')
print(f'  Size: {mat_size:,} bytes')
print(f'  For: MATLAB users\n')

# Export sparse correlation matrix to Matrix Market
corr_matrix = np.corrcoef(results['measurements'].T)
corr_sparse = sparse.csr_matrix(corr_matrix)
corr_sparse.data[np.abs(corr_sparse.data) < 0.3] = 0  # Sparsify
corr_sparse.eliminate_zeros()

with tempfile.NamedTemporaryFile(suffix='.mtx', delete=False, mode='w') as f:
    mtx_file = f.name
io.mmwrite(mtx_file, corr_sparse)
mtx_size = os.path.getsize(mtx_file)
print(f'Matrix Market: {os.path.basename(mtx_file)}')
print(f'  Size: {mtx_size:,} bytes')
print(f'  For: Sparse matrix tools\n')

# Export summary audio (beep encoding)
beep_freq = 440 + results['statistics']['mean'][0] * 100
rate = 8000
t = np.linspace(0, 0.1, int(rate * 0.1))
beep = np.sin(2 * np.pi * beep_freq * t)
beep_int = np.int16(beep * 32767)

with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
    wav_file = f.name
wavfile.write(wav_file, rate, beep_int)
wav_size = os.path.getsize(wav_file)
print(f'Audio (WAV): {os.path.basename(wav_file)}')
print(f'  Size: {wav_size:,} bytes')
print(f'  For: Audio analysis tools')

print(f'\nTotal size: {(mat_size + mtx_size + wav_size):,} bytes')
print('\n✓ Multi-format export complete')

# Cleanup
os.unlink(mat_file)
os.unlink(mtx_file)
os.unlink(wav_file)

Multi-Format Data Export

Analysis results:
  Measurements: (100, 5)
  Timepoints: 100

MATLAB format: tmpdmvb82qv.mat
  Size: 5,360 bytes
  For: MATLAB users



NameError: name 'sparse' is not defined

## Summary

### WAV Audio Files:
```python
from scipy.io import wavfile

# Write
wavfile.write('audio.wav', rate, data)

# Read
rate, data = wavfile.read('audio.wav')
```

### ARFF (Weka):
```python
from scipy.io import arff
data, meta = arff.loadarff('dataset.arff')
```

### Format Comparison:

| Format | Purpose | Size | Compression | Metadata |
|--------|---------|------|-------------|----------|
| **MATLAB** | Data exchange | Medium | Optional | Limited |
| **Matrix Market** | Sparse matrices | Small | No | Minimal |
| **WAV** | Audio | Large | No | Basic |
| **NetCDF** | Scientific arrays | Medium | Yes | Rich |
| **HDF5** | Large datasets | Variable | Yes | Rich |
| **ARFF** | ML datasets | Small | No | Schema |

### Use Cases:

**WAV Files**:
- Audio signal processing
- Sound effects
- Speech recognition data
- Music analysis

**ARFF Files**:
- Machine learning datasets
- Weka integration
- Feature engineering
- Classification tasks

**NetCDF/HDF5**:
- Climate data
- Satellite imagery
- Genomics
- Physics simulations

### Best Practices:

✓ **Choose right format**: Match tool/purpose  
✓ **Document format version**: Compatibility  
✓ **Include metadata**: Units, timestamps  
✓ **Test cross-platform**: Python/MATLAB/R  
✓ **Compress large files**: Save space/bandwidth  
✓ **Validate after export**: Round-trip test