In [None]:
import re
pattern="segment_~C_~d~t_~l_secs"
help="Base filename pattern with tokens: ~D ~T (overall YYMMDD/HHMMSS), " \
"~d ~t (segment start YYMMDD/HHMMSS),~E ~e (segment end YYMMDD/HHMMSS),"  \
" ~C (counter), ~L (segment length HHMMSS)."
subs=["C", "D", "T", "d", "t", "E", "e", "L", "l"]
for i in subs:
    pattern = pattern.replace(f"~{i}", f"")
pattern=re.sub( re.compile(r'_+[^_]*$'), '_', pattern)
print(pattern)


segm__ent_


# Task Name: Fix, Upgrade and Extend Mono Recorder

# Overview

This tool is a continuous mono recorder with silence-aware segmentation and async LAME encoding to mp3. It's a work in progress and works pretty well in command line mode.  It has a few bugs and I have a pile of enhancements I would like to implement.  

# Input: 

- A system sound input accessible to python with a common, mature python library 

# Output:

- A series of sound mp3 (and optionnally wav) files of the input chopped up into segments of the input 
- playlist files for all the segments in a full recording
- a csv file with list of files
- a live text-based spectrogram of sound input

# Issues

- Can't quit, pause recordings or break segments during the recording. And keyboard interupt is basically broken.  
- Can't control during the execution. needs a few keypress shortcuts (eg "q" to quit and "p" to pause and "B" to break-sements with no gap search  and "b" to end a segment with a gap-search)
- Something is wrong with the shutdown at the end or when a keyboard interrupt happens. The current software tends to save and process a few zero-length segment at the end of the max-time ("--max-total-minutes"). 
- The application should timeout and exit when max-time has been reached unless recording was paused. If there was a pause then it should add the total pause time to the max-time 
- Need a timestamp hh:mm:sec on the `[INFO]` updates printed to the screen
- Silence detection doesn't work very well and almost always times out resulting in a forced break as the silence/gap-detector times-out. This results in a mid-word gap and often missed content.
  - I am thinking of an algorithm that keeps an updated 2-dimensional frequency grid of binned sound-power - effectively a envelope histogram binned into a 2D map by recurrence rate and durations. The algorithm builds a map of the trailing tens of minutes of sound. When the maxiumum recording segment length time is reached, the algorithm starts looking for a place to break the files in a sound gap.  It should the histogram for to calculate the length of gap of longest duration that will likely recur with > %75 probability until half-way through the gap-timeout window. If half the gap time-out window ("--max-pause-minutes" commandline parameter) expires, it will widen the acceptable gap to shorter and shorter gaps until the end of the timeout-window. If the gap-seeker window times-out then it should cuttoff the sound but duplicate the last few seconds of the file being wrapped up onto the beginning of the next file.   Most of the audio we're recording works with hour-long segments so there's plenty of time to build a sound-gap histogram. It would be nice to print the current 2d histogram with a key-press using the color bar color-scheme that's in use for the spectrogram with a keypress of "h". there's no need to reset the histogram between segments.

# Enhancements

- Its unclear how far into a segment we are - a little progress bar and time-recorded/time-left display would be great
- fill out more [Extended M3U](https://en.wikipedia.org/wiki/M3U#Extended_M3U) fields playlist files (m3u) - currently only require field (`#EXTM3U`) is used followed by a list of filenames 
  - Include `#EXTINF` as  `#EXTINF:123,Artist Name – Track Title artist - title.mp3`
    - playlist extended info format = "%artist% - %title%"
    - playlist filename format = "%artist%_%album%_00_Playlist.m3u"
    - tag to filename conversion format = "%artist%_%album%_$num(%track%,2)_%title%"
  - update mp3 file with standard set of MP3 tags including the M of N track number
    - Use a mature Python library like "taglib" or similar
  - for both playlist and mp3 tags set the default genre to "audiobook"
  - Documentation - use a common doc framework for python apps
  
# Extensions

- add ["pls" playlist output](https://en.wikipedia.org/wiki/PLS_(file_format)) output file as a playlist format option
- build a cross-platform python GUI for the application 

# Work Environment

- This is a Mac OS machine. 
- Development environment: vscode for my own coding - I would like to be able to be able to keep working in vscode if I want to insert additional changes - do nothing that prevents this.
- Source Control: This code exits in a local Git repo with a remote on github.  Create branches for new features/bug fixes. Unit test and merge back to main if tests pass, then do regression tests to confirm main is functioning. Push to remote branch on github every few merges to main.   
- Testing should use a mature test framework such as pytest - I've not been using a test framework but I should

# Build

- I use astral UV and virtual environments for building 
- I often switch between swap between mac OS and linux for development. Occaisonally I got to windows. 
- I want to be able deploy the app using Astral's UV from my github repo

# Phase 1 - Testing Notes
- comments:
  - Nice work on phase 1 so far
- dev process
  - please checkout to "phase-1" branch in local git repo and commit the current state
- test assist:
  - I have an audiobook playing to audio input device named "BlackHole 16ch". you have to specify it's name using quotation marks otherwise it gets hung up on the space in the device name. eg: `--device "BlackHole 16ch"` This is what I'm using to test these changes.
- command line parameters:
  - Need to have a way to turn off the spectrograms at the start (but they can still be turned on by the run-time hotkeys)
- hotkeys --> requires carriage return - a bit clunky
  - update --> rather than waiting for a command to be entered at the console with the lets hook the keyboard interrupt (eg cntl-c) to enter command menu (like the behavior of Linux's screen command which requires a cntl-a XXX for commands).  
    - This keyboard interrupt effectively pauses the recording (makes the pause command 'p' unnecessary). 
    - upon keyboard-interrupt, print message '? for hot-key help, ENTER resumes recording' and show recording status bar 
    - 'cntl-c cntl-c' would be the same as 'cntl-c q' for graceful quit.  
    - to resume recording hit "return" with no command
  - Hotkey enhancements for Phase 1
    - need a help hotkey --> suggest '?'
- Issue #1: quit from pause
  - Behavior: I paused the recording and then cntl-c. (same behavior with p --> q)
  - observed output: ` [INFO] Quit command received CR [INFO] Final segment flushed. idx=2, [CR] [INFO] Waiting for 3 encoder(s) to complete... [INFO] All encoders completed.`
    - other time at around segment 0005 into a 10 minute overall recording observed:  ```[INFO] Waiting for 7 encoder(s) to complete...``` 

  - issue - why are so many encoders being instantiated? is this normal?
- Issue #2: Short segments still happening but no longer being saves to disk:
  - Command line: ` python main.py --minutes 0.5 --pause-seconds 1.0 --outdir ./test_recordings --device "BlackHole 16ch" --max-total-minutes 3 --base-filename test_$(shuf -n 1 /usr/share/dict/words)~C_~l --delete-wav-after-encode`
    - Note that I'm using `--base-filename test_$(shuf -n 1 /usr/share/dict/words)~C_~l` to get a unique filename for each test run
  - issues: 
  1. see the warning about get_cmap function. Lets put a fix for that in a future phase. (low priority)
  2. investigate what's triggering all the short segements
  - output: [spectrograms ommited] is copied in place of the actual spectrogram
```
/Users/leif/Documents/sinkt/SourceCode/Python/SoundStuff/001_mono_time_recorder/main.py:53: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.
  JET_CMAP = cm.get_cmap("jet")  # matplotlib < 3.7

╔══════════════════════════════════════════════════════════╗
║                   HOTKEY COMMANDS                        ║
╠══════════════════════════════════════════════════════════╣
║  q  │ Quit - Stop recording and exit gracefully          ║
║  p  │ Pause/Resume - Toggle recording pause              ║
║  b  │ Break with gap - Cut segment at next good gap      ║
║  B  │ Break immediate - Force segment cut now            ║
║  h  │ Histogram - Show current gap histogram (if enabled)║
║  s  │ Toggle spectrogram - Show/hide spectrogram         ║
╚══════════════════════════════════════════════════════════╝

[INFO] Starting writer thread…
[INFO] Opening input stream…
[INFO] Recording… Press Ctrl+C to stop.
[INFO] Config: segment=0.5 min, pause>=1.00s, max-pause-window=1 min, sr=48000, blocksize=2048, thr=0.01
[INFO] Output dir: test_recordings
[INFO] CSV: test_nay_segments_251108_100950.csv
[INFO] MP3 M3U: test_nay_pl_mp3_251108_100950.m3u

[INFO] Segment length reached 0.50 min; looking for a pause (≥ 1.00s, thr=0.01).

[INFO] Pause search window: up to 1 minute(s) before forced cut.
 [spectrogram omitted]  s

[INFO] Spectrogram DISABLED
[INFO] Segment cut (pause detected). idx=0000  samples:2136064
[INFO] Segment length reached 0.50 min; looking for a pause (≥ 1.00s, thr=0.01).
[INFO] Pause search window: up to 1 minute(s) before forced cut.
[INFO] Segment cut (pause detected). idx=0001  samples:2347008
[INFO] Segment length reached 0.50 min; looking for a pause (≥ 1.00s, thr=0.01).
[INFO] Pause search window: up to 1 minute(s) before forced cut.
[INFO] Segment cut (pause detected). idx=0002  samples:1574912
[INFO] Segment length reached 0.50 min; looking for a pause (≥ 1.00s, thr=0.01).
[INFO] Pause search window: up to 1 minute(s) before forced cut.
[INFO] Max active time reached (3.0 min). Forcing split and exit.
[INFO] Segment cut (max active time). idx=0003  samples:2582528
[INFO] Finalizing current segment and stopping. idx=4
[INFO] Skipping segment 4 (duration 0.04s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=5
[INFO] Skipping segment 5 (duration 0.09s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=6
[INFO] Skipping segment 6 (duration 0.13s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=7
[INFO] Skipping segment 7 (duration 0.17s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=8
[INFO] Skipping segment 8 (duration 0.21s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=9
[INFO] Skipping segment 9 (duration 0.26s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=10
[INFO] Skipping segment 10 (duration 0.30s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=11
[INFO] Skipping segment 11 (duration 0.34s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=12
[INFO] Skipping segment 12 (duration 0.38s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=13
[INFO] Skipping segment 13 (duration 0.43s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=14
[INFO] Skipping segment 14 (duration 0.47s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=15
[INFO] Skipping segment 15 (duration 0.51s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=16
[INFO] Skipping segment 16 (duration 0.55s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=17
[INFO] Skipping segment 17 (duration 0.60s < min 1.00s)
[INFO] Finalizing current segment and stopping. idx=18
[INFO] Skipping segment 18 (duration 0.64s < min 1.00s)
[INFO] Final segment flushed. idx=19
[INFO] Skipping segment 19 (duration 0.64s < min 1.00s)
[INFO] Waiting for 4 encoder(s) to complete...
[INFO] All encoders completed.
```


