Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any documentation available ? #10

Open
aexposit opened this issue Jan 4, 2023 · 1 comment
Open

Any documentation available ? #10

aexposit opened this issue Jan 4, 2023 · 1 comment

Comments

@aexposit
Copy link

aexposit commented Jan 4, 2023

Hi is there any documentation to start contributing to this interesting project ?
My goal is to try adapt it to real time audio2midi.
I would like to understand better the application flow with some description of the functions and input and output parameters without interpreting all the code.
For example :

  1. How do the OUTPUT_TO_TENSOR_NAME.frames, OUTPUT_TO_TENSOR_NAME.onsets, OUTPUT_TO_TENSOR_NAME.contours are related to note on , note off , pitchbend messages ?
  2. What is melodiatrick ?
  3. What is energy ?
  4. What outputToNotesPoly function does ?

Do you have some performance numbers for the model.execute call in ms (single batch) ?
Thx a lot

@sherwyn33
Copy link

Its sortof documented in the code. But here is my understanding for question 2,3,4:
frames: A frame activation matrix describes segments of audio analyzed for frequency content over time. Each frame in the matrix represents a specific time slice and its frequency data. This is crucial for tracking how sounds change over time within an audio file.

onsets: An onset activation matrix identifies the specific points in time when new notes begin. Each value in the matrix indicates the likelihood of a note starting at that time and frequency. Detecting onsets accurately is vital for correctly identifying the start of notes.

onsetThresh: This threshold sets the minimum amplitude of an onset activation that must be reached to consider it an actual onset of a note. This helps in filtering out false positives and ensuring that only significant note beginnings are recognized.

frameThresh: This threshold is used to determine whether a note should continue. If the amplitude of a frame activation drops below this level, it indicates that the note has ended or is too soft to be considered as continuing.

minNoteLen: This defines the minimum length a note must have to be recognized. This is measured in frames, not time directly, helping to prevent the recognition of very short, possibly erroneous notes.

inferOnsets: When this setting is true, the algorithm will automatically add onsets if there are large differences in frame amplitudes, suggesting a significant change in the audio that likely corresponds to a new note starting.

maxFreq and minFreq: These settings define the frequency range within which notes can be recognized. Frequencies outside this range will be ignored, which can be useful for filtering out noise or other unwanted audio components.

melodiaTrick: This involves a specific enhancement where semitones near a peak in frequency data are removed, presumably to clean up the data and avoid misinterpretation of pitches that are close to actual note peaks.

energyTolerance: This parameter allows a certain number of frames to drop below the threshold (potentially zero amplitude) without terminating the note. This can help in maintaining the continuity of notes through brief drops in sound level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants