Video Presentation Outline

Jump to bottom Edit New page

Eric Chin edited this page Nov 15, 2013 · 4 revisions

Introduction

Names
- Eric Chin
- David Corbett
- Pranav Gandhi
- Daniel Moran
Project details (general requirements, etc)
- Requirements
  - compile and run on CCIS Linux System
  - given two sets of audio files, identify audio files from one set that are derived from audio files of the other set
Brief outline of presentation
- Introduction
- Architecture
- Algorithms
- Critical data structures

Software Architecture

Modules
- Data ingestion
- Data normalization (to canonical form)
  - WAVE audio format
    - 44100 Hz
    - 16 bits per second
    - 1 channel
- Comparison (using FFT comparisons, using MFCCs)
- Data analysis (determine if compared audio constitutes a match)

Critical Data Structures

Representing audio
- Use WAVE file
- Each WAVE file is represented by a sequence of FFTs (overlapping?) chunks (array)
- Each WAVE file is represented by their respective MFCCs
Purpose

Algorithms

FFT
- Purpose
- Implementation (numpy.fft)
MFCC
- Purpose
- Implementation
  1. Take the Fourier transform of (a windowed excerpt of) a signal
  2. Use triangular overlapping windows to map the spectrum power onto the Mel scale
  3. Take the logs of the power at each of the Mel frequencies
  4. Take the discrete cosine transform of the powers as if it were a signal
  5. Results in the MFCCs which are amplitudes of the resulting spectrum

Add a custom sidebar

Clone this wiki locally