# ANFIS for Human Speech DSP

## Course

ECE 595 - Machine Learning

## Group Members

Oliver Bartz\
Jacob Lister

## Project Description

We recreate the ANFIS model seen in the paper using similar data in the MATLAB environment. We input speech with background noise and white noise and train the model to isolate the primary speech sound.

## Relevant Reading

Project based off of the following paper:\
<https://ieeexplore.ieee.org/document/10167890>

## About ANFIS

ANFIS stands for Adaptive-Network-Based Fuzzy Inference System. It is a low-complexity machine learning algorithm that makes use of fuzzy inference rules for determining the output of the model.

Below is a link to the paper that first proposed ANFIS as a method.\
<https://ieeexplore.ieee.org/document/256541>

## Dataset

Data is a subset of the DARPA TIMT Acoustic-Phoenetic Continuous Speech set, found at the following Kaggle page:\
<https://www.kaggle.com/datasets/mfekadu/darpa-timit-acousticphonetic-continuous-speech>

We extracted 20 random audio files of speakers with different dialectical backgrounds, 10 males and 10 females. Each sample is 2-3 seconds in length and contains one sentence of speech. We then used Audacity to add white noise and background noise found at the following link:\
<https://www.youtube.com/watch?v=xeDsWdc1vgg>

We exported the mixed audio sample and used this as input data for the model.

## Network Structure

The following image represents a standard ANFIS model with two inputs. In this case, n1 represents the sound data with noise included and n2 represents the clean sound data.\
![image.png](attachment:image.png)

## Training Details

After testing and tweaking the training parameters, we have settled on the following approach:
- Train the initial model on the data with only background noise and normal speech (no white noise)
- Tune the model on the data with only white noise and normal speech (no background speech)
- Complete 30 epochs of training
- Include validation data to prevent overfit during tuning
- Train by reducing RMSE

## Testing and Evaluation Methods

Evaluation was a multi-faceted process. Looking at SNR helped provide a quantitative measure of noise removal. Additionally, listening to the filtered audio helped as a "sanity check," ensuring reported numbers matched qualitative observations. We also looked at graphed audio signals and their spectrograms.

### Wiener Filter

To have a performance comparison, we put the same data through a Wiener filter and compared its output to that of the ANFIS model.

### Results

Below is a number of time domain of various signals\
![image.png](attachment:image.png)

### SNR Comparison:

| Signal Type          | SNR (Linear) |
| -------------------- | ------------ |
| Input data (noisy)   | 1.62966      |
| ANFIS filtered data  | 1.71903      |
| Wiener filtered data | 1.61094      |

This tells us that our ANFIS model outperforms the Wiener filter, successfully increasing the SNR of the input data and leading clearer speech recognition for the listener. This also shows that the Wiener filter slightly reduced the quality of the input data, introducing additional noise.\
These conclusions are supported by our qualitative observations when listening to the audio samples.

### Spectrograms:

Below are spectrograms of the various signals shown in the first figure of this section.\
![image-3.png](attachment:image-3.png)\
![image-4.png](attachment:image-4.png)\
![image-5.png](attachment:image-5.png)\
![image-2.png](attachment:image-2.png)

## Discussion and Conclusion

We had noticed once we completed the model that compared to the paper we were referencing, our performance on both the ANFIS model and Wiener filter were far poorer than expected. When comparing our implementation with how the paper described their implementation, we realized that the SNR for our original data (the human speech with the added background and white noise) is much lower than one would expect in almost any kind of communication system. This is what we believe to be the cause of the overall worse performance of the ANFIS model and Wiener filter. Additional work could be done in the future to test the performance of our model at a higher SNR to see if we can match or beat the results of the paper.

Overall, we were able to use the ANFIS model to help remove noise from a speech signal and compare its performance with another method of noise supression being a Wiener filter.