# Project Blog Post
2024-05-17
Wow! You completed an awesome project with your group. It’s time for you to tell the world (and especially me) about it on your blog.

Your project blog post is the authoritative description of what you achieved and what you learned. It should describe all the achievement, effort, and learning that you want us to factor in to your final grade in the course. Your blog post should have eight sections, which are described below.

You can write the first seven sections of your blog post as a group and all post them on your separate blogs. If you take this path, you should just make sure to note it. The final section should be completed individually.

# 1 Abstract

Your abstract is a one-paragraph summary of the problem you addressed, the approach(es) that you used to address it, and the big-picture results that you obtained.

At the end of your abstract, **please include a link to the GitHub repository** that houses your project code and other deliverables.

# 2 Introduction
<!-- 
Your introduction should describe the big-picture problem that you aimed to address in your project. What’s the problem to be solved, and why is it important? Who has tried solving this problem already, and what did they do? I would expect most introductions to reference no fewer than 2 scholarly studies that attempted similar tasks, although 5 is probably a better target.

You may be able to recycle some content from your project proposal for your introduction.

When citing scholarly studies in a blog post, please use Quarto citations. For a quick overview, see the appendix on references in Quarto. -->

This project aims to address the first step in denoising audio signals: classifying the presence of noise. In particular, we aimed to classify added noise in speech signals. Denoising can be used in various ways, essential in providing cleaner signals in music production, restoring historical recordings @moliner_two-stage_2022, or even studying our environment (as in the research on noise filtering for beehives from @varkonyi_dynamic_2023). Previous attempts at audio classification and denoising use various methods of processing signals. A study by @mcloughlin_timefrequency_2020 used a combination of a spectrogram and a cochleogram (both 2D representations of an audio signal) alongside convolutional and fully connected layers for their model. Similarly, @verhaegh_algorithms_2004 tests the efficacy of different processing techniques, finding that running a signal through a sequence of filterbanks achieves the highest accuracy overall (90% across different classification tasks including noise). They also found the use of a mel-frequency cepstrum (MFCC) to be highly effective at 85% average accuracy.

# 3 Values Statement

Your values statement should be a few paragraphs that address the following set of questions:

- Who are the potential **users** of your project? Who, *other than your users*, might still be **affected** by your project?
- **Who benefits** from technology that solves the problem you address?
- **Who could be harmed** from technology that solves the problem you well address?
- What is **your personal reason** for working on this problem?
- Based on your reflection, **would the world be a more equitable, just, joyful, peaceful, or sustainable place** based on the technology that you implemented?

# 4 Materials and Methods

## Your Data

<!-- Include some discussion of where it came from, who collected it (include a citation), how it was collected, and what each row represents (a person, an environmental event, a body of text, etc) Please also include a discussion of potential limitations in the data: who or what is represented, and who or what isn’t?

In structuring your description of the data, I encourage you to address many of the questions outlined in Gebru et al. (2021), although it is not necessary for you to write a complete data sheet for your data set. -->

Our project is based on the data from the Microsoft Scalable Noisy Speech Dataset @reddy2019scalable. This dataset obtained access to two speech datasets by license, one from the University of Edinburgh @inproceedings (where speakers across Great Britain were recruited by advertisement) and one from Graz University @Pirker2011APT (recruiting native English speakers through advertisements at various institutions). Similarly, MS-SNSD obtained noise samples by license from freesound.org (a website which allows for user submitted sound samples) and from the DEMAND dataset by @thiemann_demand_2013. As such, these samples were created by researchers or freesound.org users recording their environments (traffic, public noising, appliances humming, etc.). The MS-SNSD provides a Python program to automatically combine the speech and noise data.

After reading in the data from a wav file, we converted it to a mels-spectrogram. Each row (or 2D array before flattening) of the data represents one audio file as a mels-spectrogram. These audio signals are 10 seconds of speech, either clean (no noise) or accompanied with added background noise. One limitation that exists with this dataset is that it does not contain  recordings of languages other than English. As such, training a model on it does not guarantee its usefulness across languages.

## Your Approach
<!-- This is the primary section where you should describe what you did. Carefully describe:

- What features of your data you used as predictors for your models, and what features (if any) you used as targets.
- Whether you subset your data in any way, and for what reasons.
- What model(s) you used trained on your data, and how you chose them.
- How you trained your models, and on what hardware.
- How you evaluated your models (loss, accuracy, etc), and the size of your test set.
- If you performed an audit for bias, how you approached this and what metrics you used. -->

In the processing of our data, we used the mels-spectrogram representation of 160000 sample audio files (cut off at exactly 10 seconds for consistent sizing). In converting these audio files to spectrograms, we had data instances of size 128 x 313 pixels, or 40064 features per instance of data. For our targets, we just created a vector that represented whether or not an audio sample had noise added (1 if noise is present, 0 if the signal is clean). We used a subset of our data (around ~3200 audio files) in order to limit training time. Our model was trained on Google Colab using the T4 GPU when available (and its default CPU when not); our model consisted of a convolutional layer, and linear layer. While we tried to add more complexity to our model, we immediately saw a dip in accuracy (and our accuracy results with the current model were already promising). We evaluated our model in terms of accuracy on a test set of 662.

# 5 Results

This is the section in which you describe the main findings or achievements of your model. You can report things like accuracies on train/test data, loss scores, comparisons to previous models, etc. To compare a small set of numbers, tables are fine, but more complex phenomena should be illustrated with figures. Both figures and tables should include appropriate captions, axis labels, legends, and another professional annotations. It’s fine for your figures to either be constructed manually or as computational outputs (e.g. from Pandas).

Please remember: your results do not speak for themselves. While figures and tables are highly effective forms of communication, your prose is necessary to tell your story.

# 6 Concluding Discussion

Your conclusion is the right time to assess:

In what ways did our project work?
Did we meet the goals that we set at the beginning of the project?
How do our results compare to the results of others who have also studied similar problems?
If we had more time, data, or computational resources, what might we do differently in order to improve further?

# 7 Group Contributions Statement

<!-- When writing your group contributions statement, please keep in mind that everyone’s contributions are visible in the commit history of your GitHub repository. I do check these commit histories in case I suspect highly imbalanced divisions of labor.
In your group contributions statement, please include a short paragraph for each group member describing how they contributed to the project:

Who worked on which parts of the source code?
Who performed or visualized which experiments?
Who led the writing of which parts of the blog post?
Etc. -->

Jeff focused more on the data processing portion of the source code. This included the processing of the data into mels-spectrograms and the creation of tensors for the data. In doing so, they also visualized the test mels spectrogram value. They also led in the writing of the introduction and the materials and methods sections of the blog post. Further, they fixed bugs that were causing roadblocks in the training process, like making sure that the data was of the correct dimensionality (e.g. adding in an extra dimension for the one color channel the data has).

# 8 Personal Reflection

This is the only section that you are required to write individually and not with your project group.
At the very end of your blog post, in a few paragraphs, respond to the following questions:

What did you learn from the process of researching, implementing, and communicating about your project?
How do you feel about what you achieved? Did meet your initial goals? Did you exceed them or fall short? In what ways?
In what ways will you carry the experience of working on this project into your next courses, career stages, or personal life?