-
Notifications
You must be signed in to change notification settings - Fork 0
/
Representative_Data_Collection.txt
37 lines (19 loc) · 6.27 KB
/
Representative_Data_Collection.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
In order to make the vocal_numeric as robust as possible, additional data will be collected and added to the current data taken from the public data sets.
The data will be collected using an iPhone and then preprocessed locally before being added to the data already in Edge Impulse.
The thrust of this data collection will be to include additional samples from under-represented communities thereby making the device as accessible as possible.
To achieve this, I will connect with local community organizations to present a discussion of why I think this is important, provide them with examples of what I am doing and how it will work, and request their help in reaching people who may not have ever heard of this type of thing or who may be skeptical. By reaching out to established organizations, I will be able to reach a greater number of people with greater credibility.
As this is my first attempt to engage in this sort of data collection, I will attempt to use best practices outlined in the edX course, and I expect I will learn a lot about the real-world issues in accomplishing this task.
UPDATE 07/25/2021 -
I was correct that I would learn about real world issues in this effort. The community group which I was hoping to work with has had a change in leadership and I will no longer be able to coordinate a voice recording drive with them. This is dissapointing as I was sincerely hopeful to do some in person data collection among under represented members of my local community.
Due to this circumstance, the model for this project currently includes only recordings from the speech_commands dataset and the common_voice datasets along with the additional noise and background recordings which I have made. I hope to include more voice recordings captured by myself in future iterations of this project.
UPDATE 08/01/2021 -
Since my initial idea to reach out to the community did not work out as anticipated, I have taken a different approach. I have submitted the following article to a local, community-focused newspaper (see below). I am hopeful that this will encourage people to participate in the Mozilla Common Voice project and help create more inclusive and diverse datasets for use in projects like this one. I hope to continue to update the project as the Mozilla library expands. I will update this note once I find out if the article has been accepted or published. Please note that I have done my best to simplify some of the technological details for the reading public in this article. Article follows:
***
Making your voice heard: the future of voice command technology
Have you heard of Alexa or Siri? The Apple and Google voice assistants have taken the idea of talking with a computer from science fiction to reality in only a few years and the pace of these advances is only speeding up. Within the next 10 years, voice command technology will be everywhere. It will not just be your phone, it will be in your home, in your car, and in your daily life.
This raises a more important question than if you have heard of Alexa or Siri. Can Alexa and Siri hear you? Despite the best efforts of the programmers and all of the computing power of Apple and Google, computer voice assistants still struggle to understand people in certain circumstances. An accent or a speech mannerism can be enough to keep a voice recognition system from working correctly. This can leave people who may have an accent or who speak a certain way from being able to use the system at all. This has historically proven to affect people of color, and those for whom English is not their first language the most significantly.
What can be done about this? To find a solution, it is important to understand a little about how these devices work. Voice recognition systems are created by converting audio recordings of words into a computer file which includes the frequencies which make up the sound. The computers then calculate the contents of the frequencies and determine which of them define a specific word. It can then use this calculation to measure if a new audio recording has enough of the calculated frequencies to be recognized as the same word. The catch to this is that the computers can only recognize frequencies which they have heard before. We are smart enough to recognize a familiar word said in a new way. Computers are not. They can only recognize a word if they have heard similar frequencies before.
This fact makes it vitally important to make sure that there are enough audio recordings of different types of people available to teach the computers to recognize the words correctly from everyone. While there are of course many, many recordings of people speaking, most of these recordings are under copy write and can not be used. This leaves only a few sets of recordings which have been created with voice recognition in mind and which have the correct permissions to be used in this field. These audio libraries are woefully short of variety when it comes to accents, and speech mannerisms leaving many people out in the cold. People are working to expand these libraries to include more types of people, but to do this they need help.
So what can we do? While it can seem intimidating to try to participate in a large, technological project, it can actually be very simple to have an impact. The largest public project of this type is sponsored by the non-profit Mozilla foundation and is called Common Voice. This project is collecting a huge library of spoken words and phrases in many languages which can be freely used in voice recognition projects by anyone. Participating requires no sign up, just a web browser and a microphone. You most likely have what you need on your phone. Head to commonvoice.mozilla.org to make your voice a part of the project.
Voice recognition devices are going to be found in more and more parts of our lives. They promise to make many tasks easier and more convienient. The challenge we face is to make sure that these benefits reach as many people as possible. The possibility of a division between people who can use the devices and those who can’t is a real concern. It is up to us to make our voices heard. By participating in an open voice recognition library such as Mozilla’s Common Voice project, we can help ourselves and those who speak and sound like us get all of the benefits that the future holds.
***