Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to collect voice answers to free text questions #3426

Closed
evilaliv3 opened this issue Apr 17, 2023 · 43 comments
Closed

Make it possible to collect voice answers to free text questions #3426

evilaliv3 opened this issue Apr 17, 2023 · 43 comments

Comments

@evilaliv3
Copy link
Member

Proposal

In order to raise the accessibility of the tool, it would be very interesting to implement a feature to make it possible to collect voice answers for free to free text questions.

Motivation and context

The idea of this feature comes in general from the will to continuously improve the accessibility of the tool.

The feature would as well improve compliance with new upcoming laws like the ones inspired from the whistleblowing directive that still include the possibility to include traditional tools based on telephone lines that while not offering the same security of modern digital tools still have large diffusion due to their usability.

@evilaliv3
Copy link
Member Author

evilaliv3 commented Apr 17, 2023

It would be very interesting to look at the browser capabilities and see if there is any way to manipulate the audio recording to anonymize the voice of the whistleblower.

Examples of audio manipulations changing voice pitch can be found for example at: https://github.com/urtzurd/html-audio

In scientific literature altering voice pitch like typically done in tv shows is a disguise that do not make it sense at all as analyzing the sound it is possible to try to restore the original.

the scientific community agrees seems to agree instead that best solution i typically to do voice-to-text and text-to-voice to ensure there is no timber track in the record.
This is actually not doable in this context as it require user tuning and capabilities not available in a browser.

the alternative in the middle is implementing a vocoder as described here: https://security.stackexchange.com/questions/227146/is-changing-pitch-enough-for-anonymizing-a-persons-voice

vocoders are typically researched for performances in order to be used for real time communications. In the context of whistleblowing and globaeleas instead i think performances are not such a big deal as long that the voice is properly anonymized.

regarding the approach to be adopted probably it would be best if we could achieve a full implementation of a vocoder directly on the browser to avoid to transfer the original voice record to the server and continuing our research targeting at implementing a fully end to end encryption system

The community of reference on this topic seems to be the Voice Privacy Challenge community.

Among the novel literature seems interesting the work "Research in methods for achieving secure voice anonymization - Evaluation and improvement of voice anonymization techniques for whistleblowing" by ERIK HELLMAN and MATTIAS NORDSTRAND performed at KTH in synergy with the whistleblowing operator Nebulr AB: http://kth.diva-portal.org/smash/get/diva2:1678546/FULLTEXT01.pdf

@evilaliv3
Copy link
Member Author

evilaliv3 commented Apr 18, 2023

@Voice-Privacy-Challenge @Natalia-T @brijmohan @TonyWangX: are you aware of any research effort on lightweight implementations that could be implemented in the context of a browser in JavaScript? thank you

@evilaliv3 evilaliv3 removed their assignment Apr 19, 2023
@evilaliv3
Copy link
Member Author

evilaliv3 commented Apr 19, 2023

@vebmaylrie: we are testing your work (https://github.com/sarulab-speech/lightweight_spkr_anon) and it seems that current algorithms requires training and any results seems to not hide the gender of the speaker.

Are you aware of any vocoder that consider this issue and:

  1. is designed considering to not be able to use any pre-training
  2. target creating a gender neutral output clearly masking the original pitch level

Thank you for your feedback on this matter.

@msmannan00
Copy link
Contributor

@evilaliv3
https://tinglok.netlify.app/files/cvc/
The machine learning backed ai model

@evilaliv3
Copy link
Member Author

Very interesting Abdul.

In my opinion it would be very interesting first to try to see if for lightweight but effective approaches that we could possibly run on the client.
If in fact we could do the editing on the client, we wont incur in having to manage the file of the disk of the server and this solves the problem of leaving the original copy of the file in the slack space of the disk.

@msmannan00
Copy link
Contributor

Screenshot from 2023-06-10 19-16-18
Screenshot from 2023-06-10 19-16-33
Screenshot from 2023-06-10 19-16-47
Screenshot from 2023-06-10 19-16-47-1
Screenshot from 2023-06-10 19-16-55
Screenshot from 2023-06-10 19-17-01
Screenshot from 2023-06-10 19-17-08

@msmannan00
Copy link
Contributor

functionality so far

@msmannan00
Copy link
Contributor

after recording is shows not required like this
Screenshot from 2023-06-10 19-22-20

@evilaliv3
Copy link
Member Author

Very nice update @msmannan00 !

I see that you considered to add a flag to every type of question to enable the feature.
I would probably suggest to just add specific new type of question "Voice message"

In relation to the UI i like it in its simpicity. Probably if possible i would hide the player interface when the message has still not been recorded.

What do you think?

\cc @gianlucagilardi

@danielvaknine
Copy link

danielvaknine commented Jun 10, 2023 via email

@msmannan00
Copy link
Contributor

msmannan00 commented Jun 10, 2023

Very nice update @msmannan00 !

I see that you considered to add a flag to every type of question to enable the feature.
I would probably suggest to just add specific new type of question "Voice message"

In relation to the UI i like it in its simpicity. Probably if possible i would hide the player interface when the message has still not been recorded.

What do you think?

\cc @gianlucagilardi

Sure, that sounds easier and much more cleaner would change it accordingly

@evilaliv3
Copy link
Member Author

evilaliv3 commented Jun 10, 2023

Thank @msmannan00; actually i'mve very in doubt on the final choice on this.

Based on the suggestion of @danielvaknine , that i know is similar to your original idea i suggest the following

We probably want all the questionnaire to be able to switch from written to oral, so probably the setting should be a setting of the channel.
Probably on the channel we could have a type: "WRITTEN CHANNEL/ORAL CHANNEL/BOTH"
In case the channel see a configuration "BOTH" the channel should present a selection "Would you like to make a written report or an oral report" and offer two buttons "Written report" and "Oral report"
This interface should host a configurable message where we we would probably inform about the limits of oral channel like for example the technical difficulty/impossibility to anonymize the voice.

what do you all think?

This ticket deals specifically with free text questions but i think in our next meeting it would be great to brainstorm on what to do for the other type of questions like checkboxes, terms of services, dropdowns; i suggest to keep them as they are because they could be related to mandatory consents that is impossible to handle orally or conditional workflows

@evilaliv3
Copy link
Member Author

An other topic that comes to my mind is how to validate a mandatory question:

  • shall the audio recording last at least ~10 seconds ? (but what to do if it last less)
  • shall the audio be analyzed to see if it recorded something and validate somehow the collected signal?

@msmannan00
Copy link
Contributor

yes, that was something i was already thinking, will look into it

@evilaliv3
Copy link
Member Author

evilaliv3 commented Jun 11, 2023 via email

@msmannan00
Copy link
Contributor

msmannan00 commented Jun 13, 2023

no i believe having a seperate answers is better as compared to text interface with having a voice would be confusing to non it guys. the properties however for questionare that you mentioned like mandatory is a good idea but since deadline is almost end of the month would we have already added option to make audio question mandatory and to set the minimum required seconds. its shown in images below

@msmannan00
Copy link
Contributor

Screenshot from 2023-06-13 20-39-26
Screenshot from 2023-06-13 20-39-57
Screenshot from 2023-06-13 20-40-24
Screenshot from 2023-06-13 20-40-48

@msmannan00
Copy link
Contributor

actualsound.webm
changedsound.webm

This feature belongs to vasilis build but its just so that if you thing this is better we can integrate it and work on it more in the future. Transformation applied

var modbuffer = flattenArray(leftchannel, recordingLength);
modbuffer = applyLowPassFilter(modbuffer, randomPitch);
modbuffer = pitchShift(modbuffer, randomPitch);
modbuffer = applyTimeStretching(modbuffer, stretchAmount);
modbuffer = applyBitcrusher(modbuffer);

@evilaliv3
Copy link
Member Author

Thank you @msmannan00 for this update! seems all very interesting.

What i described above about the selection of the type of channel (oral or written) will be actually needed because users will have to decide if they want to make an oral reporting or written and this is up to their choice and needs but i agree we could postpone this to the future and discuss this specific topic a new ticket #3482

For what relates the anonymization of the voice i've created a dedicated ticket #3483
I find your prototype interesting; would you please clarify about the current ratio of the algorithms involved?

  • low pass filter
  • random pitch switch
  • random time stretch
  • bit crusher

@danielvaknine
Copy link

Hi all! To answer @evilaliv3 , we think the whistleblower should be able to see/listen to their own recording. Unlike @gianlucagilardi , we also believe the whistleblower should be able to download their recording, just as they can export/download their report today. But it's not a strict requirement from our side so we will leave it up to you.

Regarding the actual implementation in the questionnaire – what did we end up with? Will it be:

  1. an extra question in the questionnaire where they can record a voice note?
  2. a small mic symbol on every question where the whistleblower can choose to answer in a voice note instead?
  3. A "first step" where the whistleblower chooses if they want to do a verbal or written report, before seeing the questionnaire?

In our opinion, number 2 is the best option, with an interface like this on every question:
Screenshot 2023-06-18 at 13 03 14

Looking very much forward to seeing this great functionality in action!

@evilaliv3
Copy link
Member Author

@danielvaknine actually these questions are already answered withing discussions and proposals by @gianlucagilardi and @msmannan00

At the moment the implementation that we are performing will enable only the possibility to create a new question type "voice message".
With a simple but powerful implementation like this we will be able to create questionnaires dedicated to oral reporting and we will be able to use the existing feature of "channels" to create:

  1. a channel dedicated to written reporting featuring a written questionnaire
  2. a channel dedicated to oral reporting channels

For the topic that you are discussing we opened ticket #3482
Please load there your ideas if any. Thank you.

As discussed in fact we currently agree to not mix oral reporting and written reporting because as already discussed above only free text questions enable to be replaced by an oral recording; it is impossible to implement dropdown and conditional logics with oral reporting and for this reason we consider.

@gianlucagilardi
Copy link

Hi all! To answer @evilaliv3 , we think the whistleblower should be able to see/listen to their own recording. Unlike @gianlucagilardi , we also believe the whistleblower should be able to download their recording, just as they can export/download their report today.

Can (s)he?
If I am not missing some configuration, the whistleblower cannot export the answers/report and the only way to get a copy is out of platform via screenshots. This is the reason I stand for playback and not download, just for consistency sake: either you can export anything or you can export nothing :)

@msmannan00
Copy link
Contributor

Hi all! To answer @evilaliv3 , we think the whistleblower should be able to see/listen to their own recording. Unlike @gianlucagilardi , we also believe the whistleblower should be able to download their recording, just as they can export/download their report today.

Can (s)he? If I am not missing some configuration, the whistleblower cannot export the answers/report and the only way to get a copy is out of platform via screenshots. This is the reason I stand for playback and not download, just for consistency sake: either you can export anything or you can export nothing :)

sorry for the late reply. what i meant was not from the system itself. for instance think that a person plays the recorded video and when the voice comes out of speaker he puts his mobile in front of it to record that audio coming out of the system. other than that for instance the audio is playing on his system and their are a lot of software for instance imdb that can download that sound even when the software itself doesnt support export because audio is playting in the browser. is something like screen shot as you said. most apps dont allow screenshot but no one can stop him to take picture from another mobile directly from camera :)

@msmannan00
Copy link
Contributor

msmannan00 commented Jun 24, 2023

The latest update.

  1. Recorder indication added
    Screenshot from 2023-06-25 01-54-44
    Screenshot from 2023-06-25 01-54-34

  2. minimum and maximum range added. default min is 10 seconds, default max is 60 seconds
    Screenshot from 2023-06-25 02-10-19

@msmannan00
Copy link
Contributor

msmannan00 commented Jun 24, 2023

Hello again I am sharing the live demo for you guys to comment better.

https://demo.whistleaks.com/

username: recipient
password: UmT@123456789
username: admin
password: UmT@123456789

@evilaliv3
Copy link
Member Author

Thank you @msmannan00

I like interface.

I would suggest, if possible. to not show the player and the lenght if the user has not recorded till the user will start recording.

By the way at the moment i'm encountering some issue testing it; does it work for you?

I was able to record only once on Chrome, but not on Firefox and Brave.
Chrome as well reports that we should probably use a more up to date API; the error is "[Deprecation] The ScriptProcessorNode is deprecated. Use AudioWorkletNode instead. (https://bit.ly/audio-worklet)"

@msmannan00
Copy link
Contributor

msmannan00 commented Jun 24, 2023

Sure, will look into it. I tested in most of browsers but let me check. For the player not showing I had disable it till no recording but would completely hide it in next build till not recorded

@msmannan00
Copy link
Contributor

as suggested player and stop icon hidden till recording finished

Screenshot from 2023-06-27 02-27-22

Screenshot from 2023-06-27 02-26-57

@danielvaknine
Copy link

I'm not quite sure that I understand the new "shield" symbol, can you please explain what it is?

Screenshot 2023-07-04 at 11 35 58

@danielvaknine
Copy link

I'm not quite sure that I understand the new "shield" symbol, can you please explain what it is?

Screenshot 2023-07-04 at 11 35 58

@msmannan00 could you clarify this? If it's intentional – what is it?

@msmannan00
Copy link
Contributor

Sure, the shield represents secure audio when it is manipulated. For normal normal audio will be recorded and save with shield altered audio will be uploaded and saved

@danielvaknine
Copy link

I'm sorry but I don't quite understand how it should be operated and I think this might be a confusing extra function for the whistleblower to use without instructions etc

@msmannan00
Copy link
Contributor

so moving forward with audio masking at the moment this is the status

  1. Code is optimized
  2. For wav file generation and other features local browser libraries are being used at the moment
  3. stereo voice converted into mono single channel and sample rate channel lowered to decrease the size of file
  4. low pass filter implemented

For masking at the moment did two major things for the time being

  1. very small time stretching implemented with random shifting to put pinch of manipulation after that
  2. pitch shifting introduced with factor of 3 and -3 ignoring ranges between 1 to -1 and applied after time stretching which makes the voice more distinct.
  3. noise cancellation also implemented to improve voice quality
  4. other filter like white noise and butcrusher etc also removed

@evilaliv3
Copy link
Member Author

evilaliv3 commented Jul 22, 2023 via email

@evilaliv3
Copy link
Member Author

Implemented and merged since 4.12

@danielvaknine
Copy link

Is there anywhere we could test this function, e.g. https://demo.whistleaks.com/ or https://try.globaleaks.org/ ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants