The Audio Auditor: Participant-Level Membership Inference in Internet of Things Voice Services

The paper is accepted by PPML workshop 2019: The Audio Auditor: Participant-Level Membership Inference in Internet of Things Voice Services

Methodology

The primary task to train an audio auditor is to build up several shadow models to infer the targeted ASR model's decision boundary. We assume all learning algorithms Altar are known to the auditor; therefore, the learning algorithms for the shadow model are known accordingly (Al_shd = Al_tar). Different from the target model, we have full knowledge of the shadow models' ground truth. For a user u querying the model with her audio samples, if u ∈ D_shd^train, we collapse the features extracted from these samples' results into one record and label it as "member"; otherwise, "nonmember". Taken all together with these labeled records (processed), a training dataset is set to train a binary classifier as the audit model using a supervised learning algorithm. As also evidenced in [19], the more shadow models built, the more accurate the audit model performed.

For participant-level membership, some users' pertinent characters are extracted from each output, including the transcription text (denoted as TXT), the posterior probability (denoted as Probability), and the audio frame length (denoted as Frame Length). The features of the auditor's training set are written as: *{TXT1=type(string), Probability1=type(float), Frame_Length1=type(integer), ..., TXTn=type(txt), Probabilityn=type(float), Frame_Lengthn=type(integer), class}*, where n is the number of audios belonging to a speaker.

Build the ASR model

Goals

Build up 3 ASR models, while one as the target model and the other two as the shadow models.

Steps for each ASR model:

Install pytorch-kaldi toolkit
Separate TIMIT dataset into 3 subsets.
Follow pytorch-kaldi's instruction to build up the model using the TIMIT subset.
Gain the transcription results from the built ASR model and save it as .log in data/ folder

Train an Auditor model

Preprocess:

~$ ./log2csv.sh
~$ python txt2csv.py

Build up the auditor model using the decision tree algoritghm:

~$ python audit_member.py

Plot the performance of the auditor model

~$ python plot_fig.py

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
README.md		README.md
audit_member.py		audit_member.py
log2csv.sh		log2csv.sh
log2txt.sh		log2txt.sh
plot_fig.py		plot_fig.py
readme.txt		readme.txt
shd1_out.log		shd1_out.log
shd1_out.txt		shd1_out.txt
tmp.csv		tmp.csv
tmp.txt		tmp.txt
txt2csv.py		txt2csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

README.md

audit_member.py

audit_member.py

log2csv.sh

log2csv.sh

log2txt.sh

log2txt.sh

plot_fig.py

plot_fig.py

readme.txt

readme.txt

shd1_out.log

shd1_out.log

shd1_out.txt

shd1_out.txt

tmp.csv

tmp.csv

tmp.txt

tmp.txt

txt2csv.py

txt2csv.py

Repository files navigation

The Audio Auditor: Participant-Level Membership Inference in Internet of Things Voice Services

Methodology

Build the ASR model

Goals

Steps for each ASR model:

Train an Auditor model

About

Releases

Packages

Languages

skyInGitHub/The-Audio-Auditor

Folders and files

Latest commit

History

Repository files navigation

The Audio Auditor: Participant-Level Membership Inference in Internet of Things Voice Services

Methodology

Build the ASR model

Goals

Steps for each ASR model:

Train an Auditor model

About

Topics

Resources

Stars

Watchers

Forks

Languages