# Dataset Processing for WSJ0

## Decompress data files and convert to WAV
Used sph2pipe (a <a href='https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools'>SPHERE conversion tool</a> distributed by the <a href='https://www.ldc.upenn.edu/about'>Linguistic Data Consortium (LDC)</a>) to decompress NIST SPHERE audio files with <a href='https://www.loc.gov/preservation/digital/formats/fdd/fdd000199.shtml'>Shorten lossless compression</a> and convert audio signal to .wav format. The following code uses the bash shell to iterate through the WSJ0 file structure and process audio samples labeled as speaker-independent training samples recorded directly into a Sennheiser microphone. More information on the WSJ audio corpus (e.g., directory structure, filenaming formats) is available <a href='https://catalog.ldc.upenn.edu/docs/LDC94S13A/wsj1.txt'>here</a>.

In [6]:
%%bash
for dir in ./csr_1/11-1.1/wsj0/si_tr_s/*/
    do
        for f in $dir*.wv1
            do
                sph2pipe -f rif $f ${f%.*}.wav
            done
    done

Input file ./csr_1/11-1.1/wsj0/si_tr_s/01j/01jo0301.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010j.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010o.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010p.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010q.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010r.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010s.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010t.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010u.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010v.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010w.wv1 is not a valid SPHERE file
Input file ./csr_1/11-1.1/wsj0/si_tr_s/01p/01pa010x.wv1 is not a valid SPHERE file
Inpu

In [35]:
%%bash
for dir in ./csr_1/11-2.1/wsj0/si_tr_s/*/
    do
        for f in $dir*.wv1
            do
                sph2pipe -f rif $f ${f%.*}.wav
            done
    done

Input file ./csr_1/11-2.1/wsj0/si_tr_s/02b/02ba0101.wv1 is not a valid SPHERE file
Input file ./csr_1/11-2.1/wsj0/si_tr_s/20l is not a valid SPHERE file
Unable to open (jgisrael@iu.edu)/*.wv1 as input
Unable to open ./csr_1/11-2.1/wsj0/si_tr_s/20l/*.wv1 as input
Input file ./csr_1/11-2.1/wsj0/si_tr_s/20m is not a valid SPHERE file
Unable to open (jgisrael@iu.edu as input
Unable to open 2)/*.wv1 as input
Input file ./csr_1/11-2.1/wsj0/si_tr_s/20m is not a valid SPHERE file
Unable to open (jgisrael@iu.edu)/*.wv1 as input
Unable to open ./csr_1/11-2.1/wsj0/si_tr_s/20m/*.wv1 as input


## Combine data samples with random amounts of overlap to simulate crosstalk

To simulate conversations with potential for crosstalk, audio was combined from pairs of speakers with a random overlap between each pair of signals.  Overlap was varied between .5 seconds (8,000 frames), which accounts for virtually no overlap due to delay at the beginning and end of each recording, and 3 seconds, (48,000 frames), which would provide a simulation of conversants speaking over each other to varying degrees.  Magnitude was normalized during any period of overlap to maintain consistent range throughout the combined audio segment. Each speaker was paired with the same second speaker, so mixed speaker audio samples can be further mixed to create longer simulated conversations featuring the same two speakers only.

In [28]:
import librosa
from IPython.display import Audio
import glob
import random

In [34]:
mixed11_1 = []
dirs = glob.glob('./csr_1/11-1.1/wsj0/si_tr_s/*')
random.seed(12)
for n in range(0,len(dirs),2):
    try:
        path1=dirs[n]   
        path2=dirs[n+1]
        files1 = glob.glob(path1+'/*.wav')
        files2 = glob.glob(path2+'/*.wav')
        for f in range(len(files1)):
            try:
                # load two files
                a, ar = librosa.load(files1[f], sr = None)
                b, br = librosa.load(files2[f], sr = None)

                # create random overlap (from 0 to 2 seconds)
                overlap = randint(8000, 48000)
                if (len(a)>overlap) and (len(b)>overlap):
                    c = (a[len(a)-overlap:] + b[:overlap])/2
                    joined = np.concatenate((a[:len(a)-overlap],c,b[overlap:]))
                else:
                    joined = np.concatenate((a,b))
                # output mixed file
                path = 'mixed/si_tr_s/'
                outfile = files1[f][-12:-4]+'_'+files2[f][-12:-4]+'.wav'
                mixed11_1.append(outfile)
                outpath = path+outfile
                librosa.output.write_wav(outpath, joined, sr=ar)

            except IndexError:
                print('There are no more files in '+str(path2)+'.')
            
            except EOFError:
                print('Unable to read file pair: '+ str(path2) + ' and ' +str(path2)+'.')
                
    except IndexError:
        print('There are no more directories to combine.')
        

There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/012.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/018.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/018.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/01e.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01j and ./csr_1/11-1.1/wsj0/si_tr_s/01j.
There are no more files in ./csr_1/11-1.1/wsj0/si_tr_s/01j.
There

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01p and ./csr_1/11-1.1/wsj0/si_tr_s/01p.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01r and ./csr_1/11-1.1/wsj0/si_tr_s/01r.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01t and ./csr_1/11-1.1/wsj0/si_tr_s/01t.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01v and ./csr_1/11-1.1/wsj0/si_tr_s/01v.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01x and ./csr_1/11-1.1/wsj0/si_tr_s/01x.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/01z and ./csr_1/11-1.1/wsj0/si_tr_s/01z.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/021 and ./csr_1/11-1.1/wsj0/si_tr_s/021.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/023 and ./csr_1/11-1.1/wsj0/si_tr_s/023.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/025 and ./csr_1/11-1.1/wsj0/si_tr_s/025.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/027 and ./csr_1/11-1.1/wsj0/si_tr_s/027.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_tr_s/029 and ./csr_1/11-1.1/wsj0/si_tr_s/029.
Unable to read file pair: ./csr_1/11-1.1/wsj0/si_t

In [36]:
len(mixed)

1519

In [37]:
mixed11_1 = mixed

In [38]:
mixed11_2 = []
dirs = glob.glob('./csr_1/11-2.1/wsj0/si_tr_s/*')
random.seed(12)
for n in range(0,len(dirs),2):
    try:
        path1=dirs[n]   
        path2=dirs[n+1]
        files1 = glob.glob(path1+'/*.wav')
        files2 = glob.glob(path2+'/*.wav')
        for f in range(len(files1)):
            try:
                # load two files
                a, ar = librosa.load(files1[f], sr = None)
                b, br = librosa.load(files2[f], sr = None)

                # create random overlap (from 0 to 2 seconds)
                overlap = randint(8000, 48000)
                if (len(a)>overlap) and (len(b)>overlap):
                    c = (a[len(a)-overlap:] + b[:overlap])/2
                    joined = np.concatenate((a[:len(a)-overlap],c,b[overlap:]))
                else:
                    joined = np.concatenate((a,b))
                # output mixed file
                path = 'mixed/si_tr_s/'
                outfile = files1[f][-12:-4]+'_'+files2[f][-12:-4]+'.wav'
                mixed11_2.append(outfile)
                outpath = path+outfile
                librosa.output.write_wav(outpath, joined, sr=ar)

            except IndexError:
                print('There are no more files in '+str(path2)+'.')
            
            except EOFError:
                print('Unable to read file pair: '+ str(path2) + ' and ' +str(path2)+'.')
                
    except IndexError:
        print('There are no more directories to combine.')
        

Unable to read file pair: ./csr_1/11-2.1/wsj0/si_tr_s/02c and ./csr_1/11-2.1/wsj0/si_tr_s/02c.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/02c.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/209.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There are no more files in ./csr_1/11-2.1/wsj0/si_tr_s/20j.
There

Generated 4,356 files with two voices each.  True speakers and the order in which each speaks is preserved in the name of each file (and stored as binary streams in mixed11_1.pkl and mixed11_2.pkl below.

In [40]:
import pickle

# save lists of mixed filenames
with open('mixed11_1.pkl', 'wb') as f:  
    pickle.dump(mixed11_1, f)
with open('mixed11_2.pkl', 'wb') as f:  
    pickle.dump(mixed11_2, f)