-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes + Channel Selection for CHiME-7 Task #4934
Merged
Merged
Changes from 34 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
69fe371
addressing Taejin pointed out issues
popcornell 5bf3c5a
addressing Taejin pointed out issues
popcornell 644e58e
fixed md5sum check on original chime6 script
popcornell dc770b4
Merge branch 'master' of https://github.com/espnet/espnet
popcornell 445e3ae
adding channel selection
popcornell b217ed7
revert
popcornell 2518f81
revert
popcornell 721a64a
revert
popcornell c4a58b2
added skip stages to asr dprep
popcornell f57c1d5
added flag to generate evaluation
popcornell 447bd9d
addes contain function to data.sh
popcornell 1f106f7
minor changes to run.sh
popcornell 0d84fcb
with pretrained
popcornell 263c36c
data.sh, skipping for decoding only
popcornell 92dedbd
soundfile much faster than torchaudio
popcornell 133dcd6
revised channel selection
popcornell a594319
applied linters
popcornell d888a38
applied linters
popcornell e8dc4d3
added jiwer and conda prefix
popcornell dd91c90
added dr kamo suggestion
popcornell 17efdb2
changed stage
popcornell a531c03
better default
popcornell a412cc4
readme changed instructions
popcornell d99720d
gss2lhotse changed
popcornell df99724
Merge branch 'master' into chime7task1
popcornell ea808d1
prevent exiting on data.sh
popcornell 4180cf7
sox is appended after
popcornell 31292ce
data prep is needed
popcornell d73ddda
addressed LDC path issues with train calls and mixer6
popcornell 930d388
changed error display
popcornell ebe8db9
some comments changed
popcornell 270fad9
default is 80% mics channel selection
popcornell cfbb957
Merge branch 'chime7task1' of https://github.com/popcornell/espnet
popcornell c1abe1b
applied black
popcornell f28cca8
applied black
popcornell e87df34
added registration link to README.md
popcornell a49870a
added details about evaluation script
popcornell 00308e1
added details about non determinism in GSS inference
popcornell 5179f7a
Merge branch 'master' into chime7task1
popcornell File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
import argparse | ||
import glob | ||
import hashlib | ||
import json | ||
import os | ||
from pathlib import Path | ||
|
||
import tqdm | ||
|
||
|
||
def md5_file(fname): | ||
hash_md5 = hashlib.md5() | ||
with open(fname, "rb") as f: | ||
for chunk in iter(lambda: f.read(8192), b""): | ||
hash_md5.update(chunk) | ||
return hash_md5.hexdigest() | ||
|
||
|
||
def glob_check(root_folder, has_eval=False, input_json=None): | ||
|
||
all_files = [] | ||
for ext in [".json", ".uem", ".wav", ".flac"]: | ||
all_files.extend( | ||
glob.glob(os.path.join(root_folder, "**/*{}".format(ext)), recursive=True) | ||
) | ||
|
||
for f in tqdm.tqdm(all_files): | ||
digest = md5_file(f) | ||
if not has_eval and Path(f).parent == "eval": | ||
continue | ||
|
||
if not input_json[str(Path(f).relative_to(root_folder))] == digest: | ||
print( | ||
"MD5 Checksum for {} is not the same. " | ||
"Data has not been generated correctly." | ||
"You can retry to generate it or re-download it." | ||
"If this does not work, please reach us. ".format( | ||
str(Path(f).relative_to(root_folder)) | ||
) | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser( | ||
"Compute MD5 hash for each file recursively to check" | ||
"if the data generation and download was successful or not." | ||
) | ||
|
||
parser.add_argument( | ||
"-c,--chime7dasr_root", | ||
type=str, | ||
metavar="STR", | ||
dest="chime7_root", | ||
help="Path to chime7dasr dataset main directory." | ||
"It should contain chime6, dipco and mixer6 as sub-folders.", | ||
) | ||
parser.add_argument( | ||
"-e,--has_eval", | ||
required=False, | ||
type=int, | ||
default=0, | ||
dest="has_eval", | ||
help="Whether to check also " "for evaluation (released later).", | ||
) | ||
parser.add_argument( | ||
"-i,--input_json", | ||
type=str, | ||
default="local/chime7_dasr_md5.json", | ||
dest="input_json", | ||
required=False, | ||
help="Input JSON file to check against containing md5 checksums for each file.", | ||
) | ||
args = parser.parse_args() | ||
with open(args.input_json, "r") as f: | ||
checksum_json = json.load(f) | ||
|
||
glob_check(args.chime7_root, bool(args.has_eval), checksum_json) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for mention me, but sorry, I don't have Ph.D. :->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I am sorry. I can remove it. Honestly you deserve an honorary one ;)