Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash issues in AdvancedTreeSearchLmImageAndGlobalCacheJob #430

Open
NeoLegends opened this issue Jul 18, 2023 · 2 comments · May be fixed by #446
Open

Hash issues in AdvancedTreeSearchLmImageAndGlobalCacheJob #430

NeoLegends opened this issue Jul 18, 2023 · 2 comments · May be fixed by #446
Labels
bug Something isn't working

Comments

@NeoLegends
Copy link
Contributor

The way the AdvancedTreeSearchJob constructs the AdvancedTreeSearchLmImageAndGlobalCacheJob makes it an easy subject to hash issues.

I'm running decodings on LibriSpeech, and found the following job instances in my work folder. All compute the same 1.8G LM image but have different hashes. In my case this is likely due to differing TDP scales, which go into the CRP for the LM image job and change the hash.

Another candidate for (a different kind of) hash issues is the feature scorer, which also goes into the CRP but is explicitly left hashed.

❯ shasum $(ls work/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.*/output/*.image | xargs -n 1 cf)
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.9ZuNbY6vfq4B/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.akOZ3zJXuFWb/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.gqX8MWYitMdb/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.j2O4KrVDm2OD/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.JH9tbiSyYaU5/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.JMMQd6Qt7L4Y/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.KuQpQX2Bh0x3/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.KZg6mlchtLBB/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.NCzSYGARj48g/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.NLuAGBkMBB3P/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.nLvpXSLvNPai/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.onldbKiFZ9Yu/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.osWzDVEIASI1/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.sz9KWZd5IkfX/output/lm-1.image
4cc098b119a7f2e5d692aebacfd92ef1a132bf76  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.uRxs6uEiLpHJ/output/lm-1.image

The global caches that are computed are indeed different, and so splitting the job into two jobs (one for LM image, one for global cache) might be a solution:

❯ shasum $(ls work/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.*/output/*.cache | xargs -n 1 cf)
032e23952d190f2037a5171e4a379e8f8943a51f  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.9ZuNbY6vfq4B/output/global.cache
5f8cb12a3cc495bab18e07883001696f42a70c08  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.akOZ3zJXuFWb/output/global.cache
da28cab82e48d176718a6dceb138f77224e21545  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.gqX8MWYitMdb/output/global.cache
3b5cd599e3f00375335681af962578dd8b7af1a8  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.j2O4KrVDm2OD/output/global.cache
66ce732c9656b7a32ff9d05760eb35d63720b424  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.JH9tbiSyYaU5/output/global.cache
2cbe533d5245886945cdd525657ca07e9d3b0831  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.JMMQd6Qt7L4Y/output/global.cache
16b7cf5a95e06b658a7f40763a5cb8aa4ec2b808  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.KuQpQX2Bh0x3/output/global.cache
27fb128dd0a76e1bc3be1fc423452ca4e230e91d  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.KZg6mlchtLBB/output/global.cache
dd2773af1471fb8a6527bfd3fba03e4613ce6869  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.NCzSYGARj48g/output/global.cache
2128afe006bfb39a6a8bc89c445082dcba8ca0ba  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.NLuAGBkMBB3P/output/global.cache
2964b6d362cd896e2b144a3d9ba614404457c08a  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.nLvpXSLvNPai/output/global.cache
34d97b5f4fcdb962cde87d5dcd71a059a2a23829  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.onldbKiFZ9Yu/output/global.cache
94edb564a1b966c6ae27747625bbff7fb6ad13d6  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.osWzDVEIASI1/output/global.cache
2e34c5a2d41de5f7c3ba7021c4cbfdfa4749032d  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.sz9KWZd5IkfX/output/global.cache
36fb52477284c7377bffbabe55d8e56b3912c60d  /var/tmp/mgunz/work/asr3/raissi/shared_workspaces/gunz/2023-05--thesis-baselines-tf2/i6_core/recognition/advanced_tree_search/AdvancedTreeSearchLmImageAndGlobalCacheJob.uRxs6uEiLpHJ/output/global.cache
@NeoLegends NeoLegends added the bug Something isn't working label Jul 18, 2023
@michelwi
Copy link
Contributor

splitting the job into two jobs (one for LM image, one for global cache) might be a solution:

There is CreateLmImageJob and BuildGlobalCacheJob.

@Atticus1806
Copy link
Contributor

Whats the status here? Is this "solved"?
If there is some replacement code which reproduces the joint behavior with the two jobs can you post an example @NeoLegends ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants