New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctc loss get nan after some epochs in pytorch 1.0.0.dev20181115 #14401

Closed
WenmuZhou opened this Issue Nov 27, 2018 · 15 comments

Comments

Projects
None yet
7 participants
@WenmuZhou
Copy link

WenmuZhou commented Nov 27, 2018

馃悰 Bug

when I train a cnn-rnn-ctc text recognize model, I meet nan loss after some iters epochs, but it's ok at pytorch 0.4 with warpctc

Steps to reproduce the behavior:

download the code from https://github.com/WenmuZhou/crnn.pytorch
change the ctc loss from warpctc to nn.CTCloss()
run
PyTorch Version (e.g., 1.0): 1.0.0.dev20181115
OS (e.g., Linux): ubuntu 16.04
How you installed PyTorch (conda, pip, source): pip3
Build command you used (if compiling from source):
Python version: 3.5.2
CUDA/cuDNN version: 8.0/6.0
GPU models and configuration: 1080ti

@t-vi

This comment has been minimized.

Copy link
Contributor

t-vi commented Nov 27, 2018

Can you please try to find a more minimal repro?
This could be achieved by keeping the last inputs for the loss around and checking when the gradients become NaN.
Note that in contrast to WarpCTC, PyTorch's CTC will not zero out losses that are infinite (because the target sequence (+ the number of repeated symbols) is longer than the input sequence and so the likelihood of the target is 0). Also, the default scaling of the two losses are not identical.

That said, I should publish my CRNN or adapt @meijieru's as a tutorial.

@WenmuZhou

This comment has been minimized.

Copy link

WenmuZhou commented Nov 27, 2018

Sorry, the nan is appear after some epochs not iters. I will try to use the original crnn code to train my dataset in pytorch1.0

@WenmuZhou

This comment has been minimized.

Copy link

WenmuZhou commented Nov 28, 2018

@t-vi I have trained my dataset with crnn in https://github.com/meijieru/crnn.pytorch with pytorch 1.0.0.dev20181115, and the nan is appear again

log

/home/zj/.local/share/virtualenvs/crnn.pytorch-LLhhRepM/bin/python3 /data1/zj/tmp/crnn/train.py
Namespace(adadelta=False, adam=True, alphabet='鑷e嚮灞呯Щ榧荤倝鏍榄忓睍澶氶厷铇戞暟娉ユЫ閲庤摑鏁e敭鍗犳湇妫鍚簻钁靛挩寰疯敁閫鎵挄宀╅攤娴呯绌硅湣鍚歌尟纰嶆棴杩硾鑸潗杩滅尓鏌忚倢姝㈢儹娲绘悢閼姺婀胯楅攩鑽鐩忚耽姊姂鍚湅鎶婂彇鐧冧笣鑰涓告瘮鍘氬埄鍐诲強琛版檵寮撴鑲叉敮鑿姘﹀仴钄″樊琚栫鎭祹璋婇挗鍛艰嫹绐胯悿鍑濇煋鍑岀粨灏辨墿鎵跺垎鍚庡噰鑱斿崄纭鐓庤吹鍙嶅贩绯椂姒嗙炕璧勬璧烽焊甯歌█鍙樺ぇ鍒堕殭鑳界緹鍐犺挕姘旇禆钄樺紶鎵捐偤鐧句究鎭哄喎閿岃涓撴彺鏅箍涓夌強绁ョ(绨ч宀涜但濞冪獊钖㈣礉鐔熷肩ǎ鐪犱匠瀚╀粰鏅掗叺娣嬬敇鎷旇浜庡欢闇滃焹鍝嶉粦鐤f楹掕殌鑼佃疆榻︾縼姘翠钩槌炶浆鐩嗗媰鍔堟悘閿车姣曟瀲鏂滃瘽钁涜亴鑺冨憡鍚勯洩姊伴暱鍝堢弬寰氦鏁欐煰澶囩櫀姣忛鑱氬洜娈嬬淮閱柉涓冭毒缁曟昏豹鑽夊洯鍞愰檲缁锋境闃缁欐恫璇辩璋辩鑺嫓鑹.鎰堣瘍鍖荤棘绾︽兜鐭摤鍥芥垷鍙佸績缁跨惣璺檯鑲矁鐜涙尟瀛氶攢濂堢画姘戝彨鏂圭櫆鎷曚骇缈犲澶╅緳搴旀箻閭晳杞╄窗鍜鐨傜瓥鐥版稁閾句护娌夌棧鍠夊唨閫犳帺韬澀鏈鍔插潳鍎跨枤绡儳浼熼粡婵姹澧冪瓭褰ょ洅鎬ф京鑿蹭綍杈愬┓楦℃皬浣庣櫏鑺愮槫鐑粬妤傝懌妗夋綐鏄撴播灏肩媯澹扮硸鍏拌惃杞寸ǔ灏橀敠杩囧眿鐠冨憢鍢уΞ濉炵灣鏉挎獥娉告.婊╄偩椴ゆ鏂涘簞棰у粖鎹疯鐣告潖姘崯鑶闄囨禍瓒呭虹垎姒郴鍏嬪閫嗘墎瀹よ捊鐜綑椋炲繊璋㈡ˉ鍏氭邯杩愪綔鑵扮涔夌考鎵樼巼鑳氱紑鍞潐鍙绮涵娉婃墜姣掑憰榧撶瀛熼奔鎺㈤櫓閫熷嚑鐫¤偉鏍囩传鐑熼唶琚复涓板紩澶忓睘娴撹剴楗箯鐤℃姠鎯ㄤ笂搴嗙┛鐔忓磭铚曞弻濡嗗摦鍙︿緷鍚戝帇鍡﹀瓙娴风洊鍑嗙摚鎴疯泧淇瑙勫笇鐝堝閰版淮鍔涘垪鐤句笟鐥硅儢閲屾畩寮氦宸ㄦ熆閽夌櫍鍚茶眽姣嶉啔闂芥檿璇婅悩鍠虫崒鏇奸摐鑸掓仼娴姊㈤洴鍐滄湡璧涢檮姹侀暙杩疯偂鑴夌鍟舵钂垮攪浣嗙姸鐓ㄦ郸鍖闇版潨瀵绘簮鏌勮创娉夊晢娴佹惤鍘熻景鑴忔壃铔稿铏粯鐤吔妯℃瘺閰炶劸绉扮彔涓庨簼璧濇灊鑰典竾鑼卞崥鍘勮寕棰濈槹缃杈涗含鏉锋枃鍔′箙鏁村ぎ椋橀儹甯冮噸鍚堜紣鑼勫潡鍥惧叧铏捐鏈变换鍞ユ鼎鍧抽兘楂嬭渹濂惰敺鍙楀帵浼楅韪濈獫浜ラ璁版伓鏈堣啅閱嬫吵鏋滆搐搴欏弶灞傛繁閰嶈啙鐩堟爲棣ユ姕閿$槝婊嬭澹紓浠樺惞骞芥嘲鎬ヨ姳璧垫牳濡囩敎鍙鐞嗗崡鑸岃儉钘婚晬楂樺櫥閿愯窘椴滈儊鑴叉矏渚┒椹嫇缈冲洟绱ц姵娓呭僵鏍煎摎鍦崇鐘煡娑堟牀姘熻愬瓟淇冪剻缇ゅ挘榛勯硠瀹虫垔鏍惧铔嬪憿骞舵埧琚挏灏氱~鍑虹収鏈ㄧ墰鎽樺棄铔颁簹鐧搁捇閲忓缇庤亶寮庢坊鍑¤惀閬佃閾靛浐鐗╃偣鏋稿瀭鑸规睙鐚村凡钀告鍜楄叜榫勫眻鐤兼兂钁℃箹鍒烽吀鏃佸鍤忎徊鍙嬭姟鍙斿懕鍠滅涓规灆甯曟澃铻垫磥骞兼晫棰岃瘯濉钞鍒烘闃崇伃鑲借椏鍏夋濡傚甫杩炴晱铦舵睘缁滄嫍鑾鑻忛浘閲婅皽鍏冭ˉ鑽炴祮绲厖绉熼叝鍟夋憚鍧戞墍绉鑾辫揩鑳:鑽′經鏉傚し鑻犳帆鎷ㄩ捑杩樼绉樼墝鍔挎浮涓滅倲缃戠暀闃佺洏鐕ユ睏鍙傛绱犱粙铏庨鑼涜繘闂佃彑鑽憜娌涘鑻嶉ケ钑婄绫撶粌寰呮灣棰勬弧鐫涢拞璋冭媱鍛樼梾娣宠摚缁忓鐭崇洂鍩旀皼鏅ㄧ湁鏃╄鍥涙Υ姗℃晠澧村嚡闅炬挒鐟炶ざ娌垮熀鐥樻瑜愬叚浜跨緮涔嬪熬绋庤吇鑺庡瘑钘ら害鐧撹劘鎵嶄鸡鎺堣8鍜栦緥鐤瑰叏閾佺ü鏅姏鐤忔潹甯堟补铻嶆潃濂芥场鐏板啞绾虫湭鎬佺柍璧ゅ嘲鍨湇鐜叡鐢欐祴浣跨槧闇娣ゆ灟绀惧共绾剁储姝e瘨鎯规爴楦熻嫛鏆傜浼嶆槧鍗锋柇缃曠惓婊ら瞾鐑欑紭鐤唱鎸夊附绮掕倖涓佷簲澶嶈偑鐙诲亣鐫﹂涔卞ⅷ韫畽娲借帹闂哥エ鏈涙硶鍒欏瀷鏅村灨鑿滀箼閫嶅乏鍏告姢椴ㄥ悂鐨跨帇鑷虫潪鍋滅珯绫虫嫇浣撶僵璇哄す鍠樼殑鐢熷绀轰簨杩戞媴鑸熼偙鏌冲啗闅忚姪灏ゆ崍鐙楁敼鍗浇鐫濞犳咕瑗胯佽厰鑳庢懇娴哥悈绌楃澹佸磾绉︾殝鐏鎻愮7宸炵硿鑺ュ競鐒剁壒鍖烘潯鍚ㄦ灒鍗庣紦榛庣閽炵緤涔冨搯鏆楄胺鑼寸█瑙夋剰鐡剁晫钂欐贩鏀垮埢璧板潶闉h緭瀹湴妲愬緞娌诲喍瑙掗偊鑹介仴浣拌弻濞滃洝婧剁綈绾哥┐鍩嬭楦挎搏鑾庢潈鎷╅亗闀滅摲鍣敨瓒嬭啨澶村眬鑵虹矇搴︾棝姝嗙埥妫夊瓬鐤楀╃伕鎶浜屽鐢板崪棣堜簯钖勯『闂ㄦ浛鐐奸珦鎮夊牭鏂藉巺瀹冧欢姗兼壄绉紩鎭掕礊鍗曟婕辩嫭閽$畝鑹插娲楅摳璐ㄨ潕钘忕洿鑳冭皥瀹h揪钄楃罕璁″偛楹熷勾鐖綍鍝佸畾鑰呰懕鎶掕瘽蹇呯被娲炰涵姹囪焻鍒氱洃宸ユ鍚嶇鍕掓眽闈剁熅杈硅吂鑲濈┖鑺绫嶉晛璇︾兎鍒濊9鍦fⅷ娓嶈悵鍞戣敾绗涙灚妞界儊鑲反绮捐В鑲熻們灏勫畬涓哄姵鎾╀箤鍗囧畤浠芥樉鎿嶆1妞庢牱鑳惰兒闃翠几鐩艰挷姘㈢柋鐤叉秾锜逛唬褰掑够杩堣眴閲夌汉浣忔シ鏃辨郴鏋鍙叉帓鐢鹃殧鍘垮純鑿倰楣呰櫣鍠圭浠伴叾甯呯珶宓╄嚜鏇瑰拰濡ユ閰掗厫瑁曢《闈欓綈濉樿彵鑴愰墠鍛充紶钁惰摷鐪濆啋鑳屽緥鎰熻媼璐熸崲婧磋吘鑰宠洯鍗婅弴鎭偄鎸畯鑽ч挍鍚婄紳鎯呴槇蹇荤妧鏄嗗惎寰佷繛浼ら緢闂繕鍒瀹佷及鏈夎穼寮辩亴娴e垏搴楅檰鑳搁挴閴村瞾灏斿康瀵岄檺鍐嶉挬涓嬭蒋鎴愯斀鐪艰崋鐤戝幓閫夊浆鐮旂槴闃叉殩鎷嗛氬懁鍋氭様鐗囨牴鏁堝垁鍢屼笉鑺︾瓑闄㈠矐鍏瓫鏋佸簥閰i夯绮樺紡鏉炬々澧╃即婊戞父鏌村棯鍜悗鐥ゅ湡鍘嗛儴鏉ヨ帢鍛ㄦ垨绾堥棿鑶ㄩ鑺拡瀛﹀叿鎺ョ攣鑺芥湯瑙︽敞灞忔薄钀嶇粧閿侀晧鑷傚蔼瀹橀泦鎹¤湕鐮寸劍鐜栭唽娉㈠吇褰㈠脊閫愭姘ゼ鍩庡北閫忓鎸備節棰堜笝鑻规櫤鍖栬储灏侀箍鑲╀互钖囩孩寰愯寔鎵垮啲渚ч瀹堕渿鍒涘挸宸ц倹璁‖涔¢搷娉楄触璁よ寘杩呯鐡︾綏钖忕徍鏇濆柗鑺欐硨鐩嶆花闄や紒鍚濆钩鑶滃畨鐧芥棪濂囪鍑ゆ綅璧屽厑鑵辩瓔鎶楁樊鎮﹁幉濞佷釜姘у悍鍜彃鐝绾风矖閱涗富娼滄ⅵ缁煎姬涔愮己鍠勯檧鐫惧簭鐦熺棫缈呮嫭鐑触鍗ゆ澋姝ョ爜鏉戞绱婂棞绗嬩細鍗忚垂娓╃鐩婃濆缓鏀惧彜鍚栨煴浼忚剨鑾攭浣嶅杻鍏辫惁鐏己鑸滈噾绔归敊鏉嗛挻钁╃锠叉鐤噣鑷姌鎵戝悓鏄ョ悆闈炶瘎閾傚懖浠锋煉妫掗潚鑷存悙澹宠拷浠佷拱鍔熶篃閽煎彸鍖呯啫鑵愭矡娉界敤琛ㄩ【澶鎭崇數钃熺値涓ゆ姤鐣勫か鐭唴鑲艰埅杈ㄨ鍗撳疄鍦哄悧闂爞杩欏厛淇勫皯鐢冲壊閽熺ズ蹇ч敯铻ㄦ煚鍔犳鍗冮挶娌抽告棆绗ㄥ徃鑱栨穩娌欒春鍏嶈暀閫佽偆鑴卞尞棰犱繆涓村き娴欏椊鎯犲垬鑰屼箶姣楄拵锜掕偁绠¢湶寰俊缇屾仮绾у埆鍏嗚剛渚涗粈鍥婂鐥旇緟绔嬪彴璇樁鏂伴搮纰辨娊鐐欐懜鍓旂墮鍗′綈椋熸鏄屽ゥ璞℃嬁妗冭繋鐧岃京璞洐灏栨湹浠戝姪鐓潐闃绘笂鍖楃晠鏈楃‖鏉閽典腑鍙锋磼鐪熸瘚娓帢鐥囪創鑸嶆櫠绐嶉』浼戦啋绔壒鎬¤瘉搴氭礇閿嬫墦绾挎満寰楀叺鍓嶈鍏ヨ媮鍧婃鐫戣嫳娈疯儼鍘傝妽鎴存ū鐐喅鐜昏姖鍑夎妭灏忚剳淇濊捀鎷炬帹涓ラ櫔瀵虹壍涓介工鍐茶湝浠旇灪閬胯荆锜惧疂娲嬪浠庡挍鍦ㄩ殕灏胯悊鑵嬪彛鑳嶅笘瑕佹窞鍏跺暋鑾电毊鐥掓晳绯婁华宸遍】鏂欓宀冲尮鏃ㄨ姃娉炬洸褰卞矾瀵瑰綋蹇楀嵃鍗辨収瀹捐惤涓炬摝鍚涙墡钀勭华鐏垫柉鐩骇鍐㈠偓鎹偁閾堕樋缁f笚鐝嶈敩鍫傜儰棰涚晶鎮h鐭ュ瓡鏋愯 闄嶉鍓傚憖纭煎揩姹剧粺鍖鑲夌瑪鍙悙鐟熸臣濉旇鏋勭彮缂╅傚唽鏈敮鍤间紛鍗电憵娑や僵鍝屾暦閾嬫病涓栭粵妗傜摐鑻楀窛鑴堢濂庢厛鍦嗙帀鑽烽殣鑳炲矏绔ュ讥浼冀鑵胯寽鐔婂啺鍓‘搴曠氦妫曞祵娴村崲鐦娴婃柤淇鍢夌浉姘哥碃鑳嗘粸钃夊绫借泿鏃犲槾瀵撴鐗$鍛介泤姘ㄧ湬闆岀尞鍒掗潰鏋楄嫰鏀跺┐缈旂練铓撹。鍑忓拸鐏舵紶闆嶈窡鍞惧潕杞︾檧鎷夌簾閱夋鍏強鎶氬洖闊︾洓鏄骞哥惇闄烽攽鑲跨キ鑽e紑闆勭ⅶ鍚″江甯绾叒閮婃寫鑱嗚厱鑹鹃厞姹ょ汗閽欐寚鍟婂彂鍛愰叜鎷嶅棑鑻︾粐鍚夋贰鏄熼亾鑼朵斧娌告淳鐞ョ潱涓涙湰鏂戣冻鍌呭彾鍍忕ぜ鎱㈡壈瀹滆檻绋嬬坏浜璇氭棫姘╄瘲绠楀墤浣戞姄澶滄帶鑰冩勃绐勮棑浼存槑娑傛弿閲囧惔椤瑰瀵垮櫒濂冲潖姘栫敳纰虫苯闅滃帉瀛楁敾榻跨湅铓曞収鐠愬叴闆佹簝閭f嫑澶变汉闂獙鍜介煶棰呴湠缁嗘偓鍔ㄩ湁鑵ュ瘎鑳卞帀瀵熼挸楣版洿閿诲煿绁炶崯鏃ユ福閽犵埍鎶归浄鐜勫9鍥存床纭峰啓浠熸樇绨囦竴閾虹粍鑳′紭闀囬摑璐﹂浂瀹', batchSize=128, beta1=0.5, cuda=False, displayInterval=100, expr_dir='expr', imgH=32, imgW=320, keep_ratio=False, lr=0.001, manualSeed=1234, n_test_disp=10, nepoch=25, ngpu=3, nh=256, pretrained='', random_sample=False, saveInterval=3812, trainRoot='/data1/zj/data/crnn/txt/train2.txt', valInterval=3812, valRoot='/data/zhy/crnn/Chinese_character/test2.txt', workers=2)
3182
/data1/zj/tmp/crnn/dataset.py:95: UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end].
  batch_index = random_start + torch.range(0, self.batch_size - 1)
/data1/zj/tmp/crnn/dataset.py:100: UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end].
  tail_index = random_start + torch.range(0, tail - 1)
[0/25][100/3183] Loss: 65.721420
[0/25][200/3183] Loss: 46.322186
[0/25][300/3183] Loss: 45.079075
[0/25][400/3183] Loss: 45.056099
[0/25][500/3183] Loss: 43.011551
[0/25][600/3183] Loss: 42.256248
[0/25][700/3183] Loss: 40.859932
[0/25][800/3183] Loss: 38.856678
[0/25][900/3183] Loss: 38.780174
[0/25][1000/3183] Loss: 38.821262
[0/25][1100/3183] Loss: 37.860348
[0/25][1200/3183] Loss: 38.885632
[0/25][1300/3183] Loss: 36.587494
[0/25][1400/3183] Loss: 36.316475
[0/25][1500/3183] Loss: 36.450573
[0/25][1600/3183] Loss: 36.330582
[0/25][1700/3183] Loss: 36.654968
[0/25][1800/3183] Loss: 34.034744
[0/25][1900/3183] Loss: 33.613831
[0/25][2000/3183] Loss: 33.611679
[0/25][2100/3183] Loss: 32.302063
[0/25][2200/3183] Loss: 30.326666
[0/25][2300/3183] Loss: 30.058506
[0/25][2400/3183] Loss: 29.838854
[0/25][2500/3183] Loss: 28.068876
[0/25][2600/3183] Loss: 26.872030
[0/25][2700/3183] Loss: 27.359308
[0/25][2800/3183] Loss: 27.951128
[0/25][2900/3183] Loss: 24.630781
[0/25][3000/3183] Loss: 29.925066
[0/25][3100/3183] Loss: 28.307350
Start val
涓婃捣娴---------------------------------------------------------------------------姘戝尰闄 => 涓婃捣姘戝尰闄               , gt: 涓婃捣甯傚悓浠佸尰闄             
------------------------------------------------------------------------------姘戝尰闄 => 姘戝尰闄                 , gt: 鑲虹鍖婚櫌                
-------------------------------------------------------------------------------鍖婚櫌 => 鍖婚櫌                  , gt: 鐢熺墿铔嬬櫧娴风坏              
涓婃捣---------------------------------------------------------------------------鏈嶅姟涓績 => 涓婃捣鏈嶅姟涓績              , gt: 涓婃捣甯傞粍娴﹀尯绗簩鐗欑梾闃叉不鎵       
涓----------------------------------------------------------------------------鐥呴槻娌婚櫌 => 涓婄梾闃叉不闄               , gt: 涓婃捣甯傛郸涓滄柊鍖             
涓婃捣-----甯---------------------------------------------------------绀惧尯----------鏈嶅姟涓績 => 涓婃捣甯傜ぞ鍖烘湇鍔′腑蹇           , gt: 鏅檧鍖轰笂娴峰競鍒╃兢鍖婚櫌          
------------------------------------------------------------------閬撳睘-鍗-------鏈嶅姟-闄 => 閬撳睘鍗湇鍔¢櫌              , gt: 鏅氶拡鍒烘渶楂樹竴缁            
涓婃捣娴-甯傚競-------------------------------------------------------------鍖哄尯---------蹇冨尰闄 => 涓婃捣甯傚尯蹇冨尰闄             , gt: 涓婃捣甯傛澗姹熷尯涔濅涵鍖婚櫌          
-----------------------------------------------------------------绀惧尯----------鏈嶅姟涓櫌 => 绀惧尯鏈嶅姟涓櫌              , gt: 鐢插湴濉炵背鏉剧7閰搁挔娉ㄥ皠娑         
-----------------------------------------------------------------鍖------------闃叉不闄 => 鍖洪槻娌婚櫌                , gt: 鐩愰吀鍒╁鍗″洜鑳舵祮            
Test loss: 30.476603, accuray: 0.001172
[1/25][100/3183] Loss: 25.731989
[1/25][200/3183] Loss: 25.836498
[1/25][300/3183] Loss: 22.858639
[1/25][400/3183] Loss: 21.557499
[1/25][500/3183] Loss: 20.976313
[1/25][600/3183] Loss: 20.750202
[1/25][700/3183] Loss: 19.771132
[1/25][800/3183] Loss: 19.182550
[1/25][900/3183] Loss: 19.947617
[1/25][1000/3183] Loss: 20.535437
[1/25][1100/3183] Loss: 19.803825
[1/25][1200/3183] Loss: 20.040930
[1/25][1300/3183] Loss: 18.623062
[1/25][1400/3183] Loss: 18.415588
[1/25][1500/3183] Loss: 16.263596
[1/25][1600/3183] Loss: 17.585356
[1/25][1700/3183] Loss: 15.060411
[1/25][1800/3183] Loss: 13.365780
[1/25][1900/3183] Loss: 15.119957
[1/25][2000/3183] Loss: 14.550542
[1/25][2100/3183] Loss: 12.441657
[1/25][2200/3183] Loss: 13.298325
[1/25][2300/3183] Loss: 14.847809
[1/25][2400/3183] Loss: 14.725545
[1/25][2500/3183] Loss: 12.868505
[1/25][2600/3183] Loss: 13.277777
[1/25][2700/3183] Loss: 13.113804
[1/25][2800/3183] Loss: 10.752190
[1/25][2900/3183] Loss: 13.290190
[1/25][3000/3183] Loss: 12.438826
[1/25][3100/3183] Loss: 13.062715
Start val
-------------------------------------------------------------------瀛------------- => 瀛                   , gt: 閽嗗弻鑳烘敞灏勬恫娆т箖褰           
----------------------------------------------------------------绗涔----------浜烘皯鍖婚櫌 => 绗節浜烘皯鍖婚櫌              , gt: 绗竴浜烘皯鍖婚櫌              
涔------------------------------------------------------------------閰------------- => 涔欓吀                  , gt: 瀵逛箼閰版皑鍩洪厷缂撻噴            
涓婃捣娴----甯--------------------------------------------------------姹熷尯鍖----------鍛ㄩ檺鍖婚櫌 => 涓婃捣甯傛睙鍖哄懆闄愬尰闄           , gt: 涓婃捣甯傛澗姹熷尯娉楁尘鍖婚櫌          
------------------------------------------------------------------------------鑽夎嵂- => 鑽夎嵂                  , gt: 鍖栭獙璐                 
涓婃捣娴--甯-----------------------------------------------------------绗鍏---------浜烘皯鍖婚櫌 => 涓婃捣甯傜鍏汉姘戝尰闄           , gt: 涓婃捣甯傜鍏汉姘戝尰闄           
---------------------------------------------------------------涓滄柊鍖哄尯鏂---------鍖荤骇鍖婚櫌 => 涓滄柊鍖烘柟鍖荤骇鍖婚櫌            , gt: 鍖讳簨鏈嶅姟璐逛笁绾у尰闄           
--------------------------------------------------------------------------------- =>                     , gt: 鑵斿唴                  
琛-----------------------------------------------------------------钀----------鎺ㄦ嬁娌荤枟 => 琛钀勬帹鎷挎不鐤              , gt: 棰堟鐥呮帹鎷挎不鐤             
澶嶆棪-------------------------------------------------------------澶уぇ瀛﹂檮闄勫睘灞--------灞卞尰闄 => 澶嶆棪澶у闄勫睘灞卞尰闄           , gt: 澶嶆棪澶у闄勫睘涓北鍖婚櫌          
Test loss: 17.139116, accuray: 0.226641
[2/25][100/3183] Loss: 12.244771
[2/25][200/3183] Loss: 11.012156
[2/25][300/3183] Loss: 11.889796
[2/25][400/3183] Loss: 11.356812
[2/25][500/3183] Loss: 11.319090
[2/25][600/3183] Loss: 11.720028
[2/25][700/3183] Loss: 11.196298
[2/25][800/3183] Loss: 10.103928
[2/25][900/3183] Loss: 10.714120
[2/25][1000/3183] Loss: 10.153719
[2/25][1100/3183] Loss: 11.812366
[2/25][1200/3183] Loss: 12.453034
[2/25][1300/3183] Loss: 7.903906
[2/25][1400/3183] Loss: 9.546275
[2/25][1500/3183] Loss: 8.225714
[2/25][1600/3183] Loss: 9.973562
[2/25][1700/3183] Loss: 8.002295
[2/25][1800/3183] Loss: 8.856401
[2/25][1900/3183] Loss: 7.986436
[2/25][2000/3183] Loss: 8.362708
[2/25][2100/3183] Loss: 7.604479
[2/25][2200/3183] Loss: 8.513789
[2/25][2300/3183] Loss: 8.991019
[2/25][2400/3183] Loss: 7.689240
[2/25][2500/3183] Loss: 7.119688
[2/25][2600/3183] Loss: 8.076567
[2/25][2700/3183] Loss: 8.323510
[2/25][2800/3183] Loss: 7.215753
[2/25][2900/3183] Loss: 7.799768
[2/25][3000/3183] Loss: 7.499628
[2/25][3100/3183] Loss: 6.189898
Start val
-----------------------------------------------------------------鍗崼----------鐢熸潗鏂欒垂 => 鍗敓鏉愭枡璐               , gt: 鍗敓鏉愭枡璐               
闃---------------------------------------------------------------濂囧闇-------------- => 闃垮闇                 , gt: 涓嵂鐓庤嵂璐圭厧鑽満            
------------------------------------------------------------------姘寲-----------灏- => 姘寲灏                 , gt: 姘寲閽犻拡                
------------------------------------------------------------------------------鐧娓呯墖 => 鐧娓呯墖                 , gt: 婀挎瘨娓呯墖                
------------------------------------------------------------------------------鍖栭獙璐 => 鍖栭獙璐                 , gt: 鍖栭獙璐                 
琛----------------------------------------------------------------娓呮竻鎬昏儐--------閰搁叾娴嬪畾 => 琛娓呮昏儐閰搁叾娴嬪畾            , gt: 琛娓呮昏儐鍥洪唶娴嬪畾            
涔-----------------------------------------------------------------澶村瑗挎浛----------- => 涔欏ご瀛㈣タ鏇               , gt: 涔欏ご瀛㈣タ涓侀挔绮夐拡娉曞厠          
涓-----------------------------------------------------------------鑽嵂楗ギ-------鐗囧強鑽潗 => 涓嵂楗墖鍙婅嵂鏉             , gt: 涓嵂楗墖鍙婅嵂鏉             
澶嶆棪鏃----------------------------------------------------------澶уぇ瀛﹀闄勯檮灞炲睘涓腑---灞卞尰闄㈤櫌闄㈠垎闄 => 澶嶆棪澶у闄勫睘涓北鍖婚櫌鍒嗛櫌        , gt: 澶嶆棪澶у闄勫睘涓北鍖婚櫌闈掓郸鍒嗛櫌      
------------------------------------------------------------------------------娉ㄥ皠娑 => 娉ㄥ皠娑                 , gt: 澶忔灟鑽夊彛鏈嶆恫              
Test loss: 12.519956, accuray: 0.318672
[3/25][100/3183] Loss: 7.428660
[3/25][200/3183] Loss: 7.202775
[3/25][300/3183] Loss: 7.009318
[3/25][400/3183] Loss: 7.417906
[3/25][500/3183] Loss: 6.851774
[3/25][600/3183] Loss: 6.744330
[3/25][700/3183] Loss: 6.302882
[3/25][800/3183] Loss: 6.306813
[3/25][900/3183] Loss: 6.322616
[3/25][1000/3183] Loss: 5.097471
[3/25][1100/3183] Loss: 6.283050
[3/25][1200/3183] Loss: 5.875562
[3/25][1300/3183] Loss: 5.808622
[3/25][1400/3183] Loss: 6.673018
[3/25][1500/3183] Loss: 5.380853
[3/25][1600/3183] Loss: 5.605940
[3/25][1700/3183] Loss: 6.273378
[3/25][1800/3183] Loss: 5.914124
[3/25][1900/3183] Loss: 5.941648
[3/25][2000/3183] Loss: 5.577864
[3/25][2100/3183] Loss: 5.055161
[3/25][2200/3183] Loss: 5.138505
[3/25][2300/3183] Loss: 6.077575
[3/25][2400/3183] Loss: 5.813436
[3/25][2500/3183] Loss: 5.216923
[3/25][2600/3183] Loss: 4.696380
[3/25][2700/3183] Loss: 4.874946
[3/25][2800/3183] Loss: 4.509214
[3/25][2900/3183] Loss: 5.000929
[3/25][3000/3183] Loss: 4.715156
[3/25][3100/3183] Loss: 4.670076
Start val
涓婃捣娴--甯----------------------榛勯粍--------------------------------娴︽郸娴﹀尯棣欏北灞变腑涓------鍖-鍖婚櫌 => 涓婃捣甯傞粍娴﹀尯棣欏北涓尰鍖婚櫌        , gt: 涓婃捣甯傞粍娴﹀尯棣欏北涓尰鍖婚櫌        
-------------------------------------------------------------姘隘鍖-------------閽犳敞灏勬恫 => 姘寲閽犳敞灏勬恫              , gt: 姘寲閽犳敞灏勬恫              
------------------------------------------------------------------------------閰祴瀹 => 閰祴瀹                 , gt: 鍙堕吀娴嬪畾                
涓婃捣娴--甯--------------------------------------------------------------鐨毊鑲-------鐥呭尰闄 => 涓婃捣甯傜毊鑲ょ梾鍖婚櫌            , gt: 涓婃捣甯傜毊鑲ょ梾鍖婚櫌            
澶嶆棪鏃---------------------------------------------------------澶уぇ澶у闄勯檮灞----------閲戝北鍖婚櫌 => 澶嶆棪澶у闄勫睘閲戝北鍖婚櫌          , gt: 澶嶆棪澶у闄勫睘閲戝北鍖婚櫌          
涓婃捣娴-------------------------------------------------------------鍚屽悓------------浠佸尰闄 => 涓婃捣鍚屼粊鍖婚櫌              , gt: 涓婃捣鍚屼粊鍖婚櫌              
澶嶆柟----------------------------------------------------------鍏辩------------------- => 澶嶆柟鍏辩                , gt: 澶嶆柟鎵樺悺鍗¤兒婊             
涓婃捣娴-甯傚競----------------------------------------------------------娴︽郸涓滀笢鏂板尯鍖轰腑涓----鍖-鍖婚櫌 => 涓婃捣甯傛郸涓滄柊鍖轰腑鍖诲尰闄         , gt: 涓婃捣甯傛郸涓滄柊鍖轰腑鍖诲尰闄         
涓婃捣--甯傚競--------------------------------------------------------琛--------------涓績-- => 涓婃捣甯傝涓績              , gt: 涓嵂楗墖鍙婅嵂鏉             
涓-----------------------------------------------------------------鑽嵂---------閰嶆柟棰楃矑 => 涓嵂閰嶆柟棰楃矑              , gt: 涓嵂楗墖鍙婅嵂鏉             
Test loss: 10.486181, accuray: 0.444688
[4/25][100/3183] Loss: 4.615477
[4/25][200/3183] Loss: 4.356646
[4/25][300/3183] Loss: 4.340986
[4/25][400/3183] Loss: 4.581572
[4/25][500/3183] Loss: 4.129970
[4/25][600/3183] Loss: 4.017388
[4/25][700/3183] Loss: 3.912269
[4/25][800/3183] Loss: 4.241610
[4/25][900/3183] Loss: 4.324024
[4/25][1000/3183] Loss: 4.095037
[4/25][1100/3183] Loss: 3.769271
[4/25][1200/3183] Loss: 4.055880
[4/25][1300/3183] Loss: 3.484961
[4/25][1400/3183] Loss: 3.639617
[4/25][1500/3183] Loss: 4.443375
[4/25][1600/3183] Loss: 3.431197
[4/25][1700/3183] Loss: 4.194182
[4/25][1800/3183] Loss: 3.613052
[4/25][1900/3183] Loss: 3.779229
[4/25][2000/3183] Loss: 4.271651
[4/25][2100/3183] Loss: 3.732967
[4/25][2200/3183] Loss: 4.306026
[4/25][2300/3183] Loss: 3.734200
[4/25][2400/3183] Loss: 4.042048
[4/25][2500/3183] Loss: 3.265435
[4/25][2600/3183] Loss: 7.407653
[4/25][2700/3183] Loss: 5.202195
[4/25][2800/3183] Loss: 4.953123
[4/25][2900/3183] Loss: 3.617289
[4/25][3000/3183] Loss: 3.352068
[4/25][3100/3183] Loss: 3.546509
Start val
涓婃捣娴-浜ら氶-----------------------------------------------------澶уぇ瀛﹀鍖诲瀛﹂櫌闄勯檮灞炵憺------鍗庡尰闄 => 涓婃捣浜ら氬ぇ瀛﹀尰瀛﹂櫌闄勫睘鐟炲崕鍖婚櫌     , gt: 涓婃捣浜ら氬ぇ瀛﹀尰瀛﹂櫌闄勫睘鏂板崕鍖婚櫌     
鐢-------------------------------------------------------------------------------- => 鐢                   , gt: 鐢茬被                  
澶嶆棪鏃-----------------------------------------------------------澶уぇ瀛﹀闄勯檮灞炲睘-------閲戝北鍖婚櫌 => 澶嶆棪澶у闄勫睘閲戝北鍖婚櫌          , gt: 澶嶆棪澶у闄勫睘閲戝北鍖婚櫌          
涔---------------------------------------------------娉---------灏勫皠灏勭敤鐢ㄥご澶村瀛---------- => 涔欐敞灏勭敤澶村              , gt: 涔欐敞灏勭敤澶村瑗             
鐩---------------------------------------------------------------閰搁吀鍒╁鍗--------鍥犲洜鑳舵祮 => 鐩愰吀鍒╁鍗″洜鑳舵祮            , gt: 鐩愰吀鍒╁鍗″洜鑳舵祮            
鐢---------------------------------------------------------------灏忓皬鍎挎竻鐑姝㈠挸-----鍙e彛鏈嶆恫 => 鐢插皬鍎挎竻鐑鍜冲彛鏈嶆恫          , gt: 鐢插皬鍎挎竻鐑鍜冲彛鏈嶆恫          
鐢-----------------------------------------------------------------闃胯帿瑗挎灄-------鏋楁憾鑳跺泭 => 鐢查樋鑾タ鏋楁灄婧惰兌鍥           , gt: 鐢查樋鑾タ鏋楄兌鍥             
澶嶆棪-----------------------------------------------------------澶уぇ瀛﹀闄勯檮灞炲睘灞-鍗-----鍗庡北鍖婚櫌 => 澶嶆棪澶у闄勫睘鍗佸崕灞卞尰闄         , gt: 澶嶆棪澶у闄勫睘鍗庡北鍖婚櫌          
----------------------------------------------------------------鍎垮効--------绔ュ尰鍖婚櫌闄㈡櫘闄 => 鍎跨鍖婚櫌鏅檧              , gt: 鍎跨鍖婚櫌鏅檧              
---------------------------------------------------------鐤忛------------------琛娓呰兌鍥 => 鐤忛琛娓呰兌鍥              , gt: 榛勮繛涓婃竻鑳跺泭              
Test loss: 6.322558, accuray: 0.541328
[5/25][100/3183] Loss: 3.490037
[5/25][200/3183] Loss: 3.078195
[5/25][300/3183] Loss: 3.525270
[5/25][400/3183] Loss: 2.678599
[5/25][500/3183] Loss: 2.839092
[5/25][600/3183] Loss: 2.913834
[5/25][700/3183] Loss: 2.901745
[5/25][800/3183] Loss: 2.683181
[5/25][900/3183] Loss: 3.632576
[5/25][1000/3183] Loss: 3.175895
[5/25][1100/3183] Loss: 3.025608
[5/25][1200/3183] Loss: 2.589254
[5/25][1300/3183] Loss: 3.271782
[5/25][1400/3183] Loss: 3.338230
[5/25][1500/3183] Loss: 2.997161
[5/25][1600/3183] Loss: 3.237594
[5/25][1700/3183] Loss: 2.816560
[5/25][1800/3183] Loss: 2.218376
[5/25][1900/3183] Loss: 2.567822
[5/25][2000/3183] Loss: 2.607018
[5/25][2100/3183] Loss: 2.642059
[5/25][2200/3183] Loss: 3.179220
[5/25][2300/3183] Loss: 2.863872
[5/25][2400/3183] Loss: 3.295633
[5/25][2500/3183] Loss: 2.636028
[5/25][2600/3183] Loss: 2.995386
[5/25][2700/3183] Loss: 3.340012
[5/25][2800/3183] Loss: 3.232339
[5/25][2900/3183] Loss: 2.892770
[5/25][3000/3183] Loss: 2.638034
[5/25][3100/3183] Loss: 2.828083
Start val
涔----------------------------------------------------------------------------鍔涘姏-- => 涔欏姏                  , gt: 涔欏己鍔涙瀲鏉烽湶              
涓----------------------------------------------------------------鑽嵂楗ギ--------鐗囧強鑽潗 => 涓嵂楗墖鍙婅嵂鏉             , gt: 涓嵂楗墖鍙婅嵂鏉             
涓婃捣娴-甯----------------------------------------------------------濂夎搐鍖轰腑涓---------蹇-鍖婚櫌 => 涓婃捣甯傚璐ゅ尯涓績鍖婚櫌          , gt: 涓婃捣甯傚璐ゅ尯涓績鍖婚櫌          
涓-----------------------------------------------------------------鑽嵂楗--------鐗囧強鑽潗 => 涓嵂楗墖鍙婅嵂鏉             , gt: 涓嵂楗墖鍙婅嵂鏉             
涓---------------------------------------------------------------琛屽尯-----------鍖昏嵂鑽- => 涓鍖哄尰鑽               , gt: 涓嵂楗墖鍙婅嵂鏉             
---------------------------------------------------------------鍙嶅弽搴旇泲铔嬬櫧鐧-----杩涜繘鍙h瘯璇曞墏 => 鍙嶅簲铔嬬櫧杩涘彛璇曞墏            , gt: 鍙嶅簲铔嬬櫧杩涘彛璇曞墏            
鍏-----------------------------------------------------------------------------闂ㄦ敹璐 => 鍏堕棬鏀惰垂                , gt: 鍏朵粬闂ㄨ瘖鏀惰垂              
涓婃捣娴--甯-------------------------------------------------------------鐪肩溂鐪-----鐥呯梾闃叉不涓績 => 涓婃捣甯傜溂鐥呴槻娌讳腑蹇           , gt: 涓婃捣甯傜溂鐥呴槻娌讳腑蹇           
-------------------------------------------------------------------------------瀛- => 瀛                   , gt: 鑳剁墖璐                 
-----------------------------------------------------------------------------鍚不娴嬪畾 => 鍚不娴嬪畾                , gt: 娣绮夐叾娴嬪畾               
Test loss: 11.405902, accuray: 0.457422
[6/25][100/3183] Loss: 2.762812
[6/25][200/3183] Loss: 2.911571
[6/25][300/3183] Loss: 2.617536
[6/25][400/3183] Loss: 2.490563
[6/25][500/3183] Loss: 2.430371
[6/25][600/3183] Loss: 2.440724
[6/25][700/3183] Loss: 2.188026
[6/25][800/3183] Loss: 2.617560
[6/25][900/3183] Loss: 2.483878
[6/25][1000/3183] Loss: 2.241127
[6/25][1100/3183] Loss: 2.506879
[6/25][1200/3183] Loss: 2.323107
[6/25][1300/3183] Loss: 2.600743
[6/25][1400/3183] Loss: 2.508388
[6/25][1500/3183] Loss: 1.949326
[6/25][1600/3183] Loss: 2.624498
[6/25][1700/3183] Loss: 2.427222
[6/25][1800/3183] Loss: 2.441707
[6/25][1900/3183] Loss: 2.225923
[6/25][2000/3183] Loss: 2.314634
[6/25][2100/3183] Loss: 2.664900
[6/25][2200/3183] Loss: 2.465270
[6/25][2300/3183] Loss: 2.580495
[6/25][2400/3183] Loss: 2.351754
[6/25][2500/3183] Loss: 2.235313
[6/25][2600/3183] Loss: 2.194835
[6/25][2700/3183] Loss: 2.215188
[6/25][2800/3183] Loss: 2.186677
[6/25][2900/3183] Loss: 2.248293
[6/25][3000/3183] Loss: 2.337588
[6/25][3100/3183] Loss: 2.124885
Start val
涓婃捣娴-浜ら------------------------------------------------------澶у瀛﹀尰瀛﹀闄㈤檮灞炲睘鏂----鍖--闄㈤櫌闄 => 涓婃捣浜ら氬ぇ瀛﹀尰瀛﹂櫌闄勫睘鏂板尰闄      , gt: 涓婃捣浜ら氬ぇ瀛﹀尰瀛﹂櫌闄勫睘鏂板崕鍖婚櫌     
涓-----------------------------------------------------------------------------鎴愯嵂璐 => 涓垚鑽垂                , gt: 涓垚鑽垂                
涓婃捣娴-浜ら氶----------------------------------------------------澶у瀛﹀尰瀛﹀闄㈤檮闄勫睘涓婁笂娴-鍎跨鍖-瀛﹀涓績 => 涓婃捣浜ら氬ぇ瀛﹀尰瀛﹂櫌闄勫睘涓婃捣鍎跨鍖诲涓績 , gt: 涓婃捣浜ら氬ぇ瀛﹀尰瀛﹂櫌闄勫睘涓婃捣鍎跨鍖诲涓績 
-------------------------------------------------------------------涓浜轰汉----姘戝尰闄㈠疂灞卞垎闄 => 涓浜烘皯鍖婚櫌瀹濆北鍒嗛櫌           , gt: 涓浜烘皯鍖婚櫌瀹濆北鍒嗛櫌           
涓婃捣娴-甯-----------------------------------------------------------闂甸椀琛屽尯涓腑涓------蹇冨績鍖婚櫌 => 涓婃捣甯傞椀琛屽尯涓績鍖婚櫌          , gt: 涓婃捣甯傞椀琛屽尯涓績鍖婚櫌          
鐢----------------------------------------------------------------钁¤憽钀勭硸绯-------娉ㄦ敞灏勬恫 => 鐢茶憽钀勭硸娉ㄥ皠娑             , gt: 鐢茶憽钀勭硸娉ㄥ皠娑             
-----------------------------------------------------------------------------浜у墠妫鏌 => 浜у墠妫鏌                , gt: 浜у墠妫鏌                
鐗圭壒----------------------------寮傚紓浜轰汉浜轰汉--------------------------缁掔粧姣涙瘺鑶滆啘淇冩ц吅---------- => 鐗瑰紓浜虹粧姣涜啘淇冩ц吅           , gt: 鐗瑰紓浜虹粧姣涜啘淇冩ц吅婵          
闂-------------------------------------------------------------琛屽尯鍖------------鍚存尘鍖婚櫌 => 闂佃鍖哄惔娉惧尰闄             , gt: 闂佃鍖哄惔娉惧尰闄             
------------------------------------------------------------------------------鑽搧璐 => 鑽搧璐                 , gt: 瑗胯嵂璐                 
Test loss: 6.212533, accuray: 0.552656
[7/25][100/3183] Loss: 1.896407
[7/25][200/3183] Loss: 2.083073
[7/25][300/3183] Loss: 2.515302
[7/25][400/3183] Loss: 2.145261
[7/25][500/3183] Loss: 2.186478
[7/25][600/3183] Loss: 2.070558
[7/25][700/3183] Loss: 1.993173
[7/25][800/3183] Loss: 1.928573
[7/25][900/3183] Loss: 1.877471
[7/25][1000/3183] Loss: 1.802724
[7/25][1100/3183] Loss: 1.980011
[7/25][1200/3183] Loss: 2.286077
[7/25][1300/3183] Loss: 2.047954
[7/25][1400/3183] Loss: 2.025887
[7/25][1500/3183] Loss: 2.666587
[7/25][1600/3183] Loss: 1.794558
[7/25][1700/3183] Loss: 2.216647
[7/25][1800/3183] Loss: 2.252706
[7/25][1900/3183] Loss: 2.090238
[7/25][2000/3183] Loss: 1.917650
[7/25][2100/3183] Loss: 2.274139
[7/25][2200/3183] Loss: 1.898832
[7/25][2300/3183] Loss: 1.878344
[7/25][2400/3183] Loss: 2.132056
[7/25][2500/3183] Loss: 1.818222
[7/25][2600/3183] Loss: 1.829502
[7/25][2700/3183] Loss: 2.073719
[7/25][2800/3183] Loss: 2.003965
[7/25][2900/3183] Loss: 1.771917
[7/25][3000/3183] Loss: 1.935509
[7/25][3100/3183] Loss: 1.735716
Start val
鍙h厰--------------------------------------------------------------灞灞閮ㄩ儴---------鍐叉礂涓婅嵂 => 鍙h厰灞閮ㄥ啿娲椾笂鑽            , gt: 鍙h厰灞閮ㄥ啿娲椾笂鑽            
------------------------------------------------------------棰楅棰楅------------绮掔矑绮掕敆绯 => 棰楃矑钄楃硸                , gt: 閾堕粍棰楃矑鏃犵硸              
---------------------------------------------------------------绗崄------------浜烘皯鍖婚櫌 => 绗崄浜烘皯鍖婚櫌              , gt: 绗崄浜烘皯鍖婚櫌              
涔----------------------------------------------------------------鍗佸懗鍛抽緳--------浠栬姳棰楃矑 => 涔欏崄鍛抽緳浠栬姳棰楃矑            , gt: 涔欏崄鍛抽緳鑳嗚姳棰楃矑            
鐢-----------------------------------------------------------------------------姊呰姳閽 => 鐢叉鑺遍拡                , gt: 鐢叉鑺遍拡                
涓婃捣娴--甯------------------------------------------------------------瀹濆北鍖哄尯-------澶у満鍖婚櫌 => 涓婃捣甯傚疂灞卞尯澶у満鍖婚櫌          , gt: 涓婃捣甯傚疂灞卞尯澶у満鍖婚櫌          
涓腑---------------------------------------------------------------鑽ギ----------鐗囧強鑽潗 => 涓嵂楗墖鍙婅嵂鏉             , gt: 涓嵂楗墖鍙婅嵂鏉             
閲-----------------------------------------------------------------------------杩為绮 => 閲戣繛棰楃矑                , gt: 閲戦摱鑺遍绮               
涔------------------------------------------------------------------闈----------鐥涚墖鐗 => 涔欓潚鐥涚墖                , gt: 涔欒濉為氱墖               
澶嶆棪鏃------------------------------------------------------------澶уぇ瀛﹂檮闄勫睘--------閲戝北鍖婚櫌 => 澶嶆棪澶у闄勫睘閲戝北鍖婚櫌          , gt: 澶嶆棪澶у闄勫睘閲戝北鍖婚櫌          
Test loss: 5.872131, accuray: 0.527109
[8/25][100/3183] Loss: 2.138914
[8/25][200/3183] Loss: 2.130441
[8/25][300/3183] Loss: 1.943353
[8/25][400/3183] Loss: 1.683992
[8/25][500/3183] Loss: 1.650922
[8/25][600/3183] Loss: 1.504823
[8/25][700/3183] Loss: 1.608612
[8/25][800/3183] Loss: 1.903825
[8/25][900/3183] Loss: 1.468518
[8/25][1000/3183] Loss: 2.179544
[8/25][1100/3183] Loss: 1.693508
[8/25][1200/3183] Loss: 1.570979
[8/25][1300/3183] Loss: 1.620996
[8/25][1400/3183] Loss: 1.691496
[8/25][1500/3183] Loss: 1.674799
[8/25][1600/3183] Loss: 1.830665
[8/25][1700/3183] Loss: 2.090158
[8/25][1800/3183] Loss: 2.155529
[8/25][1900/3183] Loss: 1.966394
[8/25][2000/3183] Loss: 1.753778
[8/25][2100/3183] Loss: 2.298392
[8/25][2200/3183] Loss: 10.281125
[8/25][2300/3183] Loss: 2.610883
[8/25][2400/3183] Loss: 2.206998
[8/25][2500/3183] Loss: 1.783428
[8/25][2600/3183] Loss: 2.023059
[8/25][2700/3183] Loss: 1.977133
[8/25][2800/3183] Loss: 1.743329
[8/25][2900/3183] Loss: 1.698944
[8/25][3000/3183] Loss: 1.500533
[8/25][3100/3183] Loss: 1.665660
Start val
----------------------------------------------------------------绗竴-----------浜烘皯鍖婚櫌 => 绗竴浜烘皯鍖婚櫌              , gt: 绗竴浜烘皯鍖婚櫌              
澶嶆柟鏂------------------------------------------------------------鐩愮洂閰镐吉浼夯--------楹荤储纰- => 澶嶆柟鐩愰吀浼夯楹荤储纰           , gt: 澶嶆柟鐩愰吀浼夯榛勭⒈            
澶嶆棪--------------------------------------------------------------澶уぇ瀛﹂檮闄勫睘灞------閲戝北鍖婚櫌 => 澶嶆棪澶у闄勫睘閲戝北鍖婚櫌          , gt: 澶嶆棪澶у闄勫睘閲戝北鍖婚櫌          
涓婃捣娴--涓-------------------------------------------------------鍖昏嵂鑽ぇ瀛﹀闄勯檮灞炲睘------鏇欏厜鍖婚櫌 => 涓婃捣涓尰鑽ぇ瀛﹂檮灞炴洐鍏夊尰闄       , gt: 涓婃捣涓尰鑽ぇ瀛﹂檮灞炴洐鍏夊尰闄       
涓婃捣娴--甯------------------------------------------------------------绗浜--------浜烘皯鍖婚櫌 => 涓婃捣甯傜浜斾汉姘戝尰闄           , gt: 涓婃捣甯傜浜斾汉姘戝尰闄           
澶嶆棪鏃-------------------------------------------------------澶уぇ澶у闄勯檮灞炲睘鐪艰宠抽蓟榧婚蓟榧诲枆绉戝尰闄㈤櫌闄㈡婚櫌 => 澶嶆棪澶у闄勫睘鐪艰抽蓟鍠夌鍖婚櫌鎬婚櫌     , gt: 澶嶆棪澶у闄勫睘鐪艰抽蓟鍠夌鍖婚櫌鎬婚櫌     
鐢----------------------------------------濞佸▉------------------楂樿秴瓒呬綆瀵嗗瘑搴﹁仛涔欑儻鐑-----杈撹緭-- => 鐢插▉楂樿秴浣庡瘑搴﹁仛涔欑儻杈         , gt: 鐢插▉楂樿秴浣庡瘑搴﹁仛涔欑儻杈         
------------------------------------------------------------------------------鑽嵂璐 => 鑽垂                  , gt: 瑗胯嵂璐                 
------------------------------------------------------------------------------鏃犺嚜浠 => 鏃犺嚜浠                 , gt: 鏃犺嚜浠                 
鐩-----------------------------------------------------------閰搁吀宸︽哀姘熸矙娌----------鏄熸槦婊存淮 => 鐩愰吀宸︽哀姘熸矙鏄熸淮            , gt: 鐩愰吀宸︽哀姘熸矙鏄熸淮            
Test loss: 4.431122, accuray: 0.625391
[9/25][100/3183] Loss: 1.708904
[9/25][200/3183] Loss: 1.514301
[9/25][300/3183] Loss: 1.461609
[9/25][400/3183] Loss: 1.255323
[9/25][500/3183] Loss: 1.501552
[9/25][600/3183] Loss: nan
[9/25][700/3183] Loss: nan
[9/25][800/3183] Loss: nan
[9/25][900/3183] Loss: nan
@freesouls

This comment has been minimized.

Copy link

freesouls commented Nov 28, 2018

similar situations in my own project:

    1. training from scratch using CTCLoss inside Pytorch results in nan after several hundreds of batches
    1. finetuning on a model(which is trained using warpctc) by using CTCLoss inside Pytorch, the loss get larger and larger slowly

@WenmuZhou WenmuZhou changed the title ctc loss get nan after some iters in pytorch 1.0.0.dev20181115 ctc loss get nan after some ~~iters~~ epochs in pytorch 1.0.0.dev20181115 Nov 28, 2018

@WenmuZhou WenmuZhou changed the title ctc loss get nan after some ~~iters~~ epochs in pytorch 1.0.0.dev20181115 ctc loss get nan after some epochs in pytorch 1.0.0.dev20181115 Nov 28, 2018

@t-vi

This comment has been minimized.

Copy link
Contributor

t-vi commented Nov 28, 2018

Again, if you can please capture the last inputs to the loss function before the loss get's Inf/NaNs, I would most appreciate it.
There is a forum thread that shows some ways to debug where your network goes wrong with CTC Loss.

@soumith

This comment has been minimized.

Copy link
Member

soumith commented Nov 28, 2018

I think you should move the discussion to the forums, because it's not at all obvious that this is a bug, it's more likely numerical instability due to hyperparams or incorrrect usage.

Go ahead and read https://discuss.pytorch.org/t/ctcloss-performance-of-pytorch-1-0-0/ and if it's not suitable to add your concerns, open a new topic on https://discuss.pytorch.org/

@soumith soumith closed this Nov 28, 2018

@northeastsquare

This comment has been minimized.

Copy link

northeastsquare commented Dec 8, 2018

@t-vi I also met this problem

@gaochangw

This comment has been minimized.

Copy link

gaochangw commented Dec 11, 2018

I experience the same problem when training an RNN. Getting nan after a few epochs if I replace the CTCLoss in warpctc_pytorch with the CTCLoss in torch.nn.

  • PyTorch Version (e.g., 1.0): 1.0.0.dev20181115
  • OS (e.g., Linux): Ubuntu 18.04
  • How you installed PyTorch (conda, pip, source): conda
  • Build command you used (if compiling from source):
  • Python version: 3.6.6
  • CUDA/cuDNN version: 10/7.4.1
  • GPU models and configuration: GTX 1080
  • Any other relevant information:

A way to fix this to pass log_prob.cpu() to the loss function instead of log_prob, but results in 2x lower speed

@freesouls

This comment has been minimized.

Copy link

freesouls commented Dec 14, 2018

@t-vi @soumith

I have found some interesting things which may cause the nan problem. When the inputs are the same, the loss/cost for each sample are also the same, while the gradients of torch.nn.CTCLoss and warpctc are not the same. For example, you can reproduce the results by running the following scripts, modify this scripts to use warpctc, should change two places, 1) import warpctc loss 2)remove log_probs = probs.log_softmax(2) ),

#!/usr/bin/env python
# encoding: utf-8

import torch

def test_pytorch_ctc(blank_label = 0, reduce = False):
    if reduce:
        ctc_loss = torch.nn.CTCLoss(blank = blank_label, reduction = 'sum')
    else:
        ctc_loss = torch.nn.CTCLoss(blank = blank_label, reduction = 'none')

    print("BLANK LABEL", blank_label)

    probs = torch.FloatTensor([
        [[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]],
        [[0.6, 0.1, 0.1, 0.1, 0.1], [0.1, 0.1, 0.5, 0.2, 0.1]],
        [[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]],
        [[0.6, 0.1, 0.1, 0.1, 0.1], [0.1, 0.1, 0.5, 0.2, 0.1]],
    ])
    probs.requires_grad = True
    # (T=4, N=2, C=5)
    print("PROBS SIZE", probs.size())
    labels = torch.IntTensor([1, 2, 1, 2])
    label_sizes = torch.IntTensor([2, 2])
    seqs = torch.IntTensor([4, 2])
    log_probs = probs.log_softmax(2)
    cost_cpu = ctc_loss(log_probs, labels, seqs, label_sizes)
    if reduce == False:
        cost_sum = cost_cpu.sum()
        cost_sum.backward()
        cpu_cost = cost_sum.item()
        cpu_grads = probs.grad
    else:
        cost_cpu.backward()
        cpu_cost = cost_cpu.item()
        cpu_grads = probs.grad

    gpu_probs = probs.cuda()
    gpu_log_probs = gpu_probs.log_softmax(2)
    cost_gpu = ctc_loss(gpu_log_probs, labels, seqs, label_sizes)
    if reduce == False:
        cost_sum = cost_gpu.sum()
        cost_sum.backward()
        gpu_cost = cost_sum.item()
        gpu_grads = probs.grad
    else:
        cost_gpu.backward()
        gpu_cost = cost_gpu.item()
        gpu_grads = probs.grad

    print("CPU COST", cpu_cost)
    print("GPU COST", gpu_cost)
    print("COST CPU", cost_cpu)
    print("COST GPU", cost_gpu)
    print("CPU GRAD", cpu_grads)
    print("GPU GRAD", gpu_grads)


if __name__ == "__main__":
    test_pytorch_ctc()
    test_pytorch_ctc(reduce = False)

the gradients of torch.nn.CTCLoss is

tensor([[[-0.1209, -0.9413,  0.3541,  0.3541,  0.3541],
         [ 0.1770, -0.8230,  0.2919,  0.1770,  0.1770]],

        [[-0.1650, -0.4774, -0.0658,  0.3541,  0.3541],
         [ 0.1787,  0.1787, -0.7335,  0.1975,  0.1787]],

        [[-0.1250,  0.1639, -0.7470,  0.3541,  0.3541],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],

        [[-0.2627,  0.3541, -0.7995,  0.3541,  0.3541],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]

while the gradients for warpctc is

tensor([[[-0.1209, -0.9413,  0.3541,  0.3541,  0.3541],
         [ 0.3541, -1.6459,  0.5838,  0.3541,  0.3541]],

        [[-0.1650, -0.4774, -0.0658,  0.3541,  0.3541],
         [ 0.3573,  0.3573, -1.4669,  0.3949,  0.3573]],

        [[-0.1250,  0.1639, -0.7470,  0.3541,  0.3541],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],

        [[-0.2627,  0.3541, -0.7995,  0.3541,  0.3541],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]

the probs and grads size is (T, N, C)=(4, 2, 5)
we can see that the gradients of torch.nn.CTCLoss for the second sample(N=1) is half of warpctc,
we can change seqs = torch.IntTensor([4, 2]) in the following scripts, for example, seqs = torch.IntTensor([2, 2]), seqs = torch.IntTensor([4, 4]), only when the seqs are [4, 4] which means all the input_lengths must be T( and I found this statement in pytorch docs ), torch.nn.CTCLoss and warctc will have the same gradients, otherwise for those samples' input_lengths are smaller than T, the gradients of torch.nn.CTCLoss will be half of warpctc

@t-vi

This comment has been minimized.

Copy link
Contributor

t-vi commented Dec 14, 2018

Thank you for your analysis!
but if insert the following at the bottom of your test_pytorch_ctc function, I indeed get an error for seqs = torch.tensor([4, 2]) but not for seqs = torch.tensor([4, 4]).

torch.autograd.gradcheck(lambda x: ctc_loss(x.log_softmax(2), labels, seqs, label_sizes), gpu_log_probs.double(), atol=1e-4)

That could point to a bug in PyTorch's CTC loss, indeed, but what's strange is that the same line (with log_probs instead of gpu_log_probs seems to succeed and when I print the difference between the GPU and CPU grads it is 0).

I'm not sure I understand sure I understand all of it (and I didn't immediately get the warpctc variant to run),

I don't quite understand your comment about why this causes NaNs. My working hypothesis on that is that NaNs are seen for loss infinity inputs only. So far, @jinserk (thanks!) was the only one who provided specific inputs where the gradients are NaN (see the forum thread) and that fell into this case.

@jinserk

This comment has been minimized.

Copy link

jinserk commented Dec 14, 2018

@freesouls, the reason why your code makes the different gradients might be originated from this. warpctc doesn't count grad_outputs in its backward but this is not the right way, which means that you cannot compare both ctc losses using their gradients.

@t-vi

This comment has been minimized.

Copy link
Contributor

t-vi commented Dec 15, 2018

My hypotheses here are

  • CuDNN-based implementation works as expected,
  • CPU implementation appears to work,
  • There seems to be a bug in the native GPU forward for different input lengths.

Would these match your observations?

@WenmuZhou

This comment has been minimized.

Copy link

WenmuZhou commented Dec 17, 2018

yes

  • CPU implementation work
  • add .to(torch.float64) after ``log_softmax' work
  • CuDNN-based implementation, I don't known how to use it
@gaochangw

This comment has been minimized.

Copy link

gaochangw commented Dec 17, 2018

yes

  • CPU implementation work
  • add .to(torch.float64) after ``log_softmax' work
  • CuDNN-based implementation, I don't known how to use it

Actually I still encountered 'NaN' even if I use torch.float64 on log_softmax. It just took longer than that when torch.float64 was not used.

@t-vi

This comment has been minimized.

Copy link
Contributor

t-vi commented Dec 17, 2018

@gaochangw Can you please extract a set of inputs producing NaN? The currently known cases are (legitimately, even if not helpful) infinite losses.
@WenmuZhou CUDNN will be automatically selected unless disabled when it can handle the input. You can see whether it has been used by inspecting loss.grad_fn (which points to the backward that has either "native" or "cudnn" in its name).

t-vi added a commit to t-vi/pytorch that referenced this issue Jan 7, 2019

facebook-github-bot added a commit that referenced this issue Jan 9, 2019

Fix cuda native loss_ctc for varying input length (#15798)
Summary:
Thank you, freesouls, for the reproducing example!

This is strictly fixing the bug in gradients for varying length inputs discussed in the middle-to-bottom of the bug report. I'll have a feature patch regarding inf losses -> NaN grads separately.

Fixes: #14401
Pull Request resolved: #15798

Differential Revision: D13605739

Pulled By: soumith

fbshipit-source-id: 167ff42399c7e4cdfbd88d59bac5d25b57c0363f

soumith added a commit that referenced this issue Jan 17, 2019

Fix cuda native loss_ctc for varying input length (#15798)
Summary:
Thank you, freesouls, for the reproducing example!

This is strictly fixing the bug in gradients for varying length inputs discussed in the middle-to-bottom of the bug report. I'll have a feature patch regarding inf losses -> NaN grads separately.

Fixes: #14401
Pull Request resolved: #15798

Differential Revision: D13605739

Pulled By: soumith

fbshipit-source-id: 167ff42399c7e4cdfbd88d59bac5d25b57c0363f

soumith added a commit that referenced this issue Jan 17, 2019

Fix cuda native loss_ctc for varying input length (#15798)
Summary:
Thank you, freesouls, for the reproducing example!

This is strictly fixing the bug in gradients for varying length inputs discussed in the middle-to-bottom of the bug report. I'll have a feature patch regarding inf losses -> NaN grads separately.

Fixes: #14401
Pull Request resolved: #15798

Differential Revision: D13605739

Pulled By: soumith

fbshipit-source-id: 167ff42399c7e4cdfbd88d59bac5d25b57c0363f

soumith added a commit that referenced this issue Jan 18, 2019

Fix cuda native loss_ctc for varying input length (#15798)
Summary:
Thank you, freesouls, for the reproducing example!

This is strictly fixing the bug in gradients for varying length inputs discussed in the middle-to-bottom of the bug report. I'll have a feature patch regarding inf losses -> NaN grads separately.

Fixes: #14401
Pull Request resolved: #15798

Differential Revision: D13605739

Pulled By: soumith

fbshipit-source-id: 167ff42399c7e4cdfbd88d59bac5d25b57c0363f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment