From 7b3ef16a435831012c27bcd46378a3e60d80c954 Mon Sep 17 00:00:00 2001 From: Danil Kolesnikov Date: Thu, 27 Apr 2023 16:02:51 -0700 Subject: [PATCH] chore: lfg --- README.md | 13 ++- api/__pycache__/evaluator_app.cpython-38.pyc | Bin 0 -> 12521 bytes api/__pycache__/text_utils.cpython-38.pyc | Bin 0 -> 6582 bytes nextjs/package.json | 2 +- nextjs/utils/variables.ts | 3 +- pages/index.tsx | 82 ------------------- 6 files changed, 12 insertions(+), 88 deletions(-) create mode 100644 api/__pycache__/evaluator_app.cpython-38.pyc create mode 100644 api/__pycache__/text_utils.cpython-38.pyc delete mode 100644 pages/index.tsx diff --git a/README.md b/README.md index 3d5a5bca7..3f731633c 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ The app can be used in two ways: ![image](https://user-images.githubusercontent.com/122662504/234627824-2304f741-9f7b-4252-bdb4-ef2bdfd8139a.png) -- `Playground`: Input a set of documents that you want to ask questions about. Optionally, also include your own test set of question-answer pairs related to the documents; see an example [here](https://github.com/langchain-ai/auto-evaluator/tree/main/api/docs/karpathy-lex-pod). If you do not supply a test set, the app will auto-generate one. If the test set is smaller than the desired number of eval questions specified in the top left, the app will auto-generate the remainder. +- `Playground`: Input a set of documents that you want to ask questions about. Optionally, also include your own test set of question-answer pairs related to the documents; see an example [here](https://github.com/langchain-ai/auto-evaluator/tree/main/api/docs/karpathy-lex-pod). If you do not supply a test set, the app will auto-generate one. If the test set is smaller than the desired number of eval questions specified in the top left, the app will auto-generate the remainder. ![image](https://user-images.githubusercontent.com/122662504/234629201-4c17b411-f910-476b-9bf6-1246c7c5a307.png) @@ -41,7 +41,7 @@ The app can be used in two ways: - For each question, we use a `RetrievalQA` chain to answer it. - This will fetch chunks that are relevant to the question from the `retriever` and pass them to the LLM. -- We expose the `QA_CHAIN_PROMPT` used for to pass this context to the LLM [here](https://github.com/langchain-ai/auto-evaluator/blob/main/api/text_utils.py). +- We expose the `QA_CHAIN_PROMPT` used for to pass this context to the LLM [here](https://github.com/langchain-ai/auto-evaluator/blob/main/api/text_utils.py). `Model-graded evaluation`: @@ -52,7 +52,7 @@ The app can be used in two ways: (2) The app will evaluate the `similarity of the LLM generated answer` relative to ground truth answer. - The prompts for both can be seen [here](https://github.com/dankolesnikov/evaluator-app/blob/main/api/text_utils.py) -- Users can select which grading prompt to use. [Here](https://rlancemartin.notion.site/Auto-Evaluator-Opportunities-7b3459dc2ae34440ae3481fe6f43ba40) are some notes in prompt selection from our experience. +- Users can select which grading prompt to use. [Here](https://rlancemartin.notion.site/Auto-Evaluator-Opportunities-7b3459dc2ae34440ae3481fe6f43ba40) are some notes in prompt selection from our experience. `Experimental results`: @@ -109,10 +109,15 @@ Test the `api` locally: curl -X POST -F "files=@Docs/0333_text.txt" -F "num_eval_questions=1" -F "chunk_chars=1000" -F "overlap=100" -F "split_method=RecursiveTextSplitter" -F "retriever_type=similarity-search" -F "embeddings=OpenAI" -F "model_version=gpt-3.5-turbo" -F "grade_prompt=Fast" -F "num_neighbors=3" http://localhost:8000/evaluator-stream ``` -Run the frontend from `nextjs` folder and view web app at specified URL (e.g., `http://localhost:3001/`): +Run the frontend from `nextjs` folder and view web app at specified URL (e.g., `http://localhost:3000/`): `yarn dev` +### Environment Variables + +`EVALUATOR_API_URL=http://127.0.0.1:8000` - used by frontend. +`OPENAI_API_KEY=` - used by backend. + ## Deployment Doppler auth: diff --git a/api/__pycache__/evaluator_app.cpython-38.pyc b/api/__pycache__/evaluator_app.cpython-38.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c910c93498ac0095d13bdb158caf090d52d90704 GIT binary patch literal 12521 zcmcIq+jHF3c?YoA`-Mx2H(d?Uwyd>HQoh@??1&acTefZ%DceEfuAn_2xg-|5I>1`i z7Mp2Hcalu&H0iXRHshw$Os5mgWcr-;rJc!B`zPR`Grc_2FYQx09XF}^`#u26r9?St zI^_}v2j_nIF2Cr7d<@l)cSnF}~@zx3RMC%FjiPlN; zB&Ux#wbqm7lN>+bJk@&Ie469q&TMPWoNLXS^Q}|nDNdVko@qU6KHGZEe2%{-o#$IG zm@l+mG+%7JWWLmT*?hV6iup=w!CYv)YQBmxQ)2oP#XKzzibHt6CJu`uc)u==ieq?x zMLZ%5yw8Z^Vg~O;@u+wV?>EHb;soAj?Z>wjb4ffQPKw%k$~>3*z<9isCEc%qMB|eNkvDN^P8v%+g&`5uhDlG-?Uv@ zT7I+bBB$vhv9$8u#keV(dM>`Zcn*yw*_GAh^GnOa1rD712;*95cV)xAYI~iw>)EJR zIeYo)%EhJ-j(y9Lc1;hn%a-S3jY567>DXc6TE}Tyg5&yfTeeVX;;rubI)=JzHSEE- zJ!Dj)K@X*+!t$F}R+p`&=bd+j{c%{rT3?a2z)H4dSXr^j<*f+VB4M*R+y0~<%zI6HQO8v^!%NMV#h6kdg#Y-#i zpSzkQPCihszPz}y8cy$1ILbM;U(Orn7gy?M-#T~po$%0p`IoPpyR>*d9C>#Ur54X$ zio0*5h?wHPBHlSXXYqJ{iy+W;AinOXs^Y5wn5UoewVjmEf>fYRD|fWM9w>p1pE{-d zwsuhIr+s}V?Pr80QlFw;KeMj%vqJCZf}H$O??{l@$p&ep=Y#A{F33~MBHhouuiSiF zSN!}|LH2i+*Xdyi*CXcT|kD5EQ^_+1|eu4E)6eYa2Ja zwnuae#`A93(&$)CX{^b1%V@ftF1QbSYjBk_9c*`t6BeRmFzI?k__jeDV|cdDb%)D0 zHoETC0$3L>!)x|z;%yr=9I!?A0{P^U$qT&!J~iB`fax!qR1y;esZ>6sPJ zx1QTJ>N+ilo#9=)15+8d<#=7@F6YI5OdiOlRxt4?bX=D7?-?<(>I z)cWG|{58*(-n_8frZa!GO`>m}O0m=(idmiJ?pGbto$K5V%L!rBsmfz$(4#=f7qpyO zRjXQ6)iqt!RZT76ms2YUtLii!`4Y-C(o{PB%i>Mn%!W&d1bV`T0hn(~lLpvP4T#CK ze%hTxs*cnbeNCh}J@aX*pAOPHsX!qn3^K$D7O;rIM8d)=qG@bqJ0ezZm>Ov#cB$SY zVK7n)By=%Z==ja=0PoULNCJwkJq#>%Ih6}_PMzi zW+An1w0mj#d_IzBnpxmtb=PXyl335oTM6EgkK!9!lkP~8T_mhzhLW@}&9xxpAjC)v z3@v1?g+Y^a`x??cvKEwV4y#DKn*;aCWu!Eyo#d~K2&vB$9`6eT;J{dYf>()e5Bci0 z@@>Vf5bxsqq_5F;01E+M-{@8S)DBeYw@9^S-dEf?D8)3Cq8lkiDAu&di0r2t_Jd1O zeJcA^#nnDgg3M+XvLJ``&$=I@)jV4L13x!tGwa?!Y5}P~^z)Po>5z3VA+3nC|M;e| zrizj%f07H7B~te}zYyqn=l$YlNmPQo7{PuQM73Y^N!1onzvPc>R)btn3JPKrawi*P zvGZdVtd2<9@aBWN7>F9!U&5QV9Hxbcl%f3g=hIjf531Jv+Z{-r!M8!*!|mA#c#TUA zz8@$_%Eh&t>l-)P(pxa_c>_WWqGFI}thHs*mE{cgLc{^#M07ly{AGe@Fl4UKo4k(2 zA>lJfk3>)=!mVa;2-tox>NQ(U$C6F|_N-@Hva!+o+Vx0rcRgw(s$>i%>lr6S)3Y3> zeG7tfwYjz0-h#CCIfP+BmIh0pIpZ2PFx1=J^)8Y2KYf4a-+%mjHPzIcjW*$Yb<0cv z>i06M%dk1&RA2oT$ZK2so_14TiI}dY zNz!>Tj{uCR&7BHW|1LVlbi@!_7ETk9G}^7s>%T)yld#Gtn$nwLejP02+-axXupIAo zM3M?kltbfYbytwr9W1IJma&!fPGSZ$(IO3^xJGtDj5J0756CpAVi zfb_OmU{FH^ys+>d;6r4LNfM}L$@1!h5y}!(BEcd_M$3a#BEm$%UbDImT1fK<%p$Mv zvSaxdo|RKnuEO0UT;NHNwvmA#v=g$xhQ7ey%%5Taium4o`JLWq{saTcW{ zPD^;k@IW;w+Pw^*QXz`-m;3-N?6FjcRA{+bO;OJRbk&4?!h$wM>S!h^KZX!_&sCjb>xRuw6pS9{dB`Rha1jg?D`UHdY}_dwzFqEzAvIylw)u0BrLJ zTrm&UGZ^4E~lAYM-XXm?3|@D2+^g4BJ=blS%*x<;%@ zPx!D#0Yo%brO*2*eCfX4*ZuTn#?K~-Gti+P0h)vKCX^hhNKp{QPxTlim#n`^)YhBQ z5?E#ulCiCL!y*6*BWYc>yD+?b*@a0)CK&4gBbFbqF1@*7GKkY{Q5TkvHh`=b2da*2 zJJf(!M`y%fSVt1(d-_PWll(;@9?rgfP~X2y>kkZ=wH(cL(AAzPtkYuDqji9sB!XKo zR#7~P8xAzOGiZZKV-9o1IoPaFr!(ZjnIX!aS?)T{%$}ydOkM6Slh=T7@&-LXulY^K z!FEJriWj4I=sIm1tzR_5c;^_`>okhq!6n;k$Y#ffN5{A|Z`^2FSoaOPvDKT16=*#C zcy4kL#K$O8zJb_XE#iT%QYt}17{|5;VSQIKOUdF9Cz+L9dSK4VNBZIss%TJfoPwhi z5OS0?3fPin3l?^#UCWxI4^U57+-=x15A3rwEY>BB5DI{dVP?Uw!bL^B-Sy$>c))TP zi(2A^i?{kmXu>0TpcKe}$Rv-ZYZVw187&iW@y|25Tt}&gbMRZpeFz8Z#JPTI6P!v0 z1!T1bF3rSTnqe-@ik!#~xU^t>XCE$=HmJ~o8-U|dPh^7&g@xS>lB6lj(Fdl!%MsG1hSn&!&)}OFrvG@J zmh*Qg$@mq%93ubyNWY0P4N_RiA1S(l^4b8c!tohfg5(gzlf#q8Q}ERt^^Vdng53Ww%P6KE^xXg{Xp3yaFaR+tCE-m45If-TslhCL>76aO(?j}Qsg)d@@Pq8 z7i!c;IVcCkX{BEY3dpHYj>z{%;-3E{P(NNRtI8SFEp6tx_Q{9VE^MNuLH+-X=N8Rk zgyzsc5ack<13ZIL0GBb`&Y5k!Kkk?Dm5aY7g7IJ?Cspbxpp13s19!N1tgH9vDwd-Q#zivuVz|amDKZfwRYR&L(E@us zS_$L8#cM1*U<^mTtszU3+KKtd-xXoQ?9GCm!Ni5wYJQkZ8jOe00~9F~7S3B(P40mi zu`V6%CE|%h_&6#HZ?#1glzd?2`+hU zhTGX=$1E6^_OaRFQA&D4^`z+}DfT_!lcD#b;deU@ce;l2C<7kpmbbt?&L;Sx&I>`% zYOu<;hvC|NVvo2nN7w7jr^-Mm*}$Hh9dHuHDDUT0}c!cgVx&vFb6p};lPOn zAf||Llx@M9NN6J-sV$NQhcLziO%9geKFmtnm&|+sTaq=vTkJ#94C(d^CDtcn*TKLH zg^AK5#X&X>D+#mWOWUq{-Bt_8xo5A|Qf776KT`MGo%)tpjtJee>oh7e4}z^*-fg#m zV+ub0fwjXbXOr3`y%Q=uIx{sM2a5h2a$AwFBB)hZ5#fBI3Gd=lR4$E$lefcM(t|m< zZu_w98G-;WB%lK)4Io!KX|cl;+|OpIW39(N1lPy$1s{OcU5w8Qb$UhKU_w)p^h#}# zPc_(dGfT|$$stB*m^mYR>>5=gqdroE19t#S58wc~ft)ZKQN(RHq3JB8&Xb}Q2y-~O z;ZJk~mp!bV=xZ!t9B!sg&JFkg&|s)G!xSRsz-u4|C^IJgu*#zpiQmI=?$^6HoE!16 z$vRGnN)^q z0HYRCNR8|2+HhR<&{uC_!}mK2^YiR4+GvA&o;!8wRIhU1w@bx5l%qZ1L!sGja}JKo zhV9JZ7-;?%-}mf3w*Z_6eQxIJ?;`fR{ch12sU{98$h=!|6WxnaX# zlinBVe1jHr_)o*fPw;qTE$@7Vjc0}mS?T@p_(;s%0oE}8c6 zp$)tbLg}*I_MZXYc+`mkmo~x%9Tp~kWc}pv(5O6tQogbYR}Bq=PP6EoVJGG5Kp;h+ zqY|Dn`JzAkd!Ux$4v^1ICa7%Uh?h_SV!53>(9;NF1wMiJ3oexzF69?@O6;v3p<4Z_ z$o5Br>gEVqtpeAK24g$bJA`6JZw5hWXEYe)w#$fdPpWeB1h=1qkNh9F{rqtITxVow z><+x?7}MT*#hnA9FkJfP2~pe`4@wxxD7S?eW<7y%(6PE66t~krJ~Y=d=S&7!c;Lr5 zg)y5y6&wiCK?WEIsoRkO+hB80+u6px^n+vAJD3$b&1d~8n(?DHm^jqA{ zeqHY$>>uhM?jP}w?i>q_U_M91m^ko>5*(zngW#wbKdp!fPB|2(OrBPP!^oLJ&NSzs z)ImOLh|3?0Qzr+hROjfD7Ov5CEX4;KgA0#;*2 z9HrHuIS2*Bee6=NM1BSUfjKC^wPkp+r^z=CPg>pnn4AeXQI0$jUaxu;%tM%A4Hr%p zj9>5fM!?@rv%xrWz9ZIpRl0wI9KOP1Tv=L{L`zLWE)~gJADs9;-@HgYq=8wT>?5KFa5l@aa z^4EFe2-p!{q7yoy98@9~{9QFtJH*fMaJ++8sICt&`in@Bmk|6GaW}QcPWR)9>}G+u zpU>SaMv7pC960~`fyv*Xj#ZCr44=C>HX8yd(z`5Z|hsJ?E=_Db|xX&zac0odGF@^AO<5k>U z)@TxgQGb(0O%fO`lzZe%`M(ZQZqvAUX+9am_O8^o@9z^AIsgcz@OV<8PBu?(VUg~h z@s*KV@Qt}`KQW6yw(%l3x95zNt!AggG)i~bsQg^*QJ{;6&%z8bT5~-t5N)50;(36i z3w+Tf%+>jwvSWQQ)lR2ZdI@J0%uG<;VH z)8QFlG{$>6E-I9?QVO-S63XKQX&2GOxUQrV=EDgRp*|o$4v|(-C(*ukPyRW^(;FSa zZouNU)tdX9L_zakQ;PDrKuo?&!CMr(Nx>oo;eR;A=jv?a(zIh6AMNvrxNvsPAF~lZ5_R zioB$Bes!rg$~p)Bch{edCe-2va^TYF<=pmc1D2j$OZ7@%Lt>-ZRbrz8zFeIA zfxvp@vs7f3@@3nBcFe{A5?5FL1j7r{bm`KQBZ%`_${$jUX<(NSMu1CU=z!d!`<#DB zq(EoCN`-Dc#UMP|GC*_Og}Nun#asjStdoTTY4H2hE_Eqe6nu|@Cn)$P1#=W!MPOz+ zuqQ1~{xLqn4CDg%SrY1$v{QWl%q%)|I31ne^69c!agzJ3b2#z!_!@{gxtln266!_g zxz*iF*0LTSjqx$HIk6|tXDie3q6;Rp!{+#Ify9IY!xccOIcFS3>0_y!k(+}71>h1S=Ik2ll1*M=ImJaE+5|;aw!%Dd%s-hZH|6B=b!PFmW zhqclBbZ`srQ^CyPw5SBP124F9QWev|-9J_iZw2=zw9W*x6IyQv_b0S^!Q6z_JHh;f z*1N$2+xkK9qmxP|!M&ioSE~Q`U(l+wRWA=`cU0U@%)aO+AvdC48P<$=WrovAm{di6 znFeNm^jyK!#-(R)Go{z1-_v^K^Pjgv!6VIRGQ*{~Z$!YlSe2J%bip9PysayE@_h+jIS}465@JNQM?KqIp$*MTzVaiRc3Qs(dtF&Dw zH8RwV!TCKVA83xpAald~&3Zf{R~V9@U4X z^G8R|pHs)mi%%Apk97WAUpjgL4^)&GR&{!;rXiL!io21?70FD>XzDCgXDhR1+s3z` z?$>e3xDv3q5!~+y#ZD9_1ES@dRGv4Omk&9Sry{b=LX#Vondu3pO&W;Ee5@IqIS^_Y zzk%&BD#f~mM!S!G2`8JFX+Orcg%z@)jB6$&+7)ZZN(tYvSh39r!|A2rdF!t@peNWy ztoqzEy=MpOd;6Q)TTKY;Bx9H@yuJN{we_ui*4owUc2 zT3c^Ul*A429>ZLd^w8{>YpA-t^_#C*9Fe9{u`WQK@{n=gPXYL|2HQM|$Pj_-b|G}+ z3&ILZ(uf9l18y%*)kCGjGnup^bJX>aactk(9n=%k^ngRJ#I)c z+MX6;IuX9?N`Fk6M8&5vfGJ@YQkyo!S+($UEe>R~#uZN37ve-lKVx129~S_KI=j4v zj0+`n8k$RQ3=9`;*-OuXz@vkp7=|VYK>&U)PQw5WA%k?|FpLMf>1Bi2;w13ay~R*Q zf*mWtU)H@Z4}jZ^&34P7ckwI0K28lIqrnYi#DbdPx}q_REWebnHd@i!wJ5`L_8+3 z*7BG_wKtry7N&0j8FV=9CLluHmr;_M&M8-tW2_#|037xM!k4NyVQ3tAg=2RUhH&Bw z_V~evA^s7*=lLi3|L3bck~q{cf<~4EAgm+QbRmohnU~88x8uw$iGX{6YnFIWv&p6x}jok_Yk3{lH`v53enx2iwAO$+? zcEN61m+_ri#>EU?8v#sI7@@C{T&B3LIHDGizj5wkqPZb@{+ z686ma<8m~t2X{rLe$(UMp(x`J`q&bBGDQx3;Ma~XBJ~N;GotP~?t`bserib_K!eJK zICZ>i#ELQ@t}R%Ebqh$zNgwT=0pj{pC*~O7zYpjw>pxMDtSGHL5AcbDXf7Z&51Yi`n&%2x|d0_V;fZho6zGk$)xd`5At*|U>u%OFLtU=q>qx2^pK zW*UtD2?eej0K7W@?1@1B`gZP55jm?EP2~T}ED(usr}di8zYkJ%E5lgvfcCmpK=_|W z!}nM_Acgpc2u`#ng#}krq41hWkx_#DVBi$zQKJyP@|X(~+nN*NIL>zTv0*QxcwiTG zvzFQKQf-*#R4H)*zc3JPvXhKLcg*_yrC@1-4YNI6vSNBlASnLV)e`R_Q|8ipmSr9; zMX<%yOYeyYlWrQamaU*Rhqtof! { - const [opened, setOpened] = useState(false); - const mobileWidth = useMediaQuery("(max-width: 390px)"); - const form = useForm({ - defaultValues: { - evalQuestionsCount: 5, - chunkSize: 1000, - overlap: 100, - splitMethod: "RecursiveTextSplitter", - embeddingAlgorithm: "OpenAI", - model: "gpt-3.5-turbo", - retriever: "similarity-search", - gradingPrompt: "Fast", - numNeighbors: 3, - files: [], - }, - }); - - return ( - - ); -}; -export default HomePage;