-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add chatbot tutorial #3382
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3382 +/- ##
==========================================
+ Coverage 89.71% 90.24% +0.53%
==========================================
Files 152 152
Lines 10840 10840
==========================================
+ Hits 9725 9783 +58
+ Misses 1115 1057 -58
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Latency summaryCurrent PR yields:
Breakdown
Backed by latency-tracking. Further commits will update this comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this tutorial is currently too obtuse, most of the lines are just about downloading the dataset and not actual flow/executor things.
Can you refactor it using huggingface datasets dataset (I think it should be possible), and add some line breaks for readability in the code snippets.
Also, assume that the users are very lazy, when possible try to show what the output of the code snippets would be (huggingface course does this well, see for example https://huggingface.co/course/chapter3/2?fw=pt)
My bad, the HF dataset is not the same one as the one used by kaggle. Still, check if the HF one can be a good substitute for Kaggle one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments that will improve readability, but we really need to get rid of all the downloading code - it's complexity is just dominating over this whole tutorial.
Either we figure out how to use HF datasets (I would create one myself, but am not sure about licensing), or we add instructions in pre-requisites for how to manually download the file from kaggle - that would be much nicer.
@tadejsv I think authors of the tutorial opted for downloading each time to make usage faster and easy. Also keep in mind that the downloaded dataset is hosted by jina not kaggle: |
b23ed8c
to
f6fa7d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, looks much better now. Just some final minor changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...I'm unable to do line comments in the file. Here's my feedback:
We will build a fuzzy search demo on the source code
Change to: We will build a fuzzy search demo for source code
Also re-order tutorials in docs/index.md:
- Chatbot
- Executor
- Executor on GPU
- Practice learning
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
|
closes: https://github.com/jina-ai/internal-tasks/issues/170