You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I really love the way in which these learning materials and curriculum are designed. However, I had some reviews which might help make this better. I don't have any reviews for Sections 1 and 3 and feel they are really great in the current form. Do let me know if I should make a PR addressing any of these reviews.
🎓 Tokenization Probably the first thing most NLP algorithms have to do is split the text into tokens, or words. While this sounds simple, having to account for punctuation and different language's word and sentence delimiters can make it tricky.
maybe we could give some intuition on why tokenization might be to not only split a sentence when there is whitespace by adding:
Thought it might seem very straightforward to simply split your sentence into words, you might have to use some other methods or add on top of this too.
For the Tasks common to NLP section I feel we should add a mention about "word embeddings" owing to its importance. Maybe we could add something like this:
🎓 Embeddings Embeddings are a way to meaningfully convert your text data numerically. This is done in a way so that words with a similar meaning or words used together cluster together in a high dimensional space.
Optionally we could also add:
Try playing around with word embeddings from a quite popular model (Word2Vec) here. Can you see how clicking on one word shows the words with similar meaning clustering around! Eg. if you inspect the word 'toy' you see it clusters with words: 'disney', 'lego', 'playstation', 'console' etc.
However, I do understand this might make it a bit more deep at this stage, what do you think?
Hi @jlooper,
I really love the way in which these learning materials and curriculum are designed. However, I had some reviews which might help make this better. I don't have any reviews for Sections 1 and 3 and feel they are really great in the current form. Do let me know if I should make a PR addressing any of these reviews.
1-Introduction-to-NLP
This section looks pretty good to me and I don't have any reviews which might help make this better.
2-Tasks
maybe we could give some intuition on why tokenization might be to not only split a sentence when there is whitespace by adding:
Optionally we could also add:
However, I do understand this might make it a bit more deep at this stage, what do you think?
3-Translation-Sentiment
This section looks pretty good to me and I don't have any reviews which might help make this better.
The text was updated successfully, but these errors were encountered: