This is self-practice for building LLM from scratch. I used Burmese Dataset collected from WIKI articles.
NOTE: The words may not be meaningful coz the bigram model will only look at the previous single token or word to generate new token
VERSION 1
Loss
Generation samples
Dataset
https://www.kaggle.com/datasets/myominhtet/burmese-wikipedia-articles144k