Question about the OCR capability

Great work indeed!

From the description in the paper, I do not find any special OCR module. I am curious how LLaVA obtains the ability to understand the text in the image (e.g., the famous examples of chicken nuggets). Is there any magic in the training dataset?