I am wondering based on what the number 40000 or 60000 is chosen. Is it like a rough estimate of the size of corpora vocabulary? Thanks.