New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language support for OCR tool #4
Comments
{
"AFR": "afr",
"AMH": "amh",
"ARA": "ara",
"ASM": "asm",
"AZE": "aze",
"AZE_CYRL": "aze_cyrl",
"BEL": "bel",
"BEN": "ben",
"BOD": "bod",
"BOS": "bos",
"BUL": "bul",
"CAT": "cat",
"CEB": "ceb",
"CES": "ces",
"CHI_SIM": "chi_sim",
"CHI_TRA": "chi_tra",
"CHR": "chr",
"CYM": "cym",
"DAN": "dan",
"DEU": "deu",
"DZO": "dzo",
"ELL": "ell",
"ENG": "eng",
"ENM": "enm",
"EPO": "epo",
"EST": "est",
"EUS": "eus",
"FAS": "fas",
"FIN": "fin",
"FRA": "fra",
"FRK": "frk",
"FRM": "frm",
"GLE": "gle",
"GLG": "glg",
"GRC": "grc",
"GUJ": "guj",
"HAT": "hat",
"HEB": "heb",
"HIN": "hin",
"HRV": "hrv",
"HUN": "hun",
"IKU": "iku",
"IND": "ind",
"ISL": "isl",
"ITA": "ita",
"ITA_OLD": "ita_old",
"JAV": "jav",
"JPN": "jpn",
"KAN": "kan",
"KAT": "kat",
"KAT_OLD": "kat_old",
"KAZ": "kaz",
"KHM": "khm",
"KIR": "kir",
"KOR": "kor",
"KUR": "kur",
"LAO": "lao",
"LAT": "lat",
"LAV": "lav",
"LIT": "lit",
"MAL": "mal",
"MAR": "mar",
"MKD": "mkd",
"MLT": "mlt",
"MSA": "msa",
"MYA": "mya",
"NEP": "nep",
"NLD": "nld",
"NOR": "nor",
"ORI": "ori",
"PAN": "pan",
"POL": "pol",
"POR": "por",
"PUS": "pus",
"RON": "ron",
"RUS": "rus",
"SAN": "san",
"SIN": "sin",
"SLK": "slk",
"SLV": "slv",
"SPA": "spa",
"SPA_OLD": "spa_old",
"SQI": "sqi",
"SRP": "srp",
"SRP_LATN": "srp_latn",
"SWA": "swa",
"SWE": "swe",
"SYR": "syr",
"TAM": "tam",
"TEL": "tel",
"TGK": "tgk",
"TGL": "tgl",
"THA": "tha",
"TIR": "tir",
"TUR": "tur",
"UIG": "uig",
"UKR": "ukr",
"URD": "urd",
"UZB": "uzb",
"UZB_CYRL": "uzb_cyrl",
"VIE": "vie",
"YID": "yid"
} It would be nice to have non-acronym names for them too. |
Those are mostly ISO three letter codes, but not all. I found |
I selected that table as Object.fromEntries(Array.from(temp0.querySelectorAll('tr')).slice(1).map(tr => {
return [tr.querySelectorAll('td')[0].innerText, tr.querySelectorAll('td')[1].innerText]
})) {
"afr": "Afrikaans",
"amh": "Amharic",
"ara": "Arabic",
"asm": "Assamese",
"aze": "Azerbaijani",
"aze_cyrl": "Azerbaijani - Cyrillic",
"bel": "Belarusian",
"ben": "Bengali",
"bod": "Tibetan",
"bos": "Bosnian",
"bul": "Bulgarian",
"cat": "Catalan; Valencian",
"ceb": "Cebuano",
"ces": "Czech",
"chi_sim": "Chinese - Simplified",
"chi_tra": "Chinese - Traditional",
"chr": "Cherokee",
"cym": "Welsh",
"dan": "Danish",
"deu": "German",
"dzo": "Dzongkha",
"ell": "Greek, Modern (1453-)",
"eng": "English",
"enm": "English, Middle (1100-1500)",
"epo": "Esperanto",
"est": "Estonian",
"eus": "Basque",
"fas": "Persian",
"fin": "Finnish",
"fra": "French",
"frk": "German Fraktur",
"frm": "French, Middle (ca. 1400-1600)",
"gle": "Irish",
"glg": "Galician",
"grc": "Greek, Ancient (-1453)",
"guj": "Gujarati",
"hat": "Haitian; Haitian Creole",
"heb": "Hebrew",
"hin": "Hindi",
"hrv": "Croatian",
"hun": "Hungarian",
"iku": "Inuktitut",
"ind": "Indonesian",
"isl": "Icelandic",
"ita": "Italian",
"ita_old": "Italian - Old",
"jav": "Javanese",
"jpn": "Japanese",
"kan": "Kannada",
"kat": "Georgian",
"kat_old": "Georgian - Old",
"kaz": "Kazakh",
"khm": "Central Khmer",
"kir": "Kirghiz; Kyrgyz",
"kor": "Korean",
"kur": "Kurdish",
"lao": "Lao",
"lat": "Latin",
"lav": "Latvian",
"lit": "Lithuanian",
"mal": "Malayalam",
"mar": "Marathi",
"mkd": "Macedonian",
"mlt": "Maltese",
"msa": "Malay",
"mya": "Burmese",
"nep": "Nepali",
"nld": "Dutch; Flemish",
"nor": "Norwegian",
"ori": "Oriya",
"pan": "Panjabi; Punjabi",
"pol": "Polish",
"por": "Portuguese",
"pus": "Pushto; Pashto",
"ron": "Romanian; Moldavian; Moldovan",
"rus": "Russian",
"san": "Sanskrit",
"sin": "Sinhala; Sinhalese",
"slk": "Slovak",
"slv": "Slovenian",
"spa": "Spanish; Castilian",
"spa_old": "Spanish; Castilian - Old",
"sqi": "Albanian",
"srp": "Serbian",
"srp_latn": "Serbian - Latin",
"swa": "Swahili",
"swe": "Swedish",
"syr": "Syriac",
"tam": "Tamil",
"tel": "Telugu",
"tgk": "Tajik",
"tgl": "Tagalog",
"tha": "Thai",
"tir": "Tigrinya",
"tur": "Turkish",
"uig": "Uighur; Uyghur",
"ukr": "Ukrainian",
"urd": "Urdu",
"uzb": "Uzbek",
"uzb_cyrl": "Uzbek - Cyrillic",
"vie": "Vietnamese",
"yid": "Yiddish"
} |
Would be neat if this was bookmarkable, so you could bookmark a specific language. May as well make the back/forward/URL bar work too. ChatGPT research: https://chat.openai.com/share/bba673d6-7681-4648-bca7-0adb25527130 |
A few people mentioned that it doesn't work for their language. Tesseract.js has that support built in, but I need to expose it as a select box.
The text was updated successfully, but these errors were encountered: