# ISO-morphic languages

Nick Danis (nsdanis@wustl.edu)

Code for [this post](https://www.nickdanis.com/posts/2021/5/3/exhaustive-list-of-all-iso-morphic-languages-languages-whose-name-equals-its-iso-639-3-code). Data files are from [Ethnologue](https://www.ethnologue.com/codes) and comply with the terms of use. 

In [None]:
import pandas as pd

df = pd.read_csv('LanguageCodes.tab', sep='\t', header=0)

# find the 3-letter-long language names
df['name_three'] = (df['Name'].str.len() == 3)

# find the ISO-morphic language names
df['iso_morphic'] = (df['LangID'] == df['Name'].str.lower())

In [None]:
# Show the ISO-morphic languages among the 3-letter languages
df.groupby(['name_three','iso_morphic'])['name_three'].count().reset_index(name="count")

In [None]:
# New dataframe with the ISO-morphic languages
iso1 = df.loc[df['iso_morphic'] == True]
iso1.head()

In [None]:
# for printing the markdown list of languages
url_prefix = "https://www.ethnologue.com/language/"

def md_list(names):
    for i, l in enumerate(names):
        print(str(i+1)+".", l)

In [None]:
# Add URL column to iso1
iso1['URL'] = "[" + iso1['Name'] + "](" + url_prefix + iso1['LangID'] + ")"
iso1.head()

In [None]:
# print the markdown list of URLs for the ISO-morphic languages
md_list(iso1.URL.values.tolist())

In [None]:
df['only_weak'] = df['LangID'].isin(weak_list)
w_iso = df.loc[df.only_weak == True]

In [None]:
md_list(iso_morphic.Language.values.tolist())

In [None]:
for i, l in zip(range(1,len(iso_morphic.Language.values.tolist())+1),iso_morphic.Language.values.tolist()):
    print(str(i)+".", l)

## Weakly ISO-morphic languages

Uses LanguageIndex.tab. This file contains all LangID - Name pairs for each alternate name of the language.

In [None]:
df2 = pd.read_csv('LanguageIndex.tab', sep='\t', header=0)

# find the 3-letter-long language names
df2['name_three'] = (df2['Name'].str.len() == 3)

# find the ISO-morphic language names
df2['iso_morphic'] = (df2['LangID'] == df2['Name'].str.lower())

In [None]:
# Saves all ISO-morphic name-code pairs to new DF
iso2 = df2.loc[df2['iso_morphic'] == True]

In [None]:
df2.groupby(['name_three','iso_morphic'])['iso_morphic'].count()

Since we don't care how many alternate names are ISO-morphic, group by the codes themselves to find codes associated to a language with at least one ISO-morphic name.

In [None]:
# weakly ISO-morphic languages
weakly = iso2.groupby(['LangID'])['Name'].count().reset_index()
weakly

By doing an outer join on the `weakly` df above with the original `iso1` df, on `LangID`, we can compare the data. We only want the ones in `weakly`, so we filter to `right_only` (filtering to `both` would get us the original list of ISO-morphic languages).

In [None]:
merge = iso1.merge(weakly, on="LangID", how="outer", indicator=True)
weakly_iso = merge.loc[merge._merge == "right_only"]
weakly_iso.head()

Save these values to a list:

In [None]:
weak_list = weakly_iso.LangID.values.tolist()

Use these values to make a new column in the original `df` to retrieve language information, and save this to a new df `w_iso` which is only those languages that are weakly ISO-morphic. 

In [None]:
df['only_weak'] = df['LangID'].isin(weak_list)
w_iso = df.loc[df.only_weak == True]

In [None]:
# make URL column for the weakly languages
w_iso['URL'] = "["+w_iso['Name']+"]("+url_prefix+w_iso['LangID']+")"

In [None]:
md_list(w_iso.URL.values.tolist())