## Analysis of the list of jesuit communities in China ("Chrétientés") by Dehergne


This notebook analyzes the list of Jesuit communities in China, as compiled by Dehergne in his work 
"Repertoire des jésuites de Chine, 1552-1773" (Paris, 1973). 

This notebook produces tables with the names of communties, their wkidata code, and comments related to
modern chinese spelling of the place name. 

The notebook also uses records related to place names in individual missionary entries in the "Repertoire"
to list all the people present in a community at a given point in time.

Excel files produced in this notebook:
- inferences/residences-1644.xlsx
- inferences/residences-1701.xlsx
- inferences/residences-1644-1701.xlsx
- inferences/residences-names-1644-1701.xlsx
- inferences/residences-1644-1701-no-wikidata.xlsx'


In [936]:
import timelink

print("Timelink version:", timelink.__version__)

Timelink version: 1.1.26


### Setup

In [937]:

from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook()
# tlnb.print_info()



## The residences of the Jesuits in China
The Repertoir includes two lists of Jesuits residences in China.

One reports to 1644 [Planche: Carte des Chrétientés Chinoises de la fin des Ming (1644)](../sources/dehergne-locations-1644.cli)
and the second to 1701  [VII. Carte des résidences de Chine en l'année 1701](../sources/dehergne-locations-1701.cli).


Function to extract the wikidate associated with the name of a residence.

In [938]:
import re

def get_wikidata_id(geo_entity, if_missing=''):
    """ Check the obs field for wikidata links

    Returns a tuple of the cleaned comment and the wikidata id"""
    extra_info = geo_entity.extra_info
    name_comment = extra_info.get('name', {}).get('comment','')
    name_original = extra_info.get('name', {}).get('original','')

    pattern = r'@wikidata\:\s*(Q[0-9]*)'
    wikidata_in_comment = re.findall(pattern, name_comment)
    comment_without_wikidata = re.sub(pattern, '', name_comment)
    # Sometimes the wikidata id is in the original name
    wikidata_in_original = re.findall(pattern, name_original)
    original_without_wikidata = re.sub(pattern, '', name_original)
    return comment_without_wikidata + original_without_wikidata, wikidata_in_comment[0] if wikidata_in_comment else wikidata_in_original[0] if wikidata_in_original else if_missing

# extract from the "comment" column
def extract_coordinates(comment):
    """
    Parse various coordinate formats from text comment and return a tuple (lat, lon).
    Supported formats:
      1. 'coordinates: <lat><N/S>, <lon><E/W>'
      2. 'latitude: <decimal>, longitude: <decimal>'
      3. Signed decimal degrees: '<+ or -><decimal>, <+ or -><decimal>'
      4. DMS: '<deg>°<min>'<sec>"<N/S> <deg>°<min>'<sec>"<E/W>'
    """
    if not comment:
        return None
    # Return None if comment does not contain "coordinates:" nor "latitude:" nor "longitude:"
    if not re.search(r'coordinates:|latitude:|longitude:', comment, flags=re.IGNORECASE):
        return None

    # 1. explicit coordinate tag
    m = re.search(r'coordinates:\s*([-\d.]+)([NS]),\s*([-\d.]+)([EW])', comment, flags=re.IGNORECASE)
    if m:
        lat, ns, lon, ew = m.groups()
        lat = float(lat) * (1 if ns == 'N' else -1)
        lon = float(lon) * (1 if ew == 'E' else -1)
        return (lat, lon)

    # 2. labeled decimal degrees
    m = re.search(r'latitude:\s*([-\d.]+),\s*longitude:\s*([-\d.]+)', comment, flags=re.IGNORECASE)
    if m:
        lat, lon = m.groups()
        return (float(lat), float(lon))

    # 3. signed decimal degrees with +/− signs
    m = re.search(r'([-+]?\d+(?:\.\d+)?),\s*([-+]?\d+(?:\.\d+)?)', comment, flags=re.IGNORECASE)
    if m:
        lat, lon = m.groups()
        return (float(lat), float(lon))

    # 4. DMS format
    dms = re.search(r'(\d+)°(\d+)\'(\d+\.?\d*)"([NS])[\s,]+(\d+)°(\d+)\'(\d+\.?\d*)"([EW])', comment)
    if dms:
        d, m1, s, ns, D, m2, s2, ew = dms.groups()
        def dms_to_decimal(deg, minu, sec, hemi):
            dd = float(deg) + float(minu) / 60 + float(sec) / 3600
            return dd * (1 if hemi in ('N', 'E') else -1)
        lat = dms_to_decimal(d, m1, s, ns)
        lon = dms_to_decimal(D, m2, s2, ew)
        return (lat, lon)

    raise ValueError(f"Could not parse coordinates from comment: {comment}")


In [939]:
# test the extract_coordinates function

test_comments = [
    "coordinates: 48.8588443N, 2.2943506E",
    "latitude: 48.8588443, longitude: 2.2943506",
    "coordinates:48.8588443, 2.2943506",
    "coordinates:48°51'31.84\"N 2°17'39.66\"E",
    "latitude:26.4269, longitude:111.9940",
    """In the Chinese translation it is recognized as “章浦”（漳浦）, which is wrong. It should be "后坂", because "Aupua" corresponds to the pronunciation of "后坂" in the southern Fujian dialect. In Dehergne(1957), it has another transcription "Heupuen", coordinates: 24.50213852506329N, 117.6917197408656E""",
     """provinces, coordinates:32°3'39"N, 118°46'44"E """,
     "there, coordinates: 34.55258137628194N,109.04672038131238E/",
     """coordinates: 31°35'0"N, 105°58'19"E""",
     "coordinates: 30.845691234919308N, 121.3179093082612E",
"No coordinates here",
]

for comment in test_comments:
    try:
        coords = extract_coordinates(comment)
        print(f"Comment: {comment} => Coordinates: {coords}")
    except ValueError as e:
        print(f"Comment: {comment} => Error: {e}")

Comment: coordinates: 48.8588443N, 2.2943506E => Coordinates: (48.8588443, 2.2943506)
Comment: latitude: 48.8588443, longitude: 2.2943506 => Coordinates: (48.8588443, 2.2943506)
Comment: coordinates:48.8588443, 2.2943506 => Coordinates: (48.8588443, 2.2943506)
Comment: coordinates:48°51'31.84"N 2°17'39.66"E => Coordinates: (48.85884444444444, 2.2943499999999997)
Comment: latitude:26.4269, longitude:111.9940 => Coordinates: (26.4269, 111.994)
Comment: In the Chinese translation it is recognized as “章浦”（漳浦）, which is wrong. It should be "后坂", because "Aupua" corresponds to the pronunciation of "后坂" in the southern Fujian dialect. In Dehergne(1957), it has another transcription "Heupuen", coordinates: 24.50213852506329N, 117.6917197408656E => Coordinates: (24.50213852506329, 117.6917197408656)
Comment: provinces, coordinates:32°3'39"N, 118°46'44"E  => Coordinates: (32.06083333333333, 118.77888888888889)
Comment: there, coordinates: 34.55258137628194N,109.04672038131238E/ => Coordinates: (

## List of residences in 1644

In [940]:
import pandas as pd
from sqlalchemy import select

geo1, geo2, geo3 = tlnb.db.get_model(['geo1','geo2','geo3'])
stmt = select(geo1).where(geo1.inside == 'deh-chre-1644')

place_list = []

with tlnb.db.session() as session:
    result = session.execute(stmt).fetchall()
    for province, in result:
        comment, wikidata = get_wikidata_id(province, if_missing='No wikidata')
        print(province.id, province.name, wikidata, comment)
        place_list.append({'province': province.name,
                           'id': province.id,
                           'level': 'province',
                           'name': province.name,
                           'name_original': province.with_extra_info().name,
                           'province_wikidata_id': wikidata,
                           'wikidata_id': wikidata,
                           'comment': comment})
        fous = session.execute(select(geo2).where(geo2.inside == province.id)).fetchall()
        province_wikdata = wikidata
        for fou, in fous:

            comment, wikidata = get_wikidata_id(fou, if_missing='No wikidata')
            print(' ', fou.name,  wikidata, comment)
            place_list.append({'province': province.name,
                               'province_id': province.id,
                               'province_wikidata_id': province_wikdata,
                               'id': fou.id,
                               'level': 'fou',
                               'fou':fou.name,
                               'name': fou.name,
                               'name_original': fou.with_extra_info().name,
                               'fou_wikidata_id': wikidata,
                               'wikidata_id': wikidata,
                               'comment': comment})
            geo3s = session.execute(select(geo3).where(geo3.inside == fou.id)).fetchall()
            fou_wikidata = wikidata
            for tcheou_hien, in geo3s:
                comment, wikidata = get_wikidata_id(tcheou_hien, if_missing='No wikidata')
                print('   ', tcheou_hien.name, wikidata,comment,  )
                place_list.append({'province': province.name,
                                    'province_id': province.id,
                                    'province_wikidata_id': province_wikdata,
                                    'fou_id': fou.id,
                                    'fou_wikidata_id': fou_wikidata,
                                    'id': tcheou_hien.id,
                                    'level': 'tcheou-hien',
                                    'fou':fou.name,
                                    'name': tcheou_hien.name,
                                    'name_original': tcheou_hien.with_extra_info().name,
                                    'wikidata_id': wikidata,
                                    'comment': comment})


deh-r1644-chekiang Chekiang Q16967 Tche-kiang, today Zhejiang, 浙江,  @dehergne:396
  Hangchou Q4970 Hang-tcheou, today Hangzhou, 杭州, 
    Fuyang Q1011103 Fou-yang, today Fuyang, 富阳, 
    Jenho Q9385136 Jen-houo, today Renhe, 仁和县 (), Historical county name, coordinates: 30.448897N, 120.307504E
  Chüchow Q58235 K'iu-tcheou, today Quzhou, 衢州, , in the Chinese translation it is recognized as “遂州”, which is wrong, both phonetically and geographically. In Dehergne(1957), it is noted as "衢州".
  Huchow Q42664 Hou-tcheou, today Huzhou, 湖州, 
    Tehtsing Q1191987 "Tehtsing du Huchow, Té-ts'ingTeching# today Deqing, 德清, "
  Kashing Q58178 Kia-hing, today Jiaxing, 嘉兴, 
    Kashan Q1361347 Kia-chan, today Jiashan, 嘉善, Kaosham
    Tangsi Q10931032 "T'ang-k'i Tangchi", today Tangqi, 塘栖 , in the Chinese translation it is recognized as “塘拪”
    Tsungteh Q10270889 Tch'ong-té,today Chongde, 崇德县 , Historical county name, located in the present Chongfu 崇福镇Tsungteh (Shihmen)
    Tungsiang Q1204548 T'ong-hian

Show one place in kleio

In [941]:
place_kleio_id = "deh-r1644-chuanchow"

from timelink.api.models import Geoentity

with tlnb.db.session() as session:
    geo_place = session.get(Geoentity, place_kleio_id)
    print(geo_place.to_kleio(show_contained=True))



geo2$Chüanchow#Quanzhou, Ts'iuen-tcheou, today Quanzhou, 泉州, @wikidata:Q68695/geo2
  atr$activa/sim/1625#cerca de
  atr$residencia-missao/Dominicanos/1632
  atr$geoentity:name@wikidata/"https://www.wikidata.org/wiki/Q68695"#Quanzhou, Ts'iuen-tcheou, today Quanzhou, 泉州, @wikidata:Q68695%Q68695/1644
  atr$residencia-missao/Jesuíta/<1645#antes de
  geo3$Amoy#Hia-men,Shamen, today Xiamen, 厦门, @wikidata:Q68744,%Amoy (Szeming)/geo3
    atr$residencia-missao/Dominicanos/1594%tentatives OP 1594
    atr$geoentity:name@wikidata/"https://www.wikidata.org/wiki/Q68744"#Hia-men,Shamen, today Xiamen, 厦门, @wikidata:Q68744,%Q68744/1644
    atr$alternative-name@wikidata/"https://www.wikidata.org/wiki/Q1374907"%Siming/1644
    atr$alternative-name/Siming#today 思明, @wikidata:Q1374907, Siming is the historical name and the current central district of Xiamen in the Chinese translation it is written as“四明” (wrong character)/1644
  geo3$Anhai#Ngan-hai,today Anhai,安海, @wikidata:Q4764330%principaux bourgs/geo3


In [942]:
pd.set_option('display.max_rows', 300)
# create a dataframe from the list
places_1644_df = pd.DataFrame(place_list)
places_1644_df.info()
places_1644_df['year'] = 1644
cols=['year','id','level','province',  'fou','name', 'name_original', 'province_wikidata_id', 'fou_wikidata_id','wikidata_id', 'comment']
places_1644_df[cols].sort_values(by=['province', 'fou', 'name']).head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   province              215 non-null    object
 1   id                    215 non-null    object
 2   level                 215 non-null    object
 3   name                  215 non-null    object
 4   name_original         215 non-null    object
 5   province_wikidata_id  215 non-null    object
 6   wikidata_id           215 non-null    object
 7   comment               215 non-null    object
 8   province_id           197 non-null    object
 9   fou                   197 non-null    object
 10  fou_wikidata_id       197 non-null    object
 11  fou_id                135 non-null    object
dtypes: object(12)
memory usage: 20.3+ KB


Unnamed: 0,year,id,level,province,fou,name,name_original,province_wikidata_id,fou_wikidata_id,wikidata_id,comment
89,1644,deh-r1644-chuchow-tcheou,fou,Anhwei,Chuchow,Chuchow,"Chuchow%Chuchow, today Chuzhou, 滁州, @wikidata:...",Q40956,Q114045,Q114045,"Chuchow, today Chuzhou, 滁州,"
90,1644,deh-r1644-hweichow,fou,Anhwei,Hweichow,Hweichow,"Hweichow#Hoei-tcheou, today Huizhou, 徽州, @wiki...",Q40956,Q4358404,Q4358404,"Hoei-tcheou, today Huizhou, 徽州,"
92,1644,deh-r1644-tungment,tcheou-hien,Anhwei,Hweichow,Tungmen,"Tungmen#""""""(?) In Dehergne(1957, p51): ""Chréti...",Q40956,Q4358404,No wikidata,"(?) In Dehergne(1957, p51): ""Chrétienté au Wuy..."
91,1644,deh-r1644-wuyan-hien,tcheou-hien,Anhwei,Hweichow,Wuyan hien,"Wuyan hien#Ou-yuen, today Wuyuan Xian, 婺源县, @w...",Q40956,Q4358404,Q1357710,"Ou-yuen, today Wuyuan Xian, 婺源县, , Wuyuan hist..."
88,1644,deh-r1644-anhwei,province,Anhwei,,Anhwei,"Anhwei#Anhui, Ngon-hoei, today Anhui, 安徽, @wik...",Q40956,,Q40956,"Anhui, Ngon-hoei, today Anhui, 安徽,"


Export list of residences in 1644 to a file and list of wikidata
ids so they can be used to fetch coordinates and other information.

Ensure that notebook `wikidata-linked-data.ipynb` has been run after the next cell to update cache of linked data.


In [943]:
places_1644_df.to_excel('../inferences/residences-1644.xlsx', index=False)
# save to wikidata ids directory
places_1644_df['wikidata_id'].to_csv('../inferences/wikidata-references/residences-1644.csv', index=False)

### Load wikidata data

Requires that `wikidata-linked-data.ipynb` has been run first to generate the `locations_wikidata_info` file used here.


In [944]:
# load wikidata data
from dehergne_util import locations_wikidata_info_file

# load the locations wikidata info from xlsx file
print("Loading locations wikidata info from", locations_wikidata_info_file)
locations_wikidata_info = pd.read_excel(locations_wikidata_info_file, dtype=str)
locations_wikidata_info.set_index('wikidata_id', inplace=True)

# merge the two dataframes
merged_df = pd.merge(places_1644_df, locations_wikidata_info, on='wikidata_id', how='left')
cols = ['level', 'province', 'fou', 'name', 'wikidata_id', 'coordinates','latitude','longitude','comment', 'province_wikidata_id', 'fou_wikidata_id']
merged_df[cols].sort_values(by=['province', 'fou', 'name']).head(20)

Loading locations wikidata info from ../inferences/wikidata-references/locations_wikidata_info.xlsx


Unnamed: 0,level,province,fou,name,wikidata_id,coordinates,latitude,longitude,comment,province_wikidata_id,fou_wikidata_id
89,fou,Anhwei,Chuchow,Chuchow,Q114045,"(32.30621, 118.31148)",32.30621,118.31148,"Chuchow, today Chuzhou, 滁州,",Q40956,Q114045
90,fou,Anhwei,Hweichow,Hweichow,Q4358404,"(29.869722222222222, 118.42194444444445)",29.86972222222222,118.4219444444444,"Hoei-tcheou, today Huizhou, 徽州,",Q40956,Q4358404
92,tcheou-hien,Anhwei,Hweichow,Tungmen,No wikidata,,,,"(?) In Dehergne(1957, p51): ""Chrétienté au Wuy...",Q40956,Q4358404
91,tcheou-hien,Anhwei,Hweichow,Wuyan hien,Q1357710,"(29.25, 117.85)",29.25,117.85,"Ou-yuen, today Wuyuan Xian, 婺源县, , Wuyuan hist...",Q40956,Q4358404
88,province,Anhwei,,Anhwei,Q40956,"(31.833333333333, 117)",31.833333333333,117.0,"Anhui, Ngon-hoei, today Anhui, 安徽,",Q40956,
4,fou,Chekiang,Chüchow,Chüchow,Q58235,"(28.95445, 118.8763)",28.95445,118.8763,"K'iu-tcheou, today Quzhou, 衢州, , in the Chines...",Q16967,Q58235
2,tcheou-hien,Chekiang,Hangchou,Fuyang,Q1011103,"(30.04998, 119.93697)",30.04998,119.93697,"Fou-yang, today Fuyang, 富阳,",Q16967,Q4970
1,fou,Chekiang,Hangchou,Hangchou,Q4970,"(30.25, 120.1675)",30.25,120.1675,"Hang-tcheou, today Hangzhou, 杭州,",Q16967,Q4970
3,tcheou-hien,Chekiang,Hangchou,Jenho,Q9385136,,,,"Jen-houo, today Renhe, 仁和县 (), Historical coun...",Q16967,Q4970
5,fou,Chekiang,Huchow,Huchow,Q42664,"(30.8925, 120.0875)",30.8925,120.0875,"Hou-tcheou, today Huzhou, 湖州,",Q16967,Q42664


Missing coordinates

In [945]:
# list rows where coordinates are not available
missing_coords = merged_df[merged_df['coordinates'].isna()]
print(f"Rows with missing coordinates ({len(missing_coords)}):")
missing_coords[cols].sort_values(by=['province', 'fou', 'name'])


Rows with missing coordinates (33):


Unnamed: 0,level,province,fou,name,wikidata_id,coordinates,latitude,longitude,comment,province_wikidata_id,fou_wikidata_id
92,tcheou-hien,Anhwei,Hweichow,Tungmen,No wikidata,,,,"(?) In Dehergne(1957, p51): ""Chrétienté au Wuy...",Q40956,Q4358404
3,tcheou-hien,Chekiang,Hangchou,Jenho,Q9385136,,,,"Jen-houo, today Renhe, 仁和县 (), Historical coun...",Q16967,Q4970
16,tcheou-hien,Chekiang,Ningpo,Wuking,No wikidata,,,,"(?) Uchim,Ou kin, Dehergne(1957) did not give ...",Q16967,Q42780
33,tcheou-hien,Fukien,Changchow,Aupua,Q14420305,,,,"today Houban, 后坂 (), (@geonames:1977135), coor...",Q41705,Q68814
38,tcheou-hien,Fukien,Chüanchow,Tsingkiang,Q128883,,,,"Chingchiang ,Tsin-kiang, today Jingjiang, 靖江, ...",Q41705,Q68695
28,tcheou-hien,Fukien,Foochow,Niensien,No wikidata,,,,"(?) Dehergne(1957, p30): ""A côté de Hai keu ...",Q41705,Q68481
45,tcheou-hien,Fukien,Funing,Tingteo,No wikidata,,,,(?) 藤头？顶头？ in the Chinese translation it is re...,Q41705,Q241877
66,tcheou-hien,Fukien,Taiwan,Camarri,No wikidata,,,,"today Jinbaoli, 金包里, in the Chinese translatio...",Q41705,Q22502
65,tcheou-hien,Fukien,Taiwan,Taparri,Q7420445,,,,"""""""hoje：大包里 in the Chinese translation it is r...",Q41705,Q22502
81,tcheou-hien,Hukwang,Kingchow,Meng kia k´i,No wikidata,,,,(?) in the Chinese translation it is recognize...,Q1014420,Q14135188


Scan comments for specific mentions of coordinates

In [946]:
# set the coordinates for the places that have coordinates in the comment column
# interate rows with iterrows
for index, row in merged_df[merged_df['coordinates'].isna()].iterrows():
    print()
    print("Looking for coordinates of:", row['id'], row['name'])
    # if row['comment'] is not NaN and contains "coordinates", "latitude", or "longitude"
    if not re.search(r'coordinates:|latitude:|longitude:', row['comment'], flags=re.IGNORECASE):
        print("No coordinates found in comment for row:", index, row['name'])
        continue
    print(row['comment'])
    coords = extract_coordinates(row['comment'])
    if coords:
        print("Found coordinates:", coords)
        merged_df.at[index, 'coordinates'] = coords
        merged_df.at[index, 'latitude'] = coords[0]
        merged_df.at[index, 'longitude'] = coords[1]



Looking for coordinates of: deh-r1644-jenho Jenho
Jen-houo, today Renhe, 仁和县 (), Historical county name, coordinates: 30.448897N, 120.307504E
Found coordinates: (30.448897, 120.307504)

Looking for coordinates of: deh-r1644-wuking Wuking
No coordinates found in comment for row: 16 Wuking

Looking for coordinates of: deh-r1644-niensien Niensien
No coordinates found in comment for row: 28 Niensien

Looking for coordinates of: deh-r1644-aupua Aupua
today Houban, 后坂 (), (@geonames:1977135), coordinates: 24.50213852506329N, 117.6917197408656EAu-poa,Heupuen
Found coordinates: (24.50213852506329, 117.6917197408656)

Looking for coordinates of: deh-r1644-tsinkiang Tsingkiang
No coordinates found in comment for row: 38 Tsingkiang

Looking for coordinates of: deh-r1644-tingteo Tingteo
No coordinates found in comment for row: 45 Tingteo

Looking for coordinates of: deh-r1644-taparri Taparri
"""hoje：大包里 in the Chinese translation it is recognized as 塔巴里,in Dehergne(1957) it is noted as 大包里, which

List places with no coordinates


In [947]:
# List rows in merged_df where the 'coordinates' column is NaN
missing_coords_df = merged_df[merged_df['coordinates'].isna()]
missing_coords_df

Unnamed: 0,province,id,level,name,name_original,province_wikidata_id,wikidata_id,comment,province_id,fou,...,portuguese_description,coordinates,latitude,longitude,administrative_entity_id,administrative_entity_label_en,administrative_entity_label_zh,country_id,country_label,label
16,Chekiang,deh-r1644-wuking,tcheou-hien,Wuking,"Wuking#(?) Uchim,Ou kin, Dehergne(1957) did no...",Q16967,No wikidata,"(?) Uchim,Ou kin, Dehergne(1957) did not give ...",deh-r1644-chekiang,Ningpo,...,,,,,,,,,,
28,Fukien,deh-r1644-niensien,tcheou-hien,Niensien,"Niensien#""""""(?) Dehergne(1957, p30): ""A côté...",Q41705,No wikidata,"(?) Dehergne(1957, p30): ""A côté de Hai keu ...",deh-r1644-fukien,Foochow,...,,,,,,,,,,
38,Fukien,deh-r1644-tsinkiang,tcheou-hien,Tsingkiang,"Tsingkiang#""""""Chingchiang ,Tsin-kiang, today J...",Q41705,Q128883,"Chingchiang ,Tsin-kiang, today Jingjiang, 靖江, ...",deh-r1644-fukien,Chüanchow,...,,,,,,,,,,
45,Fukien,deh-r1644-tingteo,tcheou-hien,Tingteo,"Tingteo#""""""(?) 藤头？顶头？ in the Chinese translati...",Q41705,No wikidata,(?) 藤头？顶头？ in the Chinese translation it is re...,deh-r1644-fukien,Funing,...,,,,,,,,,,
81,Hukwang,deh-r1644-meng-kia-ki,tcheou-hien,Meng kia k´i,"Meng kia k´i#""""""(?) in the Chinese translation...",Q1014420,No wikidata,(?) in the Chinese translation it is recognize...,deh-r1644-hukwang,Kingchow,...,,,,,,,,,,
92,Anhwei,deh-r1644-tungment,tcheou-hien,Tungmen,"Tungmen#""""""(?) In Dehergne(1957, p51): ""Chréti...",Q40956,No wikidata,"(?) In Dehergne(1957, p51): ""Chrétienté au Wuy...",deh-r1644-anhwei,Hweichow,...,,,,,,,,,,
111,Kiangsu,deh-r1644-kaokia,tcheou-hien,Kaokia,"Kaokia#""""""(?) today Gaojia, 高家, in the Chinese...",Q16963,No wikidata,"(?) today Gaojia, 高家, in the Chinese translati...",deh-r1644-kiangsu,Sungkiang,...,,,,,,,,,,
151,Kwangtung,deh-r1644-hwanghsiaping,tcheou-hien,Hwanghsiaping,"Hwanghsiaping#""""""(?) Hwanghsiaping, today Vank...",Q15175,No wikidata,"(?) Hwanghsiaping, today Vankaxen, 黄下坪？, In th...",deh-r1644-kwangtung,Schiuchow,...,,,,,,,,,,
153,Kwangtung,deh-r1644-yangsiang,tcheou-hien,Yangsiang,"Yangsiang#""""""(?) In the Chinese translation, i...",Q15175,No wikidata,"(?) In the Chinese translation, it is recogniz...",deh-r1644-kwangtung,Schiuchow,...,,,,,,,,,,
186,Shangtung,deh-r1644-kwanchang,tcheou-hien,Kwanchang,"Kwanchang#""""""(?) In Dehergne(1957), there is n...",Q43407,No wikidata,"(?) In Dehergne(1957), there is no ""Kwanchang""...",deh-r1644-shangtung,Tsinan,...,,,,,,,,,,


### Map the structure of residences linking places to their enclosing administrative units.

In [948]:
%pip install plotly

52708.06s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


.bash_profile RUN!
Note: you may need to restart the kernel to use updated packages.


In [949]:
# loop over the merged dataframe and print the information

locations = []

for index, row in merged_df.sort_values(['province', 'fou', 'name']).iterrows():
    print(f" id: {row['id']} Level: {row['level']} Province: {row['province']} {row['province_wikidata_id']}, Fou: {row['fou']} {row['fou_wikidata_id']}, Tcheou/Hien: {row['name']}")
    lat = row['latitude']
    lon = row['longitude']
    place_type = row['type'] if 'type' in row else ''
    label = (
        (f"    {row['name']}") +
        (f" {row['comment']}" if row['comment'] else '') +
        (f" {row['english_label']}" if row['english_label'] else '') +
        (f"({row['english_description']})" if row['english_description'] else '') +
        (f" {row['chinese_label']}" if row['chinese_label'] else '') +
        (f"({row['chinese_description']})" if row['chinese_description'] else '')
    )
    print(label)
    level = row['level']
    if level == 'province':
        print(f"    Province: {row['province']} (Wikidata ID: {row['province_wikidata_id']})")
    elif level == 'fou':
        print(f"    Fou: {row['fou']} (Wikidata ID: {row['fou_wikidata_id']})")
    elif level == 'tcheou_hien':
        print(f"    Tcheou/Hien: {row['name']} (Wikidata ID: {row['wikidata_id']})")

 id: deh-r1644-chuchow-tcheou Level: fou Province: Anhwei Q40956, Fou: Chuchow Q114045, Tcheou/Hien: Chuchow
    Chuchow Chuchow, today Chuzhou, 滁州,  Chuzhou(prefecture-level city in Anhui, China) 滁州市(中国安徽省的地级市)
    Fou: Chuchow (Wikidata ID: Q114045)
 id: deh-r1644-hweichow Level: fou Province: Anhwei Q40956, Fou: Hweichow Q4358404, Tcheou/Hien: Hweichow
    Hweichow Hoei-tcheou, today Huizhou, 徽州,  Huizhou(region in Anhui and Jiangxi, China) 徽州(nan)
    Fou: Hweichow (Wikidata ID: Q4358404)
 id: deh-r1644-tungment Level: tcheou-hien Province: Anhwei Q40956, Fou: Hweichow Q4358404, Tcheou/Hien: Tungmen
    Tungmen (?) In Dehergne(1957, p51): "Chrétienté au Wuyüan (Ou-yuen) hien, à la frontière du Kiangsi, et se nomme Tungmen, à l'ouest du Wuyüan, route de Kingtehchen." in the Chinese translation it is recogniazed as “东门村”, but it seems that there is no 东门村 in this area.Wuyuän hien le bourg de Tungmen nan(nan) nan(nan)
 id: deh-r1644-wuyan-hien Level: tcheou-hien Province: Anhwei Q4095

In [950]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 28 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   province                        215 non-null    object
 1   id                              215 non-null    object
 2   level                           215 non-null    object
 3   name                            215 non-null    object
 4   name_original                   215 non-null    object
 5   province_wikidata_id            215 non-null    object
 6   wikidata_id                     215 non-null    object
 7   comment                         215 non-null    object
 8   province_id                     197 non-null    object
 9   fou                             197 non-null    object
 10  fou_wikidata_id                 197 non-null    object
 11  fou_id                          135 non-null    object
 12  year                            215 non-null    in

In [951]:
from textwrap import wrap
import plotly.graph_objects as go

# generate a dictionary for quick lookup with id as key
locations_wikidata_dict = {row['id']: row for index, row in merged_df.iterrows()}

# Define marker styles for each level
level_styles = {
    'province': {'color': 'red', 'size': 12},
    'fou': {'color': 'blue', 'size': 8},
    'tcheou-hien': {'color': 'green', 'size': 5}
}

fig = go.Figure()


for level, style in level_styles.items():
    fig.add_trace(go.Scattermap(
        lon=[None],
        lat=[None],
        mode='markers',
        marker=dict(color=style['color'], size=style['size']),
        name=level
    ))
# Add lines for 'fou' and 'tcheou_hien'
for _, row in merged_df.iterrows():
    level = row.get('level')
    if level == 'fou':
        id_origin = row.get('province_id', None)
        id_destination = row.get('id', None)
        lat_origin = locations_wikidata_dict.get(id_origin, {}).get('latitude', None)
        lon_origin = locations_wikidata_dict.get(id_origin, {}).get('longitude', None)
        lat_destination = locations_wikidata_dict.get(id_destination, {}).get('latitude', None)
        lon_destination = locations_wikidata_dict.get(id_destination, {}).get('longitude', None)
        if lat_origin and lon_origin and lat_destination and lon_destination:
            fig.add_trace(go.Scattermap(
                lon=[float(lon_origin), float(lon_destination)],
                lat=[float(lat_origin), float(lat_destination)],
                mode='lines',
                line=dict(width=1, color='red'),
                showlegend=False
            ))
    elif level == 'tcheou-hien':
        id_origin = row.get('fou_id', None)
        id_destination = row.get('id', None)
        lat_origin = locations_wikidata_dict.get(id_origin, {}).get('latitude', None)
        lon_origin = locations_wikidata_dict.get(id_origin, {}).get('longitude', None)
        lat_destination = locations_wikidata_dict.get(id_destination, {}).get('latitude', None)
        lon_destination = locations_wikidata_dict.get(id_destination, {}).get('longitude', None)
        if id_destination == 'Q7420445' or row['name'] == "Camarri": # special case for Taparri
            print(f"Debugging: {id_origin} -> {id_destination}")
            pass
        if lat_origin and lon_origin and lat_destination and lon_destination:
            fig.add_trace(go.Scattermap(
                lon=[float(lon_origin), float(lon_destination)],
                lat=[float(lat_origin), float(lat_destination)],
                mode='lines',
                line=dict(width=1, color='blue'),
                showlegend=False
            ))
# Add Markers for each location
for index, row in merged_df.iterrows():
    lat = row.get('latitude')
    lon = row.get('longitude')
    level = row.get('level')
    name = row.get('name')
    english_description = row.get('english_description', '')
    chinese_description = row.get('chinese_description', '')
    wikidata_id = row.get('wikidata_id', '')
    coordinates = row.get('coordinates', '')
    if name == "Camarri":
        print(f"Debugging: {name} with wikidata_id {wikidata_id}")

    if wikidata_id != 'No wikidata':
        english_label = row.get('english_label', '')
        chinese_label = row.get('chinese_label', '')
        portuguese_label = row.get('portuguese_label', '')
        english_description = row.get('english_description', '')
        chinese_description = row.get('chinese_description', '')
        portuguese_description = row.get('portuguese_description', '')
        wikidata_label = (
                         f"en:{english_label} ({english_description})<br>"
                         f"zh:{chinese_label} ({chinese_description})<br>"
                         f"pt:{portuguese_label} ({portuguese_description})"
        )
    else:
        wikidata_label = "No wikidata"
        english_label = ''
        chinese_label = ''
        portuguese_label = ''
        english_description = ''
        chinese_description = ''
        portuguese_description = ''

    name_original = row.get('name_original', '')
    comment = "<br>".join(wrap(name_original, width=40))
    hover_text = f"name: {name}<br>original: {comment}<br><br>wikidata ({wikidata_id})<br>{wikidata_label} <br>coordinates: {coordinates}"


    if level == 'tcheou-hien':
        pass
    if pd.notnull(lat) and pd.notnull(lon) and level in level_styles:
        style = level_styles[level]
        fig.add_trace(go.Scattermap(
            lon=[float(lon)],
            lat=[float(lat)],
            mode='markers+text',
            marker=dict(color=style['color'], size=style['size']),
            text=[name],
            textposition='top center',
            hovertext=[hover_text],
            hoverinfo='text',
            hoverlabel=dict(bgcolor='white', font_size=12),
            showlegend=False
        ))

# fig.update_layout(mapbox_style="open-street-map")
# fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.update_traces(textfont=dict(size=6))
fig.update_layout({
    "map": {"center": {"lat": 28.0,
                       "lon": 120.0},
            "zoom": 4,
            "style": "carto-positron",
            }
    },
    title= "Residences 1644",
    autosize=True,
    )

fig.show(config={"responsive": True})
fig.write_html("../inferences/residences_1644_map.html",
               config={"responsive": True},
               include_plotlyjs=True,
               auto_open=False)


Debugging: deh-r1644-taiwan -> deh-r1644-camarri
Debugging: Camarri with wikidata_id No wikidata
