# Setup

### Run Previous Script

Enable when debugging; elsewhere, just run it after running the master script.

In [1]:
#%run ./1_Master_Script.ipynb
#%run ./2_Cleaning_Layers.ipynb
#%run ./3.1_City_Scraping.ipynb

Master Script has been run successfully!
Cleaning Layers Script has been run successfully!
Chosen Study Area: Stolen Lands
City info not found. Regenerating new copy. Error printed below:
ERROR MESSAGE: [Errno 2] No such file or directory: 'D:/0Coding Projects/GitHub/My Repositories/Pathfinder Mapping/Data/Cleaned_Data/city_info.txt'
Regenerate Cities: True
City Scraping Script has been run successfully!


### Pathfinder Wiki Cities

NOTE: this is called in the City Scraping script.

In [2]:
cities

Unnamed: 0,Name,link,capital,size,text,articleLength,geometry
0,Aaminiut,https://pathfinderwiki.com/wiki/Aaminiut,False,2,<p><b>Aaminiut</b> is the largest town in the ...,700,POINT (-43.3315 67.15341)
1,Aaramor,https://pathfinderwiki.com/wiki/Aaramor,False,2,<p>The city-fortress of <b>Aaramor</b> is loca...,1400,POINT (-5.35496 51.01807)
2,Abberton,https://pathfinderwiki.com/wiki/Abberton,False,3,"<p><b>Abberton</b> is a small, declining town ...",600,POINT (-0.55697 32.20194)
3,Abken,https://pathfinderwiki.com/wiki/Abken,False,2,"<p>One of the newest settlements in <a href=""h...",1600,POINT (-20.54576 44.95886)
4,Absalom,https://pathfinderwiki.com/wiki/Absalom,True,0,"<p>For more than 4,000 years, <b>Absalom</b> (...",14500,POINT (-0.23431 30.88863)
...,...,...,...,...,...,...,...
846,Zimar,https://pathfinderwiki.com/wiki/Zimar,False,1,"<p>One of the main defensive <a href=""https://...",2300,POINT (6.7078 31.58416)
847,Ziplatna,https://pathfinderwiki.com/wiki/Ziplatna,False,1,<p><b>Ziplatna</b> is the northernmost of the ...,300,POINT (-99.19609 14.74814)
848,Zlatomesto,https://pathfinderwiki.com/wiki/Zlatomesto,False,2,<p><b>Zlatomesto</b> is a small town in <a hre...,900,POINT (-24.7329 51.73437)
849,Zom Kullan,https://pathfinderwiki.com/wiki/Zom_Kullan,True,0,"<p><b>Zom Kullan</b>, the capital city of <a h...",400,POINT (153.268 2.37258)


In [3]:
cities_info.keys()

dict_keys(['data', 'timestamp'])

# Cleaning

### Change CRS

In [4]:
crs_cities = cities.copy()

crs_cities = crs_cities.to_crs(chosen_crs)

### Filtering

#### Study Area

Let's filter by the studyarea we've laid out elsewhere.

In [5]:
cities_filter = crs_cities.copy()

cities_filter = gpd.sjoin(cities_filter, studyarea)
cities_filter = cities_filter.drop(columns = ['index_right'])
cities_filter = cities_filter.reset_index(drop = True)

# Diagnostic
cities_before = len(crs_cities)
cities_after = len(cities_filter)

print(f"RESULT: From {cities_before} cities, {cities_before - cities_after} were removed, leaving just {cities_after}.")

RESULT: From 851 cities, 835 were removed, leaving just 16.


#### Remove Narland Cities

Narland is another name for The Stolen Lands. This city data comes *after*, chronologically, the events from the campaign I am running, so there are cities in The Stolen Lands that should not exist yet. This section removes those cities.

In [6]:
cities_narland = cities_filter.copy()

narland_list = cities_narland.copy()
stolen = study_dict['stolen_lands'].copy()
stolen = stolen.loc[stolen['province'] == 'Narland']
narland_list = gpd.clip(narland_list,stolen)
narland_list = narland_list.loc[narland_list['Name'] != 'Restov'] # On the wrong side of the border!
narland_list = narland_list.loc[narland_list['Name'] != "Nivakta's Crossing"] # On the wrong side of the border!
narland_list = narland_list['Name'].to_list()

cities_narland = cities_narland.loc[~cities_narland['Name'].isin(narland_list)]

# Diagnostic
cities_before = len(cities_filter)
cities_after = len(cities_narland)

print(f"RESULT: From {cities_before} cities, {cities_before - cities_after} were removed, leaving just {cities_after}.")

RESULT: From 16 cities, 2 were removed, leaving just 14.


#### Filter City Info

Now let's take the remaining cities, and filter city info by them.

In [7]:
city_info_filtered = {
    'data': dict(),
    'timestamp': cities_info['timestamp']
}

city_keep = cities_narland['Name'].to_list()

for city in city_keep:

    try:
        city_info_filtered['data'][city] = cities_info['data'][city]
    except Exception as e:
        #print(f"ERROR with {city}: {e}")
        city_info_filtered['data'][city] = 'ERROR'

#city_info_filtered

### Enrichment

Now let's use the cities we have left, and enrich them with the data scraped from the wiki.

#### Features

Let's figure out how many features there are. Some pages vary in what features they have, so let's loop through the keys to find all unique keys.

In [8]:
cities_info_keys = list()

for city in city_keep:

    try:
        keys = city_info_filtered['data'][city].keys()
    except:
        continue
    
    keys = list(keys)

    # Combine without duplicates
    cities_info_keys = list(set(cities_info_keys + keys))

cities_info_keys

['Nation',
 'Region',
 'Land',
 'Size',
 'Leader',
 'Government',
 'Capital',
 'Titles',
 'Adjective',
 'Languages',
 'Demonym',
 'Alignment',
 'Level',
 'Religions',
 'Ruler',
 'Demographics',
 'Population']

#### Extraction

Now that we have all the possible keys, we can start creating lists to then add to the cities dataframe.

In [9]:
cities_enrich = cities_narland.copy()

for feature in cities_info_keys:

    feature_list = list()

    for city in city_keep:

        try:
            city_feature = city_info_filtered['data'][city][feature]
        except:
            city_feature = pd.NA

        feature_list.append(city_feature)

    cities_enrich[feature] = feature_list

#### Clean Numbers

Some columns are numbered, but were interpreted as strings due to them containing some character, like the cross notation symbol. Let's remove them.

This function removes a defined list of symbols. The flaw is that the list must manually expand, but that comes with the benefit of added control over the cleaning process.

In [10]:
def remove_symbols(value):

    symbols = ['✝'] # Add more in a string
    # do NOT add ',', as it's cleaned elsewhere

    try:
        value = str(value)
    except Exception as e:
        #print(e)
        return value # Comes into play if, for example, value is np.NaN
        
    for symbol in symbols:

        result = re.sub(symbol,'', value)

    return result

Now, let's remove commas with blank spaces, and see if that solves the issue.

In [11]:
def remove_commas(value):

    try:
        
        result = int(value.replace(',', ''))

    except Exception as e:

        #print(e)

        result = value

    return result

In [12]:
cleaning_list = [ # Names of columns to clean
    'Population'
]

method_list = [
    remove_symbols,
    remove_commas
]

for column in cleaning_list:

    new_col = cities_enrich[column]
    
    # Apply layers of cleaning
    for method in method_list:

        new_col = new_col.apply(method)

    # Create diagnostic
    city_diagnostic = cities_enrich[['Name','Population']].copy()
    city_diagnostic['TEST'] = new_col

    # Overwrite
    cities_enrich[column] = new_col

# TESTING
city_diagnostic

Unnamed: 0,Name,Population,TEST
0,Avendale,11280,11280.0
1,Brunderton,1120,1120.0
3,Jovvox,1450,1450.0
4,Littletown,297,297.0
5,Mivon,,
6,Mormouth,7401,7401.0
7,New Stetven,32850,32850.0
8,Nivakta's Crossing,140,140.0
9,Pitax,"5,881✝",5881.0
10,Restov,18670,18670.0


### Size 0 Dummy Variable

I want to know which cities, regardless of capital status, are the largest size.

In [13]:
size0_cities = cities_enrich.copy()

size0_cities['size0'] = np.where(size0_cities['size'] == 0, True, False)

# New Layers

### City Alignment (IDEA)

The idea behind this is that, in general, 'good' aligned cities should be safer than neutral, and netural safer than evil, for a good-aligned party. And so the countryside around the city should repreent that.

How to actually implement this, though, is uncertain. How should overlaps be handled? Should it be done through buffering, or some kind of weighted interpolation (the weight being for size of city)?

# Final

In [14]:
final_cities = size0_cities.copy()

# Run Message

This is to show key info when this script is run in another script.

In [15]:
print("City Cleaning Script has been run successfully!")

City Cleaning Script has been run successfully!
