# Explore the plots, when settling a city, measurements

The database contains all the plots in the first two rings from the city centre, that is:
- city centre is in ring 0 and contains 1 plot,
- adjacent to city tiles are ring 1 and contains 6 plots,
- those 2 tiles away are ring 2 and contains 12 plots.

That means we capture the 19 nearest tiles' plot information.

Data captured per tile includes:
- owner of plot at the time the city was settled. (Most civ's start with 7/19 tiles)
- recordedCityId is an identifier of all the tiles that "could" belong to a city when settled
- terrain information
- features information
- resource, resourceCount, and resourceType information
- workers on plot count
- district on plot - this is unlikely to be anything other than 'City Centre'
- hasRiver - does the plot have an adjacent river
- isWater - is the plot water
- isLake - is the plot part of a lake
- isCity - only true once per city...

The purpose of this workbook is to investigate how to present this information as input to the ML model.

I purposely did not capture the "yields" to make the Machine Learning Classification challenge more realistic. Also, there is no relation between any of the database Id values and the in-game Ids

In [1]:
import sqlite3
import matplotlib.pyplot as plt
import pandas as pd
#pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 200)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 700)
pd.set_option('display.max_colwidth', 100)

In [2]:
cnx = sqlite3.connect('Database/Civ6CitySettledData.db')
cur = cnx.cursor()
#print(cnx)
#print(cur)

## Retrieve the city plots as settled data
cityPlotsSettled contains the city plot, adjacent to city plots, and 2 tiles from city plots information at the time the city was settled. The idea is to look at this information and "predict" the yield performance of the city.

The primary identity for this database is the cityId. That is, this database (and experiment) is designed to look at city growth over time. The same cityName, leaderName etc. can occur in multiple games so when we add the recorded information for the game we create new cityIds.

In [3]:
# After experimentation I decided to use all the plots within 2 tiles from the city centre.
sqlSelect = 'SELECT * FROM cityPlotsSettled'

# Other options I experimented with are:
#sqlSelect = 'SELECT * FROM cityPlotsSettled WHERE ownerCityId <> "None"'
#sqlSelect = 'SELECT * FROM cityPlotsSettled WHERE ring == 0'
#sqlSelect = 'SELECT * FROM cityPlotsSettled WHERE ring == 1'
#sqlSelect = 'SELECT * FROM cityPlotsSettled WHERE ring == 2'

cityPs = pd.read_sql_query(sqlSelect, cnx)
print(cityPs.shape)
print(cityPs.dtypes)

(9538, 15)
plotId              int64
ownerCityId       float64
recordedCityId      int64
ring                int64
terrain            object
feature            object
resource           object
resourceCount       int64
resourceType       object
workers             int64
district           object
hasRiver            int64
isWater             int64
isLake              int64
isCity              int64
dtype: object


## Plot terrain counts

In [4]:
df = cityPs[['plotId', 'terrain']].groupby('terrain').count().sort_values(by='plotId', ascending=False)
df.rename(columns={'plotId':'Count'}, inplace=True)
print(df)

                      Count
terrain                    
Grassland              2476
Plains                 2409
Plains (Hills)          996
Coast and Lake          992
Grassland (Hills)       971
Desert                  537
Grassland (Mountain)    286
Desert (Hills)          216
Plains (Mountain)       214
Tundra                  197
Ocean                   100
Desert (Mountain)        69
Tundra (Hills)           58
Tundra (Mountain)        16
Snow                      1


## Plot features counts
City doesn't have features as they are "removed" when settled

In [5]:
df = cityPs[['plotId', 'feature']].groupby('feature').count().sort_values(by='plotId', ascending=False)
df.rename(columns={'plotId':'Count'}, inplace=True)
print(df)

             Count
feature           
None          6963
Woods         1294
Rainforest     818
Marsh          226
Floodplains    164
Reef            60
Oasis           13


## Plot resource counts

In [6]:
df = cityPs[['plotId', 'resource']].groupby('resource').count().sort_values(by='plotId', ascending=False)
#print(len(cityPs[['plotId', 'resource']].groupby('resource').count().sort_values(by='plotId', ascending=False)))
df.rename(columns={'plotId':'Count'}, inplace=True)
print(df)

          Count
resource       
None       7478
Stone       274
Wheat       221
Cattle      121
Rice        113
Bananas     107
Sheep        99
Horses       86
Fish         70
Uranium      62
Iron         55
Aluminum     54
Coal         54
Copper       52
Deer         51
Niter        46
Gypsum       34
Crabs        34
Jade         33
Amber        32
Oil          28
Sugar        27
Spices       25
Tobacco      24
Turtles      23
Wine         23
Ivory        23
Tea          22
Coffee       21
Mercury      20
Furs         20
Salt         20
Cotton       20
Marble       18
Silk         18
Truffles     15
Citrus       15
Olives       14
Diamonds     14
Incense      13
Cocoa        13
Dyes         13
Silver       12
Whales       12
Pearls        9


In [7]:
#print(cityPs[['plotId', 'resourceType']].groupby('resourceType').count().sort_values(by='plotId', ascending=False))

In [8]:
#print(cityPs[['plotId', 'district']].groupby('district').count().sort_values(by='plotId', ascending=False))

In [9]:
df = cityPs[['plotId', 'hasRiver']].groupby('hasRiver').count().sort_values(by='plotId', ascending=False)
df.rename(columns={'plotId':'Count'}, inplace=True)
print(df)

          Count
hasRiver       
0          5875
1          3663


In [10]:
#print(cityPs[['plotId', 'resourceCount']].groupby('resourceCount').count().sort_values(by='plotId', ascending=False))

In [11]:
#cityPs[cityPs['resourceType'] == 'Strategic'].resource.unique()
#cityPs[cityPs['resourceType'] == 'Luxury'].resource.unique()
#cityPs[cityPs['resourceType'] == 'Bonus'].resource.unique()

In [12]:
#cols = list(cityPs['category'].unique())
#print(type(cols))
#print(len(cols))
#print(cols)

In [13]:
cur.close()
cnx.close()