## Plot maps for Lingscape data using CartoFrames

With this script, you can plot maps from the Lingscape database using the CartoFrames package
Find the reference here: https://cartoframes.readthedocs.io/en/stable/index.html

This script was used in the following study:

Purschke, Christoph (forthcoming): Crowdscapes. Using user-generated content to (re)construct linguistic landscapes of cities. Vinguistic Vanguard.

In [None]:
#Load the required packages
import matplotlib.pyplot as plt
import cartoframes
from cartoframes import Credentials, Layer, styling, BaseMap
import pandas as pd
%matplotlib inline

Here you need to specify your crededentials to access the CARTO server. 
If you don't have an account already, visit https://https://carto.com and get one.

You will need the user name of your account and your personal API key that you will find in your profile settings.

In [None]:
# Define user credentials as CartoContext (cc)
USERNAME = ''
APIKEY = ''
creds = Credentials(username=USERNAME, key=APIKEY)
cc = cartoframes.CartoContext(creds=creds)

First, open and inspect the dataset. For every photo, it contains the following information (= columns):

- **user_id:** an anonymous unique identifier for contributors
- **user_group:** clustering of user participation in groups
- **created:** date and time the photo was uploaded to the server
- **date:** only year and month of the upload (for easy filtering)
- **lng, lat:** the geographic coordinates of the photo
- **id:** the reference id of the photo
- **country:** the country where the photo was uploaded
- **city:** the location where the photo was uploaded
- **languages:**: a list of languages visible in the photo; scheme: original name (english name)
- **iso_codes:** a list of ISO 639-2/3 language labels of the languages in the photo
- **lang_count:** the number of languages visible in the photo

In [None]:
# Get a CARTO table as a pandas DataFrame
df = pd.read_csv('pins_data_2019-04.csv', sep=',')
# Show colums and data types
df.dtypes

In [None]:
# Print first rows of the dataset.
df.head()

With the following step you can add an extra column **date_day** from the column **created** that contains date information including only years, months, and days.

In [None]:
# Define start stop and step variables 
start, stop, step = 0, -10, 1
  
# Convert date to string for slicing 
df["created"]= df["created"].astype(str) 
  
# Slice the sting till 10th last element 
df["date_day"]= df["created"].str.slice(start, stop, step) 

With the following cell, you can filter the dataset using a specified geobox, i.e., a set of geographic coordinates that define an area on the map. The three prefinde areas are the locations used in the paper, but you can easily add new ones. Simply go to the website http://boundingbox.klokantech.com, define a region, choose the format "DublinCore" and copy the values into a dictionary using the equivalents indicated.

Then, define the geobox you want to use for filtering by setting the variable **geobox** to the desired location.

**NOTE:** If you don't want to filter the data using one of the filters defined in the following cells, simply skip them without executing them.

In [None]:
# geobox equivalences: westlimit=lng_low; ; eastlimit=lng_high; southlimit=lat_low; northlimit=lat_high
geobox_vienna = {'westlimit': 16.181831, 'eastlimit': 16.577513, 'southlimit': 48.117907, 'northlimit': 48.322668}

# Set active geobox for filtering
geobox = geobox_vienna

# Define filter for latitude
lat_high = df.lat < geobox['northlimit']
lat_low = df.lat > geobox['southlimit']
geobox_lat = lat_low & lat_high
# Filter dataset by latitude
df = df[geobox_lat]

# Define filter for longitude
lng_high = df.lng < geobox['eastlimit']
lng_low = df.lng > geobox['westlimit']
geobox_lng = lng_low & lng_high
# Filter dataset by longitude
df = df[geobox_lng]

With the following cell, you can filter the dataset as for the column "lang_count", i.e., the number of languages tagged per sign.

In [None]:
# Define upper limit of languages per sign
countrange_high = df.lang_count < 5
# Define upper limit of languages per sign
countrange_low = df.lang_count > 0
countrange = countrange_low & countrange_high
# Filter the dataset as for the defined users
df = df[countrange]
# Sort dataset by number of lanuages in ascending order
df = df.sort_values(['lang_count'])

With the following cell, you can filter the dataset as for contributions by a specific group of users. If you want to focus on other users, simply replace the group variable name by the one you want to use.

In [None]:
# Filter dataframe by specific users using the column "user_id"
powers = df.user_group == 1
regulars = df.user_group == 2
casuals = df.user_group == 0

# Filter the dataset as for the defined user groups
df = df[powers]

In the following cell, you can filter the dataset as for a specific language. Simply enter the ISO code of the desired language between the round brackets.

In [None]:
# Define language filter by entering ISO code
df = df[df['iso_codes'].str.contains("LTZ")]

After filtering is done, you need to upload the dataset to the CARTO server so that the CARTO-specific column "CARTO geometry" can be added to the data. Then, the table is downloaded again. This step is required for mapping. 

In [None]:
# Write  the table to the CARTO server under a new name so it can be mapped
cc.write(df, 'lingscape_data_filtered', lnglat=('lng', 'lat'),
         overwrite=True)

# Reload table from server
df = cc.read('lingscape_data_filtered')

If you want to, you can print basic statistics for the filtered dataset, such as the number of uploads, the average of languages per sign, and the distribution of languages per sign in the filtered dataset.

In [None]:
# Define & print basic statistics for dataset.
total = df['lang_count'].count()
users = df.groupby('user_id').count()
mean = df['lang_count'].mean()
numbers = df.groupby('lang_count').count()

print("Pins total: " + str(total))
print("Users total: " + str(len(users)))
print("Languages per sign average: " + str(mean))

print("Distribution of x-lingual signs")
numbers["city"]

Now, let us plot am map, finally. You can choose from different predefined color schemes by un-commenting the one you want to use. Alternatively, you can also add a new one. 

Apart from colors, ou can change a couple of arguments for the function **cc.map()** to design your map, i.e.:

Basemap:
- **source:** basemap used; available options: "light", "dark", "voyager"
- **labels:** position of labels; options: "back", "front", "None"

Layer:
- dataset to be used for plotting the **dots**; specify as first element in the bracked
- **size**: size of the dots; you can set a single size for all points of define a variable size based on the value of a specific column
- **color:** color scheme for plotting the dots; you can define the column to use for styling the dots ("column") and the color **scheme** to be used; within the color scheme, you can specify the number of **bins** to use, i.e., the number of different values to plot, and the **bin_method** to be used; in our case by "category"
- **lat, lng:** the coordinates used as center of the map section that you will plot
- **zoom:** the zoom level to be used on the basemap, based on the defined location (lat, lng)
- **interactive:** plot an interactive or static map; available options: True (for an interactive map), False (for a static map)
- **size:** define image size in pixels

For further information about the available color schemes and bin_optons, see the package reference: https://cartoframes.readthedocs.io/en/stable/styling.html

**NOTE:** The number of colors you enter has to match the number of "bins" defined in the function **cc.map**.

In [None]:
# Choose a color scheme for the dots on your map according to the type of map you want to plot
#colors = ["#4CBDE5", "#59AE39", "#F0CB0E", "#E94E48"] # colors used to plot number of languages per sign
colors = ["#59AE39"] # contributions of power users (green)
#colors = ["#A50021"] # contributions of regular users (red)
#colors = ["#0D83B7"] # contributions of casual users (blue)

# Call the function to plot the map
cc.map(layers=[BaseMap(source='light', labels='back'), 
                Layer('lingscape_data_filtered',
                #size={'column': 'lang_count', 'max': 16, 'min': 13}, # set variable size by value
                size=18, # set one size for all points
                color={'column': 'user_group', 'scheme': styling.custom(colors, bins= 1, bin_method= 'category')})],
                #lat=48.203207, lng=16.355315, # location: Vienna
                lat=48.206681, lng=16.438382, # location: Vienna / user 1
                zoom=13,
                interactive=False, 
                size=(1600, 1200))

# Define file name for saving the map
map_name = "vienna_power users.png"
# Save to file
plt.savefig(map_name, dpi=300, format='png') 