## Plot maps with the Lingscape using CARTOFrames

With this script, you can plot maps from the Lingscape database using the CARToframes package by CARTO.
You can find it here: https://carto.com/developers/cartoframes/

Please make sure to use at least CARTOframes v1.0.0.

This notebook was used in the following study:

Purschke, Christoph (forthcoming): Using crowdsourced data to explore the linguistic landscape of cities. Results from the participatory research project Lingscape. In: Brunn, Stanley / Kehrein, Roland (eds.): *Handbook of the Changing World Language Map.* Heidelberg: Springer.

In [None]:
#Load the required packages
import matplotlib.pyplot as plt
from cartoframes.viz import Map, Layer, color_bins_style, color_category_style
from geopandas import GeoDataFrame, points_from_xy
import pandas as pd
%matplotlib inline

First, open and inspect the dataset. For every photo, it contains the following information (= columns):

- **user_id:** an anonymous unique identifier for contributors
- **created:** date and time the photo was uploaded to the server
- **date:** only year and month of the upload (for easy filtering)
- **lng, lat:** the geographic coordinates of the photo
- **id:** the reference id of the photo
- **country:** the country where the photo was uploaded
- **city:** the location where the photo was uploaded
- **languages:**: a list of languages visible in the photo; scheme: original name (english name)
- **iso_codes:** a list of ISO 639-2/3 language labels of the languages in the photo
- **lang_count:** the number of languages visible in the photo

In [None]:
# Get a CARTO table as a pandas DataFrame
df = pd.read_csv('pins_data_2019-04.csv', sep=',')
# Show colums and data types
df.dtypes

In [None]:
# Print first rows of the dataset.
df.head()

With the following step you can add an extra column **date_day** from the column **created** that contains date information including only years, months, and days.

In [None]:
# Define start stop and step variables 
start, stop, step = 0, -10, 1
  
# Convert date to string for slicing 
df["created"]= df["created"].astype(str) 
  
# Slice the sting till 10th last element 
df["date_day"]= df["created"].str.slice(start, stop, step) 

With the following cell, you can filter the dataset using a specified geobox, i.e., a set of geographic coordinates that define an area on the map. The three prefinde areas are the locations used in the paper, but you can easily add new ones. Simply go to the website http://boundingbox.klokantech.com, define a region, choose the format "DublinCore" and copy the values into a dictionary using the equivalents indicated.

Then, define the geobox you want to use for filtering by setting the variable **geobox** to the desired location.

#### NOTE: If you don't want to filter the data using one of the filters defined in the following cells, simply skip them without executing them.

In [None]:
# geobox equivalences: westlimit=lng_low; ; eastlimit=lng_high; southlimit=lat_low; northlimit=lat_high
geobox_luxembourg = {'westlimit': 6.06917, 'eastlimit': 6.203649, 
                     'southlimit': 49.560986, 'northlimit': 49.65487}
geobox_vienna = {'westlimit': 16.181831, 'eastlimit': 16.577513, 
                 'southlimit': 48.117907, 'northlimit': 48.322668}
geobox_vancouver = {'westlimit': -123.224961, 'eastlimit': -123.023242, 
                    'southlimit': 49.198445, 'northlimit': 49.316171}

# Set active geobox for filtering
geobox = geobox_luxembourg

# Define filter for latitude
lat_high = df.lat < geobox['northlimit']
lat_low = df.lat > geobox['southlimit']
geobox_lat = lat_low & lat_high
# Filter dataset by latitude
df = df[geobox_lat]

# Define filter for longitude
lng_high = df.lng < geobox['eastlimit']
lng_low = df.lng > geobox['westlimit']
geobox_lng = lng_low & lng_high
# Filter dataset by longitude
df = df[geobox_lng]

With the following cell, you can filter the dataset as for contributions by specific (anonymous) users. If you want to focus on other users, simply replace the Lingscaper names by the ones you want to use.

In [None]:
# Filter dataframe by specific users using the column "user_id"
user_1 = df.user_id == "Lingscaper_544"
user_2 = df.user_id == "Lingscaper_170"
user_3 = df.user_id == "Lingscaper_7"
user_4 = df.user_id == "Lingscaper_622"
userfilter = user_1 & user_2 & user_3 & user_4
# Filter the dataset as for the defined users
df = df[userfilter]

With the following cell, you can filter the dataset as for the column "lang_count", i.e., the number of languages tagged per sign.

In [None]:
# Define upper limit of languages per sign
countrange_high = df.lang_count < 5
# Define upper limit of languages per sign
countrange_low = df.lang_count > 0
countrange = countrange_low & countrange_high
# Filter the dataset as for the defined users
df = df[countrange]
# Sort dataset by number of lanuages in ascending order
df = df.sort_values(['lang_count'])

In the following cell, you can filter the dataset as for a specific language. Simply enter the ISO code of the desired language between the round brackets.

In [None]:
# Define language filter by entering ISO code
df = df[df['iso_codes'].str.contains("LTZ")]

In the following cell, you can filter the dataset as for a specific days as defined in the column "date_day" above. Simply enter the days to filter for in the list "dayfilter.

In [None]:
# Define extra filter for specific days
# (used for the uploads per day analysis in Vancouver)
dayfilter = ['2016-10-30', '2016-11-01', '2016-11-02', '2016-11-04', '2016-11-05']
df = df[df["date_day"].str.contains('|'.join(dayfilter))]

In order to plot the data, we need to add the CARTO-specific geometry to our dataset. We can do so by encoding the columns "lat"/"lng" to a geometry using *geopandas*.

In [None]:
# Encode lat/lng from dataset to CARTO geometry
gdf = GeoDataFrame(df, geometry=points_from_xy(df['lng'], df['lat']))

If you want to, you can print basic statistics for the filtered dataset, such as the number of uploads, the average of languages per sign, and the distribution of languages per sign in the filtered dataset.

In [None]:
# Define & print basic statistics for dataset.
total = df['lang_count'].count()
mean = df['lang_count'].mean()
numbers = df.groupby('lang_count').count()

print("Pins total: " + str(total))
print("Languages average: " + str(mean))

numbers["city"]

Now, let us plot am map, finally. You can choose from different predefined color schemes by un-commenting the one you want to use. Alternatively, you can also add a new one. 

Apart from colors, ou can change a couple of arguments for the function **Map( )** to design your map, i.e.:

Map:
- **Layer:** see below
- **source:** basemap used; available options: "Positron", "Voyager", "Darkmatter"
- **show_info:** display center and zoom information in the map
- **viewport:** specify the center of the map (lat, lng) and zoom level for plotting
- **size:** define the size (width, height) of the map to plot
- **is static:** plot static or interactive maps

Layer:
- **1st element:** dataset to be used for plotting the **dots**; specify as first element in the bracket
- **2nd element:** type of **mapping style** used for plotting; see below
- **default_Legend:** show **legend** for the plotted elements

Mapping style:
- **color_bins_style:** plot data using **numerical infos**, in our case the number of languages per sign
    - *1st element:* specify the data column to use, in our case "lang_count"
    - *method:* specify the bin method, in our case "equal"
    - *bins:* specify the number of bins to use; this should match the number of categories in the data
    - *size:* set the point size for the data
    - *palette:* specify the color palette; you can use one of the specified lists of colors or one or the predefined palettes from CARTO
    - *stroke_width:* set the stroke width for the points to plot
- **color_category_style:** plot data using **categorical infos**, in our case the different days of upload from Vancouver
    - *1st element:* specify the data column to use, in our case "date_day"
    - *top:* number of categories to plot (between 1-16); this should match the number of categories in the data and the number of colors in your palette
    - *size:* set the point size for the data
    - *palette:* specify the color palette; you can use one of the specified lists of colors or one or the predefined palettes from CARTO
    - *stroke_width:* set the stroke width for the points to plot

For further information about the available styling and mapping options, see the package reference: https://carto.com/developers/cartoframes/reference/

**NOTE:** The number of colors you enter has to match the number of "bins" defined in the function **Map( )**.

In [None]:
# Set the coordinates for the center of the map & zoom level

location = {'lat': 49.610856, 'lng': 6.129689, 'zoom': 14} # Luxembourg
#location = {'lat': 48.203207, 'lng': 16.355315, 'zoom': 12} # Vienna
#location = {'lat': 49.285815, 'lng': -123.116671, 'zoom': 12} # Vancouver
#location = {'lat': 48.206681, 'lng': 16.438382, 'zoom': 13} # Vienna (only for "user 1" in the analysis)

# Choose a color scheme for the dots on your map according to the type of map you want to plot

colors = ["#4CBDE5", "#59AE39", "#F0CB0E", "#E94E48"] # colors used to plot number of languages per sign
#colors = ["#4CBDE5", "#59AE39", "#F0CB0E", "#E94E48", "#A50021"] # colors to plot uploads/day for Vancouver
#colors = ["#59AE39"] # presence German (green) / Vienna orientation strategy (exhaustive)
#colors = ["#0D83B7"] # presence French (blue) / Vienna orientation strategy (area focus)
#colors = ["#A50021"] # presence English (red) / Vienna orientation strategy (strolling)
#colors = ["#E9561A"] # presence Luxembourgish (orange) / Vienna orientation strategy (street focus)
#colors = ["#F0CB0E"] # presence Italian (yellow)


# Call the function to plot the map

Map(
    Layer(gdf, 
          color_bins_style('lang_count', method='equal', bins=5, size=10, palette=colors, stroke_width=0.1),
          #color_category_style('date_day', top=5, size=10, palette=colors, stroke_width=0.1),
          default_legend=False),
    basemap='Positron',
    show_info=False,
    viewport=location,
    #size=(1600, 800)
    is_static=False
)