# Troubleshooting `to_geojson` method in ArcGIS Python API

The purpose of this notebook is to document a potential bug with the `to_geojson` method in the ArcGIS Python API. This sprang about when I was attempting to create a workflow to pull data from AGOL and add it to a table in PostGIS. I got some weird errors with the Greenway Trails dataset. When I'd go to insert the GeoJSON data into PostGIS, I'd get an error about 3D features. However, when examining the GeoJSON it appeared that there were no 3D features. Curious. After some more poking around, I discovered that LineString geometry types were being formatted as if they were MultiLineString. That's not necessarily bad, but the `type` property for the geometry remained `LineString`. When I manually changed all the `LineString` values to `MultiLineString`, I was able to insert the data properly into PostGIS.

This notebook is meant to generally examine the `to_geojson` method on some test, single-part geometry datasets and determine if there is a potential bug.

In [1]:
import json
import copy

from arcgis.gis import GIS
from arcgis import features
from arcgis import geometry
from IPython.display import display

import geojson
import folium
import pprint

## Helper Functions and Variables

In [2]:
def agol_feature_layer(gis, item_id, layer):
    """Get a feature layer by its item ID
    
    Parameters
    ----------
    gis : obj
        A GIS object from the ArcGIS Python API
    item_id : str
        ArcGIS Online item id
    layer : int
        Layer number in item
        
    Returns
    -------
    A single layer within a feature layer
    
    Requires
    --------
    arcgis
        ArcGIS Python API
    """
    item = gis.content.search(item_id, item_type = "Feature Layer")
    return item[layer]

In [3]:
def agol_layer_to_geojson(gis, item_id, layer, out_sr = 4326):
    """Return a layer on AGOL as GeoJSON
    
    Parameters
    ----------
    gis : obj
        A GIS object from the ArcGIS Python API
    item_id : str
        ArcGIS Online item id
    layer : int
        Layer number in item
    out_sr : int
        Output spatial reference (The default is 4326 which is the EPSG code for WGS84. According to the spec, this is the only coordinate reference system you should use with GeoJSON, but technically you can use others)
        
    Returns
    -------
    geojson.feature.FeatureCollection
        A GeoJSON representation of the layer from ArcGIS Online
        
    Requires
    --------
    arcgis
        ArcGIS Python API
    geojson
        python-geojson
    """
    item = gis.content.get(item_id)
    feature_layer = item.layers[layer]
    feature_set = feature_layer.query(out_sr = out_sr)
    return geojson.loads(feature_set.to_geojson)

In [4]:
def geojson_validity(data):
    if data.is_valid is False:
        return data.errors()
    else:
        return True

In [5]:
test_items = {
    'line': 'd2fc5983dbdd4aa38242d05354c2b853',
    'point': '161925e18a354cafabd1ee8768869adf',
    'polygon': 'aa1c7b4d7aa348edade07d24ec118794'
}

In [6]:
gis = GIS()

## Mapping data from AGOL

First I just want to show that these layers can be mapped directly from AGOL using the ArcGIS Python API.

In [7]:
agol_map = gis.map('Raleigh, NC', zoomlevel = 9)

for k, v in test_items.items():
    agol_map.add_layer(agol_feature_layer(gis, v, 0))

agol_map

MapView(layout=Layout(height='400px', width='100%'), zoom=9.0)

OK, so that works without issue. But when using the `to_geojson` method, things are a bit different. So we'll try something similar to above but use folium as the mapping library and the `to_geojson` method as a means of getting data to add to the map.

In [8]:
geojson_map1 = folium.Map([35.779591, -78.638176], zoom_start = 11)

for k, v in test_items.items():
    folium.GeoJson(agol_layer_to_geojson(gis, v, 0)).add_to(geojson_map1)

geojson_map1

So nothing shows up...Need to see if this is my error or something else. We'll look at each layer in isolation.

### Polygon Only

In [9]:
geojson_map_poly_only = folium.Map([35.779591, -78.638176], zoom_start = 11)
folium.GeoJson(agol_layer_to_geojson(gis, test_items['polygon'], 0)).add_to(geojson_map_poly_only)
geojson_map_poly_only

### Point Only

In [10]:
geojson_map_pt_only = folium.Map([35.779591, -78.638176], zoom_start = 11)
folium.GeoJson(agol_layer_to_geojson(gis, test_items['point'], 0)).add_to(geojson_map_pt_only)
geojson_map_pt_only

### Line Only

In [11]:
geojson_map_line_only = folium.Map([35.779591, -78.638176], zoom_start = 11)
folium.GeoJson(agol_layer_to_geojson(gis, test_items['line'], 0)).add_to(geojson_map_line_only)
geojson_map_line_only

So it looks like the line data is the culprit. Let's make sure it's not a folium thing by trying to add the polygon and point layers only to the same map.

In [12]:
geojson_map_pt_poly = folium.Map([35.779591, -78.638176], zoom_start = 11)

for k, v in test_items.items():
    if k != 'line':
        folium.GeoJson(agol_layer_to_geojson(gis, v, 0)).add_to(geojson_map_pt_poly)

geojson_map_pt_poly

Indeed, it looks like there is an issue with the line data. The single-part points and polygons are showing up just fine.

# Examining the Line data

So what ways can we make the line GeoJSON valid? Let's take a closer look. We can use `geojson` to help test for validity of the Line GeoJSON and hopefully repair so that it is valid according to the GeoJSON spec.

In [13]:
geojson_line = agol_layer_to_geojson(gis, test_items['line'], 0)
pprint.pprint(geojson_line) # Does not produce valid GeoJSON because it uses single-quotes. Good for visual scanning though. Use geojson.dumps() to return GeoJSON.

{'features': [{'geometry': {'coordinates': [[[-78.8271882474162,
                                              35.8700004012947],
                                             [-78.7907960355022,
                                              35.8444008470209],
                                             [-78.7509705960491,
                                              35.8260308576235],
                                             [-78.7358643948772,
                                              35.8148954608556],
                                             [-78.7029054105023,
                                              35.8032016132432],
                                             [-78.6891725003461,
                                              35.8037585021739],
                                             [-78.6548402249555,
                                              35.8093271767413],
                                             [-78.6280610501508, 35.8199065827],
         

Is this valid GeoJSON?

In [14]:
pprint.pprint(geojson_validity(geojson_line))

['the "coordinates" member must be an array of two or more positions',
 'the "coordinates" member must be an array of two or more positions']


It appears that there is an issue with the `"coordinates"` properties for each feature. It is expecting an array of two or more positions. However, each feature appears to have an array with another single array nested inside of it. Inside that array are several coordinate arrays.

According to the GeoJSON spec on [LineString](https://tools.ietf.org/html/rfc7946#section-3.1.4) geometries:
> For type "LineString", the "coordinates" member is an array of two or more positions.
   
It appears whatever we have here is not a valid array for a LineString geometry. What does the spec say about [MultiLineString](https://tools.ietf.org/html/rfc7946#section-3.1.5) geometries?

> For type "MultiLineString", the "coordinates" member is an array of LineString coordinate arrays.

Looking at the format of the line geometries, they appear to be formatted as MultiLineString geometries, but the `"type"` value is set to `"LineString"`. There appear to be a couple of options here. We could either change the `"type"` value to `"MultiLineString"` or make the `"geometry"` value a valid `LineString`. Let's explore both options.

#### Convert `"type"` to `"MultiLineString"`

In [15]:
geojson_multiline = copy.deepcopy(geojson_line)
for f in geojson_multiline['features']:
#     f['type'] = 'Feature'
    f['geometry']['type'] = 'MultiLineString'
    
pprint.pprint(geojson_multiline)

{'features': [{'geometry': {'coordinates': [[[-78.8271882474162,
                                              35.8700004012947],
                                             [-78.7907960355022,
                                              35.8444008470209],
                                             [-78.7509705960491,
                                              35.8260308576235],
                                             [-78.7358643948772,
                                              35.8148954608556],
                                             [-78.7029054105023,
                                              35.8032016132432],
                                             [-78.6891725003461,
                                              35.8037585021739],
                                             [-78.6548402249555,
                                              35.8093271767413],
                                             [-78.6280610501508, 35.8199065827],
         

In [16]:
geojson_map_multiline = folium.Map([35.779591, -78.638176], zoom_start = 11)
folium.GeoJson(geojson_multiline).add_to(geojson_map_multiline)
geojson_map_multiline

Great! The line data are on the map. It'd be nice to make sure this is valid GeoJSON.

In [17]:
pprint.pprint(geojson_validity(geojson_multiline))

['the "coordinates" member must be an array of two or more positions',
 'the "coordinates" member must be an array of two or more positions']


It appears this is not technically valid GeoJSON. I believe this is a result of a very strict reading of the GeoJSON spec for MultiLineString geometries. Since each feature only contains a single coordinate array, there are not multiple parts to these features' geometries. Interestingly enough, I ran the GeoJSON of `geojson_multiline` through [GeoJSONLint](https://geojsonlint.com) and it considered the data valid.

Can we get this data valid accoridng to this strict definition? Perhaps if we change the `"type"` value of the geometries to `"LineString"`, we will.

#### Repair formatting of LineString geometries

In [18]:
geojson_line_repair = copy.deepcopy(geojson_line)
for f in geojson_line_repair['features']:
    f['geometry']['coordinates'] = f['geometry']['coordinates'][0]

pprint.pprint(geojson_line_repair)

{'features': [{'geometry': {'coordinates': [[-78.8271882474162,
                                             35.8700004012947],
                                            [-78.7907960355022,
                                             35.8444008470209],
                                            [-78.7509705960491,
                                             35.8260308576235],
                                            [-78.7358643948772,
                                             35.8148954608556],
                                            [-78.7029054105023,
                                             35.8032016132432],
                                            [-78.6891725003461,
                                             35.8037585021739],
                                            [-78.6548402249555,
                                             35.8093271767413],
                                            [-78.6280610501508, 35.8199065827],
                        

In [19]:
geojson_map_line_repair = folium.Map([35.779591, -78.638176], zoom_start = 11)
folium.GeoJson(geojson_line_repair).add_to(geojson_map_line_repair)
geojson_map_line_repair

OK, that looks good. Now let's see if the data are valid.

In [20]:
pprint.pprint(geojson_validity(geojson_line_repair))

True


It appears that they are. Hooray! Let's dump the output to copy and paste into GeoJSONLint:

In [21]:
print(geojson.dumps(geojson_line_repair))

{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-78.8271882474162, 35.8700004012947], [-78.7907960355022, 35.8444008470209], [-78.7509705960491, 35.8260308576235], [-78.7358643948772, 35.8148954608556], [-78.7029054105023, 35.8032016132432], [-78.6891725003461, 35.8037585021739], [-78.6548402249555, 35.8093271767413], [-78.6280610501508, 35.8199065827], [-78.616388076518, 35.826587586458]]}, "properties": {"OBJECTID": 2, "id": 1, "Shape__Length": 0.232859019406969, "style": {}, "highlight": {}}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-78.6761262356976, 35.9028225091032], [-78.7317445218304, 35.864992443334], [-78.6919190823773, 35.8505232315841]]}, "properties": {"OBJECTID": 3, "id": null, "Shape__Length": 0.109636900894782, "style": {}, "highlight": {}}}]}


This should be evaluated as valid!

It appears there may be a bug in the Python API that is formatting LineString geometries as MultiLineString geometries when using the `.to_geojson` method. 