# Geog 573 (675) Advanced Geocomputing and Geospatial Big Data Analytics
# by Prof. Song Gao (song.gao@wisc.edu)
# Lab1-GeoJSON

## Geospatial Data Formats Review


## JSON

JavaScript Object Notation (JSON): 
It is an open-standard file format that uses human-readable text to transmit data objects consisting of attribute/key–value pairs and array/list data types (or any other serializable types). 

## JSON v.s. CSV

### 1. JSON is "self describing" (human readable) and better at showing hierarchical / relational data
e.g., A list of points of interest categories  
Yelp Places: https://www.yelp.com/developers/documentation/v3/all_category_list  

<br>
Twitter Streaming Data API: Filtering Tweets by location
https://developer.twitter.com/en/docs/tutorials/filtering-tweets-by-location#usage_examples

### 2. CSV will lose data (Q: Why?)

Venue Reviews (e.g., Yelp, TripAdvisor)
Example: https://gist.githubusercontent.com/shiondev/9569051/raw/0aeff13e78cecd190f7c3877b8e7e30bf9a75e93/sample_datafiniti_json_with_reviews

### 3. The standard CSV reader application (Excel) has limitations on the maximum records
Excel is great for loading small, highly-structured spreadsheet files. But it’s terrible at loading files that may have 10,000 rows, 100+ columns, with some of these columns populated by unstructured text like addresses, reviews or descriptions. 
This results in some fields spilling over into adjacent columns, which makes the data unreadable.

In [1]:
# Example
attributes="place,address,year"
data=[]
placename = "Science Hall"
address = "N Park St., Madison, WI 53593"
year = 2018
alist= placename+','+address+','+str(year)
data.append(alist)

placename = "Science Hall"
address = "N Park St. Madison WI 53593"
year = 2019
alist= placename+','+address+','+str(year)
data.append(alist)

for record in data:
    recordList = record.split(',')
    print (recordList)

['Science Hall', 'N Park St.', ' Madison', ' WI 53593', '2018']
['Science Hall', 'N Park St. Madison WI 53593', '2019']


### 4. JSON is easier to work with at scale
Most modern Web APIs natively support JSON input and output. Several database technologies (including most NoSQL variations) support it. It’s significantly easier to work with within most programming languages as well.

## JSON v.s. XML
XML stands for eXtensible Markup Language. XML is a markup language much like HTML. XML was designed to store and transport data. 


JSON Example
{"employees":[
  { "firstName":"John", "lastName":"Doe" },
  { "firstName":"Anna", "lastName":"Smith" },
  { "firstName":"Peter", "lastName":"Jones" }
]}

XML Example
<employees>
  <employee>
    <firstName>John</firstName> <lastName>Doe</lastName>
  </employee>
  <employee>
    <firstName>Anna</firstName> <lastName>Smith</lastName>
  </employee>
  <employee>
    <firstName>Peter</firstName> <lastName>Jones</lastName>
  </employee>
</employees>

### JSON is Like XML Because
Both JSON and XML are "self describing" (human readable) <br>
Both JSON and XML are hierarchical (values within values)  <br>
Both JSON and XML can be parsed and used by lots of programming languages  <br>
Both JSON and XML can be fetched with an XMLHttpRequest  <br>

### JSON is Unlike XML Because
JSON doesn't use end tag <br>
JSON is shorter <br>
JSON is quicker to read and write <br>
JSON can use arrays <br>
JSON are easiler to parse.



### Example
USGS National Land Cover Database (NLCD): https://www.sciencebase.gov/catalog/item/5825a0ebe4b01fad86db699a
<br>XML: https://www.sciencebase.gov/catalog/item/5825a0ebe4b01fad86db699a?format=isohtml
<br>JSON: https://www.sciencebase.gov/catalog/item/5825a0ebe4b01fad86db699a?format=json

## GeoJSON

GeoJSON is an open standard geospatial data interchange format that represents simple geographic features and their nonspatial attributes. Based on JavaScript Object Notation (JSON), GeoJSON is a format for encoding a variety of geographic data structures. It uses a geographic coordinate reference system, World Geodetic System 1984, and units of decimal degrees.

In [None]:
{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "name": "Dinagat Islands"
  }
}

## Support Geometry Types
Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon. Geometric objects with additional properties are Feature objects. Sets of features are contained by FeatureCollection objects.

### Geometry primitives:
### Points


In [15]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c2/SFA_Point.svg/102px-SFA_Point.svg.png", width=100, height=100)

In [None]:
{
    "type": "Point", 
    "coordinates": [30, 10]
}

### LineString

In [None]:
{
    "type": "LineString", 
    "coordinates": [
        [30, 10], [10, 30], [40, 40]
    ]
}

In [20]:
Image(url= "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b9/SFA_LineString.svg/102px-SFA_LineString.svg.png", width=100, height=100)

### Polygon

In [None]:
{
    "type": "Polygon", 
    "coordinates": [
        [[30, 10], [40, 40], [20, 40], [10, 20], [30, 10]]
    ]
}

In [23]:
Image(url= "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/SFA_Polygon.svg/1280px-SFA_Polygon.svg.png", width=100, height=100)

### Polygon with hole

In [None]:
{
    "type": "Polygon", 
    "coordinates": [
        [[35, 10], [45, 45], [15, 40], [10, 20], [35, 10]], 
        [[20, 30], [35, 35], [30, 20], [20, 30]]
    ]
}

In [24]:
Image(url= "https://upload.wikimedia.org/wikipedia/commons/thumb/5/55/SFA_Polygon_with_hole.svg/102px-SFA_Polygon_with_hole.svg.png",width=100, height=100) 

## Processing JSON with Python

src="https://docs.python.org/2/library/json.html#module-json"

## Converting Dictionary to JSON in Python
json.dumps() converts a dictionary to str object, not a json(dict) object! so you have to load your str into a dict to use it by using json.loads() method
<br>
See json.dumps() as a save method and json.loads() as a retrieve method.

In [7]:
import json

r = {'placetype': 'coffee shop', 'name': 'starbucks', 'rating': 4.5}
r = json.dumps(r)
loaded_r = json.loads(r)
print (loaded_r['rating'])
print (r)
print(type(r)) 
print(type(loaded_r)) 

4.5
{"placetype": "coffee shop", "rating": 4.5, "name": "starbucks"}
<type 'str'>
<type 'dict'>


## Conversion from CSV to GeoJSON 

In [2]:
import pandas as pd
city_df = pd.read_csv("CityPop.csv") # try header=None
print(city_df.head(5)) #print the first couple of rows

ImportError: No module named pandas

The read_csv() function has an argument called skiprows that allows you to specify the number of lines to skip at the start of the file. if you want to skip the first line, so let's try importing your CSV file with skiprows set equal to 1:

In [11]:
city_df = pd.read_csv("CityPop.csv", skiprows = 10)
print(city_df.head(5)) #print the first couple of rows

   10  24.9267101  67.0343704         Karachi       Karachi.1     0  3.99  \
0  11  -34.608509  -58.373489    Buenos_Aires    Buenos Aires  8.10  8.74   
1  12   34.053490 -118.245323     Los_Angeles     Los Angeles  8.38  8.93   
2  13   39.906570  116.387650         Beijing         Beijing  4.43  4.83   
3  14  -22.912161  -43.175011  Rio_de_Janeiro  Rio de Janeiro  6.64  7.56   
4  15   14.587430  120.983681          Manila          Manila  3.53  5.00   

   5.05   6.03   7.15   8.47  10.02  11.62  13.12  
0  9.42   9.96  10.51  11.15  11.85  12.55  13.07  
1  9.51  10.18  10.88  11.34  11.81  12.30  12.76  
2  5.37   6.02   6.79   8.14   9.76  11.45  12.39  
3  8.58   9.09   9.59  10.17  10.80  11.37  11.95  
4  5.95   6.89   7.97   9.40   9.96  10.76  11.63  


### check the data attribute type

In [12]:
city_df = pd.read_csv("CityPop.csv")
print(city_df['id'].dtypes)
print(city_df['latitude'].dtypes)
print(city_df['label'].dtypes)

int64
float64
object


Since strings data types have variable length, it is by default stored as object dtype.

More Tutorials: 
<br> https://data36.com/pandas-tutorial-1-basics-reading-data-files-dataframes-data-selection/
<br> https://www.geeksforgeeks.org/python-read-csv-using-pandas-read_csv/

In [57]:
## get the header 
header=list(city_df.columns.values)
print (header)

['id', 'latitude', 'longitude', 'city', 'label', 'yr1970', 'yr1975', 'yr1980', 'yr1985', 'yr1990', 'yr1995', 'yr2000', 'yr2005', 'yr2010']


In [None]:
## iterate each row  
pop_allyears = {} #embeded population dictionary for a city
for index, row in city_df.iterrows(): 
    print (row)
    cityname = row[3] # cityname
    citylabel = row[4] # city label
    pop_allyears['yr1970'] = row[5] # yr1970
    pop_allyears['yr1975'] = row[6] # yr1975
    pop_allyears['yr1980'] = row[7] # yr1980
    #######

In [14]:
## add the data into a list of dictionaries
city_pop_list = []# all cities
for row in city_df.itertuples(index=True):
    city = {} #embeded population dictionary for a city
    city['id'] = getattr(row, "id")#
    city['latitude'] = getattr(row, "latitude")#
    city['longitude'] = getattr(row, "longitude")#
    city['city'] =  getattr(row, "city")# cityname
    city['label'] =  getattr(row, "label")# cityname
    city['yr1970'] = getattr(row, "yr1970")# yr1970
    city['yr1975'] = getattr(row, "yr1975") # yr1975
    city['yr1980'] = getattr(row, "yr1980") # yr1980

    city_pop_list.append(city)

print (city_pop_list)

[{'yr1970': 23.3, 'city': 'Tokyo', 'yr1975': 26.61, 'yr1980': 28.55, 'longitude': 139.80894469999998, 'label': 'Tokyo', 'latitude': 35.6832085, 'id': 1L}, {'yr1970': 3.53, 'city': 'New_Delhi', 'yr1975': 4.43, 'yr1980': 5.56, 'longitude': 77.2008133, 'label': 'New Delhi', 'latitude': 28.608280199999996, 'id': 2L}, {'yr1970': 7.62, 'city': 'Sao_Paulo', 'yr1975': 9.61, 'yr1980': 12.09, 'longitude': -46.6546402, 'label': 'Sao Paulo', 'latitude': -23.5628395, 'id': 3L}, {'yr1970': 5.81, 'city': 'Mumbai', 'yr1975': 7.08, 'yr1980': 8.66, 'longitude': 72.8300934, 'label': 'Mumbai', 'latitude': 18.93013, 'id': 4L}, {'yr1970': 8.77, 'city': 'Mexico_City', 'yr1975': 10.69, 'yr1980': 13.01, 'longitude': -99.1331635, 'label': 'Mexico City', 'latitude': 19.4319592, 'id': 5L}, {'yr1970': 16.19, 'city': 'New_York', 'yr1975': 15.88, 'yr1980': 15.6, 'longitude': -73.83270259999999, 'label': 'New York', 'latitude': 40.7820015, 'id': 6L}, {'yr1970': 6.04, 'city': 'Shanghai', 'yr1975': 5.63, 'yr1980': 5.97

In [15]:
 ## convert the dictionary to json
import json
city_pop_json = json.dumps(city_pop_list)
print(type(city_pop_json)) 

<type 'str'>


https://stackoverflow.com/questions/9452775/converting-numpy-dtypes-to-native-python-types

In [19]:
## add the data into a list of dictionaries
city_pop_list = []# all cities
for row in city_df.itertuples(index=True):
    city = {} #embeded population dictionary for a city
    city['id'] = getattr(row, "id")
    city['latitude'] = getattr(row, "latitude")
    city['longitude'] = getattr(row, "longitude")
    city['city'] =  getattr(row, "city")
    city['label'] =  getattr(row, "label")
    city['yr1970'] = getattr(row, "yr1970")
    city['yr1975'] = getattr(row, "yr1975")
    city['yr1980'] = getattr(row, "yr1980")
    city_pop_list.append(city)
print (city_pop_list)
 ## convert the dictionary to json
import json
city_pop_json = json.dumps(city_pop_list)
print(type(city_pop_json)) 

## output: write a json file
with open('city_pop.json','w') as  fw:
    fw.write(city_pop_json)
print ('Done')

[{'yr1970': 23.3, 'city': 'Tokyo', 'yr1975': 26.61, 'yr1980': 28.55, 'longitude': 139.80894469999998, 'label': 'Tokyo', 'latitude': 35.6832085, 'id': 1L}, {'yr1970': 3.53, 'city': 'New_Delhi', 'yr1975': 4.43, 'yr1980': 5.56, 'longitude': 77.2008133, 'label': 'New Delhi', 'latitude': 28.608280199999996, 'id': 2L}, {'yr1970': 7.62, 'city': 'Sao_Paulo', 'yr1975': 9.61, 'yr1980': 12.09, 'longitude': -46.6546402, 'label': 'Sao Paulo', 'latitude': -23.5628395, 'id': 3L}, {'yr1970': 5.81, 'city': 'Mumbai', 'yr1975': 7.08, 'yr1980': 8.66, 'longitude': 72.8300934, 'label': 'Mumbai', 'latitude': 18.93013, 'id': 4L}, {'yr1970': 8.77, 'city': 'Mexico_City', 'yr1975': 10.69, 'yr1980': 13.01, 'longitude': -99.1331635, 'label': 'Mexico City', 'latitude': 19.4319592, 'id': 5L}, {'yr1970': 16.19, 'city': 'New_York', 'yr1975': 15.88, 'yr1980': 15.6, 'longitude': -73.83270259999999, 'label': 'New York', 'latitude': 40.7820015, 'id': 6L}, {'yr1970': 6.04, 'city': 'Shanghai', 'yr1975': 5.63, 'yr1980': 5.97

### Lab Assignment 1 (10 points)

### Hand-in 
•	Please collect your answers in a single .ipynb or .py file called lab1_yourname.ipynb or lab1_yourname.py <br>
•	Submit the file to the assignment folder called “Lab 1”. <br>
•	Include appropriate comments to explain what each line or block of code accomplishes. You must comment your code for full credit. <br>
•	The GeoJSON file you generated. <br>

### Task

In last lab, we created python scripts that read in the content of CityPop.csv, store the data in certain containers, and convert them to JSON. 
Now, you can try to create the GeoJSON format with the consideration of geospatial information.

Due Date: Two weeks later (Feb. 7, 2019)