# Iterables, Sequences, Conditional Execution & Files

This week we will be working on two different, yet related topics: (1) point pattern analysis and (2) GeoJSON.  For the former you will be leveraging the readings about iterables and sequences to write some point pattern analysis algorithms.  For the latter, you will be working on reading and processing a GeoJSON file of US cities.

The readings are quite good with respect to how iterables and iteration, conditional execution, and file access work.  This notebook is purely supplemental, and focuses on filling in some gaps to make the assignment easier.

## Iterables and Sequences

Here, the linkage that I want to make is between the mathematical notation you are likely to see when researching how to compute a given statistic and possible implementation methods.

As a simple example, lets take the average of $n$ numbers.  This could be notated as:

$\frac{1}{n}\sum_{i=1}^{n}x_{i}$, where $n$ is the number of values and $x$ is the value at position $i$.

Looking at this, we see the need to divide (multiply by the reciprocal) some summation by $n$.  To realize this in code, I know that I need to compute the summation.  This could be accomplished with a `for` loop.

In [23]:
x = list(range(10))
print('My input values are: ', x)

summation = 0
for i in range(10):
    summation += i  # This is the same as summation = summation + i
    
mean = summation / len(x)
print('The mean value is: ', mean)

My input values are:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The mean value is:  4.5


In [27]:
# As an aside, Python built-ins make this even easier (if a bit less literal).
x = range(10)
mean = sum(x) / len(x)
print('The mean value is: ', mean)

The mean value is:  4.5


### Continue
The continue statement is quite useful when doing comparisons.  For example:

In [32]:
for i in range(3):
    for j in range(3):
        if i == j:
            continue
        else:
            print(i, j)

0 1
0 2
1 0
1 2
2 0
2 1


Here is another line by line breakdown:
1. Iterate over the range (0, 1, 2)
2. Inside of the first loop iterate over the range(0,1,2)
3. Should be pretty self-explanatory?
4. If line 3 is True, then we continue.  Do not execute any more code in this for loop and return to line 2.  If this is the last time line 2 would be exectude, return to line 1.

The idea is that we can check some condition and depending on the result, either proceed with the code execution or `continue` with the for loop.

In the week that we look at functional programming we will have a look at a clear way to accomplish this.  For now, this is great practice using iteration and conditional execution.

## Files

The readings offer one way to get access to a file and read information either line by line or in the entirety.  Another syntax that you will frequently encounter utilizes a context manager.  Without diving into lots of detail about how `__enter__` and `__exit__` are working with a context manager, we can simply say that the `with` keyword manages entering, working on, and exiting from some runtime context.  Take the following example:

In [3]:
with open('example.txt', 'r') as f:
    lines = f.readlines()
    for l in lines:
        print(l)

Shall I compare thee to a summer's day?

Thou art more lovely and more temperate:

Rough winds do shake the darling buds of May,

And summer's lease hath all too short a date.



Here, a text file is being opened in read mode (`'r'`) using a context manager (`with`).  Since everything in Python is an object, we go ahead and assign the opened file object to a variable, `f`.  The text file is then being read in (as per the readings from last week) and the lines printed.

How does this benefit us?  Enter the need to read some kind of structured data that already has a Python module to support parsing.  Take this example where we load a tweet (the response of scraping twitter using their API).

## Working with JSON

In [7]:
import json

with open('example.json', 'r') as f:
    d = json.load(f)

d  # iPython pretty prints

{'contributors': None,
 'coordinates': None,
 'created_at': 'Thu Oct 21 16:02:46 +0000 2010',
 'entities': {'hashtags': [],
  'urls': [{'expanded_url': None,
    'indices': [69, 100],
    'url': 'http://gnip.com/success_stories'}],
  'user_mentions': [{'id': 16958875,
    'id_str': '16958875',
    'indices': [25, 30],
    'name': 'Gnip, Inc.',
    'screen_name': 'gnip'}]},
 'favorited': False,
 'geo': None,
 'id': 28039652140,
 'id_str': '28039652140',
 'in_reply_to_screen_name': None,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'place': None,
 'retweet_count': None,
 'retweeted': False,
 'source': 'web',
 'text': "what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories",
 'truncated': False,
 'user': {'contributors_enabled': False,
  'created_at': 'Fri Oct 24 23:22:09 +0000 2008',
  'description': 'Gnip makes it really easy for you to collect social dat

This is a 'boring tweet - no spatial data.  How can we access the tweet text?  The loaded information is a Python dictionary, so by key.

In [8]:
d['text']

"what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories"

As a bit more interesting access - how can we get the number of follwers that the person writing the tweet has?  Looking at the JSON above, I see that the `followers_count` value is nested inside of a dict with the key `user`.  This looks like a dict within a dict.

In [9]:
print(d['user']['followers_count'])

260


Or we can assign the sub-dict to a variable and get in that way.

In [13]:
user = d['user']
print(user.keys())
print()  # Print a blank line
print(user['followers_count'])

dict_keys(['id_str', 'contributors_enabled', 'utc_offset', 'following', 'verified', 'profile_background_color', 'created_at', 'notifications', 'show_all_inline_media', 'favourites_count', 'description', 'profile_sidebar_fill_color', 'follow_request_sent', 'lang', 'profile_sidebar_border_color', 'location', 'profile_image_url', 'profile_use_background_image', 'profile_text_color', 'listed_count', 'friends_count', 'protected', 'statuses_count', 'time_zone', 'id', 'screen_name', 'profile_background_image_url', 'profile_background_tile', 'profile_link_color', 'name', 'geo_enabled', 'followers_count', 'url'])

260


As an aside that will help with the assignment.  Think about how you might work with a list of tweets (or a list of GeoJson object).  Here is a really contrived example:

In [16]:
# Here I create a list of 10 identical tweets.  
# Print these if you want to see what we get or try something like, ['a'] * 10
tweets = [d] * 10 

In [19]:
for t in tweets:  # Iterate over the list:
    print(t['text'], t['user']['followers_count'])

what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to happy customers http://gnip.com/success_stories 260
what we've been up to at @gnip -- delivering data to ha

## GeoJSON
Above I talked for a moment about GeoJSON.  What is geoJSON?

> GeoJSON is a format for encoding a variety of geographic data structures. A GeoJSON object may represent a geometry, a feature, or a collection of features. GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Features in GeoJSON contain a geometry object and additional properties, and a feature collection represents a list of features.

> A complete GeoJSON data structure is always an object (in JSON terms). In GeoJSON, an object consists of a collection of name/value pairs -- also called members. For each member, the name is always a string. Member values are either a string, number, object, array or one of the literals: true, false, and null. An array consists of elements where each element is a value as described above.

From: http://geojson.org/geojson-spec.html

Here is an example of a GeoJSON file with mixed geometries (highly unlikley that you will encounter mixed geometries, but a valid example none the less).  I have added the triple quotes because I want to create a formatted string and retain the `\n` newline characters.

A few things to notice here:

  * The geojson is a dictionary when loaded into Python
  * Root level keys are "type" and "features.  A "CRS" (coordinate reference system) key could also be included.
      * "features" are the 'good' stuff containing geometries and attributes (properties).
      * "features" is a list, so iteration is by position (and not key like a dictionary).
  * Every feature is a dictionary with two keys: "geometry" and "properties".  
      * "geometry" and "properties" are dictionaries with values that can be basic types, lists, dicts, ect.
      
The implications of this are:
  * If you want to iterate over the features you first need to grab the list out of the dictionary, e.g. `my_list_of_features = geojson['features']`.
  * Once you have the list of features, if you want to access a specific attribute or geometry, you need to go back to dictionary style access, e.g. `feature[1]['properties']['prop0']`.
  
My biggest suggestion when working with structured data like this is to experiment.  The iPython notebooks work great to do this.  Simply get the file into memory (using the example above) and start iterating.  If the file is long, remember that you can always add a `break` to a loop.

## Formulas

Here are a few formulas to help with this week.

* Mean Center: $\frac{\sum_{i=1}^{n}x_{i}}{n},\frac{\sum_{i=1}^{n}y_{i}}{n}$, where $n$ is the number of points, $x$ is the X coordinate of a point, $y$ is a Y coordinate of a point, and $i$ is an index.
* Average Nearest Neighbor Distance: $\bar{D} = \frac{\sum_{i=1}^{n}d_{i}}{n}$, where $D$ is the mean nearest neighbor distance, $d_{i}$ is the nearest neighbor distance between the $i^{th}$ observation and all other, $j$ observations ($i \neq j$), and $n$ is the total number of observations.
* Expected Average Nearest Neighbor Distance: $\bar{E(d)} = \frac{1}{2}\sqrt{\frac{A}{n}}$, where $\bar{E(d)}$ is the expected average nearest neighbor distance, $A$ is the area (this statistic is very dependent upon this value), and $n$ is the total number of observations.  For those wondering about why the statistics is so dependent upon $A$, recall edge effects from your spatial statistics course(s).

# Week 5 Deliverables (E4) - Due 2/23/16
For this week make sure that you have completed the following:
    
   
* Fork Assignment 4 to your own github repository.
    * You can access assignment 4 [HERE](https://github.com/Geospatial-Python/assignment_04)
* Clone the repository locally
* Make the necessary code changes to `point_pattern.py` so that tests are passing locally
    * Like last assignment, we are going to be working with point patterns.  The readings focused on iteration, sequences, and conditional execution.  We are going to use these concepts to write functions to:
        1. Read a geojson file
        2. Parse a geojson file to find the largest city by population
        3. Write your own code to do something interesting with the geojson
        4. Compute the mean center of a point pattern
        5. Compute the average distance between neighbors
        6. Compute the miminum bounding rectangle (MBR) on a point pattern
        7. Compute the area of a MBR
        8. Compute the expected mean distance for a given point pattern
        
* For #3, above, make sure to update `test/tests.py` with a passing test
* Submit a pull request to the Geospatial_Python Assignment 4 repository.

Any questions, please post on the discussion forum.