# If you don't have Python, you can use ReplIt:

# https://repl.it/languages/Python3

# or the binder associated with this repo, at
# https://github.com/jgarst/json-class

# What do we do when we program?

### Usually we are taking text information from another program, processing it and emitting more text
#### * Take cells from a spreadsheet and produce a pdf
#### * Take tempurature readings from a device, send an email if the readings are bad
#### * Take text descriptions of people and deliver a webpage

# Today we are going to write the first half of a program.
#### * Read data from an API
#### * Manipulate it
#### * ???
#### * Profit!

# Here is an api that gives us the name of every person currently in space.

### http://api.open-notify.org/astros.json

### This is JSON data, a common text based specification for storing data
### We are going to read it in and convert it to native python types.

## JSON is a lightweight specification (15 pages) for text based data

## It describes:
| <span style="font-size: xx-large;">Element</span> | <span style="font-size: xx-large;">Syntax</span> |
| ------- | ------ |
| <span style="color:blue;font-size: xx-large;">strings</span> | <span style="font-size: xx-large;">"..."</span> |
| <span style="color:blue;font-size: xx-large;">booleans</span> | <span style="font-size: xx-large;">true/false</span> |
| <span style="color:blue;font-size: xx-large;">integers</span> | <span style="font-size: xx-large;">123</span> |
| <span style="color:blue;font-size: xx-large;">floats</span> | <span style="font-size: xx-large;">1.2 or 3.7E-5</span> |
| <span style="color:blue;font-size: xx-large;">nulls</span> | <span style="font-size: xx-large;">null</span> |
| <span style="color:red;font-size: xx-large;">Objects</span> | <span style="font-size: xx-large;">{ key1: value1, key2: value2}</span> |
| <span style="color:red;font-size: xx-large;">Array</span> | <span style="font-size: xx-large;">[ value1, value2, value3]</span> |

# Python also has these types

In [None]:
my_string = 'my favorite string'
my_boolean = True
my_int = 3
my_float = 4.5
my_null = None

# Python allows us to act on this data

In [None]:
print(5 + 5)
print(True or False)
print(3 / 2)
print('hello ' + ' noisebridge')

# We need to be careful to distinguish between type and value

###  555-5555 is a phone number, not a math concept, and should be stored as a string

In [None]:
# What is the difference between
print(5 + 5)
print('5' + '5')

# Trick Question!
# What happens if we add numbers and text?

In [None]:
print("5" + 5)

1. "55"
2. 55
3. "10"
4. 10

# Python has a zen

In [None]:
import this

## Gene name errors are widespread in the scientific literature

Mark Ziemann, Yotam Eren and Assam El-Osta

https://doi.org/10.1186/s13059-016-1044-7
    
>The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating point numbers.  A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.

## Adding times (from Think Python)

If I leave my house at 6:52 am and run 1 mile at an easy pace (8:15 per mile), then 3 miles at tempo (7:12 per mile) and 1 mile at an easy pace again, what time do I get home for breakfast?

* Use the Modulo operator.
* Print your answer in a human readable format.

## Which of these expressions are syntax errors, and why?

In [None]:
a = 5

In [None]:
5 = a

In [None]:
a, b = 5, 4
c = a + b

In [None]:
a, b = 5, 4
a + b = c

In [None]:
a = 5, 9
b = a + c

# What about all those squiggly things?

In [None]:
# people currently in space, from http://api.open-notify.org/astros.json

{'number': 3,
 'people': [{'craft': 'ISS', 'name': 'Oleg Kononenko'},
            {'craft': 'ISS', 'name': 'David Saint-Jacques'},
            {'craft': 'ISS', 'name': 'Anne McClain'}]}

# JSON has two types of collections
# Python has four primary collections

| <span style="font-size: xx-large;">JSON</span> | <span style="text-align: center; font-size: xx-large;">Syntax</span> | <span style="font-size: xx-large;">Python</span>  |
| ------- |:------ | - |
| <span style="font-size: xx-large;">nulls</span> | <span style="font-size: xx-large;">null</span> | <span style="font-size: xx-large;">None</span> |
| <span style="font-size: xx-large;">Object</span> | <span style="font-size: xx-large;">{ key1: value1, key2: value2}</span> | <span style="font-size: xx-large;">Dictionary</span> |
| <span style="font-size: xx-large;">Array</span> | <span style="font-size: xx-large;">[ value1, value2, value3]</span> | <span style="font-size: xx-large;">List</span> |
| <span/> | <span style="font-size: xx-large;">( value1, value2, value3)</span> | <span style="font-size: xx-large;">Tuple</span> |
| <span/> | <span style="font-size: xx-large;">{ value1, value2, value3}</span> | <span style="font-size: xx-large;">Set</span> |


## List: [ , , ]

* A list is a sequence
* A list is mutable
* Think 'examples of X'

In [None]:
# Python collections support
# adding
# removing
# access
# membership
# iterating

## Write code that computes the accumulation of a list

In [6]:
accumulate([1, 2, 3])

[1, 3, 6]

## Dictionary    { : , : }

* Unique Key Value Pairs
* Keys must be immutable
* Think 'I want to look this up later'

In [None]:
# add
# remove
# access
# membership
# iterating

## Write code that calculates the frequency of numbers in a list

In [8]:
count([1, 5, 5, 3, 2, 1])

{1: 2, 5: 2, 3: 1, 2: 1}

## Tuple    ( , , )

* Immutable sequence
* Think 'group of associated data'
* Also look up typing.NamedTuple

In [None]:
# add/remove
# access
# membership
# iterating
# sort order

# Set    { , , }

* Unique Collection
* Can be used for unique counting or to test for membership

In [None]:
# for each
# membership

# This isn't much, but we can combine these bits to created complex representations of the world.

In [None]:
spreadsheet = [
    {'column1': 'row1-value1', 'column2': 'row1-value2'},
    {'column1': 'row2-value1', 'column2': 'row2-value2'}
]

### We can access data in nested structures through repeated square bracket access

In [None]:
spreadsheet = [
    {'column1': 'row1-value1', 'column2': 'row1-value2'},
    {'column1': 'row2-value1', 'column2': 'row2-value2'}
]

row = spreadsheet[1]
value = row['column2']

In [None]:
value = spreadsheet[1]['column2']

# Let's talk about manipulating JSON data in Python!

In [None]:
import requests

space_people = requests.get('http://api.open-notify.org/astros.json').json()

pprint(space_people)

In [None]:
space_people = {
    "people": [
        {"name": "Oleg Kononenko", "craft": "ISS"}, 
        {"name": "David Saint-Jacques", "craft": "ISS"}, 
        {"name": "Anne McClain", "craft": "ISS"},
    ], 
    "number": 3, 
}
pprint(space_people)

### 1. Access the nested collections and print the name of everyone in space
### 2. Programatically add yourself to space!
#### Don't forget to update the number of people in space!

# These are internal to Python, what about the rest of the world?

# Writing and reading JSON
## Strings
* Input with json.loads
* Output with json.dumps

## Files
* Input with json.load
* Output with json.dump

In [None]:
import json
json.loads('{"people": [{"name": "Oleg Kononenko", "craft": "ISS"}, {"name": "David Saint-Jacques", "craft": "ISS"}, {"name": "Anne McClain", "craft": "ISS"}], "number": 3, "message": "success"}')

## What Doesn't JSON have?
* Comments
* Anything fancy.  Dates, links, colors, etc.

## Use Toml/Yaml Instead

## Toml vs Yaml is a good argument for minimal data formats
* YAML is 86 pages
* Toml is comparable to JSON in size
* Loading YAML is a security risk by default
* Lots of variation between parsers.  Lots of incomplete implementations.

## Things to watch out for: 


### Writing down something in JSON that can't be represented in the language.

In [None]:
l = json.loads('{"number": 1.6000000000000000000001}')
print(l)

## Things to watch out for:

### Encoding/Decoding can get expensive

## Let's write some text to disk!

In [None]:
# open and write

In [None]:
# file descriptors are finite

In [None]:
# errors are really hard to handle
# using each file only once is really hard

In [None]:
# Context managers!

## There are a lot of fiddly bits to files, how do you look up the options and operations?

In [None]:
help(open)

## Python Context Managers help us avoid these sorts of mistakes

## How do you read a file line by line?

In [None]:
coconuts = 'coconuts.txt'
with open(coconuts, 'w') as f:
    f.write("I've got a lovely bunch of coconuts\n")
    f.write("There they are, all standing in a row\n")
    f.write("Big ones, small ones, some as big as your head\n")
    f.write("Give them a twist a flick of the wrist\n")
    f.write("That's what the showman said")
    
with open(coconuts, 'r') as f:
    for line in f:
        print(line)

# Takeaways!
* look up file handling options with help, dir, and ?
* Avoid dangling files and other resources by using context managers (`with` statement)
* Use Python types and collections to unambiguously represent your data
* Look up David Beazly's talk "Builtin Superheros" for more information on collections
* To make your software more robust, use text data
* Start with JSON, evolve as needed