# If you don't have Python, you can use ReplIt:

# https://repl.it/languages/Python3

# What are these type things, and why do we need them?

# Types determine representation and behavior
* csv files don't have types
* Bash (shell) scripts don't have types

## Gene name errors are widespread in the scientific literature

Mark Ziemann, Yotam Eren and Assam El-Osta

https://doi.org/10.1186/s13059-016-1044-7
    
>The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating point numbers.  A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.

## List: [ , , ]

* Mutable sequence
* Think 'examples of X'

## Tuple    ( , , )

* Immutable sequence
* Think 'group of associated data'
* Also look up typing.NamedTuple

## Dictionary    { : , : }

* Unique Key Value Pairs
* Keys must be immutable
* Think 'I want to look this up later'

# Set    { , , }

* Unique Collection
* Can be used for unique counting or to test for membership

# Combine these collections to represent the world

In [2]:
# people currently in space, from http://api.open-notify.org/astros.json

space_people = {
    "people": [
        {"name": "Oleg Kononenko", "craft": "ISS"}, 
        {"name": "David Saint-Jacques", "craft": "ISS"}, 
        {"name": "Anne McClain", "craft": "ISS"}
    ], 
    "number": 3, 
}

## 1. Copy this data from the url, then use it to print the name of everyone in space.
## 2. Programatically add yourself to space!

# These are internal to Python, what about the rest of the world?

## JSON is a lightweight specification (15 pages) for text based data

## It describes:
| <span style="font-size: xx-large;">Element</span> | <span style="font-size: xx-large;">Syntax</span> |   |
| ------- | ------ | - |
| <span style="color:blue;font-size: xx-large;">strings</span> | <span style="font-size: xx-large;">"..."</span> | |
| <span style="color:blue;font-size: xx-large;">booleans</span> | <span style="font-size: xx-large;">true/false</span> | |
| <span style="color:blue;font-size: xx-large;">integers</span> | <span style="font-size: xx-large;">123</span> | |
| <span style="color:blue;font-size: xx-large;">floats</span> | <span style="font-size: xx-large;">1.2 or 3.7E-5</span> | |
| <span style="color:blue;font-size: xx-large;">nulls</span> | <span style="font-size: xx-large;">null</span> | <span style="font-size: xx-large;">called None in Python</span> |
| <span style="color:red;font-size: xx-large;">Objects</span> | <span style="font-size: xx-large;">{ key1: value1, key2: value2}</span> | <span style="font-size: xx-large;">called dictionary in Python</span> |
| <span style="color:red;font-size: xx-large;">Array</span> | <span style="font-size: xx-large;">[ value1, value2, value3]</span> | <span style="font-size: xx-large;">called list in Python</span> |

## Describe the difference between "123" and 123 in a few sentences.

## Why doesn't JSON provide Set and Tuple collections?

# Why use JSON?

## Text data is much more robust than binary data
* Self documenting
* Much much easier to debug
* Easier to version
* Cost doesn't matter for small data
* Most data is small data

## Having ambiguous data suggests unlimited work.
* There is always another date format
* It is hard to guess what format other tools expect

## Poorly specified formats are hard to implement
* Many csv parsers fail when values contain commas or newlines

# Writing and reading JSON
## Strings
* Input with json.loads
* Output with json.dumps

## Files
* Input with json.load
* Output with json.dump

# What happens when you encode something that can not be represented in JSON?

## What Doesn't JSON have?
* Comments
* Anything fancy.  Dates, links, colors, etc.

## Use Toml/Yaml Instead

## Toml vs Yaml is a good argument for minimal data formats
* YAML is 86 pages
* Toml is comparable to JSON in size
* Loading YAML is a security risk by default
* Lots of variation between parsers.  Lots of incomplete implementation.

## Things to watch out for: 


### Writing down something in JSON that can't be represented in the language.

In [15]:
l = json.loads('{"number": 1.6000000000000000000001}')
print(l)

{'number': 1.6}


## Things to watch out for:

### Encoding/Decoding can get expensive

## Let's write some text to disk!

In [1]:
# https://repl.it/languages/Python3

f = open('input.txt', mode='w')
f.write("Hello World!")
f.close()

f = open('input.txt', mode='r')
print(f.read())
f.close()

Hello World!


## There are a lot of fiddly bits to files, how do you look up the options and operations?

## What do you expect to happen if you read after closing?
## What do you expect to happen if you open many files without closing them?
## What happens if an error occurs while you are reading?

## Python Context Managers help us avoid these sorts of mistakes

In [4]:
with open('input.txt') as f:
    print(f.read())
    
print(f.closed)

Hello World!
True


## What do you expect to happen if you read twice?

In [5]:
with open('input.txt') as f:
    print(f.read())
    print(f.read())


Hello World!



## Files contain a position that advances as you read and write.

## How do you read a file line by line?

In [6]:
coconuts = 'coconuts.txt'
with open(coconuts, 'w') as f:
    f.write("I've got a lovely bunch of coconuts\n")
    f.write("There they are, all standing in a row\n")
    f.write("Big ones, small ones, some as big as your head\n")
    f.write("Give them a twist a flick of the wrist\n")
    f.write("That's what the showman said")
    
with open(coconuts, 'r') as f:
    for line in f:
        print(line)

I've got a lovely bunch of coconuts

There they are, all standing in a row

Big ones, small ones, some as big as your head

Give them a twist a flick of the wrist

That's what the showman said


# Takeaways!
* look up file handling options with help, dir, and ?
* Avoid dangling files and other resources by using context managers (`with` statement)
* Use Python types and collections to unambiguously represent your data
* Look up David Beazly's talk "Builtin Superheros" for more information on collections
* To make your software more robust, use text data
* Start with JSON, evolve as needed

# Challenge! 
# Implement functools.lru_cache