# Python 

There is an infamous document, [PEP 8](https://www.python.org/dev/peps/pep-0008/) that lays out some guidelines for how to style your code. You should read it, but here are some highlights:

* Use 4-space indentation, and no tabs.
    * 4 spaces are a good compromise between small indentation (allows greater nesting depth) and large indentation (easier to read). Tabs introduce confusion, and are best left out.
* Wrap lines so that they don’t exceed 79 characters.
    * This helps users with small displays and makes it possible to have several code files side-by-side on larger displays.
* Use blank lines to separate functions and classes, and larger blocks of code inside functions.
* When possible, put comments on a line of their own.
* Use docstrings.
* Use spaces around operators and after commas, but not directly inside bracketing constructs: a = f(1, 2) + g(3, 4).
* Name your functions consistently; the convention is to use lower_case_with_underscores for functions and methods. 
* Don’t use fancy encodings if your code is meant to be used in international environments. Python’s default, UTF-8, or even plain ASCII work best in any case.
* Likewise, don’t use non-ASCII characters in identifiers if there is only the slightest chance people speaking a different language will read or maintain the code.

## String Methods

* Strings have a bunch of *methods* that can be used to manipulate the text contents of the string
* See the [Python documentation](https://docs.python.org/3.5/library/stdtypes.html#string-methods) for a full list of string methods

In [3]:
# use the lowercase method on a string literal
"The Batman".lower()

'the batman'

In [11]:
# Put a string in a variable
superhero = "The Batman"

superhero.upper()

'THE BATMAN'

* The `split` method is very handy

In [14]:
# Make a longer string a split it
superhero = "The Batman bit the frog; he died of poison"
superhero.split()

['The', 'Batman', 'bit', 'the', 'frog;', 'he', 'died', 'of', 'poison']

In [15]:
# Split on a semicolon instead of a space
superhero.split(";")

['The Batman bit the frog', ' he died of poison']

In [16]:
# Check to see if the string contains a number
"50000".isdigit()

True

In [18]:
# Doesn't work for money
"$50000".isdigit()

False

* String formatting is a *super* useful way to programmatically create strings
* This is a very powerful system, so definitely [check out the documentation](https://docs.python.org/3.5/library/string.html#formatstrings)

In [27]:
# create a string template
template_string = "My name is {}"

In [28]:
name = "Dr. Strange"
template_string.format(name)

'My name is Dr. Strange'

In [30]:
template_string = "Oh we are using made up names. Hello, {you}! My name is {me}"
my_name = "Spiderman"

In [31]:
template_string.format(me=my_name, you=name)

'Oh we are using made up names. Hello, Dr. Strange! My name is Spiderman'

* Use triple quotes to make strings with newlines

In [19]:
# Make a Frosty string and split on the lines
multiline_example = """Nature’s first green is gold,
Her hardest hue to hold.
Her early leaf’s a flower;
But only so an hour.
Then leaf subsides to leaf,
So Eden sank to grief,
So dawn goes down to day
Nothing gold can stay."""

multiline_example.splitlines()

['Nature’s first green is gold,',
 'Her hardest hue to hold.',
 'Her early leaf’s a flower;',
 'But only so an hour.',
 'Then leaf subsides to leaf,',
 'So Eden sank to grief,',
 'So dawn goes down to day',
 'Nothing gold can stay.']

* What if we wanted to split the lines and then split the words

In [20]:
multiline_example.splitlines().split()

AttributeError: 'list' object has no attribute 'split'

---

## List Comprehensions

* [List comprehensions](https://docs.python.org/3.5/tutorial/datastructures.html#list-comprehensions) provide a concise way to create lists.
* If you find yourself creating lists by performing operations in a `for` loop, a list comprehension is a more *pythonic* way to do this
* For example, squaring all the values in a list

In [1]:
# Create a list of squares
# Code from Python documentation
squares = []
for x in range(10):
    squares.append(x**2)

squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

* Here is the same problem solved with a list comprehension

In [2]:
squares = [x**2 for x in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

* If you find yourself creating an empty list, looping, and appending to that list then you might consider a list comprehension
* A list comprehension consists of brackets containing an expression followed by a `for` clause, then zero or more `for` or `if` clauses. 

In [7]:
[(x, y) for x in ["Bruce","Tony","The"] for y in ["Wayne", "Stark","Joker"] if x != y]

[('Bruce', 'Wayne'),
 ('Bruce', 'Stark'),
 ('Bruce', 'Joker'),
 ('Tony', 'Wayne'),
 ('Tony', 'Stark'),
 ('Tony', 'Joker'),
 ('The', 'Wayne'),
 ('The', 'Stark'),
 ('The', 'Joker')]

* This is equivalent to 

In [8]:
# combine these to lists
combs = []
for x in ["Bruce","Tony","The"]:
    for y in ["Wayne", "Stark","Joker"]:
        if x != y:
            combs.append((x, y))

combs

[('Bruce', 'Wayne'),
 ('Bruce', 'Stark'),
 ('Bruce', 'Joker'),
 ('Tony', 'Wayne'),
 ('Tony', 'Stark'),
 ('Tony', 'Joker'),
 ('The', 'Wayne'),
 ('The', 'Stark'),
 ('The', 'Joker')]

* You can call functions in list comprehensions to do more powerful processing

In [12]:
list_of_strings = ['Bruce WAYNE', "  The JOKER", "   ThAnOS   "]

def clean_string(to_clean):
    return to_clean.strip().lower()

cleaned = [clean_string(x) for x in list_of_strings]
cleaned

['bruce wayne', 'the joker', 'thanos']

* And this works with string methods, so now we can split the frost poem

In [21]:
# Make a Frosty string and split on the lines
multiline_example = """Nature’s first green is gold,
Her hardest hue to hold.
Her early leaf’s a flower;
But only so an hour.
Then leaf subsides to leaf,
So Eden sank to grief,
So dawn goes down to day
Nothing gold can stay."""

tokenized = [line.split() for line in multiline_example.splitlines()]
tokenized

[['Nature’s', 'first', 'green', 'is', 'gold,'],
 ['Her', 'hardest', 'hue', 'to', 'hold.'],
 ['Her', 'early', 'leaf’s', 'a', 'flower;'],
 ['But', 'only', 'so', 'an', 'hour.'],
 ['Then', 'leaf', 'subsides', 'to', 'leaf,'],
 ['So', 'Eden', 'sank', 'to', 'grief,'],
 ['So', 'dawn', 'goes', 'down', 'to', 'day'],
 ['Nothing', 'gold', 'can', 'stay.']]

* You can also do dictionary comprehensions

In [16]:
# Create a dictionary of squares from a list of numbers
square_loopup = {x: x**2 for x in [2, 4, 6]}
print(square_loopup[2])
square_loopup

4


{2: 4, 4: 16, 6: 36}

## Working with Modules

* Python's [standard library](https://docs.python.org/3/library/) is very comprehensive 
    * Interact with your operating system with `os`
    * Work with emails using `email`
    * Run a web server with `http.server`
* Use this also to import 3rd-party libraries
* To import modules use the `import` command, this will load the module into memory
    * Use the syntax `import <module name> as <arbitrary name>` to use a different name

In [35]:
import datetime

In [33]:
import pathlib

In [None]:
# Put the cursor next to the period and hit tab
pathlib.

In [None]:
# The Path object is most useful

In [34]:
pathlib.Path(".")

PosixPath('.')

---

## Traversing the Filesystem

* Pathlib

---

## Working With CSV Files

* CSV files are used to store a large number of variables – or data. They are incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext.
* Python has a CSV parser as part of the standard library
* To parse CSV files, we use the `csv` module.
* The csv module provides a number of built-in functions to make it easier to parse and iterate through CSV files.
 

In [None]:
#  load the CSV module 
import csv

* Now we need to tell Python to open a connection to `diabetes.csv` and diabetes_file handler should be processed as a CSV file. 
*  We do that by calling on the `reader()` function of the csv module

In [None]:
# open the diabetes file
with open("diabetes.csv", 'r') as diabetes_file:
    # Create a CSV reader 
    diabetes_data = csv.reader(diabetes_file)

* At this point, the entire CSV file is treated as a table - a collection of rows and columns
* We can iterate (loop) through this table and get access to each individual row, just like the line-by-line above
* But CSV module automatically splits it all into different values!

In [None]:
# open the diabetes file
with open("diabetes.csv", 'r') as diabetes_file:
    # Create a CSV reader 
    diabetes_data = csv.reader(diabetes_file)
    
    # loop over the file and print the row contents 
    for row in diabetes_data:
        print(row)
    

* You probably noticed that the row variable is just a list - it is a list of values contained in each column.
* You can access individual columns exactly the same way you would access values in a list.
* For example, the value of cholesterol is in a column called 'chol', which is a second column and therefore has the index of 1

In [None]:
# open the diabetes file
with open("diabetes.csv", 'r') as diabetes_file:
    # Create a CSV reader 
    diabetes_data = csv.reader(diabetes_file)
    
    # loop over the file and print the row contents 
    for row in diabetes_data:
        print(row[1]) # print only the values for the chol column

* You probably also noticed that the first row does not contain data - it's just the column headers
* In order for us to do any mathematical or statistical operations on the data, we need to EXCLUDE the header
* We have to skip the header row. We can do this with the `next()` function to separate the header rows

In [None]:
# open the diabetes file
with open("diabetes.csv", 'r') as diabetes_file:
    # Create a CSV reader 
    diabetes_data = csv.reader(diabetes_file)

    # use next to skip the header row
    headers = next(diabetes_file)
    print(headers)

    # loop over the remaining lines file 
    for row in diabetes_data:
        print(row[1]) # print only the values for the chol column
