# Python 

There is an infamous document, [PEP 8](https://www.python.org/dev/peps/pep-0008/) that lays out some guidelines for how to style your code. You should read it, but here are some highlights:

* Use 4-space indentation, and no tabs.
    * 4 spaces are a good compromise between small indentation (allows greater nesting depth) and large indentation (easier to read). Tabs introduce confusion, and are best left out.
* Wrap lines so that they don’t exceed 79 characters.
    * This helps users with small displays and makes it possible to have several code files side-by-side on larger displays.
* Use blank lines to separate functions and classes, and larger blocks of code inside functions.
* When possible, put comments on a line of their own.
* Use docstrings.
* Use spaces around operators and after commas, but not directly inside bracketing constructs: a = f(1, 2) + g(3, 4).
* Name your functions consistently; the convention is to use lower_case_with_underscores for functions and methods. 
* Don’t use fancy encodings if your code is meant to be used in international environments. Python’s default, UTF-8, or even plain ASCII work best in any case.
* Likewise, don’t use non-ASCII characters in identifiers if there is only the slightest chance people speaking a different language will read or maintain the code.

## String Methods

* Strings have a bunch of *methods* that can be used to manipulate the text contents of the string
* See the [Python documentation](https://docs.python.org/3.5/library/stdtypes.html#string-methods) for a full list of string methods

In [None]:
# use the lowercase method on a string literal
"The Batman".lower()

In [None]:
# Put a string in a variable
superhero = "The Batman"

superhero.upper()

* The `split` method is very handy

In [None]:
# Make a longer string a split it
superhero = "The Batman bit the frog; he died of poison"
superhero.split()

In [None]:
# Split on a semicolon instead of a space
superhero.split(";")

In [None]:
# Check to see if the string contains a number
"50000".isdigit()

In [None]:
# Doesn't work for money
"$50000".isdigit()

* String formatting is a *super* useful way to programmatically create strings
* This is a very powerful system, so definitely [check out the documentation](https://docs.python.org/3.5/library/string.html#formatstrings)

In [None]:
# create a string template
template_string = "My name is {}"

In [None]:
name = "Dr. Strange"
template_string.format(name)

In [None]:
template_string = "Oh we are using made up names. Hello, {you}! My name is {me}"
my_name = "Spiderman"

In [None]:
template_string.format(me=my_name, you=name)

* Use triple quotes to make strings with newlines

In [None]:
# Make a Frosty string and split on the lines
multiline_example = """Nature’s first green is gold,
Her hardest hue to hold.
Her early leaf’s a flower;
But only so an hour.
Then leaf subsides to leaf,
So Eden sank to grief,
So dawn goes down to day
Nothing gold can stay."""

multiline_example.splitlines()

* What if we wanted to split the lines and then split the words

In [None]:
multiline_example.splitlines().split()

---

## List Comprehensions

* [List comprehensions](https://docs.python.org/3.5/tutorial/datastructures.html#list-comprehensions) provide a concise way to create lists.
* If you find yourself creating lists by performing operations in a `for` loop, a list comprehension is a more *pythonic* way to do this
* For example, squaring all the values in a list

In [None]:
# Create a list of squares
# Code from Python documentation
squares = []
for x in range(10):
    squares.append(x**2)

squares

* Here is the same problem solved with a list comprehension

In [None]:
squares = [x**2 for x in range(10)]
squares

* If you find yourself creating an empty list, looping, and appending to that list then you might consider a list comprehension
* A list comprehension consists of brackets containing an expression followed by a `for` clause, then zero or more `for` or `if` clauses. 

In [None]:
[(x, y) for x in ["Bruce","Tony","The"] for y in ["Wayne", "Stark","Joker"] if x != y]

* This is equivalent to 

In [None]:
# combine these to lists
combinations = []
for x in ["Bruce","Tony","The"]:
    for y in ["Wayne", "Stark","Joker"]:
        if x != y:
            combinations.append((x, y))

combinations

* You can call functions in list comprehensions to do more powerful processing

In [None]:
list_of_strings = ['Bruce WAYNE', "  The JOKER", "   ThAnOS   "]

def clean_string(to_clean):
    return to_clean.strip().lower()

cleaned = [clean_string(x) for x in list_of_strings]
cleaned

* And this works with string methods, so now we can split the frost poem

In [None]:
# Make a Frosty string and split on the lines
multiline_example = """Nature’s first green is gold,
Her hardest hue to hold.
Her early leaf’s a flower;
But only so an hour.
Then leaf subsides to leaf,
So Eden sank to grief,
So dawn goes down to day
Nothing gold can stay."""

tokenized = [line.split() for line in multiline_example.splitlines()]
tokenized

* You can also do dictionary comprehensions

In [None]:
# Create a dictionary of squares from a list of numbers
square_loopup = {x: x**2 for x in [2, 4, 6]}
print(square_loopup[2])
square_loopup

---

## Traversing the Filesystem

* [Pathlib](https://docs.python.org/3/library/pathlib.html) is a new addition to the Python standard library, it provides some very handy features for working with files and moving around the filesystem
    * Real Python has [a nice introduction to Pathlib](https://realpython.com/python-pathlib/)
* Instead of strings, Pathlib makes path and file objects and provides methods for interacting with those objects
* What is nice about Pathlib is that it is Operating System agnostic, which means code will work on Windows, Mac, or Linux.
    * Windows breaks hardcoded file paths
* I recommend using `pathlib` over `os`, `glob`, and `shutil`

In [None]:
# import the Path object from pathlib
from pathlib import Path


In [None]:
# get the current working directory
Path.cwd()

In [None]:
# Get the parent directory
Path.cwd().parent

In [None]:
# Make a path to the extra pandas content
# save to a variable so we can reuse it
extra_content_path = Path.cwd().parent.joinpath("day-two","extra-pandas")
extra_content_path

In [None]:
# iterdir gives us a generator 
# so we have to iterate to see the contents
for file in extra_content_path.iterdir():
    # print the path
    print(file)

* See how `pathlib` replaces `os.path` it can also function like the `glob` module as well
* What if we just wanted to see the csv files in the extra pandas content directory?

In [None]:
for file in extra_content_path.glob("*.csv"):
    print(file)

* We can also get additional information about those files

In [None]:
# loop over the csv files in the dir
for file in extra_content_path.glob("*.csv"):
    print("Full Path:", file)
    print("File Name:", file.name)
    print("File Stem:", file.stem)
    print("File Suffix:", file.suffix)
    print()
    

* Pathlib also replaces shutil 
* Has some nice methods for working with directories & files

In [None]:
# Check to see if the test directory exists
Path('test').exists()

In [None]:
# Create the test directory
Path('test').mkdir()

In [None]:
# Check to see if the test directory exists
Path('test').exists()

In [None]:
# create a path to a markdown file
test_file = Path('test').joinpath('test.md')

In [None]:
# check to see if the file exists
test_file.exists()

* What we have is a *path* to a file, but the file needs to be created
* Pathlib provides some *very handy* methods for quickly reading and writing text (or bytes) to a file

In [None]:
# Some text for the file
text = """# This is a test file

Written in markdown."""

# Use the write_text method to spit the text to disk
test_file.write_text(text)

In [None]:
# Read the text contents of the file
test_file.read_text()

* Now that we are done with these files we can remove them

In [None]:
# Delete the test file
test_file.unlink()

In [None]:
# Check to see if it exists
test_file.exists()

* This is just a quick overview of a very powerful new module in Python 3.4
* Be sure to [read the documentation](https://docs.python.org/3/library/pathlib.html) for more information

---

## Working With CSV Files

* CSV files are used to store a large number of variables – or data. They are incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext.
* Python has a CSV parser as part of the standard library
* To parse CSV files, we use the `csv` module.
* The csv module provides a number of built-in functions to make it easier to parse and iterate through CSV files.
 

In [None]:
#  load the CSV module 
import csv

* Now we need to tell Python to open a connection to `diabetes.csv` and diabetes_file handler should be processed as a CSV file. 
*  We do that by calling on the `reader()` function of the csv module

In [None]:
# open the diabetes file
with open("diabetes.csv", 'r') as diabetes_file:
    # Create a CSV reader 
    diabetes_data = csv.reader(diabetes_file)

* At this point, the entire CSV file is treated as a table - a collection of rows and columns
* We can iterate (loop) through this table and get access to each individual row, just like the line-by-line above
* But CSV module automatically splits it all into different values!

In [None]:
# open the diabetes file
with open("diabetes.csv", 'r') as diabetes_file:
    # Create a CSV reader 
    diabetes_data = csv.reader(diabetes_file)
    
    # loop over the file and print the row contents 
    for row in diabetes_data:
        print(row)
    

* You probably noticed that the row variable is just a list - it is a list of values contained in each column.
* You can access individual columns exactly the same way you would access values in a list.
* For example, the value of cholesterol is in a column called 'chol', which is a second column and therefore has the index of 1

In [None]:
# open the diabetes file
with open("diabetes.csv", 'r') as diabetes_file:
    # Create a CSV reader 
    diabetes_data = csv.reader(diabetes_file)
    
    # loop over the file and print the row contents 
    for row in diabetes_data:
        print(row[1]) # print only the values for the chol column

* You probably also noticed that the first row does not contain data - it's just the column headers
* In order for us to do any mathematical or statistical operations on the data, we need to EXCLUDE the header
* We have to skip the header row. We can do this with the `next()` function to separate the header rows

In [None]:
# open the diabetes file
with open("diabetes.csv", 'r') as diabetes_file:
    # Create a CSV reader 
    diabetes_data = csv.reader(diabetes_file)

    # use next to skip the header row
    headers = next(diabetes_file)
    print(headers)

    # loop over the remaining lines file 
    for row in diabetes_data:
        print(row[1]) # print only the values for the chol column
