# Python Data Analysis


Agenda:

1) Anaconda/Jupyter environment set up

2) Basic python syntax

3) Pandas

4) Data analysis example using CER data

Download Anaconda if you dont alreay have it installed. Anaconda is a distibution of python that contains python itself and all of the data analysis packages that we need. We could download base python and all the packages we need seperately, but anaconda just makes it easier.
 - https://www.anaconda.com/

### Why use python? How does it fit into our tech stack?


Python is very good for doing more advanced data analysis. Most of the work you will see today can be done in excel, or other data analysis tools. However python allows for the automation of common analysis operations we do everyday at the CER.



* SQL: Used at the CER for data management (creating databases, tables, views, queries, etc), and data analysis with queries/joins. SQL is good to know because we are moving alot of our data infrastructure to relational databases in the cloud. Tableau also plays nicely with SQL compared to Python.

* Python: Used at the CER for more advanced/reproducible data analysis, modelling, web scraping, process automation, and visualizations. Python is general purpose, unlike SQL

* R: I have never used R, but alot of people at the CER use it. Pretty much anything that can be done in Python can also be done in R, but I think there are some benefits to Python that make it better.

* Power Query: used for quick and easy (and reproducable) data analysis without any code.

* Tableau: used to visualize the outputs from SQL, python, and power query.



What cool things can be done in python that cant be done using other technology?

* web scraping

* Task scheduling, automating emails, file systems, and database inserts/updates

* Modelling, machine learning, econometrics

* Algorithms, advanced data structures, optimization

* Preparing data for use in other programs (tableau, highcharts, etc)

* RegP: Sections of the CER act/regulations contain "rules" that can be written into a custom algorithm for scenario testing.

### Pandas

Why Pandas? Pandas is the foundation of Python's data ecosystem. If you want to do modelling, machine learning, data viz, or anything more advanced, you need to know Pandas first!

## Jupyter Notebooks



Jupyter notebooks is a code environment that can both python and r (not in the same notebook). It is good for learning because we can isolate code into individual cells, and see what happens after each block of code is run. 

You can write all your code in Jupyter Notebooks no problem, but more advanced coding and algorithms may require a more sophisticated integrated development environment (IDE). Common IDE's used at the CER include: Sypder, PyCharm, and visual studio code.

## Packages

When we start Python, there are only a few base function and objects loaded. We almost always need to load some extra functionality into Python by loading packages. I like to start off each script by importing the needed packages (sometimes called modules or libraries) at the top of the script. 

The main package that we use at the CER, and the main package for all data analysis is called Pandas. This package comes standard with the Anaconda districution of Python.

Pandas extends the functionality of base python. Pandas is the foundation for pretty much all tabular data work in python, including medelling, econometrics, and machine learning.

In [116]:
import pandas as pd #packages can be renamed by the user on import. We will talk about pandas later!
pd.__version__
#if we werent using anaconda, this block of code would give an import error, because pandas is not part of vanilla python

'1.2.1'

In [117]:
import sys
import os
# these packages are part of python's standard library. We dont need anaconda to use these "modules"
sys.version

'3.8.5 (default, Sep  4 2020, 07:30:14) \n[GCC 7.3.0]'

In [118]:
## So all output comes through from Ipython. Dont add this cell to your actual projects.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Check your Working Directory

Python uses the concept of a working directory (exactly like many other programs). Everytime we start Python, it is started 'somewhere' on our computer (filesystem). Where python is started depends on environment variables in your operating system and how you started Python. I like to set the working directory for my session to be wherever my project files are. I will typically organize a directory like the structure below, then set the working directory to the top level:

`python_cer --->
        --> 2020lqdptrlmgs.xlsx
        --> connection.json
        --> Python Data Analysis.ipynb
        --> Etc...`
        
So in this case, I would set my working directory to be the file location for python_cer. There should be one jupyter notebook, and one data file for this session. You need to modify the working directory to be wherever you have downloaded the files to on your computer (on your filesystem). If you downloaded them to your 'documents' folder and named the folder 'python_session' (and your name is Grant), the file path might be this:

`c:\users\grant\documents\python_cer`

Figure out where you have downloaded the folder to, and change the code below to assign the correct working directory. We use the os package to change the working directory.


The working directory is important, and a common source of errors. Having the wrong working directory is like selecting the wrong table in SQL...


In [119]:
print('The current working direcotry is: '+os.getcwd())
#os.chdir(r'C:\Users\mossgran\Documents')
#print('The new working directory is: '+os.getcwd())
#os.chdir(r'C:\Users\mossgran\Documents\python_cer')
#print('Working directory is changed back to: '+os.getcwd())

The current working direcotry is: /home/grant/Documents/cer_python_training


In [120]:
os.listdir()

['new notebook.ipynb',
 'Part 1 - Python syntax and programming.ipynb',
 'readme.md',
 '.ipynb_checkpoints',
 'Part 2 - Pandas.ipynb',
 '2020lqdptrlmgs.xlsx',
 '.gitignore',
 'connection.json',
 '.git']

## Aside: Python comments "#"

The hashtag is used to create a python comment. Coments are ignored by Python when you run your code. Comments serve two purposes, they can be used to explain how certain sections of your code function, and your code design pattern rationale. This is critical when others need to use/improve your code, but is also super usefull for yourself.

When returning to code that you wrote in the past, your own comments can save you alot of pain...

You can also "comment out" lines of code when you want to temporarily "delete" code. This is a really good best practice that programmers use all the time. Dont ever delete code unless you are absolutely certain that you dont need it.

## OS package & File Control

The os package comes with python not anaconda. Why do we need to import this package? Why doesnt python just include all the os stuff to begin with?

In [121]:
# make sure that you are in the right working directory...
# os.mkdir("/excel_files") # this is a relative path

# we can also specify a full path
# os.mkdir(os.path.join(os.getcwd(),"/excel_files"))

## Base Python Objects

Python is object oriented. What does this mean? Everything in python is an object, and each object has different properties/methods. For example, someones name would be a string object in python "grant moss" and a specific function (method) can be called on that object to capitalize the name: "Grant Moss".

Base Python includes such computer language fundamentals as strings, numbers (both integer and floating point), and boolean types (True, False). 

Because Python is object oriented, each instance of a base Python class (like a variable that references a string) inherits the methods of that class. In the case of strings, these include case and search methods.

For an in-depth review of Python's object-orientation, read the chapter of the excellent free online book *A Byte of Python* here: https://python.swaroopch.com/oop.html

We will go over a few basic data types that are used in essentially all programming languages, including excel.


From now on, always be thinking about types. This is no different than excel!

### Most basic types: Numeric values & Strings

In [122]:
#numeric values
x = 4 #everything to the right of the assignment "=" is evaluated first, and then assigned to the variable on the left
y = 4.4
print('x is: '+str(type(x)))
print('y is: '+str(type(y)))

#types can be changed. This changes the properties of the data
x = str(x)
print('x is now: '+str(type(x)))

#python automatically determines the data type of a variable. This makes python easy to write, but we need to be careful..

x is: <class 'int'>
y is: <class 'float'>
x is now: <class 'str'>


### Assignment verses conditionals

first python "gotcha"

In [123]:
x = 4 # The assignment operator "=" creates a variable
x == 5 # The comparison operarator "==" is a type of conditional.

False

### Assignment & re-assignment

In [124]:
# re assign an existing variable
r = 1
r = r+5
print(r)

r2 = r
r2 = r2+5
print(r,r2)
# the original variable "r" stays the same. This doesnt happen with some python objects (DataFrame)

6
6 11


A variable must be assigned before it can be used for anything. Python runs as a script from top to bottom. Variables are case sensitive. This notebook uses extremely bad practices for variable names. Try to give variables descriptive names that give other programmers an indication of what the variable containes/is used for.

### Other conditionals

* == comparison (we've seen this)
* != not equal to (the same as <> in SQL)
* ">" greater than
* "<" less than
* ">=" greater than or equal to
* "<=" less than or equal to

Conditionals return (output) a boolean object (True/False)

In [125]:
x != 4
x > 4
x < 4
x >= 4
x <= 4

False

False

False

True

True

In [126]:
## the result of a comparison can be stored in a variable
z = 3 > 4
print(type(z))

## Assign references to values
z1 = True
z2 = False

<class 'bool'>


In [127]:
z1

True

In [128]:
not z1

False

## And/Or
### Similiar to SQL

In [129]:
print("True or False: ",True or False) #only one side needs to be true
print("True and False: ",True and False) #both sides need to be true
print("3 == 3 and 4==8: ", 3 == 3 and 4==8)
print("3 == 3 or 4==8: ", 3 == 3 or 4==8)

True or False:  True
True and False:  False
3 == 3 and 4==8:  False
3 == 3 or 4==8:  True


In [130]:
# same as
(3 == 3) & (4==8) #brackets can be used in python to enforce order of operations, the same as high school math...
(3 == 3) | (4==8)

False

True

## First Data Type: strings

In [131]:
#strings
## String conversion
zip_code = str(75064) # making strings like this is not the way to go. Save this one for changing types.
zip_code = '75064'
zip_code
type(zip_code)

'75064'

str

In [132]:
# strings are sliceable
# First character
zip_code[0] #python starts counting from zero
# Second character
zip_code[1]

'7'

'5'

Zero based indexing

[0,1,2,3,4]\
[7,5,0,6,4]

In [133]:
fname = 'John Smith'

fname[0:2] # First two letters
fname[-2:] # Last two letters

fname.lower()

'Jo'

'th'

'john smith'

In [134]:
try:
    zip_code[0] = '0'
except:
    # raise
    print('this wont work!')
#this wont work! Strings are immutable and cant be modified in place.
#immutable python objects mean that they cant be changed after they are created.

this wont work!


## Aside: try/except
The try/except error catching pattern is super important. We will probably talk about it a bit later, but its not critical for learning. Keep this pattern in mind when writing your own programs, especially when you are working with "unpredicatable" data or user input.

In [135]:
try:
    # the code here will be run
    None
except:
    # if the code above fails, then this code will be run
    None

In [136]:
## String literals are iterable. We havent talked about loops yet.
for t in zip_code:
    print(t)

7
5
0
6
4


In [137]:
## Modifying Strings
string1 = 'Crude oil exports -> 100,000 -> b/d -> 6/23/2020'
string1.upper()
string1.capitalize()
string1.lower()
string1.title()

'CRUDE OIL EXPORTS -> 100,000 -> B/D -> 6/23/2020'

'Crude oil exports -> 100,000 -> b/d -> 6/23/2020'

'crude oil exports -> 100,000 -> b/d -> 6/23/2020'

'Crude Oil Exports -> 100,000 -> B/D -> 6/23/2020'

In [138]:
string1.find('oil')
string1.startswith('Crude')
string1.endswith('Crude')

6

True

False

## Aside: Data Types

All the function we just used:

- upper()
- capitalize()
- lower()
- title()
- find()
- startswith()
- endswith()

Are specific to strings! All "instances" of strings have these "methods" attached to them by default. This may seem obvious and trivial but its the way we need to think of things from now on.

In [139]:
# what happens when we dont think about types.
thisisaNumber = 5
try:
    thisisaNumber.capitalize()
except:
    # raise
    print('this wont work!')

this wont work!


In [140]:
dir(string1) # dont do this, just google "how to capitalize a string in python"

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


In [141]:
## Split based on spaces
string1_list = string1.split("->")
string1_list

['Crude oil exports ', ' 100,000 ', ' b/d ', ' 6/23/2020']

In [142]:
#list comprehension, we'll talk about this later
string1_list = [x.strip() for x in string1_list]
string1_list

['Crude oil exports', '100,000', 'b/d', '6/23/2020']

In [143]:
## Join with a new separator
"_".join(string1_list)

'Crude oil exports_100,000_b/d_6/23/2020'

## Dates and Times

In [144]:
## Get todays date
import datetime
date_today = datetime.date.today()
date_today
type(date_today)

datetime.date(2021, 2, 16)

datetime.date

In [145]:
date_today.year
date_today.month
date_today.day

2021

2

16

### strftime and strptime

Convert strings to dates and dates to string

In [146]:
string_date = '2016-05-16'
date = datetime.datetime.strptime(string_date,'%Y-%m-%d')
date

datetime.datetime(2016, 5, 16, 0, 0)

In [147]:
#https://strftime.org/

In [148]:
#dateutil extends the functionality of python's date object
#relativedelta allows for adding and subtracting time from a python date
import dateutil.relativedelta as relativedelta
date

datetime.datetime(2016, 5, 16, 0, 0)

In [149]:
# relativedelta is one of many python date utility modules
date_plus1day = date + relativedelta.relativedelta(days=1)
date_plus1month = date + relativedelta.relativedelta(months=1)
date_plus1year = date + relativedelta.relativedelta(years=1)
date_plus1day
date_plus1month
date_plus1year

datetime.datetime(2016, 5, 17, 0, 0)

datetime.datetime(2016, 6, 16, 0, 0)

datetime.datetime(2017, 5, 16, 0, 0)

## Data Structures (containers or collections) & Basic Programming

What if we want to organize multiple python objects into a single object? We call this a 'collection' of objects. An example might be a list of customer_ids, or a list of addresses...  
There are multiple collections (or containers) in Python we can use. We will go over the basics here:

* Tuple
 * Immutable, fixed length sequence
* Lists
 * Mutable sequence of objects
* Sets
 * Unique collection of objects
* Dicts (dictionaries)
 * List of objects with a "key". The foundational data structure in JavaScript.
* DataFrame
 * Introduced in Pandas. We will talk about this one later.

## Aside: Data Structure gotcha

1) A common mistake that most new programmers make (including myself) is to use the list data structure for everything. Try to use the data structure that is most appropriate for the task.

2) Let your data structures do the heavy lifting. This will become more obvious, but when you take advantage of the unique properties that each data structure has to offer, your programs will become more powerful, flexible, and require alot less code.

3) Data structures are objects the same way that strings and number variables are objects. This means that data structures have methods attached to them that you can use! Dont re-invent the wheel. The list object has a sort method. Dont write your own code to sort a list because it has already been done for you!!

4) Pandas. Once we start talking about pandas and dataframes, dont ignore these data structures!

In [150]:
# there are two ways to make a new data structure
newDataStructure = []
newDataStructure = list()

### lists

In [151]:
# make a new list
colList = ["red","orange","yellow"]
colList
type(colList)

['red', 'orange', 'yellow']

list

In [152]:
colList.sort() # every list, no matter what it contains, has the .sort() method
print(colList)

['orange', 'red', 'yellow']


In [153]:
# just like strings, lists are sliceable
# First Element
colList[0]
# First and Second Element
colList[0:2]
# Last element
colList[-1]

'orange'

['orange', 'red']

'yellow'

In [154]:
list2 = ["a","b","c"]
list2.extend(["d","e"]) # add list to list
list2

['a', 'b', 'c', 'd', 'e']

In [155]:
### List functions
## Append an element
list2 = ["a","b","c"]
list2.append("d")
list2

['a', 'b', 'c', 'd']

In [156]:
list3 = [1,2,3]+list2 #add list to list.
list3

[1, 2, 3, 'a', 'b', 'c', 'd']

In [157]:
# check if a list contains an element
print("a" in list3)

True


## Before we move on to other data structures, lets talk about loops

The loop is a foundational component of every programming language. As a general rule: more that 1, use a for.

In [158]:
# for loop. Most common loop in python. Your code should be riddled with for loops
loopList = [1, 2, 3]
for listElement in loopList: # "listElement" becomes a variable. You can name it whatever you want!
    print("Current value in loop: ", listElement, type(listElement))

Current value in loop:  1 <class 'int'>
Current value in loop:  2 <class 'int'>
Current value in loop:  3 <class 'int'>


In [159]:
# Task: multiply each list element by 2
# wrong way to do this:
loopList[0] = loopList[0] * 2
loopList[1] = loopList[1] * 2
loopList[2] = loopList[2] * 2
print(loopList)

[2, 4, 6]


In [160]:
# This task is the perfect candidate for a loop!
# why doesnt this work? The "l" variable becomes seperate from the list
loopList = [1, 2, 3]
for l in loopList: # l is not a good variable name
    l = l * 2 # l becomes is own variable, and does not connect back to our original list!
print(loopList)

[1, 2, 3]


In [161]:
#There are two ways to do this properly

# use a counter. The counter "pattern" is extremely common in most programming languages
counter = 0
for l in loopList:
    loopList[counter] = l * 2
    counter = counter + 1
print("Counter pattern: ",loopList)

# use a new list
loopList = [1, 2, 3]
newList = []
for l in loopList:
    newList.append(l * 2)
print("new list pattern: ", newList)

Counter pattern:  [2, 4, 6]
new list pattern:  [2, 4, 6]


In [162]:
# an even better way. Enumerate works on "iterable" objects, like the list.
# less varaibles are better than more!
loopList = [1, 2, 3]
for counter,l in enumerate(loopList):
    loopList[counter] = l * 2
print(loopList)

[2, 4, 6]


### Aside: why start with python?

In [163]:
# this is a for loop in Javascript
# var i;
# for (i = 0; i < list.length; i++) {
#   text += list[i] + "<br>";
# }

### list comprehension. A pythonic take on the for loop

In [164]:
# pythonic way to do this. Less lines are better than more lines!
loopList = [1, 2, 3]
newList = [l * 2 for l in loopList]
print(newList)
# this pattern is worth knowing!

[2, 4, 6]


In [165]:
# more complicated list comprehension
[i*i for i in range(10) if i > 5]

# same as:
greaterThan5 = []
for i in range(10):
    if i > 5:
        greaterThan5.append(i*i)
print(greaterThan5)       

[36, 49, 64, 81]

[36, 49, 64, 81]


In [166]:
# what is range? It just creates an itterable object, similiar to a list. I rarely use range
print(range(10))

range(0, 10)


### Dictionaries

A dictionary, or dict, is a collection of objects that have been mapped to a key (a collection of key-value pairs). Dicts are inherently unordered, so we subset using the key.

Dictionaries are underutilized by alot of programmers. Use them!

In [167]:
prices = {'henry hub':[2.5, 2, 2.3 ,3.2],
          'aeco':[1.1, 0.9, 1.4, 1.6]}

In [168]:
print('aeco prices: ',prices['aeco']) # dictionaries use "key" based indexing

aeco prices:  [1.1, 0.9, 1.4, 1.6]


In [169]:
# why use a dictionary? I can do the same thing with a list of lists
pricesList = [
    ['henry hub',[2.5, 2, 2.3 ,3.2]],
    ['aeco',[1.1, 0.9, 1.4, 1.6]]
]
aecoPrices = pricesList[1][1]
print('aeco prices: ',aecoPrices)
# but now i need to remember that crude prices are the first element in the list! 
# This is what programmers would call unsafe. It could lead to errors in my program later on.

aeco prices:  [1.1, 0.9, 1.4, 1.6]


In [170]:
prices.keys()

dict_keys(['henry hub', 'aeco'])

In [171]:
prices.values()

dict_values([[2.5, 2, 2.3, 3.2], [1.1, 0.9, 1.4, 1.6]])

In [172]:
for key,value in prices.items():
    print(value)

[2.5, 2, 2.3, 3.2]
[1.1, 0.9, 1.4, 1.6]


In [173]:
# add to the dictionary
differential = []
for aeco,hh in zip(prices['aeco'],prices['henry hub']):
    differential.append(round(hh-aeco,2))
prices['differential'] = differential
print(prices)

{'henry hub': [2.5, 2, 2.3, 3.2], 'aeco': [1.1, 0.9, 1.4, 1.6], 'differential': [1.4, 1.1, 0.9, 1.6]}


### sets

In [174]:
# sets can only contain unique values. Why would we ever need this?
nonUniqueList = [1, 1, 2, 2, 3, 3]
# knowing what we know so far, how would i create a list that contains only the unique elements of "nonUniqueList"

In [175]:
uniqueList = []
for l in nonUniqueList:
    if l not in uniqueList:
        uniqueList.append(l)
print(uniqueList)

[1, 2, 3]


In [176]:
# let the data structures do the heavy lifting! Think about data structures before you dive into the code!
print(list(set(nonUniqueList)))

[1, 2, 3]


## Python logic control

### If-Else flows

The `if` statement checks a condition. If true, it executes the first code block. Additional if conditions can be checked with `elif`. If none of these are true, the `else` block is executed. 

In [177]:
# this is the same as excel
the_answer = 42

if the_answer == 42:
    print ("good job.")
elif the_answer == 0:
    print ("nice try.")
else:
    print ("better luck next time.")

good job.


In [178]:
## While Loops
counter = 5
while counter > 0: # is the statement true?
    print("Still Positive...i = " + str(counter))
    counter = counter - 1
    print(counter)

Still Positive...i = 5
4
Still Positive...i = 4
3
Still Positive...i = 3
2
Still Positive...i = 2
1
Still Positive...i = 1
0


## Functions

We should avoid writing the same code many times. *Computers are good at iteration :)*  
Thus, we can wrap code into functions for reuse.  

* Functions in Python:
 * Use `def()` to start
 * We define the function name, the parameters (inputs)
 * Whitespace is significant as always
 * Use `return` statement to return values (multiple allowed)


This is the general syntax to define a new function:

>`Def new_func(par1,par2,some_other_par="default:"):
    _do_some_stuff_
    return some_related_stuff`

Functions should do one thing. Dont put all your code into one function. Custom functions can be used within other custom functions.


Let's try a simple function!

In [179]:
# First Function
def mult_by_3(x, weight=1): #required vs optional parameters
    result = x*weight*3
    return result

mult_by_3(4)
mult_by_3(4,0.5)
mult_by_3(weight=0.25,x=4) #order doesnt matter if variable names are used

12

6.0

3.0

In [180]:
# think about the code we have written so far, what do you think would be useful to have in a function?
def uniqueList(nonUniqueList): #this is an example of better naming conventions
    return list(set(nonUniqueList))

## Aside: function gotchas

1) function names should be unique. Dont create two functions in your project that have the same name. Dont create an enumerate() function because this function already exists in python and will be confusing and possibly lead to errors.

2) functions shouldnt be too "long". Functions should do one thing, and should have a name that describes what the function does.

3) variables that are created in the function are "scoped" to that function. This is actually a good thing! We need to start thinking about something called "namespace". Variables that are defined in a cell outside of a function are "global" and variables that are defined inside a function are "local". There is nothing wrong with having global variables, but the best practice is to only have have global varables if they are needed globally.

4) Functions shouldnt have too many parameters. If your function start to have alot of parameters, it can get confusing for you and other programmers. Maybe the function should be split into seperate functions.

5) functions should be general.


## Function namespace

In [181]:
myGlobal = "  grant moss  "

def formatGrantMoss():
    return myGlobal.strip().capitalize() # I can access global variables inside functions!

print(formatGrantMoss())

# this is an example of an anti-pattern, or code that goes against best practices. What are the anti-patterns here?

Grant moss


In [182]:
# lets fix the first anti-pattern

def formatGrantMoss():
    myLocal = "  grant moss  "
    return myLocal.strip().capitalize() # I can access global variables inside functions!

print(formatGrantMoss())

try:
    print(myLocal)
except:
    print('cant access this variable!')
# this is better, because now I can use the "myLocal" variable name in other places

Grant moss
cant access this variable!


In [183]:
# lets fix the second anti-pattern

def formatName(name):
    return name.strip().capitalize()

print(formatName("  grant moss  "))

Grant moss


### Closure. A function within a function.
Creates a local function that has similiar benefits to a local variable.

Why use closures?
1) useful for resolving namespace problems. I can define "processText1" elewhere in my program without a conflict!
2) helps to make your code even more modular and readable.


In [184]:
# closure. A usefull pattern for resolving namespace conflicts

def processText1(text):
    print('Running globally defined processText1')

def formatName(name):
    
    def processText1(text):
        return text.strip()
    def processText2(text):
        return text.capitalize()
    
    name = processText1(name)
    name = processText2(name)
    name = processText1(name) # make extra sure that there is no whitespace..
    return name

print(formatName("  grant moss  "))

Grant moss


### Data structure challenge:

Given the following list, create a list of boolean values that determines if the first value is larger than or equal to the second:

Try solving with both a loop ad list comprehension seperately

In [185]:
list5 = [[1,3],[5,2],[24,5],[150,150]]

In [186]:
bool_list = []
for l in list5:
    if l[0] >= l[-1]:
        bool_list.append(True)
    else:
        bool_list.append(False)       
bool_list

[False, True, True, True]

In [187]:
[True if l[0]>=l[-1] else False for l in list5]

[False, True, True, True]

In [188]:
# other useful challenges
# https://www.practicepython.org/

### Part 1 Notes:

1) You may be wondering how all of this relates to data analysis. In the next session we will go over pandas, which is kind of like the excel of pandas. We will start to see more actual data stuff, but the patterns and syntax in this session will continue to show up.