In [1]:
%load_ext rpy2.ipython

# PYTHON PRIMER

This *Jupyter notebook* is meant to be used as a primer on the following topics:
- [Programming](#prog)
 - [Intro](#prog_intro)
 - [SQL vs Programming](#prog_sqlvsprog)
- [Jupyter Notebooks Basics](#jn)
 - [Introduction to Jupyter Notebooks](#jn_intro)
 - [Installing Jupyter Notebooks](#jn_install)
 - [Starting with Jupyter Notebooks](#jn_start)
 - [Notebook cells](#jn_cells)
   - [Adding/Removing cells](#jn_cells_add)
   - [Code cells](#jn_cells_code)
   - [Hypertext cells](#jn_cells_hypertext)
- [Python Basics](#py)
 - [Variables](#py_vars)
   - [Intro](#py_vars_intro)
   - [Variable assignment](#py_vars_assign)
   - [Variable names](#py_vars_names)
 - [Data Types](#py_dtypes)
   - [Intro](#py_dtypes_intro)
   - [Numerical](#py_dtypes_numerical)
   - [Strings](#py_dtypes_strings)
   - [Boolean](#py_dtypes_boolean)
 - [Data Structures](#py_dstructs)
   - [Intro](#py_dstructs_intro)
   - [Lists](#py_dstructs_lists)
   - [Dictionaries](#py_dstructs_dicts)
 - [Indexing](#py_index)
   - [Intro](#py_index_intro)
   - [Zero-based Index](#py_index_zero)
   - [Negative Index](#py_index_negative)
   - [Sequence Index](#py_index_sequence)
 - [Indentation](#py_indentation)
 - [Modules](#py_modules)
   - [Intro](#py_modules_intro)
   - [Installing Modules](#py_modules_install)
   - [Loading Modules](#py_modules_load)
   - [Aliasing Modules](#py_modules_alias)
 - [Functions](#py_funcs)
   - [Intro](#py_funcs_intro)
   - [Calling a Function](#py_funcs_call)
   - [Chaining Functions](#py_funcs_chain)
   - [Defining a Function](#py_funcs_define)
 - [Flow Control](#py_flow)
   - [If Else](#py_flow_ifelse)
   - [For Loops](#py_flow_for)
 - [Resources](#py_resources)

____

## Programming <a class="anchor" id="prog"></a>

**Intro** <a class="anchor" id="prog_intro"></a>

"Programming is the process of taking an algorithm (a set of instructions) and encoding it into a notation, a programming language, so that it can be executed by a computer."

Typically, programming languages are general-purpose. That is, programming languages allow you to do any kind of application. Anything from Chrome to Jabber, from Tableau to Odyssey was once coded (and it is still maintain) using one (or more) programming languages.

**SQL vs Programming** <a class="anchor" id="prog_sqlvsprog"></a>

While SQL may look like a programming language - after all, it encodes a set of instructions understood by a computer - some people debate it is not in fact a full-fledged programming language. This is because SQL's funcitonality is fairly limited.

In practice, SQL is only used along with a relational database in order to extract, transform or load data. While a programming language like Python or R are much more powerful and can be used to do more complex stuff like creating a Graphical User Interface, connecting to other type of data sources (reading excel files), scrapping websites, etc.

In general, **think of SQL as the tool to retrieve data from a database, and a programming language as a tool to consume and perform actions on said data.**

____

## Jupyter Notebooks <a class="anchor" id="jn"></a>

**Introduction to Jupyter Notebooks** <a class="anchor" id="jn_intro"></a>

As mentioned before, this document is called a *Jupyter notebook*.

Notebooks are a different way of organizing code while mixing it with other media resources to create documents that are rich in content. These documents are therefore more user-friendly, specially for presentations.

**Installing Jupyter Notebooks** <a class="anchor" id="jn_install"></a>

In order to start using *Jupyter notebooks*, the first thing needed is to install Python. I recommend installing *Anaconda*, a Python distribution that bundles many useful tools related to working with Python.

You can get *Anaconda* at __[https://www.anaconda.com/download/](https://www.anaconda.com/download/)__.

* Be sure to checkout *Miniconda* if you ever need a lighter version of the product.

One of the tools that come bundled in *Anaconda* is *Jupyter Lab*. *Jupyter Lab* is what I will be using during the presentation.

If you are looking for a similar GUI to that of *RStudio*, *Anacaonda* comes with *Spyder*:

- RStudio

![rstudio](rstudio.png)

- Spyder

![spyder](spyder.png)

**Starting Jupyter Notebooks** <a class="anchor" id="jn_start"></a>

To start *Jupyter Notebooks* you can start *Anaconda Prompt*...

![anacondaprompt](anacondaprompt.png)

Move to the directory you want as your working directory and input **jupyter labs** (or **jupyter notebook**)...

![jupyterlab](jupyterlab.png)

**Notebook Cells** <a class="anchor" id="jn_cells"></a>

A notebook consists of two types of cells:
    - Code cells: The content of these cells is parsed and runs by the Python interpreter.
    - Hypertext cells: These cells can contain anything - standard text, LaTeX, HTML, images, etc.

You can run a cell with code by selecting it (a selected cell will have a blue indicator to the left) and pressing *Ctrl + Enter* to execute the code and display output (if any).

Similarly, *Ctrl + Enter* will format the contents of a cell with hypertext.

To **switch between a code and hypertext** cell, select the cell and press __Y__ for code or __M__ for hypertext.

**Adding/Removing cells** <a class="anchor" id="jn_cells_add"></a>

A cell can be added to a notebook by selecting an existing cell (i.e. clicking on it) and pressing the keys:
* '**A**' if you want to add the cell **A**bove the selected cell.
* '**B**' if you want to add the cell **B**elow the selected cell.

**<font color='red'>Exercise:</font>**
Create a cell above and below this cell.

**Code cells** <a class="anchor" id="jn_cells_code"></a>

Behind each *Jupyter notebook* there is a Python interpreter. The interpreter is in charge of parsing and running the lines of code written in the *Code cells*. The output of the code (i.e. when using the print function) will be redirected and displayed in the notebook.

The the last line of code on each cell will always be printed (as long as it is not a variable assignment):

In [2]:
# Cell with variable assignment, no result printed
capacity = 180
res_hold = 99

In [3]:
# The last line of this cell is not a variable assignment
ldf = res_hold/capacity
ldf

0.55

Note that anything that you declare in one cell, will be common to the whole notebook (i.e. object definitions, variables, etc.).

You can explicitly print results to the notebook by applying the *print* function:

In [4]:
print('The legacy airlines are:')
for airline in ['AA','DL','UA']:
    print('-',airline)
'These are all based in the US'

The legacy airlines are:
- AA
- DL
- UA


'These are all based in the US'

In order to display the documentation of functions packages and modules you can always use the ? operator.

In [5]:
?str.replace

[1;31mDocstring:[0m
S.replace(old, new[, count]) -> str

Return a copy of S with all occurrences of substring
old replaced by new.  If the optional argument count is
given, only the first count occurrences are replaced.
[1;31mType:[0m      method_descriptor


**Hypertext cells** <a class="anchor" id="jn_cells_hypertext"></a>

Hypertext cells are pretty flexible, they accept raw text as well as text in other formats.

- Raw Text is parsed as Markdown (a short-hand form of HTML):  
*This text* contains **many elements** of ~~markdown~~

- Tables can also be written using markdown:

| Tables        | Are           | Cool  |
| ------------- |:-------------:| -----:|
| col 3 is      | right-aligned | $1600 |
| col 2 is      | centered      |   $12 |
| zebra stripes | are neat      |    $1 |

- TeX for Math functions can be included: 
$-b \pm \sqrt{b^2 - 4ac} \over 2a$  
$x = a_0 + \frac{1}{a_1 + \frac{1}{a_2 + \frac{1}{a_3 + a_4}}}$  
$\forall x \in X, \quad \exists y \leq \epsilon$

* Inline HTML can be written as well: 

<style>
th {
background-color:#55FF33;
}
td {
background-color:#00FFFF;
}
</style>

<table><tr><th>bar</th><th>bar</th></tr><tr><td>foo</td><td>foo</td></tr></table>

- Images can be added from the web with HTML:  
<img src="https://smlogin.aa.com/login/images/aa-logo.gif" alt="American Airlines Logo">

- Or from a local source with markdown:  
![aa-livery](American-Airlines.jpg)

Other media can be added by using code cells...

In [6]:
from IPython.display import YouTubeVideo
YouTubeVideo('dQw4w9WgXcQ')

_____

# Python Basics <a class="anchor" id="py"></a>

## Variables <a class="anchor" id="py_vars"></a>

**What is a variable** <a class="anchor" id="py_vars_intro"></a>

Variables are nothing but __reserved computer memory locations to store values__. This means that when you create a variable you reserve some space in a computer's memory.After a variable is created, you can refer to the stored value by utilizing the name of the variable.

**Variable assignment** <a class="anchor" id="py_vars_assign"></a>

A variable is created with an **assignment command**

In [7]:
my_first_variable = 5
my_first_variable

5

A variable can be created from other variables

In [8]:
my_second_variable = my_first_variable + my_first_variable
my_second_variable

10

A variable can be reassigned a value with another assignment command

In [9]:
my_first_variable = my_first_variable + 1
my_first_variable

6

In [10]:
# Notice how reassigning the value of my_first_variable does not change the value of my_second_variable
my_second_variable

10

**<font color='red'>Exercise:</font>** Declare a variable *my_name* with your name, and a variable *my_age* with your age.

In [11]:
# PUT YOUR CODE HERE




# =)

**Variable Names** <a class="anchor" id="py_vars_names"></a>

A variable can have a short name (like x and y) or a more descriptive name (age, carname, total_volume). Rules for Python variables:  
* A variable name must start with a letter or the underscore character
* A variable name cannot start with a number
* A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ )
* Variable names are case-sensitive (age, Age and AGE are three different variables)

In [12]:
ten = 10
TEN = 'ten'
TeN = 10000
tEn = 0.10

In [13]:
# printing the values on each variable defined in the previous cell
ten, TEN, TeN, tEn

(10, 'ten', 10000, 0.1)

## Data Types <a class="anchor" id="py_dtypes"></a>

**Intro** <a class="anchor" id="py_dtypes_intro"></a>

Objects in Python can be of many different data types, the most basic and common being numerical, strings and booleans.  
An object's type will determine what kind of operations can be applied to them.

**Numerical** <a class="anchor" id="py_dtypes_numerical"></a>

We've seen the numerical data type before. Numerical data-type objects (values/variables) can be used in mathematical operations.

In [14]:
# CREATING TWO NUMERICAL VARIABLES
a = 2.5 # A quantity with decimals is called a Float
b = 2 # A quantity without decimals is called an Integer

# ADDITION
print(a + b)
# SUBSTRACTION
print(a - 5)
# MULTIPLICATION
print(a * b)
# DIVISION
print(a / b)
# EXPONENTIATION
print(a ** b)

# COMBINATION
c = (a ** 2 + b **2) ** (0.5)
c

4.5
-2.5
5.0
1.25
6.25


3.2015621187164243

**Strings** <a class="anchor" id="py_dtypes_strings"></a>

A string variable is a variable that holds text. As expected, mathematical operations will not work for string type variables. Nevertheless, string variables have other useful operations.

In [15]:
destination = 'DFW'
origin = 'GDL'

# CONCATENATION
market = origin + destination
print(market)
# REPETITION
print(origin * 5)

GDLDFW
GDLGDLGDLGDLGDL


**Booleans** <a class="anchor" id="py_dtypes_boolean"></a>

Boolean variables can only have two values: True or False. These kind of values are very useful in controlling the flow of your code. As with other data types, booleans can be used in (boolean-specific) operations.

In [16]:
# AND
print(True and True)
# OR
print(True or False)
# NOT
print(not True)

True
True
False


## Data Structures <a class="anchor" id="py_dstructs"></a>

Data structures are a more complex Data Type. They most common data structures used in python ar lists and dictionaries. They both help store multiple values in a single structure.

**Lists** <a class="anchor" id="py_dstructs_lists"></a>

Lists are objects that can store multiple values on them while keeping the order of the elements. Consecutively, these values can be retrieved and/or updated at will.

A list can be created by enclosing the comma-separated elements in square brackets.

In [17]:
gdl_markets = ['GDLDFW', 'GDLPHX', 'GDLCLT']
gdl_feb_flights = [56, 28, 26]

In [18]:
gdl_markets

['GDLDFW', 'GDLPHX', 'GDLCLT']

In [19]:
gdl_feb_flights

[56, 28, 26]

Retrieving values from lists can be done through *indexing*. Indexing is done by list followed by a number within square brackets, the number - the index - is the position of the element to retrieve.

As with other programming languges, Python is *zero-indexed*, that is, the first element is stored in position 0.

In [20]:
# First element on list
print(gdl_markets[0], ' has ', gdl_feb_flights[0], ' in February 2019.')
# Second element in list
print(gdl_markets[1], ' has ', gdl_feb_flights[1], ' in February 2019.')
# Third element on list
print(gdl_markets[2], ' has ', gdl_feb_flights[2], ' in February 2019.')

GDLDFW  has  56  in February 2019.
GDLPHX  has  28  in February 2019.
GDLCLT  has  26  in February 2019.


**Dictionaries** <a class="anchor" id="py_dstructs_dicts"></a>

Similar to lists, dictionaries hold multiple values. Unlike lists, values in a dictionary are not associated to a numerical position. A dictionary stores its values in key-value pairs.

In [21]:
gdl_feb_pax = {'GDLDFW':7168, 'GDLPHX' : 3584, 'GDLCLT' : 1976}

To retrieve a value from a dictionary, you must follow the dictionary with a key surrounded by square brackets.

In [22]:
# Retrieve GDLDFW
print('GDLDFW', ' has a Feb 2019 max capacity of ', gdl_feb_pax['GDLDFW'])

GDLDFW  has a Feb 2019 max capacity of  7168


If a key does not exist in a dictionary, Python will return an error.

In [23]:
gdl_feb_pax['GDLORD']

KeyError: 'GDLORD'

* It is important to note that there are many other Data Structures, this is only a sample!

## Indexing <a class="anchor" id="py_index"></a>

**Intro** <a class="anchor" id="py_index_intro"></a>

When it comes to indexing (retrieving elements from a data structure) Python has some pretty neat in-built tricks.

**Zero-Based Indexing** <a class="anchor" id="py_index_zero"></a>

Python, unlinke R, is zero-indexed. That means that the first element in data structures like strings (similar to *R*
s *character* class) or lists (similar to *R* vectors) can be retrieved by uxing the indexing operation *[0]*.

In [30]:
y_cabin_string = 'OQNSGVWLMKHY'
print('The first inventory in the Y cabin is', y_cabin_string[0])
y_cabin_list = ['O', 'Q', 'N', 'S', 'G', 'V', 'W', 'L', 'M', 'K', 'H', 'Y']# we colud have also utilized list(y_class)
print('The second inventory in the Y cabin is', y_cabin_string[1])

The first inventory in the Y cabin is O
The second inventory in the Y cabin is Q


**Negative Indexing** <a class="anchor" id="py_index_negative"></a>

A handy thing that sets apart *Python* from other languages is *negative indexing*. With it, you can refer to items from last to first as opposed to the traditional first to last.

For example, to retrieve the last element of a list without knowing the length of the list in R you had to do something similar to:

In [31]:
%%R -i y_cabin_list
y_cabin_list[length(y_cabin_list)]

While in Python you can do the following:

In [32]:
y_cabin_list[-1] #Getting the last element of Y CABIN

'Y'

**Sequence Indexing** <a class="anchor" id="py_index_sequence"></a>

If you wanted to grab specific subsets of a data structure like a list you could use *sequence indexing*:

In [33]:
y_cabin_list[:5] # Firsf five inventory classes in Y CABIN

['O', 'Q', 'N', 'S', 'G']

In [34]:
y_cabin_list[-5:] # Last five inventory classes in Y CABIN

['L', 'M', 'K', 'H', 'Y']

In [35]:
y_cabin_list[5:10] # A specific subset of inventory classes in Y CABIN

['V', 'W', 'L', 'M', 'K']

You could even do something more complicated like:

In [36]:
numbers = [0,1,2,3,4,5,6,7,8,9]
even_numbers = numbers[::2]
print(even_numbers)
multiples_of_three = numbers[3::3]
print(multiples_of_three)

[0, 2, 4, 6, 8]
[3, 6, 9]


## Indentation <a class="anchor" id="py_indentation"></a>

One of Python's must distinctive featuers is that it enforces human readability by getting rid of parentheses where other programming languages would use them. Instead, Python rely on indentations.

In [37]:
%%R
# Code in R for FizzBuzz
solveFizzBuzz = function(x) {
  for (i in seq(1,x)) {
    response = ''
    if (i %% 3 == 0) {
      response = paste0(response, 'Fizz')
    }
    if (i %% 5 == 0) {
      response = paste0(response, 'Buzz')
    }
    if (i %% 3 != 0 & i %% 5 != 0) {
      response = as.character(i)
    }
    print(response)
  }
}

In [38]:
# Code in Python for FizzBuzz
def solveFizzBuzz(x):
    for i in range(1,x+1):
        response = ''
        if i % 3 == 0:
            response += 'Fizz'
        if i % 5 == 0:
            response += 'Buzz'
        if i % 3 != 0 and i % 5 != 0:
            response = str(i)
        print(response)    

In [39]:
solveFizzBuzz(15)

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz


## Modules <a class="anchor" id="py_modules"></a>

**Intro** <a class="anchor" id="py_modules_intro"></a>

*Modules* are the same as R *libraries*. They are basically code written by other people with specialized functions/objects that will make your life easier as an analyst/coder.

**Installing modules** <a class="anchor" id="py_modules_install"></a>

To install any module, you can type **conda install YOUR_MODULE** on the *Anaconda Prompt* or run the following code cell:

In [40]:
# The following line is commented to avoid installing an already installed module.
#!conda install YOUR_MODULE 

**Loading modules** <a class="anchor" id="py_modules_load"></a>

There are different ways to load a module. They are all different to enforce granularity. We'll review a few ones.

* Loading a full module:

In [41]:
import datetime

In [42]:
datetime.datetime.now() # We are calling the function now inside the time submodule inside the datetime module

datetime.datetime(2019, 1, 14, 10, 53, 10, 912562)

* Loading a specific submodule:

In [43]:
from datetime import datetime, timedelta

In [44]:
datetime.now() # Running the same function after loading only the datetime submodule

datetime.datetime(2019, 1, 14, 10, 53, 11, 575556)

**Aliasing** <a class="anchor" id="py_modules_alias"></a>

Aliasing is the process of loading a module under an alias to save typing.

In [45]:
from datetime import datetime as dt

In [46]:
dt.now()

datetime.datetime(2019, 1, 14, 10, 53, 12, 509556)

## Functions <a class="anchor" id="py_funcs"></a>

**Intro** <a class="anchor" id="py_funcs_intro"></a>

Functions are just pieces of code that can be run just by typing the function name. A function can accept any number of parameters and can return any number of values.
Functions in *Python* work exactly the same as in R.

**Calling a function** <a class="anchor" id="py_funcs_call"></a>

Similar to *R*, you can call a function by its name followed by a set of parantheses with any parameters.

In [47]:
airport_codes = 'ABE,    BWI,  CLT,   DFW,       EWR'
airport_codes = airport_codes.replace(' ', '') # Substituting a substring (or regex) with another string
print(airport_codes)
airport_codes = airport_codes.split(sep=',') # Separate string by a substring (or regex)
print(airport_codes)

ABE,BWI,CLT,DFW,EWR
['ABE', 'BWI', 'CLT', 'DFW', 'EWR']


**Chaining functions** <a class="anchor" id="py_funcs_chain"></a>

You can achieve complex operation with one-liners by using *Python*'s chaining capabilities.

In [48]:
airport_codes = 'ABE,    BWI,  CLT,   DFW,       EWR'
airport_codes.replace(' ','').split(',')

['ABE', 'BWI', 'CLT', 'DFW', 'EWR']

In order to chain functionis, the resulting class of the left function must implement the method on the right.

**Defining functions** <a class="anchor" id="py_funcs_define"></a>

In order to define/create your own functions, you use the **def** operator. If our function is to return a value, then we use the **return** keyword.

In [49]:
def directional2alpha(directional):
    orig, dest = [directional[:3], directional[3:]] # This is called variable unpacking, we'll reference it later
    alpha = ''.join(sorted([orig,dest]))
    return alpha

The function can now be called by using its name followed by the correct parameters inside parantheses:

In [50]:
print(directional2alpha('GDLCLT'))
print(directional2alpha('CLTGDL'))

CLTGDL
CLTGDL


## Flow Control <a class="anchor" id="py_flow"></a>

**If Else Statements** <a class="anchor" id="py_flow_ifelse"></a>

An If Else statement lets you run selected lines of coded according to the result of a logical expression.

An If Else statement always start with the *if* keyword, followed by the logical expression and a colon (:). The following indented lines will only be run if the expression between the *if* keyword and the *colon* evaluates to true. If there are other conditions, the keywords *elif* (short for else if) and *else* can be used.

In [51]:
pax = 110
cap = 130

if pax/cap > 0.90:
    print('Strong Flight')
elif pax/cap > 0.80 <= 0.90:
    print('Normal Flight')
else:
    print('Weak Flight')

Normal Flight


**For Loops** <a class="anchor" id="py_flow_for"></a>

*For loops* allow you to run a couple of code lines a fixed set of times. This is usually used to iterate over the values of a list.

Usually, *For loops* contain the following elements:
1. the key-word *follow*
2. a variable name
3. the key-word *in*
4. a list (or the *range* function) on which to iterate
5. a colon (:)  
The following indented lines will be run as many times as there are elements on the specified list. Furthermore, for each iteration of the code, the variable specified on step 2 will take the value of the n-th element in the list.

In [52]:
hubs = ['LAX','PHX','DFW','MIA','CLT','DCA','JFK','LGA','ORD','BOS']
for hub in hubs:
    print("One of AA's hubs is", hub)

One of AA's hubs is LAX
One of AA's hubs is PHX
One of AA's hubs is DFW
One of AA's hubs is MIA
One of AA's hubs is CLT
One of AA's hubs is DCA
One of AA's hubs is JFK
One of AA's hubs is LGA
One of AA's hubs is ORD
One of AA's hubs is BOS


## Resources <a class="anchor" id="py_resources"></a>

This was just a primer for Python. There are many other concepts that are useful for coding with Python but basic programs can be written with this knowledge. If you want to learn more please refer to:

**Markup Cheatsheet**

This is a great resource to format your hyper-text cells: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

**Codeacademy: Learn Python 3**

A nice interactive, free, well-rounded Python course: https://www.codecademy.com/learn/learn-python-3

**Datacamp: Python Programmer**  
This free, interactive online course is a must for any new aspring data analyst: https://www.datacamp.com/onboarding/learn?technology=python  
If you are interested in more in-depth analysis classes, take a look at their other offerings.

**Automate the boring stuff**

Great free online book for beginners with cool step-by-step projects that will help you automate tasks like webscrapping or even playing games like candy-crush: https://automatetheboringstuff.com/