# INTRODUCTION TO PYTHON

---
January, 2018   -- Version 2.7  
Maria L Zamora Maass  


<!-- <div class="alert alert-block alert-info">   -->

<!-- other options:  alert-warning alert-success alert-danger -->

Here are some basic ideas to cover with Python: 

1. Python and Jupyter notebooks (terminal commands, formats, widgets) 
2. Basic data structures and classes
3. Control flow and relational operators
4. Functions and development modules (Ipython enhancement and Pydev)


Reference texts and links:

* Learning Python, 5th Edition by Mark Lutz, O'Reilly Media, 2013, ISBN 978-1-4493-5573-9
* Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin, Prentice Hall, 2008, ISBN 000-0-1323-5088-2
* Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinny, O'Reilly Media, 2012,
ISBN 978-1-4493-1979-3
* [Codeacademy](https://www.codecademy.com/learn/learn-python) -  Introduction to Data Analysis 10 weeks

## 1. Python and Jupyter notebooks 
<!-- ## <font color=blue>1. Python and Jupyter notebooks </font>  -->

<!-- <div class="alert alert-block alert-info">  -->

On one side, **Python**. This language is one the most frequently used, specially for Data Science. From this language, multiple languages, features and packages (a.k.a. modules or libraries) have been created: Ipython, Pandas, Numpy, Matplotlib, and more. 

On the other side, **Jupyter notebooks**. This platform is "an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media". It's basically an app that allows you to create and share documents with live and static code that helps for tasks like data cleaning, data transformation, numerical simulation, statistical modeling, and machine learning. 

***

***

In particular, IPython notebooks are made up of **cells**. There are mainly 2 types of cells: *Markdown and Code cells*. You can edit a cell by double clicking on it. You can get it back to the display mode (run a cell) by pressing the Play ( ▶|) button, and you can also stop it with the square (◼︎).


> More details: [IPython notebooks](http://jupyter.org/)

<!-- <div class="alert alert-block alert-info">  -->


What are **"markdown"** cells for (like this one)? These spaces can help us with formatting, formulas (equations), static code, images, tables, links and more.

<br>
Please double-click on this cell to see the markdown version.

<br>
<b>Tex Math equations</b> 

" s={\sqrt {\frac {\sum _{i=1}^{N}(x_{i}-{\overline {x}})^{2}}{N-1}}} "

<br>
 $ s={\sqrt {\frac {\sum _{i=1}^{N}(x_{i}-{\overline {x}})^{2}}{N-1}}} $   

<br>
<br>
<b> Tables</b>

 Type | Qty | Price
--- | --- | ---
10YR | 2M | 100.2
5YR | 1.5M | 100.3
2YR | 4M | 100.1
3YR | 1.5M | 100.2


<br>
<br>
<b> Other type of code </b>
 ```javascript
var s = "JS string example";
alert(s);
```

<br>
<br>
<b> Links </b>
<a href="https://www.youtube.com/watch?v=lmoNmY-cmSI" target="_blank"><img 
src="video.png" 
width="240" height="180" border="10" /></a>

<br>
<br>
<b> Images </b>
<img src="images/markdown.jpg" width="400" height="400" border="10" />



***

*** 

<!-- <div class="alert alert-block alert-info">  -->

Now, let's see a **"code"** cell. This could be Python code or even terminal commands. Each time we run a cell, our notebook will store those values for the following cells (look at the numbers on the left of each cell, that is basically the Input or Output reference).

In [1]:

##### Terminal commands

!printf "---------Disk File system:"
!df -haT 
!printf " "

!printf "---------Let's find TMP:"
!df -haT | grep "tmp" 
!printf " "

!printf "---------Net use command:"
!net use


---------Disk File system:
Filesystem                           Type    Size  Used Avail Use% Mounted on
C:/Program Files/Git                 ntfs    477G  179G  299G  38% /
C:/Program Files/Git/usr/bin         -          -     -     -    - /bin
C:/Users/MZAMOR~1/AppData/Local/Temp -          -     -     -    - /tmp
C:                                   -          -     -     -    - /c
F:                                   netapp  5.7T  4.9T  874G  86% /f
G:                                   netapp  2.4T  1.8T  617G  75% /g
J:                                   ntfs    137G   41G   97G  30% /j
M:                                   ntfs    300M   43M  258M  15% /m
 
---------Let's find TMP:
C:/Users/MZAMOR~1/AppData/Local/Temp -          -     -     -    - /tmp
 
---------Net use command:
New connections will be remembered.


Status       Local     Remote                    Network

-------------------------------------------------------------------------------
             F:        \\ad.j

In [2]:

##### Python using Jupyter notebook

## Python variable (no need to declare)
var_python = !df -haT 

## Attributes of the Python variable
print "\n---------Python variable(list)"
print var_python.fields()
print "\n---------Python grep (TMP) "
print var_python.grep("tmp")

!printf "\n---------Python variable in the terminal"
!echo "{var_python}"   
##!echo "$var_python"

##### Python using library called "subprocess" (modules)
##### https://docs.python.org/2/library/subprocess.html#module-subprocess

print "\n---------Call the terminal using modules"
import subprocess

print subprocess.check_output( [ "ls" ], shell=True ) 



---------Python variable(list)
[['Filesystem', 'Type', 'Size', 'Used', 'Avail', 'Use%', 'Mounted', 'on'], ['C:/Program', 'Files/Git', 'ntfs', '477G', '179G', '299G', '38%', '/'], ['C:/Program', 'Files/Git/usr/bin', '-', '-', '-', '-', '-', '/bin'], ['C:/Users/MZAMOR~1/AppData/Local/Temp', '-', '-', '-', '-', '-', '/tmp'], ['C:', '-', '-', '-', '-', '-', '/c'], ['F:', 'netapp', '5.7T', '4.9T', '874G', '86%', '/f'], ['G:', 'netapp', '2.4T', '1.8T', '617G', '75%', '/g'], ['J:', 'ntfs', '137G', '41G', '97G', '30%', '/j'], ['M:', 'ntfs', '300M', '43M', '258M', '15%', '/m']]

---------Python grep (TMP) 
['C:/Users/MZAMOR~1/AppData/Local/Temp -          -     -     -    - /tmp']

---------Python variable in the terminal
"['Filesystem                           Type    Size  Used Avail Use% Mounted on', 'C:/Program Files/Git                 ntfs    477G  179G  299G  38% /', 'C:/Program Files/Git/usr/bin         -          -     -     -    - /bin', 'C:/Users/MZAMOR~1/AppData/Local/Temp -       

## 2. Basic data structures 

<!-- ## <font color=blue>2. Basic data structures </font>  -->

<!-- <div class="alert alert-bl?ock alert-info">  -->

**2.1** Most of the built-in **structures** correspond to the **collections** module.

* Dictionary: 
    * { } or dict( )
    * We must try to use {} since it's faster, it will push values directly while "dict" uses LOAD_NAME to get the associated object and then CALL_FUNCTION to load.
    * {key1: value1, key2: value2}  
    * {prices: [100, 101, 109], action: ["buy","sell"]}
    * Pair made of a key and a value, similar to objects (instances with a value for each attribute).
    * Similar to JSON but instead of a 'string' format, this is a structure with memory (in-memory object).


* Lists
    * [ ] or list( )
    * [101, 100, 101.5, 103, 108]
    * Ordered sequences of objects, they can be extended or reduced. Similar to arrays.
    * LIFO Stacks (last-in, first-out) using methods pop() and append()
    * FIFO Queues (first-in, first out) using _collections.deque_ methods like popleft
    

* Tuples
    * ( "price", "action" )
    * ( 100, "buy" )
    * Ordered sequences of objects, immutable. Similar to arrays.
    * Faster than lists and they can be used in dictionaries.
    * { ("price","action"): (100,"buy"), ("hour","minute"): (14,58) }
    

* Sets
    * set( [100, 101.2, 100, 101, 100] )
    * Mutable unordered sequence of _unique_ elements. 
    * Similar to arrays but without duplicated values (good for ID's)
    
   
   
   
    
Now, some abstract implemented with **Queue** or **heapq** modules

* Priority queues ()
    * They might be constructed as lists but we'll get lowest values first.
    * In other words, high priority is served first and then, when two elements have same priority, they are served per order in the queue.
    

**If you want to see more.. you can import for example collections:**

**Write a dot and then use "tab" to see all options available.. or just type \_\_dict\_\_**

**Try other modules, Python is all about modules!! **


In [3]:

import collections

# collections.
# collections.__dict__


In [4]:
### Let's import one more module
import timeit

print timeit.timeit("{}")
print timeit.timeit("dict()")


0.0421159803666
0.169504540305


In [5]:

### Queue and priority example
from Queue import PriorityQueue

q = PriorityQueue()   # Remember: lowest first
q.put( (101.5,"buy","trader ZG") )
q.put( (101.1,"buy","trader ZG") )
q.put( (101.3,"buy","trader DB") )
q.put( (101.5,"buy","trader DB") )

while not q.empty():
    print q.get()
    

(101.1, 'buy', 'trader ZG')
(101.3, 'buy', 'trader DB')
(101.5, 'buy', 'trader DB')
(101.5, 'buy', 'trader ZG')


> Queue is useful for threaded programming [take a look here](https://docs.python.org/3/library/queue.html)

> To define priorities you can use [heapq](https://docs.python.org/2/library/heapq.html) too

<!-- <div class="alert alert-block alert-info">  -->

**2.2** Now let's see **classes and functions**. There are two ways to create classes: 

- Traditional option: Define each attribute using "self" definition (this automatically creates a "\_\_dict\_\_")
- Using "slots": Declare name of the attributes first.

> What's the difference? 
By default each object has a *\_\_dict\_\_* with all attributes, this means memory space for this structure. If we want to save space because only some instances will be created, we can use "slots" but must be aware of the following constraints: While **\_\_slots\_\_** is read-only tuple, a **\_\_dict\_\_** this is a read and write structure (literally a dictionary). Also, \_\_slots\_\_ declaration is only for the class defined, not for subclasses.


> All details about [classes](https://docs.python.org/2.7/tutorial/classes.html)

> More details about slots: [documentation](https://docs.python.org/3/reference/datamodel.html#slots)


In [6]:
class sampleClass(object):
    
    def __init__(self, *args, **kwargs):
        self.price = 101.64
        self.action = "buy"

        
a = sampleClass()
print a.__dict__
print a.__dict__['price']


{'action': 'buy', 'price': 101.64}
101.64


In [7]:
class sampleClassSlots(object):
    
    ## Attributes as "descriptors"
    __slots__ = ('price', 'action') 
    
    def __init__(self, *args, **kwargs):
        self.price = 101.64
        self.action = "buy"

        
b = sampleClassSlots()
print b.__slots__
print b.price


('price', 'action')
101.64


In [8]:

### Time in seconds: number of seconds, WE DON'T NNED TO IMPORT AGAIN !!!!!!!!!!!

print timeit.timeit( sampleClassSlots )
print timeit.timeit( sampleClass )


0.513779768102
0.600130114014


In [9]:

### Let's see the size in BYTES, we can "sys" module (a very important one)
import sys

print sys.getsizeof( a.__dict__ )
print sys.getsizeof( b.__slots__ )


272
64


In [10]:

### Let's try to add more attributes:

a.__dict__['hour'] = 12
print a.__dict__

b.__slots__ = ('price', 'action', 'hour') 
print b.__slots__


{'action': 'buy', 'price': 101.64, 'hour': 12}


AttributeError: 'sampleClassSlots' object attribute '__slots__' is read-only

## 3. Control Flow and relational operators

<!-- ## <font color=blue>3. Control Flow and relational operators</font>  -->

<!-- <div class="alert alert-block alert-info">  -->

At this point, you might have noticed that in Python we don't need to declare variables (they are implicitly declared once we assign values) and we don't use punctuation marks. But one more particular aspect: Python identify blocks with **indentation**. One line indented would be expected to correspond to the previous line. For this, the interpreter can handle either spaces or tabs (if we want to go formal we should use 4 spaces).

_Note: You can use backslash \ to split commands into 2 or more lines !!_

*** 

***

Once we know that, we can see the different ways in which we can work with basic control flows:

- If statement  
    - If *statement* : ... 
    - elif (*statement*) : ... 
    - else : ...
- For statement
    - for i in range(n) : ... i ...
- While statement
    -  while *statement* : ...



In those crotrols we can always use: **pass, continue or break**



In [None]:
#### Data to use

prices = [100, 101, 99.8, 99.5, 101.2, 100.5, 101.2, 102.2]
benchmark = 100


In [None]:

allPricesToUse, exceptionPrices = [], []

for i in prices:
    if i > benchmark: allPricesToUse.append(i)
    else: exceptionPrices.append(i)
    
print "Success", allPricesToUse, "Fail", exceptionPrices
    

<!-- <div class="alert alert-block alert-info">  -->

> There is also a shorter way to do this using **list comprehension**. 



In [None]:

### One line per list
allPricesToUse = [ i for i in prices if i > benchmark ]
exceptionPrices = [ i for i in prices if i not in allPricesToUse]
    
print "Success", allPricesToUse, "Fail", exceptionPrices


In [None]:

### Combinations (joints) -> two loops for one list
joints = [ (x,y) for x in allPricesToUse  for y in exceptionPrices ]
    
print "\n", "Joints", joints



> This method is also available for **dictionaries** and **sets**.




In [None]:


allPricesToUse =  { i for i in prices if i > benchmark }
exceptionPrices = { num: price for num,price in enumerate(prices) if price < benchmark}

print "\n", "Successes in a Set (no duplicates)  - ", allPricesToUse, "\n"
print "Fails in a dictionary  - ", exceptionPrices



> More Control flow specifications [documented here](https://docs.python.org/3/tutorial/controlflow.html)


<!-- <div class="alert alert-block alert-info">  -->

Here are the rest of the relational operators so we can use them in control flows or any other process:

- Equal ==
- strictly less than < 
- less than or equal <=
- strictly greater than >
- greater than or equal >=
- not equal !=
- object identity "is"
- negated object identity "is not"
- "and", also known as "&" 
- "or", also known as "|"


## 4. Functions and development modules

<!-- ## <font color=blue>4. Functions and development modules</font> -->

<!-- <div class="alert alert-block alert-info">  -->

**4.1 FUNCTIONS** 


To create procedures we only have to include the word "def" and their inputs using parenthesis (or an empty one). We can pass only input values, specific names or even dictionaries usin \* or \*\* . Look at the following examples:



In [None]:

def myArgsFunctions(internal1, internal2, internal3, internal4):
    print (internal1 + 1, internal2 + 2, internal3 + 3, internal4 + 4  )

    
### Traditional input
myArgsFunctions(10,200,30,4000)

### Traditional input without order
myArgsFunctions(internal2=200, internal1=10, internal4=4000, internal3=30)
    
### Input sent as dictionary 
orderedArg = (10,200)
dictArg = {'internal3':30, 'internal4':4000}

myArgsFunctions(*orderedArg , **dictArg)


In [None]:

def myArgsFunctions(**args):
    print args['internal1']+1 

    
### Traditional input
myArgsFunctions(internal2=200, internal1=10, internal4=4000, internal3=30)
    
### Input sent as dictionary 
dictArg = {'internal1':10, 'internal3':30, 'internal4':4000}
myArgsFunctions(**dictArg)


<!-- <div class="alert alert-block alert-info">  -->

> We can also pass functions and call them locally.


In [None]:

def firstFunction(price,action):
    print "%s at %d"%(action, price)

def secondFunction(firstFunction,inputValues):
    firstFunction(*inputValues)
    
values = (101,"buy")
secondFunction( firstFunction, values )


<!-- <div class="alert alert-block alert-info">  -->

> Now let's take a look to functions with 'return' values.


In [None]:

externalVar = "buy"

def myFunction(internalVar):
    internalVar = str(internalVar)
    internalVar = internalVar + " - " + externalVar
    return internalVar

print myFunction(101.32)



<!-- <div class="alert alert-block alert-info">  -->

> In this last example look at how **type conversion** (typecasting) works, literally with the words "str", "int" or "float" ( remember this from how we transformed a list into a set, same concept! ).

> See how we concatenate strings simply by adding **"+"** them. In the following example, we'll see how we add variables using the **+=** command.

> Moreover, we must know that, by default, variable are local. You can definitely use **global** and **local** variables, but make sure you're not duplicating names, otherwise local variables will replace externals. 


In [None]:

externalVar = "buy"
internalVar = 101.32

def myFunction(internalVar):

    internalVar  = str(internalVar)
    internalVar += " - " + externalVar
    return internalVar

print myFunction(104)


> See how we can't change variables outside the function, even using the same name

In [None]:

externalVar = "buy"
internalVar = 101.32

def myFunction(internalVar):
    
    internalVar  = str(internalVar)
    internalVar += " - " + externalVar
    return internalVar


print myFunction(104)
print internalVar


<!-- <div class="alert alert-block alert-info">  -->

**4.2 PYDEV** 

One relevant aspect for Python development is **PyDev**, a Python IDE (plugin) for Eclipse.

<br>

Once we have Python and Ecplise, we must define location of your Python installation in Eclipse:
* Open in the Window ▸ Preference ▸ Pydev ▸ Interpreter Python menu.


Now we can create a new project under PyDev:
* Select File ▸ New ▸ Project ▸ Pydev ▸ Pydev Project.
* Select Window ▸ Open Perspective ▸ Other ▸ PyDev perspective


Then, we can create and run scripts:

* Right-click on the src folder ▸ New ▸ PyDev Module (create a module and write the source code). 
* Run As ▸ Python run
* Debugging: add a breakpoint and Debug as ▸ Python Run



> Getting started with [PyDev and Eclipse](http://www.pydev.org/manual_101_root.html)


<!-- <div class="alert alert-block alert-info">  -->
**4.3 ** Another important module is **IPython**. This is an interactive and dynamic environment for Python and Unix commands. It also has decoupled two-process communication model, and parallel computing features. We can use Ipython commands here (Jupyter notebooks) or in the console:


<img src="images/Ipython.jpg" width="700" height="700" border="10" />

Ipython examples:


* Get previous results using the input or output number.
    * Input In[n], _in, or _ih[n]
    * Output Out[n], _n, or _oh[n]
    
    
* Complete commands when we don't know the name of methods/attributes
    * ? or ?? before or after an object will print information about the object
    * ? can be combined with * to search commands 
    * Example: a? to find out type and value of the "a" object 
    * Example: *time*? to find out any method or object that includes the word "time"
    
    
* If you are using Ipython in the console, type < TAB \> key to complete words with the module's content or attributes of objects


* Magic commands. This specific Ipython functions are designed to enhance our Python experience:
    * We can run Python scripts using %run path/myscrpt.py
    * We can get times of a process with %timeit
    * We can start debugger with %debug or %pdb



> Parallel with Ipython [Documentaiton](http://ipyparallel.readthedocs.io/en/latest/intro.html#examples)

> Magic functions: Lines and cells [details here](https://ipython.org/ipython-doc/3/interactive/magics.html)

In [None]:

### Let's use some magic commands

%timeit sampleClassSlots()
%timeit sampleClass()


In [None]:

### Let's search fo class type and objects

b?
*time*?

Mytime = 1
*time*?


In [None]:

### Let's take a look to the 19th input that we sent

In[13]


<!-- <div class="alert alert-block alert-warning">  -->

## Next class:

- Python scripts and modules
- Pandas and Numpy
- Processing files
- Regular expressions and unstructured data 
- Basic visualization techniques 
    