# INFO 212: Data Science Programming 1
## CCI at Drexel University
### Yuan An, PhD
### Associate Professor

## Week 1-Lecture 1: Setup and Review: Python Language Basics, IPython, and Jupyter Notebooks

Welcome to INFO 212: Data Science Programming I! Starting from today, we're going to be looking at the main tools and ideas in the data scientist's toolbox. We aim to learn how to write interactive programming code for extracting, cleansing, wrangling, transforming, reshaping, and analyzing data. We will mainly use Python programming language in the Jupyter notebook environment. However, to be productive in loading, processing, and keeping track of data and analyses, we should also be proficient at using scripts such as linux command line, version control, and git. 

This is the first course in the the sequence of data science programming offering. In this course, we will cover various Python packages for high performance data analysis and visualization including numpy, pandas, matplotlib, and seaborn, etc. In the second course, we will study the primary python package, scikit-learn,  for building basic predictive models to solve some data challenge problems. 

To get started, you need to create a programming environment as specified later [Set up the Development Environment](#Set-up-the-Development-Environment).

> **Try It!** As you work through this notebook, you can run all the notebook cells (a block of either code or text). To see the markdown code in a text cell, simply double click the cell. For a code cell, you can treat it as a text editor. Running the cells by yourself would help you cement your understanding of the concepts we're talking about. Once you've written the code to in a cell, you can run the code by clicking inside the cell (box with code in it) with the code you want to run and then hit CTRL + ENTER. 

Here's what we're going to do today:

* [Set up the Development Environment](#Set-up-the-Development-Environment)
* [Course Introduction](#Course-Introduction)
* [Introduction to Data Analysis Tasks](#Introdcution-to-Data-Analysis-Tasks)
* [Python Language Basics, IPython, and Jupyter Notebooks](#Python-Language-Basics,-IPython,-and-Jupyter-Notebooks)

Let's get started!

## Set up the Development Environment
To prepare for this course, you need a python data analysis environment. For an easy setup, I recommend everybody download and install the latest Anaconda Distribution:
[https://www.anaconda.com/download/] (https://www.anaconda.com/download/) 

Once you have installed the Anaconda distribution in your local machines, start the Anaconda Navigator and use the Anaconda Navigator for managing packages and launching Jupyter Notebook. 
<img src = "anaconda-navigator.png" width = 800 height= 700>

#### Now, launch the jupyter notebook and start to code and analyze....

# Test Documentation
* Bullet1
* Bullet2
* Bullet 3
# Redirect to a diff part of the notebook
* [Course Introduction](#Course-Introduction)
### Hyperlink
* To prepare for this course, you need a python data analysis environment. For an easy setup, I recommend everybody download and install the latest Anaconda Distribution:
[Titanic Suvival Prediction](https://www.kaggle.com/c/titanic)
#### Even smaller than than
<img src="lambo.jpeg",width=400,height=350>

# Course Introduction

## What Is This Course About?

Introduces the main tools and ideas in the data scientist's toolbox. Focuses writing interactive and programming code for extracting, cleansing, wrangling, transforming, reshaping, and analyzing data. Covers practical tools and ideas including Linux command line, version control, git, and interactive programming. Studies various Python packages for high performance data analysis.

The primary focus of analysis is on structured data, such as
* Multidimensional arrays (matrices)
* Tabular or spreadsheet-like data in which each column may be a different type (string, numeric, date, or otherwise). This includes most kinds of data commonly stored in relational databases or tab- or comma-delimited text files
* Multiple tables of data interrelated by key columns 
* Evenly or unevenly spaced time series

### Essential Python Libraries

The following is a list of essential python libraries in the scientific Python ecosystem that will be used throughout course:

#### NumPy
NumPy, short for Numerical Python, is the foundational package for scientific computing
in Python.It provides, among other things:
* A fast and efficient multidimensional array object ndarray
* Functions for performing element-wise computations with arrays or mathematical operations between arrays
* Tools for reading and writing array-based data sets to disk
* Linear algebra operations, Fourier transform, and random number generation
* Tools for integrating connecting C, C++, and Fortran code to Python

#### Pandas
pandas provides rich data structures and functions designed to make working with
structured data fast, easy, and expressive. It is, as you will see, one of the critical ingredients enabling Python to be a powerful and productive data analysis environment.
The primary object in pandas that will be used in this book is the DataFrame, a two-dimensional tabular, column-oriented data structure with both row and column labels.

#### matplotlib
matplotlib is the most popular Python library for producing plots and other 2D data
visualizations. 

#### SciPy
SciPy is a collection of packages addressing a number of different standard problem
domains in scientific computing.

#### Scikit-Learn
Scikit-learn is a machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, etc.

## Set up Directories
I recommend you to put all your notebooks and datasets in a directory called "info212" in your local disk. Test navigating the directory by the following methods:
* Navigate to the directory "info212" from the notebook homepage.
* Navigate to the directory "info212" from a command line tool.
* Navigate to the directory "info212" from a file explorer in your OS.

Create a sub-directory called "datasets" under "info212" and download the data set file we will use for each lecture and assignment to the "datasets" sub-directory.

Download this notebook to "info212" and open it in your browser.


## Import Conventions
The Python community has adopted a number of naming conventions for commonly used
modules:

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### A Quick Peek
Let us load a real data set into a Python Pandas DataFrame object and take a first look at the data. The first thing we'll need to do is to load in the libraries (we have done it in the cell above, so we don't need to repeat it). For the data set, for today, we'll be using a dataset about the Titanic passengers. The data set contains various information about passengers boarding on the Titanic ship. This data set comes from the Kaggle data science get-started competition: [Titanic Suvival Prediction](https://www.kaggle.com/c/titanic). We will use the training data set called train.csv. the extension 'csv' indicates that the data is formated as comma separated values (Hence, csv). 'CSV' files are the standard ways for save and load data in data analytics tasks.

You can download the train.csv file from the Week 1 module of the course website. Create a sub-directory called 'titanic' under 'datasets' and put the file in 'titanic'. Let us load the data and then we will explain the fields of the data.

In [10]:
    
titanic_df = pd.read_csv("train.csv")
titanic_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:
f

The `pd.read_csv()` function read in a csv file and create a Pandas DataFrame. The `head()` function of DataFrame object shows the first 5 rows in the data set as displayed in the cell above. 

As you can see that the data is formated as a tabular structure with rows and columns. Each row represents a single passenger. Each column represents a feature or property or field of a passenger. 

How many features does the data describe a passenger?

In [11]:
# This statement gives the list of columns in the DataFrame
titanic_df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [12]:
# This statement shows the dimension of the DataFrame. The second value gives the 
# number of columns
titanic_df.shape

(891, 12)

OK. There are 12 fields for each passenger. Then, what does each field mean? Now it is time to look at the data documentation or speak with the creator or owner of the data set to figure out the meaning of each field. No data analysis can happen unless you know the meaning of the data. 

Here is the data dictionary provided by the Kaggel website:
<table>
<tr><th>Variable</th><th>Definition</th><th>Key</th></tr>
<tr><td>survival</td><td>Survival</td><td>0 = No, 1 = Yes</td></tr>
<tr><td>pclass</td><td>Ticket</td><td> class	1 = 1st, 2 = 2nd, 3 = 3rd</td></tr>
<tr><td>sex</td><td>Sex	</td><td></td></tr>
<tr><td>Age</td><td>Age in years</td><td></td></tr>	
<tr><td>sibsp</td><td># of siblings / spouses aboard the Titanic</td><td></td></tr>	
<tr><td>parch</td><td># of parents / children aboard the Titanic</td><td>	</td></tr>
<tr><td>ticket</td><td>Ticket number</td><td></td></tr>	
<tr><td>fare</td><td>Passenger fare	</td><td></td></tr>
<tr><td>cabin</td><td>Cabin number</td><td></td></tr>	
<tr><td>embarked</td><td>Port of Embarkation</td><td>	C = Cherbourg, Q = Queenstown, S = Southampton</td></tr>
</table>

# Introdcution to Data Analysis Tasks

In this course we will learn the Python tools to work productively with data. The tasks required generally fall into a number of different broad groups:
* **_Interacting with the outside world_**
    - Reading and writing with a variety of file formats and databases. 
    
* **_Preparation_**
    - Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis.
    
* **_Transformation_**
    - Applying mathematical and statistical operations to groups of data sets to derive new data sets. For example, aggregating a large table by group variables.
    
* **_Modeling and computation_**
    - Connecting your data to statistical models, machine learning algorithms, or other computational tools.
    
* **_Presentation_**
    - Creating interactive or static graphical visualizations or textual summaries.


In [16]:
# For example, we can show how many passengers in the titanic data set and 
# how many are male passengers and how many are female passergers
print (titanic_df.PassengerId.count())
print (titanic_df.Survived.count())

891
891


In [18]:
titanic_df.groupby("Sex").count()
#titanic_df.groupby("Pclass").count()

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
female,314,314,314,314,261,314,314,314,314,97,312
male,577,577,577,577,453,577,577,577,577,107,577


If you are not familiar with the statements above, don't worry. We will discuss the details throughout the rest of the course. But, first, let us review some basic knowledge about Python

With that being said, let us start from the beginning.

# Python Language Basics, IPython, and Jupyter Notebooks

In [24]:
import numpy as np
np.random.seed(12345)
np.set_printoptions(precision=4, suppress=True)

## The Python Interpreter

```python
$ python
Python 3.6.0 | packaged by conda-forge | (default, Jan 13 2017, 23:17:12)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 5
>>> print(a)
5
```

```python
print('Hello world')
```

```python
$ python hello_world.py
Hello world
```

```shell
$ ipython
Python 3.6.0 | packaged by conda-forge | (default, Jan 13 2017, 23:17:12)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: %run hello_world.py
Hello world

In [2]:
```

## IPython Basics

### Running the IPython Shell

$ 

In [32]:
import numpy as np
data = {i : np.random.randn() for i in range(7)}
data

random_nums = {x : np.random.randn() for x in range(10)}
random_nums

{0: -0.7485315484917335,
 1: 0.5849697378523505,
 2: 0.1526765727999654,
 3: -1.5656572938161821,
 4: -0.5625401880747255,
 5: -0.032664139158833747,
 6: -0.9290062022343519,
 7: -0.4825726456604974,
 8: -0.03626384610141546,
 9: 1.0953900601394428}

>>> from numpy.random import randn
>>> data = {i : randn() for i in range(7)}
>>> print(data)
{0: -1.5948255432744511, 1: 0.10569006472787983, 2: 1.972367135977295,
3: 0.15455217573074576, 4: -0.24058577449429575, 5: -1.2904897053651216,
6: 0.3308507317325902}

### Running the Jupyter Notebook

```shell
$ jupyter notebook
[I 15:20:52.739 NotebookApp] Serving notebooks from local directory:
/home/wesm/code/pydata-book
[I 15:20:52.739 NotebookApp] 0 active kernels
[I 15:20:52.739 NotebookApp] The Jupyter Notebook is running at:
http://localhost:8888/
[I 15:20:52.740 NotebookApp] Use Control-C to stop this server and shut down
all kernels (twice to skip confirmation).
Created new window in existing browser session.
```

### Tab Completion

```
In [1]: an_apple = 27

In [2]: an_example = 42

In [3]: an
```

```
In [3]: b = [1, 2, 3]

In [4]: b.
```

```
In [1]: import datetime

In [2]: datetime.
```

```
In [7]: datasets/movielens/
```

### Introspection

```
In [8]: b = [1, 2, 3]

In [9]: b?
Type:       list
String Form:[1, 2, 3]
Length:     3
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable's items

In [10]: print?
Docstring:
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type:      builtin_function_or_method
```

```python
def add_numbers(a, b):
    """
    Add two numbers together

    Returns
    -------
    the_sum : type of arguments
    """
    return a + b
```

```python
In [11]: add_numbers?
Signature: add_numbers(a, b)
Docstring:
Add two numbers together

Returns
-------
the_sum : type of arguments
File:      <ipython-input-9-6a548a216e27>
Type:      function
```

```python
In [12]: add_numbers??
Signature: add_numbers(a, b)
Source:
def add_numbers(a, b):
    """
    Add two numbers together

    Returns
    -------
    the_sum : type of arguments
    """
    return a + b
File:      <ipython-input-9-6a548a216e27>
Type:      function
```

```python
In [13]: np.*load*?
np.__loader__
np.load
np.loads
np.loadtxt
np.pkgload
```

### The %run Command

```python
def f(x, y, z):
    return (x + y) / z

a = 5
b = 6
c = 7.5

result = f(a, b, c)
```

```python
In [14]: %run ipython_script_test.py
```

```python
In [15]: c
Out [15]: 7.5

In [16]: result
Out[16]: 1.4666666666666666
```

```python
>>> %load ipython_script_test.py

    def f(x, y, z):
        return (x + y) / z

    a = 5
    b = 6
    c = 7.5

    result = f(a, b, c)
```

#### Interrupting running code

### Executing Code from the Clipboard

```python
x = 5
y = 7
if x > 5:
    x += 1

    y = 8
```

```python
In [17]: %paste
x = 5
y = 7
if x > 5:
    x += 1

    y = 8
## -- End pasted text --
```

```python
In [18]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:x = 5
:y = 7
:if x > 5:
:    x += 1
:
:    y = 8
:--
```

### Terminal Keyboard Shortcuts

### About Magic Commands

```python
In [20]: a = np.random.randn(100, 100)

In [20]: %timeit np.dot(a, a)
10000 loops, best of 3: 20.9 µs per loop
```

```python
In [21]: %debug?
Docstring:
::

  %debug [--breakpoint FILE:LINE] [statement [statement ...]]

Activate the interactive debugger.

This magic command support two ways of activating debugger.
One is to activate debugger before executing code.  This way, you
can set a break point, to step through the code from the point.
You can use this mode by giving statements to execute and optionally
a breakpoint.

The other one is to activate debugger in post-mortem mode.  You can
activate this mode simply running %debug without any argument.
If an exception has just occurred, this lets you inspect its stack
frames interactively.  Note that this will always work only on the last
traceback that occurred, so you must call this quickly after an
exception that you wish to inspect has fired, because if another one
occurs, it clobbers the previous one.

If you want IPython to automatically do this on every exception, see
the %pdb magic for more details.

positional arguments:
  statement             Code to run in debugger. You can omit this in cell
                        magic mode.

optional arguments:
  --breakpoint <FILE:LINE>, -b <FILE:LINE>
                        Set break point at LINE in FILE.

```                        

```python
In [22]: %pwd
Out[22]: '/home/wesm/code/pydata-book

In [23]: foo = %pwd

In [24]: foo
Out[24]: '/home/wesm/code/pydata-book'
```

### Matplotlib Integration

```python
In [26]: %matplotlib
Using matplotlib backend: Qt4Agg
```

```python
In [26]: %matplotlib inline
```

## Python Language Basics

### Language Semantics

#### Indentation, not braces

```python
for x in array:
    if x < pivot:
        less.append(x)
    else:
        greater.append(x)
```

```python
a = 5; b = 6; c = 7
```

#### Everything is an object

#### Comments

```python
results = []
for line in file_handle:
    # keep the empty lines for now
    # if len(line) == 0:
    #   continue
    results.append(line.replace('foo', 'bar'))
```

```python
print("Reached this line")  # Simple status report
```

#### Function and object method calls

```
result = f(x, y, z)
g()
```

```
obj.some_method(x, y, z)
```

```python
result = f(a, b, c, d=5, e='foo')
```

#### Variables and argument passing

In [34]:
a = [1, 2, 3]

In [35]:
b = a

In [36]:
a.append(4)
b

[1, 2, 3, 4]

```python
def append_element(some_list, element):
    some_list.append(element)
```

```python
In [27]: data = [1, 2, 3]

In [28]: append_element(data, 4)

In [29]: data
Out[29]: [1, 2, 3, 4]
```

#### Dynamic references, strong types

In [38]:
a = 'foo'
type(a)
a = 5
type(a)

int

In [39]:
'5' + 5

TypeError: must be str, not int

In [40]:
a = 4.5
b = 2
# String formatting, to be visited later
print('a is {0}, b is {1}'.format(type(a), type(b)))
a / b

a is <class 'float'>, b is <class 'int'>


2.25

In [43]:
a = 5
isinstance(a, int)

True

In [44]:
a = 5; b = 4.5
isinstance(a, (int, float))
isinstance(b, (int, float))

True

#### Attributes and methods

```python
In [1]: a = 'foo'

In [2]: a.<Press Tab>
a.capitalize  a.format      a.isupper     a.rindex      a.strip
a.center      a.index       a.join        a.rjust       a.swapcase
a.count       a.isalnum     a.ljust       a.rpartition  a.title
a.decode      a.isalpha     a.lower       a.rsplit      a.translate
a.encode      a.isdigit     a.lstrip      a.rstrip      a.upper
a.endswith    a.islower     a.partition   a.split       a.zfill
a.expandtabs  a.isspace     a.replace     a.splitlines
a.find        a.istitle     a.rfind       a.startswith
```

In [45]:
a = 'foo'

In [46]:
getattr(a, 'split')

<function str.split>

#### Duck typing

In [47]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # not iterable
        return False

In [51]:
isiterable('a string')
isiterable([1, 2, 3])
isiterable(5)


False

if not isinstance(x, list) and isiterable(x):
    x = list(x)

#### Imports

```python
# some_module.py
PI = 3.14159

def f(x):
    return x + 2

def g(a, b):
    return a + b
```

import some_module
result = some_module.f(5)
pi = some_module.PI

from some_module import f, g, PI
result = g(5, PI)

import some_module as sm
from some_module import PI as pi, g as gf

r1 = sm.f(pi)
r2 = gf(6, pi)

#### Binary operators and comparisons

In [52]:
5 - 7
12 + 21.5
5 <= 2

False

In [58]:
a = [1, 2, 3]
b = a
c = list(a)
a is b
a is not c


True

In [None]:
a == c

In [59]:
a = None
a is None

True

#### Mutable and immutable objects

In [62]:
a_list = ['foo', 2, [4, 5]]
a_list[2] = (3, 4)
a_list

['foo', 2, (3, 4)]

In [63]:
a_tuple = (3, 5, (4, 5))
a_tuple[1] = 'four'

TypeError: 'tuple' object does not support item assignment

### Scalar Types

#### Numeric types

In [64]:
ival = 17239871
ival ** 6

26254519291092456596965462913230729701102721

In [65]:
fval = 7.243
fval2 = 6.78e-5

In [66]:
3 / 2

1.5

In [67]:
3 // 2

1

#### Strings

a = 'one way of writing a string'
b = "another way"

In [69]:
c = """
This is a longer string that
spans multiple lines
"""
c

'\nThis is a longer string that\nspans multiple lines\n'

In [70]:
c.count('\n')

3

In [72]:
a = 'this is a string'
a[10] = 'f'
b = a.replace('string', 'longer string')
b

TypeError: 'str' object does not support item assignment

In [75]:
a

'this is a string'

In [77]:
a = 5.6
s = str(a)
print(s)


5.6


str

In [78]:
s = 'python'
list(s)
s[:3]

'pyt'

In [79]:
s = '12\\34'
print(s)

12\34


In [80]:
s = r'this\has\no\special\characters'
s

'this\\has\\no\\special\\characters'

In [81]:
a = 'this is the first half '
b = 'and this is the second half'
a + b

'this is the first half and this is the second half'

In [82]:
template = '{0:.2f} {1:s} are worth US${2:d}'

In [83]:
template.format(4.5560, 'Argentine Pesos', 1)

'4.56 Argentine Pesos are worth US$1'

#### Bytes and Unicode

In [84]:
val = "español"
val

'español'

In [85]:
val_utf8 = val.encode('utf-8')
val_utf8
type(val_utf8)

bytes

In [86]:
val_utf8.decode('utf-8')

'español'

In [87]:
val.encode('latin1')
val.encode('utf-16')
val.encode('utf-16le')

b'e\x00s\x00p\x00a\x00\xf1\x00o\x00l\x00'

In [90]:
bytes_val = b'this is bytes'
bytes_val
decoded = bytes_val.decode('utf8')
decoded  # this is str (Unicode) now

'this is bytes'

#### Booleans

In [91]:
True and True
False or True

True

#### Type casting

In [93]:
s = '3.14159'
fval = float(s)
type(fval)
int(fval)
bool(fval)
bool(1)

True

#### None

In [94]:
a = None
a is None
b = 5
b is not None

True

def add_and_maybe_multiply(a, b, c=None):
    result = a + b

    if c is not None:
        result = result * c

    return result

In [95]:
type(None)

NoneType

#### Dates and times

In [101]:
from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)
dt.day
dt.minute

30

In [102]:
dt.date()
dt.time()

datetime.time(20, 30, 21)

In [103]:
dt.strftime('%m/%d/%Y %H:%M')

'10/29/2011 20:30'

In [104]:
datetime.strptime('20091031', '%Y%m%d')

datetime.datetime(2009, 10, 31, 0, 0)

In [None]:
dt.replace(minute=0, second=0)

In [105]:
dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
delta
type(delta)

datetime.timedelta

In [106]:
dt
dt + delta

datetime.datetime(2011, 11, 15, 22, 30)

### Control Flow

#### if, elif, and else

if x < 0:
    print('It's negative')

if x < 0:
    print('It's negative')
elif x == 0:
    print('Equal to zero')
elif 0 < x < 5:
    print('Positive but smaller than 5')
else:
    print('Positive and larger than or equal to 5')

In [None]:
a = 5; b = 7
c = 8; d = 4
if a < b or c > d:
    print('Made it')

In [None]:
4 > 3 > 2 > 1

#### for loops

for value in collection:
    # do something with value

sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
    if value is None:
        continue
    total += value

sequence = [1, 2, 0, 4, 6, 5, 2, 1]
total_until_5 = 0
for value in sequence:
    if value == 5:
        break
    total_until_5 += value

In [109]:
for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))

(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)


for a, b, c in iterator:
    # do something

#### while loops

x = 256
total = 0
while x > 0:
    if total > 500:
        break
    total += x
    x = x // 2

#### pass

if x < 0:
    print('negative!')
elif x == 0:
    # TODO: put something smart here
    pass
else:
    print('positive!')

#### range

In [110]:
range(10)
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [111]:
list(range(0, 20, 2))
list(range(5, 0, -1))

[5, 4, 3, 2, 1]

seq = [1, 2, 3, 4]
for i in range(len(seq)):
    val = seq[i]

sum = 0
for i in range(100000):
    # % is the modulo operator
    if i % 3 == 0 or i % 5 == 0:
        sum += i

#### Ternary expressions

value = 

if 

In [112]:
x = 5
'Non-negative' if x >= 0 else 'Negative'

'Non-negative'

# References

Python for Data Analysis by Wes McKinney. Publisher: O'Reilly Media.