# Unit 1: Foundations of Python

Data science is like painting. Your canvas is your dataset, your statistical techniques are your colors, and your technique and artistry comes through your programming. Data scientists with strong programming skills are able to quickly implement techniques and are more readily able to be more creative problem solvers and scientists.

## Learning Objectives of Module 1
By the end of this module, you will be able to...
1. Articulate the strengths and weaknesses of using Python vs other tools such as R
2. Explain the differences between the command line, Ipython, Jupyter Notebooks, and the Spyder IDE and when to use each.
3. Create variables and use those variables in basic scripts that contain conditional statements and loops

### How to use these modules
While the content in each of these modules covers the material at a high level, there is significant depth provided through associated readings and videos throughout the course. While the topics are designed to be complementary, a number of key topics are addressed in both to provide multiple perspectives. Wherever you see a \*\*\*Reading\*\*\* section, this is a required course component, so please open the link and read the content. The videos may also help if you're struggling with any of the particular topics. 

For this module, the references used are from Charles Severance, [Python for Everybody](https://www.py4e.com/lessons), who provides an excellent overview of these topics.

### Why use Python?
There are numerous strengths of this language, which have helped to propel it to becoming one of the most popular languages for data science.

Strengths of Python:
1. Readability. Python is considered to be by many more naturally readable than other languages.
2. Numeric and scientific programming (with packages like numpy, scipy, matplotlib, and scikit-learn)
3. Object oriented. From the ground up, Python is object oriented, with all of the primary data types being objects. We'll expore the benefits of this in later modules, but they are considerable.
4. Free. Enough said.
5. Portable. Python works on nearly all systems.
6. Easy to use and easier to learn
7. Provides the simplicity and ease of use of a interpreted language with the advanced tools that are found in compiled languages (dynamic typing, automatic memory management)

Weaknesses of Python:
1. Performance - Python is slower than compiled languages like C.
2. It can be challenging to manage python packages and confusion exists between code bases written in Python 2 vs Python 3.

But what about R or other languages? "R is a programming language developed by statisticians for statisticians; Python was developed by a computer scientist, and it can be used by programmers to apply statistical techniques." ([Sebastian Raschka](https://sebastianraschka.com/blog/2015/why-python.html)) R does statistics extremely well and is a powerful tool for data science, well-worth knowing. For many applications, either Python or R will get the job done, and both are worth knowing for the aspiring data scientist.

Python, however, consistently pulls ahead in numerous ranking for both the data science community, particularly within the subfield of machine learning, as well as the broader programming community. Python has the benefit of being independently an extremely powerful programming language used in many industries, not only for data science. The [Kaggle State of Data Science Survey](https://www.kaggle.com/surveys/2017), put together by the preeminant host of online machine learning competitions, rated Python as the most used programming language of a survey of data science professionals. Additionally, those same professionals overwhelmingly recommend Python as the language for new data scientists to learn forst. The ratings don't stop there, however. The [Institute of Electrical and Electronics Engineers (IEEE) ranked Python](https://spectrum.ieee.org/computing/software/the-2017-top-programming-languages) as the #1 programming language in 2017. In the [TIOBE software index for 2018 ranks Python](https://www.tiobe.com/tiobe-index/) (shown below), Python is the number 4 programming language just after Java, C, and C++, the dominant languages in industry for many years. Just being in the same league as those industry standards demonstrates that Python is a highly transferrable skill that can help to propel your career in diverse directions.

<img src="img/tiobe.png" width="800">


### How do I learn to program?
True or false: if I read enough textbooks and watch enough videos about programming, I'll become a good programmer. This statement could not be more FALSE. Becoming good at programming is much like becoming good at any spoken language. You will only develop your skills by hands-on experience using the language. This process will be hard at times, and you'll sometimes feel like you're struggling to understand a concept or fix a bug. That's natural and that's often when you learn the most. Here are some tips for success:

1. Always work through all the examples until the end - don't stop and assume that you could figure out the rest.
2. If you get stuck - search or an answer. Every time I go to code something that I haven't done before, I inevitably end up looking on one of a number of websites / forums such as Stack Overflow, StackExchange, Quora, or a host of other sites that are best found through a general Google search of your topic. A key skill to develop is knowing when to trust a source and when to keep looking.
3. When in doubt, draw it. When you're trying to accomplish something algorithmically, think through and draw out the logic that you're trying to implement before typing in code.

### Understanding the Python ecosystem
Tools for using Python
Command line
Shell
IDLE
Ipython
Jupyter Notebook
Spyder
Any text editor (Sublime, VIM, Emacs, etc.)

Core Python
Packages
Numpy
Matplotlib
Pandas

### Using the terminal
Command prompt basics – opening the prompt (terminal or cmder)
Command line operations (cd, ls, mkdir, rm)

### First steps Python as a calculator
Basic operators on the command line - Python as a calculator
Checking the status of a variable
Creating and running a .py file

### First steps on the Python interpreter
*Section adapted from Charles R. Severance, "Python for Everybody" [Chapter 1](https://www.py4e.com/html3/01-intro)*

Open up a terminal, type `python`, and the fun begins:

```
Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:54:25)
[MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
```

The >>> prompt is the Python interpreter's way of asking you, "What do you want me to do next?" Python is ready to have a conversation with you. All you have to know is how to speak the Python language.

Let's say for example that you did not know even the simplest Python language words or sentences. You might want to use the standard line that astronauts use when they land on a faraway planet and try to speak with the inhabitants of the planet:

```
>>> I come in peace, please take me to your leader
  File "<stdin>", line 1
    I come in peace, please take me to your leader
         ^
SyntaxError: invalid syntax
>>>
```

Of course, this isn't valid Python syntax. But don't worry about the error - it doesn't do any harm. Let's try something that works on the Python interpreter (valid Python syntax):

```
>>> print('Hello world!')
Hello world!
```

To leave the Python interpreter, you can type:
```
>>> quit()
```

### Interpreted language vs compiled languages
*Section from Charles R. Severance, "Python for Everybody" [Chapter 1](https://www.py4e.com/html3/01-intro)*

Python is a high-level language intended to be relatively straightforward for humans to read and write and for computers to read and process. Other high-level languages include Java, C++, PHP, Ruby, Basic, Perl, JavaScript, and many more. The actual hardware inside the Central Processing Unit (CPU) does not understand any of these high-level languages.

The CPU understands a language we call machine language. Machine language is very simple and frankly very tiresome to write because it is represented all in zeros and ones:

```
001010001110100100101010000001111
11100110000011101010010101101101
...
```

Machine language seems quite simple on the surface, given that there are only zeros and ones, but its syntax is even more complex and far more intricate than Python. So very few programmers ever write machine language. Instead we build various translators to allow programmers to write in high-level languages like Python or JavaScript and these translators convert the programs to machine language for actual execution by the CPU.

Since machine language is tied to the computer hardware, machine language is not portable across different types of hardware. Programs written in high-level languages can be moved between different computers by using a different interpreter on the new machine or recompiling the code to create a machine language version of the program for the new machine.

These programming language translators fall into two general categories: (1) interpreters and (2) compilers.

An interpreter reads the source code of the program as written by the programmer, parses the source code, and interprets the instructions on the fly. Python is an interpreter and when we are running Python interactively, we can type a line of Python (a sentence) and Python processes it immediately and is ready for us to type another line of Python.

Some of the lines of Python tell Python that you want it to remember some value for later. We need to pick a name for that value to be remembered and we can use that symbolic name to retrieve the value later. We use the term variable to refer to the labels we use to refer to this stored data.

```
    >>> x = 6
    >>> print(x)
    6
    >>> y = x * 7
    >>> print(y)
    42
    >>>
```

In this example, we ask Python to remember the value six and use the label x so we can retrieve the value later. We verify that Python has actually remembered the value using print. Then we ask Python to retrieve x and multiply it by seven and put the newly computed value in y. Then we ask Python to print out the value currently in y.

Even though we are typing these commands into Python one line at a time, Python is treating them as an ordered sequence of statements with later statements able to retrieve data created in earlier statements. We are writing our first simple paragraph with four sentences in a logical and meaningful order.

It is the nature of an interpreter to be able to have an interactive conversation as shown above. A compiler needs to be handed the entire program in a file, and then it runs a process to translate the high-level source code into machine language and then the compiler puts the resulting machine language into a file for later execution.

If you have a Windows system, often these executable machine language programs have a suffix of ".exe" or ".dll" which stand for "executable" and "dynamic link library" respectively. In Linux and Macintosh, there is no suffix that uniquely marks a file as executable.

If you were to open an executable file in a text editor, it would look completely crazy and be unreadable:

```
^?ELF^A^A^A^@^@^@^@^@^@^@^@^@^B^@^C^@^A^@^@^@\xa0\x82
^D^H4^@^@^@\x90^]^@^@^@^@^@^@4^@ ^@^G^@(^@$^@!^@^F^@
^@^@4^@^@^@4\x80^D^H4\x80^D^H\xe0^@^@^@\xe0^@^@^@^E
^@^@^@^D^@^@^@^C^@^@^@^T^A^@^@^T\x81^D^H^T\x81^D^H^S
^@^@^@^S^@^@^@^D^@^@^@^A^@^@^@^A\^D^HQVhT\x83^D^H\xe8
....
```

It is not easy to read or write machine language, so it is nice that we have interpreters and compilers that allow us to write in high-level languages like Python or C.

Now at this point in our discussion of compilers and interpreters, you should be wondering a bit about the Python interpreter itself. What language is it written in? Is it written in a compiled language? When we type "python", what exactly is happening?

The Python interpreter is written in a high-level language called "C". You can look at the actual source code for the Python interpreter by going to www.python.org and working your way to their source code. So Python is a program itself and it is compiled into machine code. When you installed Python on your computer (or the vendor installed it), you copied a machine-code copy of the translated Python program onto your system. In Windows, the executable machine code for Python itself is likely in a file with a name like:

```
C:\Python35\python.exe
```

That is more than you really need to know to be a Python programmer, but sometimes it pays to answer those little nagging questions right at the beginning.

### Writing programs
*Section adapted from Charles R. Severance, "Python for Everybody" [Chapter 1](https://www.py4e.com/html3/01-intro)*

Typing commands into the Python interpreter is a great way to experiment with Python's features, but it is not recommended for solving more complex problems.

When we want to write a program, we use a text editor to write the Python instructions into a file, which is called a script. By convention, Python scripts have names that end with .py.

Say you've created a text file called `hello.py` with the following contents:

```
print('Hello world!')
```

To execute the script, you have to tell the Python interpreter the name of the file. In a Unix terminal (similar to Linux and MacOS terminal in terms of commands) or a Windows command window, you would type python hello.py as follows:

```
python hello.py
```

We call the Python interpreter and tell it to read its source code from the file "hello.py" instead of prompting us for lines of Python code interactively.

You will notice that there was no need to have quit() at the end of the Python program in the file. When Python is reading your source code from a file, it knows to stop when it reaches the end of the file.

#### What is a program?
The definition of a program at its most basic is a sequence of Python statements that have been crafted to do something. Even our simple hello.py script is a program. It is a one-line program and is not particularly useful, but in the strictest definition, it is a Python program.

It might be easiest to understand what a program is by thinking about a problem that a program might be built to solve, and then looking at a program that would solve that problem.

Lets say you are doing Social Computing research on Facebook posts and you are interested in the most frequently used word in a series of posts. You could print out the stream of Facebook posts and pore over the text looking for the most common word, but that would take a long time and be very mistake prone. You would be smart to write a Python program to handle the task quickly and accurately so you can spend the weekend doing something fun.

For example, look at the following text about a clown and a car. Look at the text and figure out the most common word and how many times it occurs.

```
the clown ran after the car and the car ran into the tent
and the tent fell down on the clown and the car
```

Then imagine that you are doing this task looking at millions of lines of text. Frankly it would be quicker for you to learn Python and write a Python program to count the words than it would be to manually scan the words.

The even better news is that I already came up with a simple program to find the most common word in a text file. I wrote it, tested it, and now I am giving it to you to use so you can save some time.

```
name = input('Enter file:')
handle = open(name, 'r')
counts = dict()

for line in handle:
    words = line.split()
    for word in words:
        counts[word] = counts.get(word, 0) + 1

bigcount = None
bigword = None
for word, count in list(counts.items()):
    if bigcount is None or count > bigcount:
        bigword = word
        bigcount = count

print(bigword, bigcount)

# Code: http://www.py4e.com/code3/words.py
```

You don't even need to know Python to use this program. You will need to get through Chapter 10 of this book to fully understand the awesome Python techniques that were used to make the program. You are the end user, you simply use the program and marvel at its cleverness and how it saved you so much manual effort. You simply type the code into a file called words.py and run it or you download the source code from http://www.py4e.com/code3/ and run it.

This is a good example of how Python and the Python language are acting as an intermediary between you (the end user) and me (the programmer). Python is a way for us to exchange useful instruction sequences (i.e., programs) in a common language that can be used by anyone who installs Python on their computer. So neither of us are talking to Python, instead we are communicating with each other through Python.

## Introduction to data type objects
Data types are the building blocks of our program that form the raw material that we process with out programs. In Python, all data types are objects, and there are a number of built in object types that are commonly used.

- Numeric data types: **integers** and **floats**
- **Booleans** (logical data type, `True` or `False`)
- **Strings** (sequence of characters)
- Collections – data types that may contain any kind of object (including other collections)
    - **Lists** (sequences that are mutable, or changable)
    - **Tuples** (sequences that are immutable, or static)
    - **Dictionaries** (mappings from a key to a value)

### Numbers
There are two primary numerical data types in Python that you'll see: integers and floats. For integers, these will be whole numbers (e.g. 1,2,3,60000,-12,0). Floats (or floating point numbers) enable decimal values to be included.

*For reference, the function `type()` below returns the type of object.*

In [81]:
x = 3
type(x)

int

In [82]:
x = 3.4
type(x)

float

When you divide two integers that do not result in an integer, the result is a float (this is true in Python 3.x, but the behavior was different in Python 2.7).

In [83]:
a = 7
b = 2
print(type(a))
print(type(b))
c = a / b
print(c)
print(type(c))

<class 'int'>
<class 'int'>
3.5
<class 'float'>


#### \*\*\*Reading\*\*\*

Please read [Variables, expressions, and statements](https://www.py4e.com/html3/02-variables) from Python for Everybody by Charles Severance.

Supplementary videos that cover this material are also [available here](https://www.py4e.com/lessons/memory)

### Strings
Strings are sequences of characters. For example 'Data Science' contains 12 characters including one space in a particular order (or sequence).

In [84]:
x = 'this'
type(x)

str

### \*\*\*Reading\*\*\*

Please read [Strings](https://www.py4e.com/html3/06-strings) from Python for Everybody by Charles Severance.

Supplementary videos that cover this material are also [available here](https://www.py4e.com/lessons/logic)

### Booleans
Booleans are data type objects that can take on two values, `True` or `False`, which are used for logical operations

In [85]:
x = True
type(x)

bool

### Variable names
There are a few rules around naming variables. You cannot name variables starting with a number, so `1000miles` is not a valid variable name. You can use numbers in variable names, but they just can't begin the name. So `miles1000` is legal syntax. You can use letters, numbers, and underscores, but not other symbols. Therefore, eat@joes is not a valid name.

We'll talk more about style later, but generally, you'll want to make these variable names meaningful. Let's say you were trying to represent the speed of a train. You could use a variable `s`, but there are a lot of things that could mean. A better (and more descriptive) name would be `speed`, and better still would be `speed_of_train`. Now, you want this to be within a reasonable length, so `speed_of_train_from_boston_to_new_york_through_new_haven_and_providence` is not an ideal name because it's far too long. Use common sense and above all, make your code readable so someone else could easily pick it up, understand it, and use it.

There are also a number of reserved words for the Python language that you cannot use in your programs:

| | | | | |
|----------|----------|----------|----------|-------|
|and       |del       |from      |None      |True   |
|as        |elif      |global    |nonlocal  |try    |
|assert    |else      |if        |not       |while  |
|break     |except    |import    |or        |with   |
|class     |False     |in        |pass      |yield  |
|continue  |finally   |is        |raise     |       |
|def       |for       |lambda    |return    |       |

### Containers: lists and tuples
Containers (or sequences) can hold multiple basic data types (integers, floats, Booleans, strings). Lists are defined with brackets [], and tuples are defined with parentheses (). Either can contain

In [86]:
x = [1,2,3]
type(x)

list

In [87]:
x = (1,2,3)
type(x)

tuple

You can also mix and match data types in these containers

In [88]:
x = [1,2,'this',True]
print(x)

[1, 2, 'this', True]


In [89]:
x = (1,2,'this',True)
print(x)

(1, 2, 'this', True)


These containers can even contain other containers!

In [90]:
x = [1,[2,3,4],'this',('this',2),True]
print(x)

[1, [2, 3, 4], 'this', ('this', 2), True]


In [91]:
x = (1,[2,3,4],'this',('this',2),True)
print(x)

(1, [2, 3, 4], 'this', ('this', 2), True)


In practice, we'll most commonly see these lists with more homogeneous data types, (e.g. all numbers, or all text) and tuples with more heterogeneous sequences.

In [92]:
x = [1,2,6,32,7,8,4,3]
y = (10, 'Downing Street', 'London', 'United Kingdom')

Mutable means changeable or dynamic (think the word "mutation"). Immutable means unable to be changed, or static. An example of a mutable data type is a list.

In [93]:
x = [1,2,3,4,5]
x[0] = 99
print(x)

[99, 2, 3, 4, 5]


Tuples, on the other hand, are immutable, and you'll get an error if you try to change one of their elements.

In [94]:
x = (1,2,3,4,5)
x[0] = 99

TypeError: 'tuple' object does not support item assignment

#### \*\*\*Reading\*\*\*

Please read [Lists](https://www.py4e.com/html3/08-lists) from Python for Everybody by Charles Severance.

Please read [Tuples](https://www.py4e.com/html3/10-tuples) from Python for Everybody by Charles Severance.

Supplementary videos that cover this material are also available for [lists](https://www.py4e.com/lessons/lists) and for [tuples](https://www.py4e.com/lessons/tuples)

### Dictionaries
Then there are dictionaries. Think about how a tradition paper dictionary works. You want to search for a definition (we'll call these values), and the definition is connected with a particular word (we'll call these keys). You open up the dictionary, search for the key, and by doing so are able to retrieve the corresponding value. That's how a dictionary works here: it *maps* keys to values.

Let's start with an example with two key-value pairs. Let's assume we are representing the ages of two people, Sheila, age 34, and Robert, age 17. We can represent this with the following dictionary:

In [95]:
age = {'Sheila': 34, 'Robert': 17}
print(age)

{'Sheila': 34, 'Robert': 17}


If you want to know what Sheila's age is, just look it up as you would in a dictionary.

In [96]:
age['Sheila']

34

Now, dictionaries are also mutable, so they can be changed after being created. Let's adjust Robert's age to 18. Let's also add Polly, age 42, to the dictionary.

In [97]:
age['Robert'] = 18
age['Polly']  = 42
print(age)

{'Sheila': 34, 'Robert': 18, 'Polly': 42}


We'll return to each of these data types in the context

#### \*\*\*Reading\*\*\*

Please read [Dictionaries](https://www.py4e.com/html3/09-dictionaries) from Python for Everybody by Charles Severance.

Supplementary videos that cover this material are also [available here](https://www.py4e.com/lessons/dictionary)

## Operators
- Basic numerical operators: +, -, \*, /, //, \*\*, %
- Assignment operators: =, +=, -=, \*=, /=
- Relational operators: >, <, ==, !=, >=, <=
- Logical operators: and, or, not
- Membership operators: in, not in

### Basic numerical operators
Most of the standard numerical operators will be familiar including +, -, \*, /

In [98]:
5 + 2

7

In [99]:
5 - 2

3

In [100]:
5 * 2

10

In [101]:
5 / 2

2.5

There are a few numerical operators that require a little bit more explanation.

First, exponentiation (\*\*)

In [102]:
5 ** 2

25

In addition to "classic" or "true" division, / , there is also floor division, // , which rounds the result of division down to the nearest integer.

In [103]:
5 // 2

2

The modulus operator, % , returns the remainder from division. For example, if you divide 5 by 2, you can express the result as 2 with a remainder of 1. If you divide 9 by 3 it is cleanly divided with a remainder of zero. This is often used to determine whether or not one value is a multiple of another.

In [104]:
5 % 2

1

In [105]:
9 % 3

0

### Assignment operators
Here, we'll discuss the standard operators for assigning a value to a variable: `=`, `+=`, `-=`, `*=`, `/=`

The most typically used operator is the equal sign. It assigns the expression the right to the variable on the left. It's important to note that this is different from logical operator that compares whether two quantities are equal - that is a double equals sign `==`.

In [106]:
x = 7
print(x)

7


You can also assign multiple variables using tuples, lists, and strings. The number of items in the container on the left must match the number on the right.

In [107]:
a, b = (3, 4)
print(a)
print(b)

3
4


This is equivalent to writing the following:

In [108]:
(a, b) = (3, 4)
print(a)
print(b)

3
4


This is also equivalent to this:

In [109]:
[a, b] = (3, 4)
print(a)
print(b)

3
4


You can also do this assignment with lists:

In [110]:
a, b = [3, 4]
print(a)
print(b)

3
4


In [111]:
a, b = 'th'
print(a)
print(b)

t
h


You can also have more than two elements involved:

In [112]:
a, b, c, d = (3, 4, 9, 'this')
print(a)
print(b)
print(c)
print(d)

3
4
9
this


Let's say we wanted to increment the value stored in x but keep it stored in x - this is a common task. We could write the following:

In [113]:
x = 7
x = x + 1
print(x)

8


Or, equivalently, we could use the increment operator

In [114]:
x = 7
x += 1
print(x)

8


We could increment by whatever value we'd like

In [115]:
x = 7
x += 4
print(x)

11


In [116]:
We could similarly decrement the value

SyntaxError: invalid syntax (<ipython-input-116-32ce0bf9674d>, line 1)

In [117]:
x = 7
x = x - 1
print(x)

6


In [118]:
x = 7
x -= 1
print(x)

6


There are similar operators for multiplication and division

In [119]:
x = 7
x *= 2
print(x)

14


In [120]:
x = 7
x /= 2
print(x)

3.5


### Relational operators
These operators are used for comparison and are typically used for numerical data:
- Greater than >
- Less than <
- Equal to ==
- Not equal to !=
- Greater than or equal to >=
- Less than or equal to <=

Each returns a value of `True` or `False` depending on the status of the condition.

In [121]:
2 == 3

False

In [122]:
2 < 3

True

In [123]:
2 > 3

False

In [124]:
2 != 3

True

In [125]:
2 >= 3

False

In [126]:
2 <= 3

True

### Logical operators
There are three primary logical operators: `and`, `or`, and `not`. If both arguments in the expression are True, then the `and` statement returns true. If one or more of the arguments in the expression are True, then `or` returns true. `not` returns the opposite of the argument.

In [127]:
t = True
f = False

t and f

False

In [128]:
t and t

True

In [129]:
f and f

False

In [130]:
t or f

True

In [131]:
t or t

True

In [132]:
f or f

False

In [133]:
not t

False

Of course, you can combine these together, nesting them, to make more intricate expressions.

In [134]:
(t and f) or (f and f)

False

It's worthwhile to know that EVERY object has a corresponding Boolean value. The following each evaluate to `False`:
- False
- None
- 0
- Empty collections [], (), {}

With a few other exceptions, most things will return True. Here are some examples to demonstrate:

In [135]:
x = 7
not x

False

In [136]:
x = 0
not x

True

In [137]:
x = []
not x

True

In [138]:
x = [1,3,4]
not x

False

In [139]:
x = None
not x

True

### Membership operators
Imagine that we have a list and want to determine whether or not a certain entry is contained in that list. The operators `in` and `not in` are specifically for that purpose. They return a Boolean that corresponds to whether or not the condition is true.

In [140]:
mylist = [1,3,6,8,23,256]
3 in mylist

True

In [141]:
4 not in mylist

True

It also works with mixed object types

In [142]:
mynewlist = [1,3,6,8,23,256,'this','that']
'this' in mynewlist

True

And tuples work similarly

In [143]:
mynewtuple = (1,3,6,8,23,256,'this','that')
'this' in mynewtuple

True

## Flow control: conditional execution and repetition structures
We typically control the flow of our programs with conditional statements (if statements) and loops (for and while).

#### \*\*\*Reading\*\*\*

Please read [Conditional Execution](https://www.py4e.com/html3/03-conditional) from Python for Everybody by Charles Severance.

Supplementary videos that cover this material are also [available here](https://www.py4e.com/lessons/logic)

### if...elif...else
If statements allow you to take different actions in your program depending on whether or not specific conditions are satisfied. For example, if a stoplight is green, you can drive your car forward, otherwise, you should not.

In [144]:
light = 'green'
action = 'stop'

if light == 'green':
    action = 'drive'

This is the most basic form of the `if` statement, but we can add more to it.

In [145]:
light = 'green'

if light == 'green':
    action = 'drive'
else:
    action = 'stop'
print(action)

drive


Note a couple of things:
1. If the `if` indition is not evaluated to be True, then the `else` statement is executed
2. The if statement requires a colon at the end of it - this is required. 
3. The indentation at the beginning of the line after the if statement and the else statement is important - this is critical for your code to run since spaces have meaning in Python. 

Consider the following and note the error that results because there wasn't an indentation.

In [146]:
light = 'green'

if light == 'green':
action = 'drive'
else:
action = 'stop'
print(drive)

IndentationError: expected an indented block (<ipython-input-146-80f5d7a38e09>, line 4)

We can have more than one condition to consider using the `elif` statement which is short for "else if". In this case, the `if` expression is first evaluated and if true, the statement below it is run and no other component of the if/elif/else block are executed. If the `if` expression evaluates to `False`, then the program evaluates the first `elif` statement. Only if the `if` expression and all the `elif` expressions are evaluated as false will the `else` statement be executed.

In [147]:
light = 'blue'

if light == 'green':
    action = 'drive'
elif light == 'yellow':
    action = 'stop'
elif light == 'green':
    action = 'drive'
else:
    action = 'proceed with caution - the light is broken!'
print(action)

proceed with caution - the light is broken!


You can also add in other logical expressions

In [148]:
light = 'blue'
action = 'drive'

if not light == 'green':
    action = 'stop'

print(action)

stop


## Iteration
Loops are the primary way that we can call the same series of commands over and over to accomplish a task.

### While loops
The while loop is the most basic of the Python looping structures. The while loop continues to execute the block of code as long as the test at the top of the loop is true. 

In [149]:
x = 0
while x != 10:
    x = x + 1
    print(x)
    

1
2
3
4
5
6
7
8
9
10


Using loops can occasionally end up with you stuck in an infinite loop if there's a logical error in the code or a problem with the input data. To stop execution on the command line, hit control-c, if you're in a Jupyter notebook, hit Escape one (to enter command mode), then hit I twice.

In [150]:
x = 10
while x > 0:
    print(x)
    x -= 1

10
9
8
7
6
5
4
3
2
1


You may occasionally encounter `break`, `continue`, and `pass` statements: `break` will jump out of the closest enclosing loop; `continue` immediately causes execution to jump to the top of the loop; and `pass` does nothing - it's simply an empty placeholder statement.

The following code demonstrates the `break` statement by jumping execution out of the loop when the word 'done' is the entry into the loop. Notice that 'done' is never printed since the break statement jumps out of the loop

In [151]:
entries = ['not right', '#please do not print this', '#do not print this', 'stop']
index = 0
while True: 
    line = entries[index]
    index += 1
    if line == 'stop':
        break
    print(line)
print('Finished!')

not right
#please do not print this
#do not print this
Finished!


We can add in the `continue` statement to jump back to the top of the loop without finishing execution of the rest of the loop. In this case, the `continue` statement will only be reached if a hashtag is placed before the word (much like a comment). In that case, the 

In [152]:
entries = ['not right', '#please do not print this', '#do not print this', 'stop']
index = 0
while True:
    line = entries[index]
    index += 1 
    if line[0] == '#':
        continue
    if line == 'stop':
        break
    print(line)
print('Finished!')

not right
Finished!


### For loops
For loops step through items in a sequence or other objects that can be iterated over.

In [153]:
for x in [1, 2, 3, 4, 5]:
    print(x)

1
2
3
4
5


This works equally well with strings

In [154]:
address = ['Four','score','and','seven','years','ago']
for word in address:
    print(word)

Four
score
and
seven
years
ago


Let's add the numbers from 1 to 5 in a loop

In [155]:
total = 0
for y in [1, 2, 3, 4, 5]:
    total += y
print(total)

15


You can iterate over tuples just as well

In [156]:
total = 0
for y in (1, 2, 3, 4, 5):
    total += y
print(total)

15


You can also iterate over characters in a string

In [157]:
for c in 'words':
    print(c)

w
o
r
d
s


You can use for loops with multiple assignments

In [158]:
for (a, b, c) in [(1, 2, 3), (4, 5, 6)]:
    print(a, b, c)

1 2 3
4 5 6


### List comprehensions
List comprehensions combine for loops with lists to make a very compact and readable way of creating lists. Suppose you want to create a matrix of the squared value of a set of numbers (we'll see this is super easy with `Numpy`, but for now it will illustrate this concept). We could start with a `for` loop:

In [97]:
mylist  = [1,2,3,4,5,6,7,8,9,10]
squares = []
for n in mylist:
    squares.append(n**2)
print(squares)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


However, we can do this with a list comprehension in one line:

In [98]:
squares = [n**2 for n in mylist]
print(squares)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


You can construct a list comprehensions with the basic format of 

[ {some expression} `for` {the variable used for iteration} `in` {the quantity to iterate over} ]

So instead, we could have multiplied each value by 3

In [99]:
newlist = [n * 3 for n in mylist]
print(newlist)

[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]


You can also filter the values by placing an `if` statement at the end and the list comprehension will only iterate over values that return `True` from the `if` statement. For example, if we only wanted the list to include the squared of odd values in the list:

In [100]:
squares = [n**2 for n in mylist if n % 2 != 0]
print(squares)

[1, 9, 25, 49, 81]


Or we could filter out those values less than 5

In [101]:
squares = [n**2 for n in mylist if n < 5]
print(squares)

[1, 4, 9, 16]


As you can see, this becomes extremely readable - one of the touted benefits of Python by fans of the language. In the above example, it pretty much reads: "Square the value for each value in the list that's less than 5." It's very close to how we would speak.

#### \*\*\*Reading\*\*\*

Please read [Iteration](https://www.py4e.com/html3/05-iterations) from Python for Everybody by Charles Severance.

Supplementary videos that cover this material are also [available here](https://www.py4e.com/lessons/loops)

### Example: Data Cleaning. For loops with `enumerate()`, `range()`, and `zip()`
The handy `enumerate()` built in Python function is helpful if your program needs to iterate through a list while also using the index of the current value. Let's say our goal is to check whether the weather data we have on wind speed is valid, or if the sensor has failed. The wind speed has three classes, each corresponding to a different range. If the wind speed is negative, the sensor collected an invalid reading, and we should replace the corresponding value of `wind_class` with the value 'invalid'.

In [17]:
wind_speed = [-2,       3,        -0.7,     24,         -3,        -4,        9         ]
wind_class = ['Class I','Class I','Class I','Class III','Class II','Class II','Class II']

To do this, we could loop through the index of each array. The `range()` function enables this and when transformed to a list, produces a sequential list of values from 0 to the argument minus one, by increments of 1:

In [18]:
list(range(4))

[0, 1, 2, 3]

In [19]:
list(range(7))

[0, 1, 2, 3, 4, 5, 6]

Before we start any of that, let's make a copy of our list so we don't destroy any of the original in the process. Here's an example of what was described previously remember that this can happen. This won't be a problem for Numpy arrays, which we'll discuss later in detail, but it is important to know for lists.

In [20]:
a    = [1,2,3,4]
b    = a
b[0] = 'flamingo'
print(b)
print(a)

['flamingo', 2, 3, 4]
['flamingo', 2, 3, 4]


So we copy the list first

In [24]:
corrected_class = wind_class.copy()
print(corrected_class)

['Class I', 'Class I', 'Class I', 'Class III', 'Class II', 'Class II', 'Class II']


So here we could write the loop to clean the data above:

In [25]:
for i in range(len(wind_speed)):
    if wind_speed[i] < 0:
        corrected_class[i] = 'invalid'
print(corrected_class)

['invalid', 'Class I', 'invalid', 'Class III', 'invalid', 'invalid', 'Class II']


Enumerate adds the ability to have an index that corresponds to each value in a container and iterate over both. It makes a sequence of tuples that contain the ndex and the corresponding value 

In [26]:
list(enumerate(wind_class))

[(0, 'Class I'),
 (1, 'Class I'),
 (2, 'Class I'),
 (3, 'Class III'),
 (4, 'Class II'),
 (5, 'Class II'),
 (6, 'Class II')]

Let's take this concept and apply it to our problem:

In [27]:
for (i, speed) in enumerate(wind_speed):
    if speed < 0:
        corrected_class[i] = 'invalid'
print(corrected_class)

['invalid', 'Class I', 'invalid', 'Class III', 'invalid', 'invalid', 'Class II']


While we've solved the problem at hand, this example also gives us the opportunity to introduce another helpful built in Python function, `zip()` which aggregates elements from two containers. 

In [35]:
# Get the indices
index = list(range(len(wind_class)))
print(index)

# Get the values that correspond to each index
print(wind_class)

# "zip" the two together
zipped = zip(index,wind_class)
print(list(zipped))

[0, 1, 2, 3, 4, 5, 6]
['Class I', 'Class I', 'Class I', 'Class III', 'Class II', 'Class II', 'Class II']
[(0, 'Class I'), (1, 'Class I'), (2, 'Class I'), (3, 'Class III'), (4, 'Class II'), (5, 'Class II'), (6, 'Class II')]


As you can see it "zips" the two together making tupes of the two input arrays. Let's reproduce the functionality we just saw with enumerate with zip:

In [33]:
for (i, speed) in zip(index,wind_speed):
    if speed < 0:
        corrected_class[i] = 'invalid'
print(corrected_class)

['invalid', 'Class I', 'invalid', 'Class III', 'invalid', 'invalid', 'Class II']


## Core Python functions
While we'll discuss functions in detail in the next module, but there are a number of important built-in functions to be aware of. A full list of Python's built in functions can be found on the [official Python website](https://docs.python.org/3.6/library/functions.html#func-list). 

### `print()` and `format()`
Of all of the built in functions in Python, `print()` is likely to be the one you use by far the most. It's helpful as part of the debugging process as well as to provide output from your programs.

In [38]:
a = 'First'
b = 42
c = 'Third'
print('First') # You can print strings
print(42)      # You can print numbers
print(a,b,c)   # Separating arguments with commas 

First
42
First 42 Third


It's often useful to mix text and data from variables. Suppose you had a madlib program where the user provided two words and you wanted to insert them into your madlib. One way we can do this is to break up a string into parts and concatenate them together

In [9]:
adjective = 'shiny'
noun      = 'broccoli'
madlib    = 'You can\'t be serious!\nThat ' + adjective + ' dog ate all that ' + noun + '?'
print(madlib)

You can't be serious!
That shiny dog ate all that broccoli?


You'll notice that there are a few backslashes before apostrophes. These are known as escape sequences since they prevent the Python interpreter from incorrectly evaluating them as part of the code rather than as part of the string. One's you'll commonly encounter:

|Escape Sequence| Produces |
|---------------|----------|
| `\\`            | `\`        |
| `\'`            | `'`        |
| `\"`            | `"`        |
| `\t`            | a tab    |
| `\n`            | newline  |

The madlib example above requires typing multiple strings and concatenating them together. This can all be merged into one string with the use of the `format()` method for strings (we'll talk more about methods when we discuss classes later). Let's rewrite the madlib example using this method.

In [11]:
madlib    = 'You can\'t be serious!\nThat {} dog ate all that {}?'.format(adjective,noun)
print(madlib)

You can't be serious!
That shiny dog ate all that broccoli?


Using these format strings and leaving spaces for the variables we want to insert allows us to more easily modify the strings without entirely rewriting that line of code (and adding in concatenation operators and apostrophes all over the place).

But this method can also shine through with numerical values and allow us to format these in custom ways. This is a tool that will also come in handy when making legends, titles, and axes for plots later on.

In [13]:
height = 0.234573534197
weight = 23.789325129594
summary = 'Height = {}, weight = {}'.format(height,weight)
print(summary)

Height = 0.234573534197, weight = 23.789325129594


This contains far too many decimal places, though. So we can provide formatting instructions. You can find the full documentation on how this works in the [Python documentation](https://docs.python.org/3.4/library/string.html#formatstrings). Let's say we wanted the weight to be formatted as an integer and the height to have 3 digits after the decimal place. We can easily do this as follows:

In [40]:
summary = 'Height = {0:7.3f}, weight = {1:4.0f}'.format(height,weight)
print(summary)

Height =   0.235, weight =   24


The `format()` method has its own mini language that is quite powerful, but you really just need the basics to get the most value out of it. The template string (in the brackets) is what defines how the text is presented. Let's break down that example a bit more:
<img src="img/format.png">

You can also replace the argument numbers with variable names, (below we use `h` for height and `w` for weight) which you then use in the argument of the `format()` method

In [41]:
summary = 'Height = {h:7.3f}, weight = {w:4.0f}'.format(h=height,w=weight)
print(summary)

Height =   0.235, weight =   24


If you leave off the position identifiers, `format()` inserts the variables into the string in the order that they appear from left to right

In [42]:
summary = 'Height = {:7.3f}, weight = {:4.0f}'.format(height,weight)
print(summary)

Height =   0.235, weight =   24


### `input()`

A simple input function

In [44]:
x = input('Enter a number: ')
print('Your number is ' + x)

Enter a number: 42
Your number is 42


### core math functions: `abs()`, `min()`, `max()`, `pow()`, `round()`, `sum()`
Each of these core functions do what they say by...

Taking the absolute value of a number

In [8]:
print(abs(-7))

7


Calculating the minimum or maximum value from a collection of numbers

In [2]:
print(min(4,7,-3,2,-8,44.67))
print(max(4,7,-3,2,-8,44.67))

-8
44.67


This also works with lists and tuples

In [12]:
values_list = [4,7,-3,2,-8,44.67]
print(min(values_list))
print(max(values_list))

-8
44.67


In [13]:
values_tuple = (4,7,-3,2,-8,44.67)
print(min(values_tuple))
print(max(values_tuple))

-8
44.67


You can even use these for text analysis for calculating the first string in a sequence alphabetically

In [16]:
values_text = ['bitcoin','alphazero','crazy horse','zelda','d is for doctor']
print(min(values_text))
print(max(values_text))

alphazero
zelda


The `pow(x,y)` function returns $x^y$

In [35]:
print(pow(4,2))
print(pow(64,0.5))

16
8.0


The `round()` function rounds to the nearest whole number by default

In [37]:
print(round(2.34363))
print(round(24.6473))

2
25


You can also specify the level of precision you'd prefer it to round to instead. For example, if you'd like to round to the nearest 2-decimal places: `round(<number to round>,<decimal places to round to>)`

In [38]:
print(round(2.34363,2))
print(round(24.6473,2))

2.34
24.65


Of course, you can take the sum over numberical values in containers

In [3]:
values_list = [4,7,-3,2]
values_tuple = (4,7,-3,2)
print(sum(values_list))
print(sum(values_tuple))

10
10


### `len()`
Calculates the length (or number of items in) a container

In [19]:
values_list  = [4,7,-3,2,-8,44.67]
values_tuple = (4,7,-3,2,-8,44.67)
values_text  = ['bitcoin','alphazero','crazy horse','zelda','d is for doctor']
print(len(values_list))
print(len(values_tuple))
print(len(values_text))

6
6
5


### `type()`
Returns the type of object (recall that we used this extensively when introducing the various Python data types)

In [21]:
print(type(4))
print(type(4.4))
print(type('this'))
print(type([2,3,4]))
print(type((2,3,4)))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'list'>
<class 'tuple'>


### `any()`, `all()`
These logical functions are helpful tools to check whether any or all elements in a container are true.

In [22]:
mylist = [True,False,False,False]
print(any(mylist))
print(all(mylist))

True
False


**Practical example**: How do I determine if any or all of the numbers in a list are odd?

First, how do we test if a number is odd or even?

In [28]:
# Test whether the number is odd or even
number = 8
if number % 2 == 0:
    print('even')
else:
    print('odd')

even


We'll start with our list and we'll first create a new list where we test whether or not each item in this list is a multiple of 2. We can check whether or note each member of the list is odd or even using the modulus (`%`) operator.

In [30]:
numbers = [2,4,5,8,10,200]
isodd   = [] # Start with an empty list and append to it
for i in numbers:
    isodd.append(i % 2 == 0)
print(isodd)
print(any(isodd))
print(all(isodd))

[True, True, False, True, True, True]
True
False


### Type casting: `int()`, `float()`, `bool()`, `str()`, `list()`, `tuple()`


In [31]:
list((1,3,4))

[1, 3, 4]

### `range()`


In [34]:
list(range(3))

[0, 1, 2]

In [1]:
list(range(4,7))

[4, 5, 6]

## Comments
Comments are lines in a program that are not executed and often used to enhance program readability.

In [160]:
'''
This is a longer block comment
It can be multiple lines if you want
'''

'''The comment can also be one line'''

# There are also inline comments
x = 7 # They can also go at the end of a line of code

We'll be moving from proceedural programming (what we've discussed so far) to object oriented programming in the next section.