# Week 1: Python Basics


This week's material is intended as an introductory text for working with data in Python. I will introduce enough of the basic concepts that allows us eventually to manipulate data and run some Machine Learning algorithms. 
I designed this course for beginners — no previous math, stats, or programming experience required — so you can start learning right away.
It is not necessary to be proficient in Python to be able to be productive in data analysis. Due to time constraints, there are introductory topics which we will not cover in this course such as 'classes' and 'object-oriented programming', that you may find useful. However as we go I will highlight those topics and suggest study material if you wish to pursue more in-depth knowledge.


To build data science applications, we need to give the computer the proper instructions to learn from data. When we give instructions to a computer, we say that we're programming it.

To program a computer, we need to write the instructions in a special language, which we call a programming language. In this course, we'll learn Python, a programming language that offers great support for data science.

We'll start by instructing the computer to add two numbers together: __20 + 5__. 

In [None]:
20+5

Python is an interpreted language. The Python interpreter reads and executes each statement one at a time. For example we start with the following statement that defines a variable __'x'__ and assigns the value 10 to it.

In [None]:
x=10

We can retrieve the value of x by calling it:

In [None]:
x

We can also retrieve the value of x by calling the 'print' function as such:

In [None]:
print(x)

These are the two syntax rules we need to be aware of when we're naming variables:

- We must use only letters, numbers, or underscores (we can't use apostrophes, hyphens, whitespace characters, etc.).
- Variable names cannot start with a number.

__Syntax__ is the structure of statements in a computer language.

The print() function could be also used to return the string 'Hello World!'. A string is a data type used in programming, such as an integer and floating point unit, but is used to represent text rather than numbers. It is comprised of a set of characters that can also contain spaces and numbers. 

In [None]:
print('Hello World!')

### Python Programming Concepts and Mechanics

#### Everything is an Object

Python consistently follows an ___object model___. Every number, string, data structure, function and so on could be referred to as a ___Python object___. An _object_ is characterised by its ___type___ (e.g. string, float or function), ___internal data, attributes and methods___.

![OOP](OOP.jpg)

In __Object-Oriented Programming__, the fundamental building blocks are objects.
It differs from __Procedural programming__, where sequential steps are executed.
An object is an entity that stores information.
A __class__ describes an object's type. 
It defines:
 - What data is stored in the object, known as __attributes__.
 - What actions the object can do, known as __methods__.

An attribute is a variable that belongs to an instance of a class.
A method is a function that belongs to an instance of a class.
Attributes and methods are accessed using dot notation. Attributes do not use parentheses, whereas methods do.

Object.method()
variable = Object.attribute

An instance is specific case of a class. For instance, in the code x = 3, x is an instance of the type int.

A class definition is code that defines how a class behaves, including all methods and attributes.

All methods must include self, representing the object instance, as their first parameter.


For example we define a _string_ s as "Hello World!". Obviously the type of this object is a string. We can recall that information by applying the function __type ()__ on __s__, which returns the value 'str'.

In [None]:
s = "Hello World!"
type(s)

An example of a method that applies to __strings__ is ___lower()___, which puts every letter in the string in a lower case format.

In [None]:
s.lower()


#### Comments

Any text that comes after the hash mark or pound sign __#__ is ignored by python. This feature could be used to add comments to your code or to exclude some part of your code without deleting it. 

In [None]:
#The author of this code is K. Smith
#a=3; b=5; c=10
print("Reached this line")#Simple status report

#### Functions vs. object method calls

A function is called using parentheses and by passing a zero or more arguments (i.e. variables)
```
result = f(x, y, z)
g()
```

Almost every object in Python comes with associated functions, known as methods that have access to the object internal information. Methods are called using the following syntax:

```
obj.some_method(x, y, z)
```
Similar to the example we have seen earlier when we created __s__.


Functions can take both _positional_ and _keyword_ arguments:
```python
result = f(a, b, c, d=5, e='foo')
```

More on this later.

#### Attributes and methods

Objects in Python typically have both attributes (other Python objects stored 'inside' the object) and methods (functions associated with an object that can have access to the object's internal data). Both of them are accessed via the syntax: object.attribute_name

```python
In [1]: a = 'foo'

In [2]: a.<Press Tab>
a.capitalize  a.format      a.isupper     a.rindex      a.strip
a.center      a.index       a.join        a.rjust       a.swapcase
a.count       a.isalnum     a.ljust       a.rpartition  a.title
a.decode      a.isalpha     a.lower       a.rsplit      a.translate
a.encode      a.isdigit     a.lstrip      a.rstrip      a.upper
a.endswith    a.islower     a.partition   a.split       a.zfill
a.expandtabs  a.isspace     a.replace     a.splitlines
a.find        a.istitle     a.rfind       a.startswith
```

#### Tab completion

While entering an expression in the shell, pressing the Tab key will search the namespace for any variable (objects, functions,etc) matching the characters you typed so far:

In [None]:
a = 'foo'


#### Binary operators and comparison

The standard binary math operations and comparisons behave as expected:

In [None]:
12-8

In [None]:
5<=12

In [None]:
y = x

In [None]:
x is y

In [None]:
x is not y

In [None]:
x==y

In [None]:
a = None

In [None]:
a is None

In [None]:
a = 10
b = 6

Add a to b

In [None]:
a + b

Subtract b from a

In [None]:
a - b

Multiply a by b

In [None]:
a*b

Divide a by b

In [None]:
a/b

Floor-divide a by b dropping any fractional remainder

In [None]:
a//b

Raise a to the power b

In [None]:
a**b

In [None]:
a%b # % is the modulo opeator

In [None]:
c = True
d = False

In [None]:
c & d # True if both are True, False otherwise. 

In [None]:
c | d # True if either a or b is True.

In [None]:
a !=b # True is a is not equal to b

##### Scalar Type

Along with the standard library of functions of which we have seen a couple (e.g. type(),print() and the operators), Python has a small set of built-in _types_ for handling numerical data, strings and Booleans. These single value _types_ are sometimes called ___scalar types___. 

For numbers, we the main __types__ we use are integers and float denoted in Python as _int_ and _float_. 

In [None]:
iv = 178994 # example of a int

In [None]:
iv**2

In [None]:
fv1 = 8.33
fv2 = 7.35e-5
# examples of float types

In [None]:
fv2

In [None]:
print(type(fv2))
print(type(iv))

#### Strings 

Python is renowned for being flexible and powerful in processing text also know as strings type in programming. You can write string values by using the single or double quote. 

In [None]:
a  = 'easy way to store a string'

In [None]:
b = "The type of the variable b is "

In [None]:
print(b, type(b))

For multiline strings with line breaks, you can use triple quotes, either ''' or """

In [None]:
c = """
This is a long string
spanning over few
lines

"""

In [None]:
c

In [None]:
print(c)

In [None]:
c.count('\n')

The backslash character within is string is an _escape character_, meaning that it is used to specify special characters like the newline '\n'. In order to write a string literal with backslashes you need to type it twice:

In [None]:
 stri_1 = '12\\44'

In [None]:
print (stri_1)

You can concatenate strings by adding them using the __+__ operator.

In [None]:
s_1 = "Hi there! "
s_2 = "How are you?"

s_3 = s_1 + s_2

In [None]:
print (s_3)

String templating and formatting is an important and useful topic. I will briefly introduce it here but we will revisit it on few different occasions. String objects has a useful method called ___.format()___, that kind be used to create string templates. For example:

In [None]:
template = '{0:.2f} {1:s} are equivalent to US${2:d}'


```python
    . {0:.2f} means to format the first argument as a floating point number with two decimal places
    . {1:s} means to format the second argument with a string
    . {2:d} means to format the third and last argument with a 2 digits integer.
    
```

In [None]:
output = template.format(108.56, "Japanese yen", 1)
print(output)

Sometimes we'll need to create strings with quotation marks inside, like in this example: Churchill's war motto was 'never, never, nerver give up'.

In situations like these, we need to alternate double quotation marks (" ") with single quotation marks (' '):

In [None]:
str_1 = "Churchill's war motto was: 'never, never, nerver give up'"
print(str_1)

### Lists

Besides the scalar data types we have seen so far one very important data type in Python is a ___list___.  It is considered a collection data type. A list is a collection which is ordered and changeable. Allows duplicate members. It is a container of other objects and can take different types within it. List literals are written within square brackets [ ]. For example:

In [None]:
 l_1 = [2,3,4,0.5] 

In [None]:
print(l_1)

Lists allow for _indexing_ and _slicing_. 
#### Indexing



In [None]:
l_1[0] #indexing starts with zero

#### Slicing

In [None]:
l_1[1:3]

In [None]:
l_1[1:]

In [None]:
l_1[:-1]

In [None]:
l_1[1:-2]


In [None]:
len(l_1) # returns the length of the list i.e the number of elements inside.

### Type casting 

In computer science the act of changing the type of a value or variable is usually referred to as type casting. With the few scalar data and list types we have seen so far we can do the following

In [None]:
s = '3.908'

In [None]:
fval = float(s)

In [None]:
type(fval)

In [None]:
int(fval)

In [None]:
bool(0)

In [None]:
bool(fval)

In [None]:
s_2 = str(4.25)
print(s_2)

In [None]:
list(s_2)

## Control flow

Python has few built-in keywords for conditional logic and loops. 

##### if, elif and else

The IF statement is one of the most well-known control flow statement types. It checks for a condition that, if True, evaluates the code in the block that follows.

In [None]:
if x<0:
    print ("it's negative")

#### indentation

Python uses whitespace (tabs, spaces) to structure code instead of braces as in many other languages. In the example above the block that Python will evaluate if the condition is true is within one space indentation. In the following example we add another task:

In [None]:
if x<0:
    print ("it's negative")
print("it's printing")

In [None]:
if x<0:
    print ("it's negative")
    print("it's printing")

In [None]:
if x<0:
    print("It's negative")
elif x == 0:
    print('equal zero')
elif 0<x<=5:
    print('positive but smaller or equal to 5')
else:
    print('positive and larger than 5')
    

If any condition is __True__, no further elif or else block will be reached. With a compound condition using __and__ or __or__, conditions are evaluated left to right and will short-circuit:

In [None]:
a = 5; b = 7

In [None]:
c = 8; d = 4

In [None]:
if a < b  or c > d:
    print('Yes')

In this example, he comparison c < d never gets evaluated because he first comparison is __True__

#### For loops

For loops are for iterating over a collection (like a list) or an __iterater__. The standard syntax for a __for loop__ is:

```python 
   
for value in collection:
    do something with the value

```
Simple example:

In [None]:
for v in l_1:
    print(v)


You can advance a for loop to the next iteration, skipping the remainder of the block, using the __continue__ keyword. Consider this code which sums u integers ina lis ad skips __None__ values.

In [None]:
sequence = [1,2,3,None,4,None]

In [None]:
total = 0

In [None]:
for value in sequence:
    if value is None:
        continue
    total +=value # this is equivelant to writing total =total +1

In [None]:
total

Alternatively a for loop can be exited if a certain condition is met by using the key word __break__

In [None]:
sequence = [1,2,0,4,6,5,2,1]

In [None]:
total_until_5 = 0

In [None]:
for value in sequence:
    if value ==5:
        break
    total_until_5 += value
    

In [None]:
total_until_5

#### While loop

A while loop specifies a condition and a block of code that is to be executed until the condition evaluates to __False__ or the loop is explicitly ended with a __Break__

In [None]:
x = 256
total = 0
while x >0:
    if total > 500:
        break
    total += x
    x = x//2
total

#### Range function

The __range__ function returns an iterator that yields a sequence of evenly spaced integers:

In [None]:
range(10)

In [None]:
list(range(10))

In [None]:
type(range(10))

In [None]:
list(range(0,20,2))

In [None]:
list(range(10,0,-2))

In [None]:
seq=[1,2,3,4]

In [None]:
for i in range (len(seq)):
    val = seq[i]

In [None]:
val

In [None]:
sum_1 = 0
for i in range(100000):
    if i % 3 == 0 or i % 5 ==0:
        sum_1 += i

In [None]:
sum_1

### Tricks and Magic

#### Introspection
Using the question mark (?) before and after a variable or a function will display some general information about the object:

In [None]:
l_2 = [2,4,5,8]

In [None]:
l_2?

In [None]:
print?

##### The %run command

To illustrate who this command work we will write a small code in __spyder__. You can copy and paste the below simple code in an empty file in __spyder__.

In [None]:
seq = [2,4,6,8]
for i in seq:
    print (i/2)

In [None]:
%pwd # this command will retrun the home folder where you want to save the sypder code

In [None]:
#%run xxx.py

A magic command is any command prefixed by the percent symbol ___%___. If you want Python to time a code for you, use the %timeit magic command

In [None]:

%timeit seq = [2,4,6,8]
for i in seq:
    print (i/2)

## Exercises

__rainfall__ is a string that contains the average number of inches of rainfall in Singapore for every month (in inches) with every month separated by a comma. Write code to compute the number of months that have more than 3 inches of rainfall. Store the result in the variable high_rainy_months. 

In [2]:
rainfall = "2.65, 1.46, 2.15, 1.95, 3.35, 3.43, 3.87, 4.23, 4.5, 2.32, 2.76, 1.05"

Write code to count the number of strings in list __items__ that have the character w in it. Assign that number to the variable w_num.

In [3]:
items = ["window", "wild", "table", "row", "town", "machine", "group","Osaka","owing"]