<a href="https://colab.research.google.com/github/IndraniMandal/CSC310-S20/blob/master/02_python_programming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python for Data Science

* Anaconda3 ([www.anaconda.com](https://www.anaconda.com))
    * Python 3.x
    * Includes ALL major Python data science packages
        * Sci-kit learn
        * Pandas
        * PlotPy
    * Jupyter Notebooks can be run from here

*   Google Colab -cloud based environment









## A Whirlwind Tour of Python

If you are not familiar with Python or you feel you are rusty then I recommend looking Jake VanderPlas’ intro to Python:

[A Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/)

In [None]:
farm_list=['cow','sheep', 'goat', 'horse', 'pig']
for animal in farm_list:
  print(animal)

cow
sheep
goat
horse
pig


In [None]:
ans=3+1.4
print('the answer is ', ans)

the answer is  4.4


In [None]:
type(ans)

float

## Python - simple commands!

Python is an interactive interpreter started from the shell.


In [None]:
7/2

3.5

In [None]:
print("hello world!")

hello world!


In [None]:
print("hi")

hi


In [None]:
3 + 10.5

13.5

But we are going to do this in Jupyter Note distribution of Google Colab!

## Loading Files

Assume that we have the following program stored in a file called `helloworld.py` in the folder `assets`:
```Python
"""                                                                                                               
helloworld.py
This is the classic program every programmer writes when he or she learns
a new programming language.
"""

def hello():
    "Just print 'hello world!' and that's it"
    print("hello world!") # print inserts a newline char    
```

In [None]:
import helloworld

### Calling Functions in Modules

Functions belong to modules - if you want to execute a function in a module you have to provide the module name as a qualifier (and the folder the module lives in).

In [None]:
helloworld.hello()

hello world!


One of the most helpful features of Python is the `help` function callable on any Python object.

In [None]:
help(helloworld)

Help on module helloworld:

NAME
    helloworld

DESCRIPTION
    helloworld.py
    This is the classic program every programmer writes when he or she learns
    a new programming language.

FUNCTIONS
    hello()
        Just print 'hello world!' and that's it

FILE
    /content/drive/My Drive/CSC310/notes/assets/helloworld.py




**Docstrings shine!!!** - automatically generated documentation of your module


### Docstring vs Comment

* A docstring should document what your code does
  * Important for the user of your code
  * Docstrings are exported by Python into the help system
* A comment should comment on how your code does it
  * Important for your peer programmers modifying/understanding your code
  * Comments stay internal to the code


## Python - `import *` considered dangerous!

`from <module> import *` -- Any function or variable in <module> is imported into your local scope WITHOUT a module qualifier!


In [None]:
from helloworld import *

In [None]:
hello()

hello world!


**Very Dangerous!** - it can lead to silent name clashes with strange effects on your code!


Consider we have another file `helloagain.py` that also defines a `hello` function:

```Python
"""
helloagain.py

Here we demonstrate that Python silently clobbers names clashes if you are not careful.
"""

def hello():
    "Print out 'hello again!' and that's it"
    print("hello again!")
```

In [None]:
from helloagain import * # Silently overwrote the original hello() - the original is no longer available!!

In [None]:
hello()

hello again!


In [None]:
helloworld.hello()

hello world!


**Never use `from <module> import *`  - you have no control over your name space!** Always use fully qualified function names for import.


In [None]:
import pandas as pd
df = pd.read_csv

In [None]:
from helloworld import hello
from helloagain import hello as hi

In [None]:
hello()

hello world!


In [None]:
hi()

hello again!


## Python - basic programming structures!

### The Loop

In [None]:
for i in range(5): # a for loop with range object
    print(i)

0
1
2
3
4


The `range` function:
```
range(stop) -> range object
range(start, stop[, step]) -> range object
  
Returns an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step.  
range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
```

In [None]:
list(range(5))

[0, 1, 2, 3, 4]

In [None]:
list(range(5,0,-1))

[5, 4, 3, 2, 1]

In [None]:
lst = [1,2,3] # for loop over lists
for e in lst:
    print(e)

1
2
3


In [None]:
lst = ['chicken','turkey','duck']
for e in lst:
    print(e)

chicken
turkey
duck


### The if-then-else statement

In [None]:
x = input("type a value: ")
x = int(x)
if x==2:
    print('x equals 2')
else:
    print('x is something else')


type a value: 2
x equals 2


### The function definition statement

We saw some of that already above. Here is another function definition with parameters.

In [None]:
def inc(x):
    return x+1


In [None]:
inc(3)

4

A slightly more complicated example using a recursive function.

In [None]:
"""
fact.py

An example of a recursive function to
 find the factorial of a number
"""

def factorial(x):
    """
    This is a recursive function to find the factorial of an
     integer x where x >= 0.  The function is not defined
     for x < 0.
    """
    if x == 0:
        return 1
    else:
        return x * factorial(x-1)

In [None]:
factorial(3)

6

## Python Lists

In Python lists are a cornerstone of programming.  Consequently lists have a lot of built-in functionality.

In [None]:
lst = [1,2,3]
lst

[1, 2, 3]

In [None]:
len(lst)

3

In [None]:
lst.append(4)
lst

[1, 2, 3, 4]

In [None]:
lst.reverse()
lst

[4, 3, 2, 1]

In [None]:
lst[1]

3

In [None]:
lst = []
lst

[]

In [None]:
len(lst)

0

Things you can do with lists: <br>
 append(...)<br>
 clear(...)<br>
 copy(...)<br>
 count(...)<br>
 extend(...)<br>
 index(...)<br>
 insert(...)<br>
 pop(...)<br>
 remove(...)<br>
 reverse(...)<br>
 sort(...)<br>
See `help([ ])`

### List Comprehensions

Comprehensions are a short hand notation for constructing lists.

In [None]:
S= [x**2 for x in range(10)]
S

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Another more complicated example.

In [None]:
words = 'The quick brown fox jumps over the lazy dog'.split()
words

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [None]:
stuff = [[w.upper(), w.lower(), len(w)] for w in words]
stuff

[['THE', 'the', 3],
 ['QUICK', 'quick', 5],
 ['BROWN', 'brown', 5],
 ['FOX', 'fox', 3],
 ['JUMPS', 'jumps', 5],
 ['OVER', 'over', 4],
 ['THE', 'the', 3],
 ['LAZY', 'lazy', 4],
 ['DOG', 'dog', 3]]

Note: strings are objects with
member functions!

Note: we are constructing a list
of lists!


## Data Structures

Python has a number of data structures beyond lists that make programming much easier:
* Tuples
* Sets
* Dictionaries


### Tuples

* A tuple consists of a number of values separated by commas
* Though tuples may seem similar to lists, they are often used in different situations and for different purposes.
* Tuples are *immutable*, and usually contain a heterogeneous sequence of elements that are accessed via *unpacking* or *indexing*.
* Lists are *mutable*, and their elements are usually homogeneous and are accessed by *iterating* over the list.


In [None]:
t = (12345, 54321, 'hello!')
t

(12345, 54321, 'hello!')

In [None]:
t[0]

12345

In [None]:
(x, y, z) = t     # pattern matching!
x

12345

In [None]:
empty = ()
len(empty)

0

In [None]:
singleton = 'hello',    # <-- note trailing comma
len(singleton)

1

In [None]:
singleton

('hello',)

### Sets

A set is an unordered collection with no duplicate elements.

In [None]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'} # apple and orange have duplicate entries
basket # show that duplicates have been removed

{'apple', 'banana', 'orange', 'pear'}

In [None]:
'orange' in basket                 # fast membership testing

True

In [None]:
'crabgrass' in basket

False

Sets support the standard set operations such as union, intersection and set difference. Sets can also be built using **set comprehensions** which mirror the mathematical version of set comprehensions.

In [None]:
a = set('abracadabra')
a

{'a', 'b', 'c', 'd', 'r'}

In [None]:
b = set('alacazam')
b

{'a', 'c', 'l', 'm', 'z'}

In [None]:
a | b # union

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [None]:
a & b # intersection

{'a', 'c'}

In [None]:
a - b # difference

{'b', 'd', 'r'}

In [None]:
{x for x in set('abracadabra') if x not in set('abc')} # set comprehension

{'d', 'r'}

### Dictionaries

A dictionary is an unordered set of `key:value` pairs, with the requirement that the keys are unique (within one dictionary).


In [None]:
tel = {'jack': 4098, 'sape': 4139}
tel

{'jack': 4098, 'sape': 4139}

In [None]:
tel['jack'] # looking up a value using a key

4098

In [None]:
tel['guido'] = 4127 # adding a new key:value pair
tel

{'guido': 4127, 'jack': 4098, 'sape': 4139}

In [None]:
del tel['sape'] # removing a key:value pair
tel

{'guido': 4127, 'jack': 4098}

In [None]:
list(tel.keys()) # we can just look at the keys in the dictionary

['jack', 'guido']

In [None]:
list(tel.values()) # we can just look at the values in the dictionary

[4098, 4127]

In [None]:
import pandas as pd

In [None]:
arr= [[1,2,3], [4,5,6], [7,8,9]]
arr

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [None]:
df= pd.DataFrame(data=arr,columns=['a', 'b','c'] )
df

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


In [None]:
df.iloc[1,1]

5

In [None]:
df.iloc[0,2]

3

In [None]:
df['b']

0    2
1    5
2    8
Name: b, dtype: int64

In [None]:
df.iloc[ : , 1]

0    2
1    5
2    8
Name: b, dtype: int64

In [None]:
df.iloc[:, 0:2]

Unnamed: 0,a,b
0,1,2
1,4,5
2,7,8


In [None]:
df.iloc[1: , : ] #[rows, cols]

Unnamed: 0,a,b,c
1,4,5,6
2,7,8,9
