# Learning Python, Numpy, Panda and Matplotlib for Data Science part-1 Python

The intent of this exercise is to improve the background skill for learning Data Science and Machine Learning. 

It mainly based on Stanford [cs231n python tutorial][1] and [cs228 jupyter notebook][2] and added some other materials which are listed in reference. I plan to cover:

- Basic Python: Basic data types (Collectors, Lists, Dictionaries, Sets, Tuples), Functions, Classes; Functional programming.- Numpy: Arrays, Array indexing, Datatypes, Array math, Broadcasting.
- Matplotlib: Plotting, Subplots, Images.
- IPython: Creating notebooks, Typical workflows; Seaborn.

This exercise is written with Python 3.7, some syntax may not compatible with Python 2.7, see [differences][3].

[1]: http://cs231n.github.io/python-numpy-tutorial/
[2]: https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb
[3]: https://www.geeksforgeeks.org/important-differences-between-python-2-x-and-python-3-x-with-examples/

In [1]:
# ![learning plan map](https://www.evernote.com/l/AS7k_sVgE65KK5n0wBrl_-6cB04fNIYrSzkB/image.png)

## Reference
- [cs228 python tutorial][3]
- [Top 15 Python Libraries for Data Science in 2017][4]
- [Top 5 Python IDEs For Data Science][5]

[3]: https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb
[4]: https://medium.com/activewizards-machine-learning-company/top-15-python-libraries-for-data-science-in-in-2017-ab61b4f9b4a7
[5]: https://www.datacamp.com/community/tutorials/data-science-python-ide

## Why Python?
See short comparisons [Comparison of top data science libraries for Python, R and Scala [Infographic]][1] and [Which Languages Should You Learn For Data Science?][2]

![compare R, Scala, Python][3]

[1]: https://medium.com/activewizards-machine-learning-company/comparison-of-top-data-science-libraries-for-python-r-and-scala-infographic-574069949267
[2]: https://medium.freecodecamp.org/which-languages-should-you-learn-for-data-science-e806ba55a81f
[3]: https://cdn-images-1.medium.com/max/1800/0*gorWChPPTZpr9ULL.png

## Anaconda
Anaconda helps to create different development environments and install required packages.
- [Python Tutorial: Anaconda - Installation and Using Conda][1]
- [Use Anaconda][2]

[1]: https://www.youtube.com/watch?v=YJC6ldI3hWk
[2]: https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c

## Python

Copying some notes from [Moving to Python from other computer languages][1] here;
- Significant Whitespace
    Whitespace and indents are important. Python uses whitespace to signify end-of-line and code blocks. This may annoy you at first, but the more you read the code the more you will like it.
- Where you would use `<vector T>`, use lists, or tuples, that is [] or (). Where you would use `<map T1, T2>`, use dictionaries, that is {} .
- For iteration: 
    - For item in alist: iterates over all the items in alist, one by one .. where alist is a sequence, i.e. a list, tuple, or string. 
    - To iterate over a sublist, use slices: for item in alist[1:-1]: does as above, but omits the first and last items.
- For reference the excellent [Python Quick Reference][3] and the [Module Index of the Python][4]
- See [Language comparison][2] to speedup language, [PEP 8 -- Style Guide for Python Code][5] for coding style. 

[1]: https://wiki.python.org/moin/MovingToPythonFromOtherLanguages
[2]: https://wiki.python.org/moin/LanguageComparisons
[3]: http://rgruet.free.fr/PQR24/PQR2.4.html
[4]: https://docs.python.org/3.7/py-modindex.html
[5]: https://www.python.org/dev/peps/pep-0008/

### Reference
- [cs231n python tutorial][1] 
- [cs228 jupyter notebook][2]
- [Python cheat sheet][3], source of cheat sheet below.
- [Python 3 in mindmap][4], [readme and notebook version link][5]

[1]: http://cs231n.github.io/python-numpy-tutorial/
[2]: https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb
[3]: https://www.datacamp.com/community/tutorials/python-data-science-cheat-sheet-basics
[4]: http://coodict.github.io/python3-in-one-pic/
[5]: https://github.com/coodict/python3-in-one-pic/blob/master/README.md

![python cheat sheet][1]

[1]: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Cheat+Sheets/content_pythonfordatascience.png

### Data Type

#### Numbers

In [2]:
x = 5
print(x, type(x))

5 <class 'int'>


In [3]:
print(x + 1)   # x only assigned once
print(x - 1)
print(x * 1)
print(x / 2)
print(x ** 2)

6
4
5
2.5
25


In [4]:
print(x)   # x is assigned before or run-all
x += 1     # x is re-assigned
print(x)
x *= 2     # double the current x value

5
6


#### Strings

In [5]:
hello = 'hello'   # String literals can use single quotes
world = "world"   # or double quotes; it does not matter.
print(hello), len(hello)

hello


(None, 5)

In [6]:
hw = hello + ' ' + world + '! '  # String concatenation
print(hw)
print(hw * 2)

hello world! 
hello world! hello world! 


In [7]:
s = "hello"

print(s.capitalize())  # Capitalize a string, only the first letter; -> prints "Hello"
print(s.upper())       # Convert a string to uppercase; -> prints "HELLO"
print(len(s))
print(s.rjust(7))      # Right-justify a string, padding with spaces; prints "  hello", padded with 2 (= 7 - 5) extra space
print(s.center(7))     # Center a string, padding with spaces; prints " hello ", 
print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another; l -> ell
                                # prints "he(ell)(ell)o"
print ('  world '.strip())      # Strip leading and trailing whitespace; prints "world"

Hello
HELLO
5
  hello
 hello 
he(ell)(ell)o
world


#### Boolean
Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (&&, ||, etc.):
Logic operator is not same as logic comparison, also bit operation is not designed for logic operator, See [more...][1] and very clear presented [here][2].

[1]: https://www.digitalocean.com/community/tutorials/understanding-boolean-logic-in-python-3
[2]: https://www.programiz.com/python-programming/operators

In [8]:
t, f = True, False           # Define some short-cut
print(type(t))               # <class 'bool'>

<class 'bool'>


In [9]:
# logic operator
# see 
print(t and f)               # logic AND; 
print(t or f)                # logic OR; 
print(not f)                 # logic NOT

False
True
True


In [10]:
# comparison operator is not the same purpose as logic operator
print("t != f: ", t != f)  
print("t <> f: ", t == f)  

t != f:  True
t <> f:  False


### Collections
See [Python data structure][1] for more details.

[1]: https://docs.python.org/3/tutorial/datastructures.html

In [11]:
# there are several different types of collections/containers in Python
aList  = ['a',  2, 'c', True]   # don't use keyword as a variable name, it leads to -> TypeError: 'xxx' is callable
aTuple = ('a', 'b', 3, 'a')
aDict  = {'a':1, 'b': True, 'c': "name"}
aSet   = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}

#### List

In [12]:
ls_x = [1, 2, 3, 4]
print(ls_x, type(ls_x))

[1, 2, 3, 4] <class 'list'>


In [13]:
ls_x2 = [[4, 5, 6, 7],[2, 3, 4, 5]]
ls_x2 # it returns
      # [[4, 5, 6, 7], 
      #  [2, 3, 4, 5]]

[[4, 5, 6, 7], [2, 3, 4, 5]]

In [14]:
ls_y = ['a', 'b', 'c']
ls_y.append('d')
print(ls_y)

ls_y[0]                # return 'a'
ls_y.insert(2, 'm')    # return ['a', 'b', 'm', 'c', 'd']
ls_y.pop(2)            # remove item by index, return ['a', 'b', 'c', 'd']
print(ls_y)

['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd']


In [15]:
ls_z = ['a', 'b', 1, 'c', 2, 'c']
ls_z.remove('c')        # remove the first element which value has specified value -> return ['a', 'b', 1, 2, 'c']
print(ls_z)
# see more ... https://stackoverflow.com/questions/9405322/python-array-v-list 

['a', 'b', 1, 2, 'c']


##### Indexing and Slicing
[array indexing and slicing discussion][1]
[Table][2]

```python
Python indexes and slices for a six-element list.
Indexes enumerate the elements, slices enumerate the spaces between the elements.

Index from rear:    -6  -5  -4  -3  -2  -1      a=[0,1,2,3,4,5]    a[1:]==[1,2,3,4,5]
Index from front:    0   1   2   3   4   5      len(a)==6          a[:5]==[0,1,2,3,4]
                   +---+---+---+---+---+---+    a[0]==0            a[:-2]==[0,1,2,3]
                   | a | b | c | d | e | f |    a[5]==5            a[1:2]==[1]
                   +---+---+---+---+---+---+    a[-1]==5           a[1:-1]==[1,2,3,4]
Slice from front:  :   1   2   3   4   5   :    a[-2]==4
Slice from rear:   :  -5  -4  -3  -2  -1   :
                                                b=a[:]
                                                b==[0,1,2,3,4,5] (shallow copy of a)
```

```python
a[start:end] # items start through end-1
a[start:]    # items start through the rest of the array
a[:end]      # items from the beginning through end-1
a[:]         # a copy of the whole array

a[start:end:step] # start through not past end, by step
```

[1]: https://stackoverflow.com/questions/509211/understanding-pythons-slice-notation/509295#509295
[2]: http://wiki.python.org/moin/MovingToPythonFromOtherLanguages

In [16]:
r = [1,2,3,4]
r[1:1]

[]

In [17]:
r[1:1] = [9,8]     # insert elements at slot 1 (in front of 1 to 1)
r

[1, 9, 8, 2, 3, 4]

In [18]:
r[1:1] = ['blah']
r

[1, 'blah', 9, 8, 2, 3, 4]

In [19]:
r[1:2] = []        # remove elements with index 1 thru index 2
r

[1, 9, 8, 2, 3, 4]

##### Subset 

In [20]:
k = ['a', 'b', 'c', 'd']
print(k[2], k[-2])  # index = 2 from front, and index = -2 (second element from back)

c c


In [21]:
k[1:3]              # from (zero based) index = 1 to index = 3 - 1

['b', 'c']

##### Subset lists of lists

In [22]:
# ls_x2
# [[4, 5, 6, 7], 
#  [2, 3, 4, 5]]

ls_x2[1][0]          # 1st dimension index = 1, 2nd dimension index = 0 -> return 2 

2

In [23]:
ls_x2[1][:2]         # 2nd dimension index = from 0 to (2 - 1) -> return [2, 3]

[2, 3]

#### Dictionary
It has key and value pair (k-v).

In [24]:
# use the same name as before
aDict  = {'a':1, 'b': True, 'c': "name"}
aDict['a']      # return the value of aDict with the key = 'a', -> return 1

1

In [25]:
aDict['d'] = 4  # add a item => return {'a': 1, 'b': True, 'c': 'name', 'd': 4}
aDict['d'] = 7  # change the value of a key -> return {'a': 1, 'b': True, 'c': 'name', 'd': 7}
del(aDict['d']) # remove an item with a key -> return {'a': 1, 'b': True, 'c': 'name'}
aDict

{'a': 1, 'b': True, 'c': 'name'}

##### Set
from https://docs.python.org/3/tutorial/datastructures.html#sets

In [26]:
a = set('abracadabra')  # use 'set()' not '{}' to create set
b = set('alacazam')
print(a)
print(b)
print(a | b)    # not '+'
print(a - b)    # letters in a but not in b
# more ...

{'b', 'd', 'a', 'c', 'r'}
{'l', 'm', 'a', 'c', 'z'}
{'b', 'l', 'd', 'a', 'm', 'c', 'r', 'z'}
{'d', 'b', 'r'}


#### Array
An array has fix length and same type elements; it has specific features that are in the C array libraries. We need to import a module first.  
- [python array doc](https://docs.python.org/3.1/library/array.html)

In [27]:
import array

y = array.array('l', [1, 2, 3, 4])
print(y, type(y))

array('l', [1, 2, 3, 4]) <class 'array.array'>


### Function
From [w3schools][1]:
A function is a block of code which only runs when it is called. You can pass data, known as parameters, into a function. A function can return data as a result.

[1]: https://www.w3schools.com/python/python_functions.asp

In [45]:
def my_function(fname):         # function with parameter
  print(fname + " Refsnes")

my_function("Emil")
my_function("Tobias")
my_function("Linus")

Emil Refsnes
Tobias Refsnes
Linus Refsnes


In [46]:
# def my_function(x):           # name duplication, this is not allowed in scala 
def my_5times_function(x):
  return 5 * x

print(my_5times_function(3))
print(my_5times_function(5))
print(my_5times_function(9))

15
25
45


### Class
Python supports Object Oriented Programming.
Examples below are from [w3schools][1]:

[1]: https://www.w3schools.com/python/python_functions.asp

In [47]:
# Class has property
class MyFirstClass:             # keyword `class` is lower case, class name is caspital (recommended)
    x = 5
    
# instantiate one object
p1 = MyFirstClass()
print(p1.x)

5


#### __init__() function 
All classes have a function called __init__(), which is always executed when the class is being initiated.

Use the __init__() function to assign values to object properties, or other operations that are necessary to do when the object is being created:

In [48]:
class Person:  # a Person class with property
  def __init__(self, name, age):
    self.name = name
    self.age = age

# use class
p1 = Person("John", 36)

print(p1.name)  # return John
print(p1.age)   # return 36

John
36


In [49]:
class Person:    # a Person class with property and method/function
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def greeting(self):
        print('My name is ' + self.name)

# use class's method
p2 = Person('Smith', 36)
p2.greeting()

# Note: The self parameter is a reference to the class itself, and is used to access variables that belongs to the class; only in definination.

My name is Smith


### Functional Programming
- Mainly use mutable data structure to avoid change of state. 
- Function as a first class citizen, which can be passed in to another function, and return a function. 
- Avoid using iteration (e.g. for ) and loops (e.g. while)
- Functions (standalone) and methods (part of object) are different.

#### Reference
- [Introduction to Functional Programming in Python][1]
- [lambda, map, and filter in Python][2]
- [zip in Python][3]
- [List comprehension in Python][4]
- [Introduction to Functional Programming in Python][5]

[1]: https://www.dataquest.io/blog/introduction-functional-programming-python/
[2]: https://medium.com/@happymishra66/lambda-map-and-filter-in-python-4935f248593
[3]: https://medium.com/@happymishra66/zip-in-python-48cb4f70d013
[4]: https://hackernoon.com/list-comprehension-in-python-8895a785550b
[5]: https://www.dataquest.io/blog/introduction-functional-programming-python/

#### lambda
From [Understanding Lambda Expressions][1] and [lambda, map, and filter in Python][2]

Lambda expressions (or lambda functions) are essentially blocks of code that can be assigned to variables, passed as an argument, or returned from a function call, in languages that support high-order functions. lambda operator or lambda function is used for creating small, one-time and anonymous function objects in Python.

![lambda function image][3]  

[Image source][1]

[1]: https://medium.com/@luijar/understanding-lambda-expressions-4fb7ed216bc5
[2]: https://medium.com/@happymishra66/lambda-map-and-filter-in-python-4935f248593
[3]: https://www.evernote.com/l/AS7KGSPaAP9I0Z_MC5MZpLcAOBqJGObdUOkB/image.png

In [42]:
# normal function
def square(x):
    return x * x

a = square(5)    # return 25
a

25

In [50]:
# lambda expression
sq = lambda x: x * x        # just assign the expression to a variable, which has one parameter.

b = sq(5)
print(b, type(b), type(sq)) # b is a 'int' and sq is a 'function'
                            # lambda can have multiple variables, but only one expression

25 <class 'int'> <class 'function'>


#### map
map each element in domain to the elements in range.

![map diagram][1]

##### Syntax
**map(function_object, iterable1, iterable2,...)**

[1]: https://www.evernote.com/l/AS4aJSXWjudNs783Acj2mSmDLh2HGHXDkZYB/image.png

In [3]:
ls_x = [3, 4, 6, 7, 9]
print(ls_x)
ls_x_sq = map(lambda x : x * x, ls_x)   # return a map object,
print(list(ls_x_sq))                    # need to add list() to convert back return of mapping -> print [9, 16, 36, 49, 81]

[3, 4, 6, 7, 9]
[9, 16, 36, 49, 81]


In [6]:
# map can have multiple input
ls_a = [1, 2, 3]
ls_b = [4, 5, 6]
ls_c = map(lambda x, y : x + y, ls_a, ls_b) 
print(list(ls_c))

# or we can define a function
def fn_add2lists(a, b):
    return a + b
ls_d = map(fn_add2lists, ls_a, ls_b)
print(list(ls_d))

[5, 7, 9]
[5, 7, 9]


#### filter
filter function expects two arguments, function_object and an iterable. function_object returns a boolean value. function_object is called for each element of the iterable and filter returns only those element for which the function_object returns true.

##### Syntax 
**filter(function_object, iterable1)**

In [17]:
ls_d = filter(lambda x : x % 2 == 0, ls_b)   # even number in the list 
print(list(ls_d), type(ls_d))

# or define a function for re-use
def fn_isEven(x):
    return x % 2 == 0

ls_e = filter(fn_isEven, ls_b)
print(list(ls_e), type(ls_e))

[4, 6] <class 'filter'>
[4, 6] <class 'filter'>


#### reduce
When we want continuously apply a function to a collection. For example, add 1 thru 4 together:
We need to `import functools` first see [Higher-order functions and operations on callable objects][2] for more functions.

[1]: http://book.pythontips.com/en/latest/map_filter.html
[2]: https://docs.python.org/3.7/library/functools.html

In [30]:
from functools import reduce                     # need to import the library
sum = reduce((lambda x, y: x + y), [1, 2, 3, 4]) # return 10
print('reduce [1, 2, 3, 4] with lambda "+" = ', sum)

# or define a function for re-use
def fn_addAll(x, y):
    return x + y
sum2 = reduce(fn_addAll, [1, 2, 3, 4])           # return 10
print('reduce [1, 2, 3, 4] with user-defined fn_addAll = ',sum2)

# try different function
def fn_multipleAll(x, y):
    return x * y
mul1 = reduce(fn_multipleAll, [1, 2, 3, 4])     # just change the function object, return 24
print('reduce [1, 2, 3, 4] with user-defined fn_multipleAll = ',mul1)

# There is another library, which eliminates defining operation function and makes simpler.
import operator

sum3 = reduce(operator.add, [1, 2, 3, 4])        # return 10
print('reduce [1, 2, 3, 4] with pre-defined operator "add" = ', sum3)
mul3 = reduce(operator.mul, [1, 2, 3, 4])        # return 24
print('reduce [1, 2, 3, 4] with pre-defined operator "mul" = ', mul3)

reduce [1, 2, 3, 4] with lambda "+" =  10
reduce [1, 2, 3, 4] with user-defined fn_addAll =  10
reduce [1, 2, 3, 4] with user-defined fn_multipleAll =  24
reduce [1, 2, 3, 4] with pre-defined operator "add" =  10
reduce [1, 2, 3, 4] with pre-defined operator "mul" =  24


#### zip
It creates a pair by grouping the element from one and the element from another. copying example from [here][1].

[1]: https://medium.com/@happymishra66/zip-in-python-48cb4f70d013

In [58]:
# zip
ls_l = [1, 2, 3]
ls_m = ['a', 'b', 'c']
zipped_n = zip(ls_l, ls_m)      # no need to use lambda
print(list(zipped_n), type(zipped_n))           # convert to list first

[(1, 'a'), (2, 'b'), (3, 'c')] <class 'zip'>


In [63]:
# unzip 
# unzipped_j, unzipped_k = zip(*zipped_n)   # this creates ValueError: not enough values to unpack (expected 2, got 0)
                                            # see https://python-forum.io/Thread-zip-function-does-not-work-as-expected
                                            # this does not make any sense, what happens to "no side effect"
        
unzipped_j, unzipped_k = zip(*[(1, 'a'), (2, 'b'), (3, 'c')])   # this works
print('type is ', type(unzipped_j))         # is not a list
print(unzipped_j, type(unzipped_j))         # is tuple
print(list(unzipped_j))                     # convert to list

type is  <class 'tuple'>
(1, 2, 3) <class 'tuple'>
[1, 2, 3]


#### comprehension
From [Comprehensions 理解][1]: Comprehensions are constructs that allow sequences to be built from other sequences. Started from Python 3, comprehension covers list, set and dictionary.

![syntax][3]

read 
- [set builder notation][2]

[1]: http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html
[2]: https://en.wikipedia.org/wiki/Set-builder_notation
[3]: http://python-3-patterns-idioms-test.readthedocs.io/en/latest/_images/listComprehensions.gif

In [77]:
# for list mapping
ls_g = map(lambda x : 2 * x, range(1, 10))   # return a type map
print(list(ls_g), type(ls_g))

# for list comprehension
comp = [2 * x for x in range(1, 10)]         # return a type list
print(comp, type(comp))

[2, 4, 6, 8, 10, 12, 14, 16, 18] <class 'map'>
[2, 4, 6, 8, 10, 12, 14, 16, 18] <class 'list'>


Set comprehension:

Example from http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html
We have a set: ['Bob', 'JOHN', 'alice', 'bob', 'ALICE', 'J', 'Bob'] and would like to make it to {'Bob, 'Alice', 'John'}. 

In [108]:
ls_names = ['Bob', 'JOHN', 'alice', 'bob', 'ALICE', 'J', 'Bob']
print(ls_names, type(ls_names))
set_names = set(ls_names)             # eliminate duplication, but case sensitive, so both 'Bob' ans 'bob' stay.
print(set_names, type(set_names))     # {'Bob', 'JOHN', 'alice', 'ALICE', 'J', 'bob'} <class 'set'>

# for set mapping ?

# for set comprehension
# [name[0].upper() + name[1:].lower() for name in set_names if len(name) > 1]   # aList -> ['Bob', 'John', 'Alice', 'Alice', 'Bob']   
{name[0].upper() + name[1:].lower() for name in set_names if len(name) > 1 }   # aSet  -> {'Alice', 'Bob', 'John'}

['Bob', 'JOHN', 'alice', 'bob', 'ALICE', 'J', 'Bob'] <class 'list'>
{'Bob', 'JOHN', 'alice', 'ALICE', 'J', 'bob'} <class 'set'>


{'Alice', 'Bob', 'John'}