 ## Data Analysis Using [Python](https://www.python.org)

![Gudio Van Rossum](gudio_van_rossum.jpg?raw=true)

## Why Python ?

* #### Easy to learn
* #### Has efficient high level data structures
* #### Elegent Syntax and Dynamic typing
* #### Many third party libraries/modules
* #### Active community support

## Installation

* #### We will be using Anaconda distributed by [CONTINUUM](https://www.continuum.io/downloads)
* #### Anaconda is a completely free [Python](https://www.python.org) distribution (including for commercial use and redistribution). It includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis
* #### Check the python version installed on our training desktop/laptop

## Python Interpreter

* The Python interpreter is usually installed as /usr/local/bin/python or /usr/bin/python
* On Windows machines, the Python installation is usually placed in C:\Python27
* In you lab machines - it will be present in c:\anaconda2\bin

### REPL - read–eval–print loop

```python
>>> print("Let us start learning python")
Let us start learning python 
>>>
```



## Introduction to [Jupyter](http://jupyter.org) notebooks

* #### Starting jupyter notebook
* #### How to get python help
* #### Walk thru basic operations
* #### Magic commands

In [None]:
# My first program
print("I am learning python")

In [None]:
%run run.py

In [None]:
%timeit [a for a in range(0,100000)]

### How to get help ?

In [None]:
%quickref

In [None]:
help(sum)

In [None]:
sum?

In [None]:
import utilities
#reload(utilities)

In [None]:
#Python is dynamically typed language
iam_integer = 100
iam_float = 3.14
iam_str = "Hellow"
iam_bool = True
iam_complex = 3+4j

In [None]:
# Everything in Python is an object
# Everything in Python has a type
# type and object are special objects in python

print(type(iam_integer))
print(type(iam_float))
print(type(iam_str))
print(type(iam_bool))
print(type(iam_complex))

In [None]:
# Intriduction to objects and namespace
a = 2
a = a +1
b = 2

<img src="namespace_example.png" alt="namespace" height="550" width="550" align="left">

* Every objects has an identity which is going to be unique
* **variable a** in the namespace points to object 2
* **variable a** in the namespace points moves to object 3 
* new name b is created in the namespace and points to object 2

In [None]:
print(id(a))
print(id(b))
print(id(2))

<img src="namespace.jpg" alt="Python Namespace" height="542" width="542" align="left">

In [None]:
# There are many other names in this namespace which are brought in by Jupyter
print(dir())

In [None]:
print(utilities.my_dir(dir()))

## [Python Library Reference](https://docs.python.org/2/library/index.html)
#### [Built-in Functions](https://docs.python.org/2/library/functions.html) - Loaded when python is started
#### Standard modules - These are installed along with standard python installation, ex: sys, os, etc
#### [External modules](https://pypi.python.org/pypi) - Can be downloaded from the Python Package Index, ex: numpy, pandas, etc

## Numbers

In [None]:
2 + 2

In [None]:
5 * 3

In [None]:
100/21

In [None]:
100/21.0

In [None]:
import math
radius = 10 # 10 centimeters
area = math.pi * radius**2
print(area)

## Strings

* Strings in python are immutable

In [None]:
str_a = 'This string uses single quotes'

str_b = "This string uses double quotes"

str_c = """This is a multi line string
This is second line
This is third line
"""

str_d = "This doesn't contain escape characters"
str_e = 'There are some "SPECIAL" words in this sentence'
str_f = 'It is fine to use escape character\'s some times'
# There are some special character \t, \n, etc

str_g = "Everything in Python is an object\nEvery object in Python has type\nPython is dynamically typed language"

In [None]:
print(str_g)

### String methods

* Strings can be indexed
* startswith, endswith
* strip, split, replace, partition
* index,count, find
* upper, lower
* join, format
* string slicing


In [None]:
my_string = "Assets under administration : $5.2 trillion, including managed assets : $2.1 trillion"

In [None]:
len(my_string) # returns length of string

In [None]:
my_string.startswith("Assets") # Returns True or False

In [None]:
my_string.endswith("Fidelity")

In [None]:
"   This is a test String ".strip() # removes the leading and trailing spaces

In [None]:
"This line has return line characters at the end\n\n\n".strip("\n")

In [None]:
print(my_string.split())

In [None]:
my_string.find("$") # returns the index first occurance of character $

In [None]:
my_string.count("trillion") # returns the number of occurances of word/character

In [None]:
my_string.count("Fidelity") # returns 0 if the substring is not found

In [None]:
my_string.upper()

In [None]:
my_string.index("trillion")

In [None]:
# string slicing
print(my_string[0:10])  # returns the character starting from zero till 15 ( excluding 15)
print(my_string[10:25]) # returns the character starting from 10 till 25 ( excluding 25)
print(my_string[25:])   # starting with 10 till the end of the string
print(my_string[:25])   # starting from the begining till 25 ( excluding 25)
print(my_string[:])     # complete string

In [None]:
# String concatenation

my_statement = "This" + " " + "is" + " a " + "test statemet"
my_statement

In [None]:
print("*"*3 + " Title " + "*"*3)

## Exercises

* Find the lenght of string "String in Python is an array of characters"
* How many occurance of "people" word are there in below sentence

Fidelity's goal is to make financial expertise broadly accessible and effective in helping people live the lives they want. With assets under administration of \$5.2 trillion, including managed assets of \$2.1 trillion as of April 30, 2015, we focus on meeting the unique needs of a diverse set of customers: helping more than 24 million people invest their own life savings, nearly 20,000 businesses manage employee benefit programs, as well as providing nearly 10,000 advisory firms with technology solutions to invest their own clients' money.

* Extract substring "assets under administration of \$5.2 trillion"  from above sentence using indicies
* Remove "." from the above sentence and split the sentence using "," as the delimiter



## Lists

* List is the  most versatile compound data type, which can be written as a list of comma-separated values (items) between square brackets
* Lists in python are mutable
* Items of list can be any python object
* [List methods](https://docs.python.org/2/tutorial/datastructures.html#more-on-lists): append, extend, insert, remove, pop, index, count, sort, reverse
* in statement to check the presence of an element

In [None]:
my_list = ['Python','java',25,32,43.55,'C++']

In [None]:
len(my_list)

In [None]:
my_list[1]

In [None]:
my_list[1] = 'Java'

In [None]:
my_list

In [None]:
new_list = my_list[1:4]

In [None]:
new_list

In [None]:
my_list.append("DotNet")

In [None]:
my_list

In [None]:
my_list.extend(['R','SPSS','MATLAB'])

In [None]:
my_list

In [None]:
# list can contain duplicate items
my_list.append("Python")

In [None]:
print(my_list)

In [None]:
my_list.count("Python")

In [None]:
# this modifies the original list, sort is in place
my_list.sort()

In [None]:
print(my_list)

In [None]:
# Please do not run this multiple times, pop removes an element each time
last_element = my_list.pop()
last_element

In [None]:
my_list.index("Java")

In [None]:
'Python' in my_list

In [None]:
# this modifies the original list, in place reverese
my_list.reverse()

In [None]:
print(my_list)

In [None]:
my_list.insert(1,'SPSS')

In [None]:
print(my_list)

In [None]:
# List of Lists
list_of_lists = [['Python','C++','Java'],[2.7,4.2,8.0],['Object',2.5],'Main']
list_of_lists

In [None]:
list_of_lists[0]

In [None]:
# Accessing list
for item in my_list: # iterates from the first element to last element
    if(item=='Python'):
        print(item)

## Tuples

* Tuples are very similar to Lists except that they are not mutable
* A tuple consists of a number of values separated by commas enclosed in round brackets
* Tuples can contain mutable objects like lists

In [None]:
my_tuple = 'Equity', # observe the comma at the end

In [None]:
my_tuple

In [None]:
another_tuple = ('Equity Fund','1 Year',13.5)

In [None]:
another_tuple[1]

In [None]:
sorted(another_tuple) # Pelase see the type of the output

In [None]:
len(another_tuple)

In [None]:
tuple_list = ([1,2,3],['a','b','c'],'Another String')

In [None]:
tuple_list[0].append(4)

In [None]:
tuple_list

## Sets

* A set is an unordered collection with no duplicate elements
* Basic uses include membership testing and eliminating duplicate entries
* Support mathematical operations like union, intersection, difference, and symmetric difference.

In [None]:
instrument_types = ['Equity','Fixed Income','Equity','Money Market']
instrument_types_set = set(instrument_types)

In [None]:
instrument_types_set

In [None]:
another_set = {'Fixed Deposits','Equity'}

In [None]:
instrument_types_set.union(another_set)

In [None]:
instrument_types_set.intersection(another_set)

In [None]:
instrument_types_set - another_set

In [None]:
instrument_types_set ^ another_set #items in instrument_types_set or another_set but not both

## Dictionaries

* Associative arrays or Hash tables
* An unordered set of key: value pairs
* Key should by any immutable type

In [None]:
my_dict = {'name':'Python','version':2.7,'objects':['List','Tuple','Set']}

In [None]:
my_dict_list = dict([('x',20),('y',40)])

In [None]:
my_dict_list

In [None]:
person = dict(name='Mark',age=25,language='English')

In [None]:
person

In [None]:
my_dict['name']

## Exercises

* Explore [**range**](https://docs.python.org/2/tutorial/controlflow.html#the-range-function) function
* Explore [**del**](https://docs.python.org/2/tutorial/datastructures.html#the-del-statement) statement
* Explore [**format**](https://docs.python.org/2/tutorial/inputoutput.html#fancier-output-formatting) formatting