# Python, numpy and pandas basics   

#### Basic data types

##### Numbers

Integers and floats work as you would expect from other languages:


In [None]:
x = 3

In [None]:
print x

In [None]:
print type(x)

In [None]:
print x + 1   # Addition;
print x - 1   # Subtraction;
print x * 2   # Multiplication;
print x ** 2  # Exponentiation;

In [None]:
x += 1 #Increment and assign
print x  

In [None]:
x *= 2# Multiply and assign
print x 

In [None]:
y = 2.5
print y, type(y) 

In [None]:
print y + 1, y * 2, y ** 2 # Prints "3.5 5.0 6.25"

#### ACTIVITY:
Create a variable called 'pi' and assign the float value 3.1415 to it. Print the value of pi.

##### Booleans
Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (`&&`, `||`, etc.):

In [None]:
t, f = True, False
print type(t) 

In [None]:
print t and f # Logical AND;
print t or f  # Logical OR;
print not t   # Logical NOT;
print t != f  # Logical XOR;

In [None]:
print(t & f) # Logical AND;
print(t | f) # Logical OR;

##### Strings

In [None]:
hello = 'hello'   # String literals can use single quotes
world = "world"   # or double quotes; it does not matter.
print hello, len(hello)

In [None]:
hw = hello + ' ' + world  # String concatenation
print hw  

In [None]:
hw12 = '%s %s %d' % (hello, world, 12)  # print style string formatting
print hw12  
print type(hw12) 

String objects have a bunch of useful methods; for example:

In [None]:
s = "hello"
print s.capitalize()  # Capitalize a string
print s.upper()       # Convert a string to uppercase
print s.rjust(7)      # Right-justify a string, padding with spaces i.e adds spaces in front
print s.center(7)     # Center a string, padding with spaces
print s.replace('l', '(ell)')  # Replace all instances of one substring with another;
print '  world '.strip()  # Strip leading and trailing whitespace

#### ACTIVITY:
Create a variable 'firstString' and assign the value 'I am' to it. Create another variable 'secondString' and assign the value 'from INSOFE Batch' to it. 

Create an integer variable 'batchNo' and assign the value 32 to it. Concatenate and print such that the result looks this:
"Hello there: I AM FROM INSOFE BATCH 32!!!"

### Containers

###### Python includes several built-in container types: lists, dictionaries, sets, and tuples.

#### Lists
A list is the Python equivalent of an array, but is resizeable and can contain elements of different types:

In [None]:
xs = [3, 1, 2]   # Create a list
print xs
     # Negative indices count from the end of the list; prints "2"

###### Accessing values inside the list

In [None]:
print xs[2]
print xs[-1], xs[-2], xs[-3]

###### Assigning a value to the list based on index

In [None]:
xs[2] = 'foo'    # Lists can contain elements of different types
print xs

###### Add a new element to the end of the list

In [None]:
xs.append(['bar']) # 
print xs  

###### Add a list to the existing list

In [None]:
xs.append(['bar','bar_again']) # 
print xs  

###### Add multiple values to the list

In [None]:
xs.extend(['new','values'])

In [None]:
xs

###### Remove and return the last element of the list

In [None]:
x = xs.pop()     # 
print x

#### ACTIVITY:
Create a list 'myList' with the following elements in it: 'a', [1, 2, 3] and 'abc' to it. 

Add another element 'batch' to it. Extract out [1,2,3] from it. 

##### Slicing
In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:

In [None]:
str_list = ["a","b","c","d","e","f"]
print str_list

In [None]:
from IPython.display import Image
Image(filename='list.png')

In [None]:
print str_list[1]
print str_list[-1]

In [None]:
print str_list[2:4]    # Get a slice from index 2 to 4 (including 2 and excluding 4)
print str_list[2:]     # Get a slice from index 2 to the end
print str_list[:2]     # Get a slice from the start to index 2 (exclusive)
print str_list[:]      # Get a slice of the whole list
print str_list[:-1]   # Slice indices can be negative

###### Range is a built-in function that creates a list of integers

In [None]:
nums = range(5) # 
print nums         

In [None]:
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print nums        

#### ACTIVITY:
Create a list 'mySecondList' with the following elements in it: 'a', 'b', 23, True.

Extract out the last element using reverse indexing. Change the element at index 1 to 'bat'. Print out the modified list

#### `if` Statements

In [None]:
x = int(raw_input("Please enter an integer: "))

In [None]:
#try:except:

###### Conditional statements

In [None]:
if x < 0:
    x = 0
    print('Negative changed to zero')
elif x == 0:
    print('Zero')
elif x == 1:
    print('Single')
else:
    print('More')

#### ACTIVITY:
Read in an integer number using raw_input. Using if-else statement, check if the number is even or odd. 

If even, print out "Number is even". If odd, print out "Number is odd" (HINT: Use % operator). What if number is 0?

#### Loops
You can loop over the elements of a list like this:

In [None]:
animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print animal

#### ACTIVITY:
Uisng a for loop and range() function, print out all the odd numbers between 20-30.

#### ACTIVITY:
Count the number of vowels in the string "Hello Batch 32"

#### List comprehension
When programming, frequently we want to transform one type of data into another. 
As a simple example, consider the following code that computes square numbers:

In [None]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print squares

###### You can make this code simpler using a list comprehension:

In [2]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print squares

[0, 1, 4, 9, 16]


######  Time taken for the operation (without list comprehension)

In [3]:
%%timeit -n 1000
squares = []
for x in nums:
    squares.append(x ** 2)

1000 loops, best of 3: 1.65 µs per loop


######  Time taken for the operation (with list comprehension)

In [4]:
%%timeit -n 1000
squares = [x ** 2 for x in nums]

1000 loops, best of 3: 1.05 µs per loop


######  If you want to look at more of these 'magic' functions, refer http://ipython.readthedocs.io/en/stable/interactive/magics.html

In [5]:
Celsius = [39.2, 36.5, 37.3, 37.8]
Fahrenheit = [ ((float(9)/5)*x + 32) for x in Celsius ]
print Fahrenheit

[102.56, 97.7, 99.14, 100.03999999999999]


In [6]:
colours = [ "red", "green", "yellow", "blue" ]
things = [ "house", "car", "tree" ]
coloured_things = [ (x,y) for x in colours for y in things ]
print coloured_things

[('red', 'house'), ('red', 'car'), ('red', 'tree'), ('green', 'house'), ('green', 'car'), ('green', 'tree'), ('yellow', 'house'), ('yellow', 'car'), ('yellow', 'tree'), ('blue', 'house'), ('blue', 'car'), ('blue', 'tree')]


###### List comprehensions can also contain conditions:

In [7]:
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print even_squares

[0, 4, 16]


#### ACTIVITY: 
Create two variables: 'var1' with values 1-5 and 'var2' with values 10-20 in steps of 2 (i.e., 10,12,14,... etc). Using list comprehensions, print out result of (var1+var2) if (var1+var2) is a multiple of 3.

#### Dictionaries
A dictionary stores (key, value) pairs.

In [None]:
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data

###### Accessing the keys of a dictionary

In [None]:
d.keys()

###### Accessing the values of a dictionary

In [None]:
d.values()

###### Get value for a particular key

In [None]:
print d['cat']       # Get an entry from a dictionary; prints "cute"

###### Check if your dictionary has a particular key
Please note that by default,python searches only the keys in this case

In [None]:
print 'cat' in d     # Check if a dictionary has a given key; prints "True"

###### Set value for a particular key

In [None]:
d['fish'] = 'wet'    # Set an entry in a dictionary
print d['fish']      

###### What happens when we try to access a key that does not exist?

In [None]:
print d['monkey']

#### The ever so useful 'get' method

In [None]:
d.get('cat')

In [None]:
d.get('monkey','Key does not exist')

###### How to remove elements from a dictionary?

In [None]:
del d['fish']        # Remove an element from a dictionary
print d.get('fish', 'N/A') # "fish" is no longer a key

###### It is easy to iterate over the keys in a dictionary

In [None]:
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print 'A %s has %d legs' % (animal, legs)

###### Dictionary comprehensions: These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:

In [None]:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print even_num_to_square

In [None]:
x = range(5)
y = [1,1,0,1,0,1]

In [None]:
#%%timeit -n 1000
dict(zip(x,y))

In [None]:
#%%timeit -n 1000
{x[i]:y[i] for i in range(5)}

#### ACTIVITY:
Create a dictionary 'myDict' with the following (key, value) pairs. ('Name', "XYZ"), ('Batch', 32), ("Location", "Bangalore").
Iterate over the dictionary and print out the following (order doesn't matter):

My Name is XYZ

My Batch is 32

My Location is Bangalore

#### Tuples

A tuple is an (immutable) ordered list of values. A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries, while lists cannot. Here is a trivial example:

In [None]:
tup1 = (1,2)

In [None]:
print type(tup1)

###### Tuples are immutable
Meaning, you cannot alter the values in the tuple, if you want to make any changes you have to create a copy of the tuple.

In [None]:
tup1[1] =2

In [None]:
tup1_copy = (tup1[0],'New Value')

In [None]:
tup1_copy

In [None]:
t = (5, 6)       # Create a tuple
print type(t)

In [None]:
dt = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
print dt

###### Accessing an the value in dictionary using tuple index

In [None]:
print dt[t]

In [None]:
print dt[(1, 2)]

#### ACTIVITY:
Create a tuple named 'myTup' with the following elements: 1, 2, 'abc'. Print this tuple. Access the first two elements of the tuple. Can you add another element 'xyz' to myTup? What should we do if we want to get (1, 2, 'abc', 'xyz') usin gthe existing tuple?

#### Functions

Python functions are defined using the `def` keyword. For example:

In [None]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print sign(x)

We will often define functions to take optional keyword arguments, like this:

In [None]:
def hello(name, loud=False):
    if loud:
        print 'HELLO, %s' % name.upper()
    else:
        print 'Hello, %s!' % name

hello('Bob')
hello('Fred', loud=True)

#### ACTIVITY:
Define a function which would take a number 'n' as input and compute the sum of natural numbers till n.
Example if input number is 5, the function returns the result: 1+2+3+4+5

HINT: You can use the shortcut formula n(n+1)/2.
So for n = 5, result = 5*6/2 = 15

#### ACTIVITY:
Write a function 'F' which would take in two numbers. Based on user choice 1 == add the numbers or 2 == multiply the two numbers, perform the operation and return the appropriate results

### Lambda Functions

In [None]:
f = lambda x, y : x + y
f(1,1)

In [None]:
#This is equivalent to 
def f (x,y):
    return(x+y)

#### ACTIVITY:
Create a lambda function which would take in 3 inputs i, j and k, and computes the multiplication of i, j and k.

#### The map() Function

map() is a function with two arguments:

r = map(func, seq)

The first argument func is the name of a function and the second a sequence (e.g. a list) seq. map() applies the function func to all the elements of the sequence seq. It returns a new list with the elements changed by func

In [None]:
def fahrenheit(T):
    return ((float(9)/5)*T + 32)

def celsius(T):
    return (float(5)/9)*(T-32)

In [None]:
temp = (36.5, 37, 37.5,39)

F = map(fahrenheit, temp)

print F

C = map(celsius, F)

print C

By using lambda, we wouldn't have had to define and name the functions fahrenheit() and celsius().

In [None]:
Celsius = [39.2, 36.5, 37.3, 37.8]

Fahrenheit = map(lambda x: (float(9)/5)*x + 32, Celsius)

print Fahrenheit

C = map(lambda x: (float(5)/9)*(x-32), Fahrenheit)

print C

#### ACTIVITY:
Create a lambda function to multiply 5 to any given number. Create a list 'myList' with numbers 0-5. Map the created lambda function to myList. Print the results


### Filtering

The function filter(function, list) offers an elegant way to filter out all the elements of a list, for which the function function returns True.
The function filter(f,l) needs a function f as its first argument. f returns a Boolean value, i.e. either True or False. This function will be applied to every element of the list l. Only if f returns True will the element of the list be included in the result list. 

In [None]:
fib = [0,1,1,2,3,5,8,13,21,34,55]
result = filter(lambda x: x % 2, fib)
print result

#### ACTIVITY:
Create a list with the following elements: 'a', 'b', 'c', 'i', 'o'.
Using filter, extract out the vowels in the list

### Modules
* A module is a file containing Python definitions and statements. 
* The file name is the module name with the suffix .py appended. 
* Within a module, the module’s name (as a string) is available as the value of the global variable __name__. 

#### Create a file called fibo.py with the following contents:

#### Import fibo module with the following command

#### Import fibo module with the following command

#### Import fibo function in the fibo module with the following command

#### import all names that a module defines:
Note: This imports all names except those beginning with an underscore (_).


## Numpy

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. 

##### To use Numpy, we first need to import the numpy package:


In [None]:
import numpy as np

##### Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. 

In [None]:
a = np.array([1, 2, 3])  # Create a rank 1 array
print type(a)           

The number of dimensions is the rank of the array
The shape of an array is a tuple of integers giving the size of the array along each dimension.


In [None]:
print a.shape

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [None]:
print a[0], a[1], a[2]
a[0] = 5                 # Change an element of the array
print a

In [None]:
b = np.array([[1,2,3],[4,5,6]])   # Create a rank 2 array
print b

In [None]:
print b.shape
print b[(0, 0)], b[0, 1], b[1, 0]

##### Array indexing

Numpy offers several ways to index into arrays.

Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:


In [None]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print a

In [None]:
b = a[:2, 1:3]
print b

#### ACTIVITY:
extract the elements 1,2,5,6 from the array


In [None]:
arr = np.array([[1,   2,   3,   5  ], [10,  20,  30,  50 ],[100, 200, 300, 500] ])
print arr

In [None]:
arr1 = arr[1:, 1:3]
print arr1

###### Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [None]:
a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)  # Find the elements of a that are bigger than 2;
                    # this returns a numpy array of Booleans of the same
                    # shape as a, where each slot of bool_idx tells
                    # whether that element of a is > 2.

print bool_idx

In [None]:
# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print a[bool_idx]

# We can do all of the above in a single concise statement:
print a[a > 2]

###### Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:


In [None]:
x = np.array([1, 2])  # Let numpy choose the datatype
y = np.array([1.0, 2.0])  # Let numpy choose the datatype
z = np.array([1, 2], dtype=np.int64)  # Force a particular datatype

print x.dtype, y.dtype, z.dtype

## Pandas basics

import pandas

In [None]:
import pandas as pd

#### Object Creation
Creating a `Series` by passing a list of values, letting pandas create a default integer index:

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

##### Get current working directory and change directory

In [None]:
import os

print os.getcwd(); # Prints the current working directory

#os.chdir("C:\Users.......") # Change directory

Import Pandas 

In [None]:
import pandas as pd

Read the data in to padas dataframe

In [None]:
df = pd.read_csv('train.csv')

###### View first and last 5 rows of the dataframe

In [None]:
df.head()

In [None]:
df.tail()

###### Display indices

In [None]:
df.index

In [None]:
df.index.values

###### Printing column names

In [None]:
df.columns

###### Print the columns values in a list format

In [None]:
print df.columns.values

In [None]:
df.values

###### Displaying information for dataframes

In [None]:
df.info()

###### Dataframe summary 

In [None]:
df.describe()

In [None]:
help(pd.DataFrame.describe)

In [None]:
df.describe(include='all')

### Selection ( Also called indexing)

#### Selection by Position

###### .iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing.

In [None]:
df.iloc[3:5,0:2]

#### Selection by Label

##### .loc is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. 

In [None]:
#Showing label slicing, both endpoints are included
df.loc[1:2,['PassengerId','Survived']]

#### ACTIVITY:
From the dataframe df, extract out all rows from the columns 'Age' and  'Fare'

Ref: 
* https://docs.python.org/2.7/tutorial
* http://cs231n.github.io/python-numpy-tutorial/
* http://pandas.pydata.org/pandas-docs/stable/10min.html
* http://cs231n.github.io/python-numpy-tutorial/