**Python Session 1**

**Course Plan**

+ We will learn Python from basics and will cover advanced concepts
+ Language constructs, programming in Python, data cleaning and manipulation, data analysis, plotting, web scraping, textual analysis.
+ First session will focus on building blocks of the language
+ Completely hands-on approach

**What is Python**

+ High level, general purpose programming language
+ First introduced in 1991
+ Dynamic, interpreted language
+ Language of choice of web programming, and scripting purposes
+ Great tool for scientific computing community, explosive growth in the scientific computing and data analytics capabilities
+ Fairly large scientific community and is used in a variety of settings such as financial research, algorithms development, options and derivatives pricing, financial modeling, trading systems, network analysis, and machine learning.
+ No type declarations thus making coding short

**Why Python?**

+ Simple to understand, easy to learn
+ Object oriented structure makes programming very easy
+ Platform independent 
+ Open source and has support from a huge scientific community that contributes actively to Python development
+ A number of libraries for research tasks and the capabilities are growing by the day
+ De facto choice for textual data analytics

**Important Packages**

+ NumPy: Numerical Python (NumPy) is the foundation package. Other packages and libraries are built on top of NumPy. 
+ pandas: Provides data structures and processing capabilities similar to ones found in R and Excel. Also provides time series capabilities.
+ SciPy: Collection of packages to tackle a number of computing problems in data analytics, statistics, and linear algebra.
+ matplotlib: Plotting library. Allows to plot a number of 2D graphs and will serve as primary graphics library.
+ IPython: Interactive Python shell that allows quick prototyping of code. 
+ Statsmodels: Allows data analysis, statistical model estimation, statistical tests, and regressions, and function plotting. 
+ BeautifulSoup: Python library for crawling the web. Allows you to pull data from HTML and XML pages. 
+ Scikits: A number of packages for running simulations, machine learning, data mining, optimization, and time series models. 
+ RPy: Integrates Python with R, allows calling of R commands from Python code

## Installation

**Installing on Windows**

1. [Download the Anaconda installer](https://www.continuum.io/downloads).

2. Optional: Verify data integrity with MD5 or SHA-256. More info on hashes

3. Double click the installer to launch.

  NOTE: If you encounter any issues during installation, temporarily disable your anti-virus software during install, then re-enable it after the installation concludes. If you have installed for all users, uninstall Anaconda and re-install it for your user only and try again.

4. Click Next.

5. Read the licensing terms and click I Agree.

6. Select an install for “Just Me” unless you’re installing for all users (which requires Windows Administrator privileges).

7. Select a destination folder to install Anaconda and click Next.

  NOTE: Install Anaconda to a directory path that does not contain spaces or unicode characters.

  NOTE: Do not install as Administrator unless admin privileges are required.

8. Choose whether to add Anaconda to your PATH environment variable. We recommend not adding Anaconda to the PATH environment variable, since this can interfere with other software. Instead, use Anaconda software by opening Anaconda Navigator or the Anaconda Command Prompt from the Start Menu.

9. Choose whether to register Anaconda as your default Python 3.6. Unless you plan on installing and running multiple versions of Anaconda, or multiple versions of Python, you should accept the default and leave this box checked.

10. Click Install. You can click Show Details if you want to see all the packages Anaconda is installing.

11. Click Next.

12. After a successful installation you will see the “Thanks for installing Anaconda” image:
 <img src="Images/anaconda-install-win.png">
 
13. You can leave the boxes checked “Learn more about Anaconda Cloud” and “Learn more about Anaconda Support” if you wish to read more about this cloud package management service and Anaconda support. Click Finish.

14. After your install is complete, verify it by opening Anaconda Navigator, a program that is included with Anaconda. From your Windows Start menu, select the shortcut Anaconda Navigator. If Navigator opens, you have successfully installed Anaconda.


**Programming in Python**

Two basic modes:

+ Interactive: Quick way of writing and testing code. Interactive in nature and gives results on the fly.
+ Interactive mode is recommended when some idea/result is to be quickly tested.
+ Script: Write complete code just like in any other language. Run all or some part of the code.
+ Code can be stored in scripts and run at a later time. 
+ Python scripts end with a .py extension

**IPython notebook**

IPython is a software package in Python that serves as a research notebook. IPython has a seamless interface using which you can write notes as well as perform data analysis in the same file. The data analysis part allows you to write the code and run it from within the notebook itself. The results also are displayed in the notebook. Among other things, IPython provides:

1. An interactive shell
2. Notebook that is browser based. The notebook supports text, code, expressions, graphs and interactive data visualization, embed images and videos, and also links to web pages on the Internet. 
3. Tools for parallel computing.
4. Import external code and run it.
5. Export the notebook in a number of formats.

**Downloading and installing IPython notebook**

Ipython notebook can be installed (if it is not already installed) on using Jupyter notebook. To install IPython notebook, go to Package Manager and search IPython. A list of options will come up detailing if IPython is already installed or if there is a newer version available. Click on Install. Once IPython is installed, it can be launched from Enthough Canopy.

**Creating an IPython notebook**

In order to create an IPython notebook, choose File -> New -> IPython notebook. 

Hitting Alt-Enter creates a new cell. A cell is nothing but a placeholder where you can write some text or execute code.

In [None]:
print ("An example of code execution in IPython")
23*543

In order to find help on some concept, make use of ? If you want to find a function or make use of wildcard entry, make use of followed by ? A quickref command gives you a quick reference to the most commonly used commands in Python.

In [None]:
import collections
collections.namedtuple?

In [None]:
import numpy as np
np.*mean*?

Tab is a very handy feature in Python. It gives you a list of all the functions and commands that are associated with a package (such as NumPy). For example, if you type np. and then press tab, a dropdown will come up with the list of all the functions associate with np. By using this, you do not have to rememeber the functions specifics and can just use tab completion.

If you want to know the histroy of commands, make use of history command. This feature also allows you to extract code from a IPython notebook to a separate file. 

In [None]:
%history

**Running external code files in IPython**

If you want to run external code files in Python, make use of %run command followed by the path of the file. IPython shell will execute the code and will display the results.

In [None]:
%run "D:\fibonacci.py"

**Syntax Formalities**

+ Python is case sensitive
+ Python makes use of whitespace for code structuring and marking logical breaks in the code (In contrast to other programming languages that use braces)
+ End of line marks end of a statement, so does not require a semicolon at end of each statement
+ Whitespace at the beginning of the line is important. This is called indentation.
+ Leading whitespace (spaces and tabs) at the beginning of the logical line is used to determine the indentation level of the logical line.
+ Statements which go together must have the same indentation. Each such set of statements is called a block.

In [None]:
i = 5
a =7 
i ='text'
print ('Value is ', i) # Error! Notice a single space at the start of the li
print ('I repeat, the value is ', i)

**Comments** 

One line comments are denoted by (#) at the start of line
Multiple line comments start with ''' and end with '''

In [None]:
# this is a single line comment
'''
print("We are in a comment")
print ("We are still in a comment")
'''
print("We are out of the comment")

In [None]:
#Example of indentation
'''
for x in alist:
    if x < anumber:
            print(x)
    else:
        print(-x)
        
is similar to

for x in alist 
{if x < anumber  {
        print(x)
    }
    else
    {
        print(-x)
    }
}
'''

In [None]:
if x < 5:
    print x
print 5

In [None]:
x = 21
y =25
print x
if x<10:
    print x
    
    print y

**Variables and Data Structures**

+ Built-in data types:
Integer, Floating point, String, Boolean Values, Date and Time

+ Additional data structures:
Tuples, Lists, Dictionary

+ A variable is a name that refers to a value.
+ No need to specify type to a variable; Python automatically assigns.


In [None]:
counter = 100          # An integer assignment
miles   = 1000.0       # A floating point
name    = 'Ajay'        # A string

print counter
print miles
print name

a = 11111
b = 2.0
c = "0"
d = "2"
print(b + a)
print(c  + d)

**Strings**

+ Built-in string class named "str" with many handy features
+ In addition to numerical data processing, Python has very strong string processing capabilities. 
+ Subsets of strings can be taken using the slice operator ( [ ] and [ : ] ) with indexes starting at 0 in the beginning of the string and working their way from -1 at the end.
+ The plus ( + ) sign is the string concatenation operator and the asterisk ( * ) is the repetition operator.
+ Strings in Python are immutable. Unlike other datasets such as lists, you cannot manipulate individual string values. In order to do so, you have to take subsets of strings and form a new string. 
+ A string can be converted to a numerical type and vice versa (wherever applicable). Many a times, raw data, although numeric, is coded in string format. This feature provides a clean way to make sure all of the data is in numeric form.
+ Strings are sequence of characters and can be tokenized.
+ Strings and numbers can also be formatted.



In [None]:
str = 'Hello World' 
print str    # prints complete string
print str[0]  # prints first character of string
print str[2:5]   #prints characters starting from 3rd to 5th 
print str[2:]    #prints string starting from 3rd character
print str*2    #prints string two times
print str + "TEST"   # prints concatenated string

In [None]:
str ="hello"
print(str[-1])

a = 'this is a string'
b = a.replace('string','longer string')
print a 
print b

a = '20'
anum = int(a)
print anum + anum

a = "hi"
anum = int(a)
print anum



In [None]:
a = "hi"
print a
print len(a)


In [None]:
a + len(a)

In [None]:
a + str(len(a))

In [None]:
string = "This is python"
strlist = list(string)
print(strlist)

format = '%.2f %s is $%d'
format %(4.5560,'Argentine Pesos',1)

In [None]:
print string.count('s')
print string.split(' ')
print string.upper()
print string.lower()
print string.swapcase()

In [None]:
str1 = "    This is a bright, sunny day      "
print str1.strip()
print str1.lstrip()
print str1.rstrip()
print ":".join(str1)
print len(str1)

In [None]:
school = 'ISB'
print ('S' in school)
'L' in school

In [None]:
x='X-DSPAM-Confidence:   0.9032'
pos = x.find(':')
num=float(x[pos+1:])
print num, type(num)

In [None]:
#Let us do some exercises
#1
#Given below string, get first 10 characters, and last 10 characters.
#Now join them to form a new string
str = 'Hey how are you doing. We are doing good'


In [None]:
#Consider two strings x and y
x = 'Confusing'
y = 'Strings'
#Swap first three characters of each string and join them by an _


**Lists**

+ Lists, along with dictionary, are perhaps most important data types.
+ A list contains items separated by commas and enclosed within square brackets ([]).
+ All the items belonging to a list can be of different data type.
+ Lists are similar to arrays in C language.
+ The plus ( + ) sign is the list concatenation operator, and the asterisk ( * ) is the repetition operator.


In [None]:
list = [ 'abcd', 786 , 2.23, 'ISB', 70.2 ]
tinylist = [123, 'ISB']

print list          # Prints complete list
print list[0]       # Prints first element of the list
print list[1:3]     # Prints elements starting from 2nd till 3rd 
print list[2:]      # Prints elements starting from 3rd element
print tinylist * 2  # Prints list two times
print list + tinylist # Prints concatenated lists
print len(list)

In [None]:
alist = [ 'abcd', 786 , 2.23, 'ISB', 70.2 ]
blist = alist
alist = alist*2
print alist

In [None]:
#Check the behavior
print blist

In [None]:
blist = alist
print blist

In [None]:
list = ['Ajay', 'Vijay', 'Ramesh']
list.append('Sujay')         
list.insert(0, 'NewGuy')       
list.extend(['Guy1', 'Guy2']) 
print list  
print list.index('Guy1') 
list.remove('Guy1')
list.pop(1)
print list

In [None]:
#Sorting
numberlist = [1, 5, 23, 1 ,54,2, 54,23, 54,76, 76,34,87]
numberlist.sort()
print(numberlist)
print len(numberlist)
print max(numberlist)
print min(numberlist)

In [None]:
string = ['abcd', 'efg', 'hijk', 'lmn']
print sorted(string, key=len)

In [None]:
#You can even pass your own function
string = ['abcg', 'eff', 'hijd', 'lmi']

def func1(l):
    return l[-1]

print sorted(string, key=func1)

**Dictionary**

+ One of the most important built-in data structure.
+ Python's dictionaries are kind of hash tables.
+ They work like associative arrays and consist of key-value pairs. 
+ A dictionary key can be almost any Python type, but are usually numbers or strings. 
+ Values, on the other hand, can be any arbitrary Python object.
+ Dictionaries are enclosed by curly braces ( { } ) and values can be assigned and accessed using square braces ( [] ).


In [None]:
dict = {}
dict['one'] = "This is one"
dict[2]     = "This is two"
tinydict = {'name': 'isb','code':6734, 'dept': 'sales'}

print dict['one']       # Prints value for 'one' key
print dict[2]           # Prints value for 2 key
print tinydict          # Prints complete dictionary
print tinydict.keys()   # Prints all the keys
print tinydict.values() # Prints all the values


**Data type conversion**

+ Data from one type can be converted into another type using conversion operators.
+ Comes in handy when data is not coded in proper format (number coded as string, date coded as string)
+ int(variable) - converts variable to integer 
+ str(variable) - converts variable to string 
+ float(variable) - converts variable to float (number with decimal) 


**Operators**

+ + (plus) : Adds two objects 
+ - (minus) Gives the subtraction of one number from the other; if the first operand is absent it is assumed to be zero.
+ * (multiply) Gives the multiplication of the two numbers or returns the string repeated that many times.
+ ** (power) Returns x to the power of y
+ / (divide) Divide x by y
+ // (floor division) Returns the floor of the quotient
+ % (modulo) Returns the remainder of the division
+ < (less than) Returns whether x is less than y. All comparison operators return True or False. Note the capitalization of these names.
+ > (greater than) Returns whether x is greater than y
+ <= (less than or equal to) Returns whether x is less than or equal to y
+ >= (greater than or equal to) Returns whether x is greater than or equal to y
+ == (equal to) Compares if the objects are equal
+ != (not equal to) Compares if the objects are not equal
+ not (boolean NOT) If x is True, it returns False. If x is False, it returns True.
+ and (boolean AND) x and y returns False if x is False, else it returns evaluation of y
+ or (boolean OR) If x is True, it returns True, else it returns evaluation of y

In [None]:
a = 6
b = 7
c = 42
print 1, a == 6
print 2, a == 7
print 3, a == 6 and b == 7
print 4, a == 7 and b == 7
print 5, not a == 7 and b == 7
print 6, a == 7 or b == 7
print 7, a == 7 or b == 6
print 8, not (a == 7 and b == 6)
print 9, not a == 7 and b == 6


**Conditional Statements**

**If-statement**

The if statement is used to check a condition: if the condition is true, we run a block of statements (called the if-block), else we process another block of statements (called the else-block). The else clause is optional.

In [None]:
a = 20
if a >= 22:

   print("if")
elif a >= 21:
   print("elif")
else:
   print("else")


In [None]:
#Testing for an element in list
list = ['Ajay', 'Vijay', 'Ramesh']
if 'Vijay' in list:
    print 'Found Vijay'

**While-statement**

The while statement allows you to repeatedly execute a block of statements as long as a condition is true. A while statement is an example of what is called a looping statement. A while statement can have an optional else clause.

In [None]:
count = 0
while (count < 9):
   print 'The count is:', count
   count = count + 1

print "End of while loop!"


**For-statement**

The for..in statement is another looping statement which iterates over a sequence of objects i.e. go through each item in a sequence. A sequence is just an ordered collection of items.

In [None]:
for i in range(1,5,2):
    print (i)
#else:
#   print "The for loop is over"

In [None]:
#Traversing a list
#What is the output
numbers = [2, 3, 5]
sum = 0
for num in numbers:
    sum += num
print sum

In [None]:
numbers = [2, 3, 5]
getsum = [ i+2 for i in numbers ]
print getsum

In [None]:
numbers = [2, 3, 5]
getnum = [ i+2 for i in numbers if i<5]
print getnum

**Break statement**

The break statement is used to break out of a loop statement i.e. stop the execution of a looping statement, even if the loop condition has not become False or the sequence of items has not been completely iterated over.

An important note is that if you break out of a for or while loop, any corresponding loop else block is not executed.

In [None]:
for i in range(1,10):
       if i == 5:
           break
       print i
print('Done')

**Functions**

+ Functions are reusable piece of software.
+ Block of statements that accepts some arguments, perform some functionality, and provide the output.
+ Defined using def keyword
+ Similar to functions in R.
+ For example, implement code to perform two way clustering once and can be used again in the same program. 
+ We will look at functions for a number of features (Fama Mac Beth regression, two way clustering, industry code classification) in subsequent sessions. 
+ A function can take arguments.
+ Arguments are specified within parentheses in function definition separated by commas.
+ It is also possible to assign default values to parameters in order to make the program flexible and not behave in an unexpected manner.
+ One of the most powerful feature of functions is that it allows you to pass any number of arguments and you do not have to worry about specifying the number when writing the function. This feature becomes extremely important when dealing with lists or input data where you do not know number of data observations before hand.
+ Scope of variables defined inside a function is local i.e. they cannot be used outside of a function.


In [None]:
def sayHello():
    print('Hello World!') # block belonging to the function
# End of function #

sayHello()

In [None]:
def printMax(a, b):
   if a > b:
       print(a, 'is maximum')
   elif a == b:
       print(a, 'is equal to', b)
   else:
       print(b, 'is maximum')

printMax(3, 4) 

In [None]:
def say(message, times = 1):
   print(message * times)

say('Hello')
say('World', 5)

In [None]:
def func(a, b=5, c=10):
   print('a is', a, 'and b is', b, 'and c is', c)
func(3, 7)
func(25, c=24)
func(c=50, a=100)

In [None]:
x = 50
def func(x):
   print('x is', x)
   x = 2
   print('Changed local x to', x)
func(x)
print('x is still', x)

In [None]:
def total(initial=5, *numbers, **keywords):
   count = initial
   for number in numbers:
       count += number
   for key in keywords:
       count += keywords[key]
   return count

print(total(10, 1, 2, 3, vegetables=50, fruits=100))

**Modules**

+ Functions can be used in the same program. 
+ If you want to use function (s) in other programs, make use of modules.
+ Modules can be imported in other programs and functions contained in those modules can be used.
+ Simplest way to create a module is to write a .py file with functions defined in that file.
+ Other way is to import using byte-compiled .pyc files. 

In [None]:
import os
print os.getcwd()
import math
x = -25
print(math.fabs(x))
print(math.factorial(abs(x)))

In [None]:
from mymodule import *
print(sayhi())


numb = input("Enter a non negative number")
num_factorial = factorial(numb)
print(num_factorial)


**Writing first program**

+ Source files .py extension, and can be run from command prompt by using python filename.py (optional arguments) command. 
+ Examine name.py. 
+ First line imports modules
+ Next line defines a main() function. We can specify command line arguments. Command line args are in sys.argv[1], sys.argv[2]
+ sys.argv[0] is the script name itself and can be ignored
+ name = 'main' starts the program. 
+ When a Python file is run directly, the special variable "__name__" is set to "__main__". Therefore, it's common to have the boilerplate if __name__ ==... shown above to call a main() function when the module is run directly, but not when the module is imported by some other module.

In [None]:
#Run name.py

** Exception Handling**

**Exceptions**

+ An exception is an event that interrupts the ordinary sequential processing of a program.
+ For example, what if you are going to read a file and the file does not exist? Or what if you accidentally deleted it when the program was running? Similarly, what if your program had some invalid statements? Such situations are handled using exceptions.

**Handling Exceptions**

+ We can handle exceptions using the try..except statement. 
+ We basically put our usual statements within the try-block and put all our error handlers in the except-block.

In [None]:
def avg( numList ):
    """Raises TypeError or ZeroDivisionError exceptions."""
    sum= 0
    for v in numList:
        sum = sum + v
    return float(sum)/len(numList)

def avgReport(numList):
     try:
         m= avg(numList)
         print "Average = ", m
     except TypeError, ex:
         print "TypeError:", ex
     except ZeroDivisionError, ex:
         print "ZeroDivisionError:", ex

                    
list1 = [10,20,30,40]
list2 = []
list3 = [10,20,30,'abc']

avgReport(list1)
print avgReport(list2)
print avgReport(list3)

**Try..Finally**

+ Suppose you are reading a file in your program. How do you ensure that the file object is closed properly whether or not an exception was raised? This can be done using the finally block.
+ This final step will be performed before the try block is finished, either normally or by any exception.
+ The finally clause is always executed. 
+ This includes all three possible cases: if the try block finishes with no exceptions; if an exception is raised and handled; and if an exception is raised but not handled. 
+ This last case means that every nested try statement with a finally clause will have that finally clause executed.
+ Use a finally clause to close files, release locks, close database connections, write final log messages, and other kinds of final operations. 
+ In the following example, we use the finally clause to write a final log message.


In [None]:
def avgReport( numList ):
    try:
        print "Start avgReport"
        m= avg(numList)
        print "Average = ", m
    except TypeError, ex:
        print "TypeError: ", ex
    except ZeroDivisionError, ex:
        print "ZeroDivisionError: ", ex
    finally:
         print "Finish avgReport"

list1 = [10,20,30,40]
list2 = []
list3 = [10,20,30,'abc']

avgReport(list2)

**pandas**

pandas is the primary package for performing data analysis tasks in Python. pandas derives its name from panel data analysis and is the fundamental package that provides relational data structures (think Excel, SQL type) and a host of capabilities to play with those data structures. It is the most widely used package in Python for data analysis tasks, and is very good to work with cross sectional, time series, and panel data analysis. Python sits on top of NumPy and can be used with NumPy arrays and the functions in NumPy. How is pandas suited for a researcher’s needs:

+ Has a tabular data structure that can hold both homogenous and heterogenous data.
+ Very good indexing capabilities that makes data alignment and merging easy.
+ Good time series functionality. No need to use different data structures for time series and cross sectional data. Allows for both ordered and unordered time-series data.
+ A host of statistical functions developed around NumPy and pandas that makes a researcher’s task easy and fast.
+ Programming is lot simpler and faster.
+ Easily handles data manipulation and cleaning.
+ Easy to expand and shorten data sets. Comprehensive merging, joins, and group by functionality to join multiple data sets.

**Installing pandas** 

In order to check if pandas is installed, go to Package Manager and type pandas. By default, pandas already comes installed with a distribution of Canopy. If the package is not installed, click on Install.

**Importing pandas**

In order to be able to use NumPy, first import it using import statement


In [None]:
import pandas as pd #this will import pandas into your workspace

In [None]:
import numpy as np  #we will be using numpy functions so import numpy

**Data Structures in pandas**

There are two basic data structures in pandas: Series and DataFrame

**Series:** It is similar to a NumPy 1-dimensional array. In addition to the values that are specified by the programmer, pandas attaches a label to each of the values. If the labels are not provided by the programmer, then pandas assigns labels ( 0 for first element, 1 for second element and so on). A benefit of assigning labels to data values is that it becomes easier to perform manipulations on the dataset as the whole dataset becomes more of a dictionary where each value is associated with a label. 


In [None]:
series1 = pd.Series([10,20,30,40])
series1

In [None]:
series1.values

In [None]:
series1.index

If you want to specify custom index values rather than the default ones provided, you can do so using the following command

In [None]:
series2 = pd.Series([10,20,30,40,50], index=['one','two','three','four','five'])
series2

The ways of accesing elements in a Series object are similar to what we have seen in NumPy, and you can perform NumPy operations on Series data arrays.

In [None]:
series2[2]

In [None]:
series2['three']

In [None]:
series2[['one', 'three', 'five']]

In [None]:
series2[[0,1,3]]

In [None]:
series2 + 4

In [None]:
series2 ** 3

In [None]:
series2[series2>30]

In [None]:
np.sqrt(series2)

If you have a dictionary, you can create a Series data structure from that dictionary. Suppose you are interested in EPS values for firms and the values come from different sources and is not clean. In that case you dont have to worry about cleaning and aligning those values.

In [None]:
years = [90, 91, 92, 93, 94, 95]
f1 = {90:8, 91:9, 92:7, 93:8, 94:9, 95:11}
firm1 = pd.Series(f1,index=years)
firm1

In [None]:
f2 = {90:14,92:9, 93:13, 94:5}
firm2 = pd.Series(f2,index=years)
firm2

In [None]:
f3 = {93:10, 94:12, 95: 13}
firm3 = pd.Series(f3,index=years)
firm3

NaN stands for missing or NA values in pandas. Make use of isnull() function to find out if there are any missing values in the data structure.

In [None]:
pd.isnull(firm3)

A key feature of Series data is structures is that you don't have to worry about data alignment. For example, if we have run a word count program on two different files and we have the following data structures

In [None]:
dict1 = {'finance': 10, 'earning': 5, 'debt':8}
dict2 = {'finance' : 8, 'compensation':4, 'earning': 9}
count1 = pd.Series(dict1)
count2 = pd.Series(dict2)
print count1
count2

If we want to calculate the sum of common words in combined files, then we dont have to worry about data alignment. If we want to include all words, then we can take care of NaN values and compute the sum. By default, Series data structure ignores NaN values. NaN values stand for missing data values.

In [None]:
count1+count2

**Data Frame**

DataFrame is a tabular data structure in which data is laid out in rows and column format (similar to a CSV and SQL file), but it can also be used for higher dimensional data sets. The DataFrame object can contain homogenous and heterogenous values, and can be thought of as a logical extension of Series data structures. In contrast to Series, where there is one index, a DataFrame object has one index for column and one index for rows. This allows flexibility in accessing and manipulating data.

In [None]:
data = pd.DataFrame({'price':[95, 25, 85, 41, 78],
                     'ticker':['AXP', 'CSCO', 'DIS', 'MSFT', 'WMT'],
                     'company':['American Express', 'Cisco', 'Walt Disney','Microsoft', 'Walmart']})
data

If a column is passed with no values, it will simply have NaN values

In order to access a column, simply mention the column name

In [None]:
data['company']

In [None]:
data.company

In [None]:
data.ix[2]

In [None]:
data.ix[data.ticker=='DIS']

In order to add additional columns

In [None]:
data['Year'] = 2014
data

In [None]:
data['pricesquared'] = data.price**2
data

In [None]:
del data['pricesquared']
data

In [None]:
data['pricesquared'] = NaN
data

In [None]:
data['sequence'] = arange(1,6)
data

In [None]:
data.values

In [None]:
newdata = data.drop(2)

In [None]:
newdata

In [None]:
years = [90, 91, 92, 93, 94, 95]
f1 = {90:8, 91:9, 92:7, 93:8, 94:9, 95:11}
firm1 = pd.Series(f1,index=years)
firm1
f2 = {90:14,92:9, 93:13, 94:5}
firm2 = pd.Series(f2,index=years)
firm2
f3 = {93:10, 94:12, 95: 13}
firm3 = pd.Series(f3,index=years)
firm3
df1 = pd.DataFrame(columns=['Firm1','Firm2','Firm3'],index=years)
df1
df1.Firm1 = firm1
df1.Firm2 = firm2
df1.Firm3 = firm3
df1

In [None]:
dft = df1.T
dft
del dft[90]
dft


You can pass a number of data structures to DataFrame such as a ndarray, lists, dict, Series, and another DataFrame. You can also reindex to confirm to data to a new index. Reindexing is a powerful feature that allows you to access data in a number of different ways, and also to confirm data to some new time series or other index.

In [None]:
reindexdf1 = df1.reindex([88,89,90,91,92,93,94,95,96,97,98])
reindexdf1

In [None]:
reindexdf1 = df1.reindex(arange(1988,2008))
reindexdf1

In [None]:
years1 = [90, 91, 92, 93, 94, 95]
f4 = {90:8, 91:9, 92:7, 93:8, 94:9, 95:11}
firm4 = pd.Series(f4,index=years)
f5 = {90:14,91:12, 92:9, 93:13, 94:5, 95:8}
firm5 = pd.Series(f5,index=years)
f6 = {90:8, 91: 9, 92:9,93:10, 94:12, 95: 13}
firm6 = pd.Series(f6,index=years)
df2 = pd.DataFrame(columns=['Firm1','Firm2','Firm3'],index=years1)
df2.Firm1 = firm4
df2.Firm2 = firm5
df2.Firm3 = firm6
df2

In [None]:
reindexdf2 = df2.reindex([88,89,90,91,92,93,94,95,96,97,98], fill_value=0)
reindexdf2

Similarly, you have backfill (bfill) method to fill values backwards.

In [None]:
df2

In [None]:
reindexdf3 = df2.reindex([88,89,90,91,92,93,94,95,96,97,98], method='ffill')
reindexdf3

In [None]:
reindexdf1

In [None]:
reindexdf3

In [None]:
reindexdf1+reindexdf3

In [None]:
reindexdf1.add(reindexdf3, fill_value=0)

**Exercise to Solve**

1). Write code to print first and last element from the given (or in general any) list:

2).  Write code to print a dictionary where the keys are numbers between 1 and 15 (both included) and the values are square of keys.
#Sample Dictionary {1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100, 11: 121, 12: 144, 13: 169, 14: 196, 15: 225}

3). In cryptography, a Caesar cipher is a very simple encryption techniques in which each letter in the plain text is replaced by a letter some fixed number of positions down the alphabet. For example, with a shift of 3, A would be replaced by D, B would become E, and so on. The method is named after Julius Caesar, who used it to communicate with his generals. ROT-13 ("rotate by 13 places") is a widely used example of a Caesar cipher where the shift is 13. In Python, the key for ROT-13 may be represented by means of the following dictionary:
key = {'a':'n', 'b':'o', 'c':'p', 'd':'q', 'e':'r', 'f':'s', 'g':'t', 'h':'u', 
       'i':'v', 'j':'w', 'k':'x', 'l':'y', 'm':'z', 'n':'a', 'o':'b', 'p':'c', 
       'q':'d', 'r':'e', 's':'f', 't':'g', 'u':'h', 'v':'i', 'w':'j', 'x':'k',
       'y':'l', 'z':'m', 'A':'N', 'B':'O', 'C':'P', 'D':'Q', 'E':'R', 'F':'S', 
       'G':'T', 'H':'U', 'I':'V', 'J':'W', 'K':'X', 'L':'Y', 'M':'Z', 'N':'A', 
       'O':'B', 'P':'C', 'Q':'D', 'R':'E', 'S':'F', 'T':'G', 'U':'H', 'V':'I', 
       'W':'J', 'X':'K', 'Y':'L', 'Z':'M'}
Your task in this exercise is to implement an encoder/decoder of ROT-13. Once you're done, you will be able to read the following secret message:

   Pnrfne pvcure zrgubq vf anzrq nsgre Whyvhf Pnrfne!

Note that since English has 26 characters, your ROT-13 program will be able to both encode and decode texts written in English.
