<div align="center">
<img style="display: block; margin: auto;" alt="photo" src="https://cdn.quantconnect.com/web/i/icon.png">

Quantconnect

Introduction to Financial Python
</div>

# 01 Data Types and Data Structures

# Introduction

This tutorial provides a basic introduction to the Python programming language. If you are new to Python, you should run the code snippets while reading this tutorial. If you are an advanced Python user, please feel free to skip this chapter.

# Basic Variable Types
The basic types of variables in Python are: strings, integers, floating point numbers and booleans.

Strings in python are identified as a contiguous set of characters represented in either single quotes (' ') or double quotes (" ").


In [9]:
from random import random

my_string1 = 'Welcome to'
my_string2 = "QuantConnect"
print(my_string1 + ' ' + my_string2) # Sum of two strings

# Edit:

print(
    ( "Juanito" if( random() < 0.5) else "Maria" ) + " " +
    ( "comió" if( random() < 0.5) else "bebió" ) + " en " +
    ( "la fiesta" if( random() < 0.5) else "la reunión" )
) # Concatenate multiple strings based in a condiction

Welcome to QuantConnect
Juanito bebió en la reunión


An integer is a round number with no values after the decimal point.

In [10]:
my_int = 10
print(my_int)
print(type(my_int)) # Int class

10
<class 'int'>


The built-in function int() can convert a string into an integer.

In [11]:
my_string = "100"
print(type(my_string))
my_int = int(my_string)
print(type(my_int))

# Edit:
str1 = "100"
str2 = "10"
print( int( str1 + str2 ))
print( int(str1) + int(str2))
# This operation doesn´t represent a distributive property

<class 'str'>
<class 'int'>
10010
110


A floating point number, or a float, is a real number in mathematics. In Python we need to include a value after a decimal point to define it as a float

In [None]:
my_string = "100"
my_float = float(my_string) # Float to string
print(type(my_float))
flo=3.54
print(type(flo)) # Float class

<class 'float'>
<class 'float'>


As you can see above, if we don't include a decimal value, the variable would be defined as an integer. The built-in function float() can convert a string or an integer into a float.

In [None]:
my_bool = False
print(my_bool)
print(type(my_bool)) # Bool class

False
<class 'bool'>


A boolean, or bool, is a binary variable. Its value can only be True or False. It is useful when we do some logic operations, which would be covered in our next chapter.

# Basic Math Operations

The basic math operators in python are demonstrated below:

In [12]:
print("Addition ", 1+1) # Sum
print("Subtraction ", 5-2) # Substraction
print("Multiplication ", 2*3) # Multiplication
print("Division ", 10/2) # Division
print('Exponent', 2**3) # Exponentation

# Edit:

print("Integer division", 14//3 ) # Integer result of division
print("Remainder of division", 14 % 3) # Remainder result of division

Addition  2
Subtraction  3
Multiplication  6
Division  5.0
Exponent 8
Integer division 4
Remainder of division 2


In [None]:
print(1/3)
print(1.0/3) # Without "//" result will always be a float result

0.3333333333333333
0.3333333333333333


# Data Collections

## List
A list is an ordered collection of values. A list is mutable, which means you can change a list's value without changing the list itself. Creating a list is simply putting different comma-separated values between square brackets.

In [15]:
my_list = ['Quant', 'Connect'] #, 1,2,3]
print(my_list)

# Edit:

my_list = [True, 1, "True"] # different types
for i in my_list:
  if bool(i): # But all can be interpret as the same value (logical 1)
    print(i)

['Quant', 'Connect']
True
1
True


The values in a list are called "elements". We can access list elements by indexing. Python index starts from 0. So if you have a list of length n, the index of the first element will be 0, and that of the last element will be n − 1. By the way, the length of a list can be obtained by the built-in function len().

In [14]:
my_list = ['Quant', 'Connect', 1,2,3] # List is indexed from 0 to its length - 1
print(len(my_list))
print(my_list[0])
print(my_list[len(my_list) -1])

5
Quant
3


You can also change the elements in the list by accessing an index and assigning a new value.

In [17]:
my_list = ['Quant','Connect',1,2,3]
my_list[2] = 'go'

# Edit:
my_list[3] = my_list[4]
my_list[4] = my_list[0]

print(my_list)

['Quant', 'Connect', 'go', 3, 'Quant']


A list can also be sliced with a colon:

In [20]:
my_list = ['Quant','Connect',1,2,3]
print(my_list[1:3])

# Edit:
# Printing even-indexed values in a list
print( my_list[ 0 : len( my_list ) : 2 ])

['Connect', 1]
['Quant', 1, 3]


The slice starts from the first element indicated, but excludes the last element indicated. Here we select all elements starting from index 1, which refers to the second element:

In [21]:
print(my_list[1:])

# Edit:
# Get the whole list (useful when making a copy of a list
# without referencing its object address

print( my_list[:])

['Connect', 1, 2, 3]
['Quant', 'Connect', 1, 2, 3]


And all elements up to but excluding index 3:

In [22]:
print(my_list[:3])

# Edit:
# Reversed values excluding index 3
print( my_list[:3:-1])

['Quant', 'Connect', 1]
[3]


If you wish to add or remove an element from a list, you can use the append() and remove() methods for lists as follows:

In [28]:
my_list = ['Hello', 'Quant']
my_list.append('Hello') # Always inserts at the end of a list

# Edit:
my_list.append('Of')
my_list.append('Sea')

print(my_list)



['Hello', 'Quant', 'Hello', 'Of', 'Sea']


In [29]:
my_list.remove('Hello') # Removes one element at a time
# Running two times will result in two "Hello" removed
# Running a third time will result in a error, since "Hello" is not present in 
# my_list anymore
print(my_list)

['Quant', 'Hello', 'Of', 'Sea']


When there are repeated instances of "Hello", the first one is removed.

## Tuple
A tuple is a data structure type similar to a list. The difference is that a tuple is immutable, which means you can't change the elements in it once it's defined. We create a tuple by putting comma-separated values between parentheses.

In [34]:
my_tuple = ('Welcome','to','QuantConnect') 
# Useful when returning an array which is not supossed to be edited, only read

Just like a list, a tuple can be sliced by using index.

In [35]:
my_tuple = ('Welcome','to','QuantConnect')
# Same reading operations than a list
print(my_tuple[1:])

# Edit:
# Even-indexed values
print( my_tuple[ 0: len(my_tuple) : 2])

('to', 'QuantConnect')
('Welcome', 'QuantConnect')


## Set
A set is an **unordered**  collection with **no duplicate** elements. The built-in function **set()** can be used to create sets.

In [36]:
stock_list = ['AAPL','GOOG','IBM','AAPL','IBM','FB','F','GOOG']
# All duplicaed values will be ignored while defining a set
stock_set = set(stock_list)

# Edit:

stock_set.add( "AAPL" ) # Duplicated values won't be added
# Noted doing this doesn't throw an error

print(stock_set)

{'IBM', 'F', 'GOOG', 'FB', 'AAPL'}


Set is an easy way to remove duplicate elements from a list.

##Dictionary
A dictionary is one of the most important data structures in Python. Unlike sequences which are indexed by integers, dictionaries are indexed by keys which can be either strings or floats.

A dictionary is an **unordered** collection of key : value pairs, with the requirement that the keys are unique. We create a dictionary by placing a comma-separated list of key : value pairs within the braces.

In [39]:
# Keys can also be any type
my_dic = {'AAPL':'AAPLE', 'FB':'FaceBook', 'GOOG':'Alphabet', 1: "TEST"}

In [37]:
print(my_dic['GOOG'])

# Edit:
print( my_dic[1])

Alphabet
TEST


After defining a dictionary, we can access any value by indicating its key in brackets.

In [41]:
my_dic['GOOG'] = 'Alphabet Company'
print(my_dic['GOOG'])

# Edit
my_dic[1] = "TEST2"
print( my_dic[1])

Alphabet Company
TEST2


We can also change the value associated with a specified key:

In [43]:
print(list(my_dic.keys()))

# Edit:
# Also works for an array of values

print(list(my_dic.values()))

['AAPL', 'FB', 'GOOG', 1]
['AAPLE', 'FaceBook', 'Alphabet Company', 'TEST2']


The built-in method of the dictionary object dict.keys() returns a list of all the keys used in the dictionary.

# Common String Operations
A string is an immutable sequence of characters. It can be sliced by index just like a tuple:

In [45]:
my_str = 'Welcome to QuantConnect'

print(my_str[8:])

# Edit:
# Even-index characters of string

print( my_str[0: len(my_str): 2])

to QuantConnect
Wloet unCnet


There are many methods associated with strings. We can use string.count() to count the occurrences of a character in a string, use string.find() to return the index of a specific character, and use string.replace() to replace characters

In [46]:
print('Counting the number of e appears in this sentence'.count('e')) # Count number of es
print('The first time e appears in this sentence'.find('e')) # Find the first ocurrence of "e" character and returns its index
print('all the a in this sentence now becomes e'.replace('a','e')) # Replaces all "a" characters with "e"

7
2
ell the e in this sentence now becomes e


The most commonly used method for strings is string.split(). This method will split the string by the indicated character and return a list:

In [48]:
Time = '2020-08-19 11:39:00'
splited_list = Time.split(' ') # Split date and time and places them in an array
date = splited_list[0] # the first block is before " " and it is the date data
time = splited_list[1] # the second block is after " " and it is the time data
print(date, time) 
hour = time.split(':')[0] # return an array: [ hour, minute, second ]
print(hour)

2020-08-19 11:39:00
11


We can replace parts of a string by our variable. This is called string formatting.

In [55]:
my_time = 'Hour: {}, Minute:{}'.format('09','43') # Every {} will be replaced
print(my_time)

# Edit:
# Replacing when "{","}" characters must be present in string

my_str = "Val1 : {{{}}}".format("test")

print( my_str )

Hour: 09, Minute:43
Val1 : {test}


Another way to format a string is to use the % symbol.

In [67]:
print('the pi number is %f'%3.14)
print('%s to %s'%('Welcome','Quantconnect'))

# Edit:
# Escape % character when needed
print('Single %%s , %s was replaced' %("test"))

the pi number is 3.140000
Welcome to Quantconnect
Single %s , test was replaced


# Summary

Weave seen the basic data types and data structures in Python. It's important to keep practicing to become familiar with these data structures. In the next tutorial, we will cover for and while loops and logical operations in Python.

<div align="center">
<img style="display: block; margin: auto;" alt="photo" src="https://cdn.quantconnect.com/web/i/icon.png">

Quantconnect

Introduction to Financial Python
</div>

# 02 Logical Operations and Loops

# Introduction
We discussed the basic data types and data structures in Python in the last tutorial. This chapter covers logical operations and loops in Python, which are very common in programming.

# Logical Operations
Like most programming languages, Python has comparison operators:

In [69]:
print(1 == 0) # Equal than
print(1 == 1)
print(1 != 0) # Not equal than
print(5 >= 5) # Greater or equal than
print(5 >= 6)

# Edit:

print(5 <= 5) # Less or equal than
print(5 <= 4) 

False
True
True
True
False
True
False


Each statement above has a boolean value, which must be either True or False, but not both.

We can combine simple statements P and Q to form complex statements using logical operators:

- The statement "P and Q" is true if both P and Q are true, otherwise it is false.
- The statement "P or Q" is false if both P and Q are false, otherwise it is true.
- The statement "not P" is true if P is false, and vice versa.

In [71]:
print(2 > 1 and 3 > 2) # && operator
print(2 > 1 and 3 < 2) 
print(2 > 1 or 3 < 2) # || operator
print(2 < 1 and 3 < 2)

True
False
True
False


When dealing with a very complex logical statement that involves in several statements, we can use brackets to separate and combine them.

In [73]:
print(
    (3 > 2 or 1 < 3) and 
    (1!=3 and 4>3) and 
    not ( 
        3 < 2 or 1 < 3 and
        ( 1!=3 and 4>3 ) 
      )
    ) # False ( T and T and not T => T & T & F => F)

# And has more priority than or by default
print(
    3 > 2 or 1 < 3 and 
    (
        1!=3 and 4>3
     ) and 
      not (
          3 < 2 or 1 < 3 and 
           (
               1!=3 and 
               4>3
            )
           )
      ) # True ( T or T and T and not ( T or T and T) => T or T and T and F => T)

False
True


Comparing the above two statements, we can see that it's wise to use brackets when we make a complex logical statement.

# If Statement
An if statement executes a segment of code only if its condition is true. A standard if statement consists of 3 segments: if, elif and else.

```python
if statement1:
    # if the statement1 is true, execute the code here.
    # code.....
    # code.....
elif statement2:
    # if the statement 1 is false, skip the codes above to this part.
    # code......
    # code......
else:
    # if none of the above statements is True, skip to this part
    # code......
```

An if statement doesn't necessarily has elif and else part. If it's not specified, the indented block of code will be executed when the condition is true, otherwise the whole if statement will be skipped.

In [74]:
i = 0
if i == 0:
    print('i==0 is True')

i==0 is True


As we mentioned above, we can write some complex statements here:

In [76]:
p = 1 > 0 # True
q = 2 > 3 # False
if p and q: # T and F => F
    print('p and q is true')
elif p and not q: # T and T => T
    print('q is false')
elif q and not p: # F and F => F
    print('p is false')
else: # F and T
    print('None of p and q is true')

q is false


# Loop Structure
Loops are an essential part of programming. The "for" and "while" loops run a block of code repeatedly.

## While Loop
A "while" loop will run repeatedly until a certain condition has been met.

In [78]:
i = 0
while i < 5: # Five iterations, from 0 to 4
    print(i)
    i += 1 

0
1
2
3
4


When making a while loop, we need to ensure that something changes from iteration to iteration so that the while loop will terminate, otherwise, it will run forever. Here we used i += 1 (short for i = i + 1) to make i larger after each iteration. This is the most commonly used method to control a while loop.

## For Loop
A "for" loop will iterate over a sequence of value and terminate when the sequence has ended.

In [77]:
for i in [1,2,3,4,5]: # For each element in list
    print(i)

1
2
3
4
5


We can also add if statements in a for loop. Here is a real example from our pairs trading algorithm:

In [79]:
stocks = ['AAPL','GOOG','IBM','FB','F','V', 'G', 'GE']
selected = ['AAPL','IBM']
new_list = []
for i in stocks:
    if i not in selected:
        # Adds an element to new_list only if its in both stocks and selected
        new_list.append(i)
print(stocks)

['AAPL', 'GOOG', 'IBM', 'FB', 'F', 'V', 'G', 'GE']


Here we iterated all the elements in the list 'stocks'. Later in this chapter, we will introduce a smarter way to do this, which is just a one-line code.

## Break and continue
These are two commonly used commands in a for loop. If "break" is triggered while a loop is executing, the loop will terminate immediately:

In [80]:
stocks = ['AAPL','GOOG','IBM','FB','F','V', 'G', 'GE']
for i in stocks:
    print(i)
    if i == 'FB':
        # Exit for loop as soon it reaches the 4th element (and after print it)
        break

AAPL
GOOG
IBM
FB


The "continue" command tells the loop to end this iteration and skip to the next iteration:

In [None]:
stocks = ['AAPL','GOOG','IBM','FB','F','V', 'G', 'GE']
for i in stocks:
    if i == 'FB':
      # Skips the 4th element in list (FB)
        continue
    print(i)

AAPL
GOOG
IBM
F
V
G
GE


# List Comprehension
List comprehension is a Pythonic way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence. For example, if we want to create a list of squares using for loop:

In [81]:
squares = []

# Creates a list with the squares of the first 5 natural non-negative numbers
for i in [1,2,3,4,5]:
    squares.append(i**2)
print(squares)

[1, 4, 9, 16, 25]


Using list comprehension:

In [94]:
list = [1,2,3,4,5]

# for each element in list, calculates its square and appends the result to squares
squares = [x**2 for x in list] 
print(squares)

[1, 4, 9, 16, 25]


Recall the example above where we used a for loop to select stocks. Here we use list comprehension:

In [84]:
stocks = ['AAPL','GOOG','IBM','FB','F','V', 'G', 'GE']
selected = ['AAPL','IBM']
# Creates a new list (by comprehension) with the elements shared in both stocks and selected lists
new_list = [x for x in stocks if x in selected]
print(new_list)

# Edit:
# List of even numbers such that 0 <= n <= 9
new_list = [ x for x in range(10) if x % 2 == 0]
print( new_list )

['AAPL', 'IBM']
[0, 2, 4, 6, 8]


A list comprehension consists of square brackets containing an expression followed by a "for" clause, and possibly "for" or "if" clauses. For example:

In [86]:
# list of 2-tuples such that every ordered pair is unique and each tuple's elements are different between them
print([(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]) 

# Same example than the one above, but using strings
print([str(x)+' vs '+str(y) for x in ['AAPL','GOOG','IBM','FB'] for y in ['F','V','G','GE'] if x!=y])

# Edit:

# Returns the elements in the diagonal of a matrix

A = [[1,2,3], [4,5,6], [7,8,9]]
print(
    [ A[x][y] for x in range( len(A) ) for y in range( len(A)) if x==y ]
)

[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
['AAPL vs F', 'AAPL vs V', 'AAPL vs G', 'AAPL vs GE', 'GOOG vs F', 'GOOG vs V', 'GOOG vs G', 'GOOG vs GE', 'IBM vs F', 'IBM vs V', 'IBM vs G', 'IBM vs GE', 'FB vs F', 'FB vs V', 'FB vs G', 'FB vs GE']
[1, 5, 9]


List comprehension is an elegant way to organize one or more for loops when creating a list.

# Summary
This chapter has introduced logical operations, loops, and list comprehension. In the next chapter, we will introduce functions and object-oriented programming, which will enable us to make our codes clean and versatile.

<div align="center">
<img style="display: block; margin: auto;" alt="photo" src="https://cdn.quantconnect.com/web/i/icon.png">

Quantconnect

Introduction to Financial Python
</div>

# 03 Functions and Objective-Oriented Programming

# Introduction

In the last tutorial we introduced logical operations, loops and list comprehension. We will introduce functions and object-oriented programming in this chapter, which will enable us to build complex algorithms in more flexible ways.

# Functions
A function is a reusable block of code. We can use a function to output a value, or do anything else we want. We can easily define our own function by using the keyword "def".

In [87]:
def product(x,y):
    # Returns the product between two numbers
    return x*y
print(product(2,3))
print(product(5,10))

# Edit:

print( product( product(3,4), product( 5,6)))

6
50
360


The keyword "def" is followed by the function name and the parenthesized list of formal parameters. The statements that form the body of the function start at the next line, and must be indented. The product() function above has "x" and "y" as its parameters. A function doesn't necessarily have parameters:

In [88]:
def say_hi():
    # Prints a predefined message
    print('Welcome to QuantConnect')
say_hi()

Welcome to QuantConnect


# Built-in Function
**range()** is a function that creates a list containing an arithmetic sequence. It's often used in for loops. The arguments must be integers. If the "step" argument is omitted, it defaults to 1.

In [90]:
# range creates an iterator useful for "for" loops
print(range(10)) # [0, 1, .... , 9]
print(range(1,11)) # [ 1, 2, .... , 10]
print(range(1,11,2)) # [1, 3, 5, ... , 9]

range(0, 10)
range(1, 11)
range(1, 11, 2)


**len()** is another function used together with range() to create a for loop. This function returns the length of an object. The argument must be a sequence or a collection.

In [91]:
tickers = ['AAPL','GOOG','IBM','FB','F','V', 'G', 'GE']
print('The length of tickers is {}'.format(len(tickers)))
for i in range(len(tickers)):
    # Prints all elements in tickers, but accesing them by index
    print(tickers[i])

The length of tickers is 8
AAPL
GOOG
IBM
FB
F
V
G
GE


Note: If you want to print only the tickers without those numbers, then simply write "for ticker in tickers: print ticker"

**map(**) is a function that applies a specific function to every item of a sequence or collection, and returns a list of the results.

Because list at the moment is [1,2,3,4,5] and overwriting list() from builtins we del list

In [95]:
print(list)
del list
list # List is a class type, will return to its default definition

[1, 2, 3, 4, 5]


list

In [97]:
tickers = ['AAPL','GOOG','IBM','FB','F','V', 'G', 'GE']

# Map iterates through the list and executes a function (which has to return)
# over all the original list elements

# In this case, this takes a function (len) and evaluates for every element
# Finally it return an array composed by the length of each string in tickers
list(map(len,tickers))

[4, 4, 3, 2, 1, 1, 1, 2]

In [99]:
tickers = ['AAPL','GOOG','IBM','FB','F','V', 'G', 'GE']
print(list(map(len,tickers)))

# Edit
# Adds a postfix to each string in tickers

def change( item ):
  return item + "_TEST"

print( list( map( change, tickers )))

[4, 4, 3, 2, 1, 1, 1, 2]
['AAPL_TEST', 'GOOG_TEST', 'IBM_TEST', 'FB_TEST', 'F_TEST', 'V_TEST', 'G_TEST', 'GE_TEST']


The **lambda operator** is a way to create small anonymous functions. These functions are just needed where they have been created. For example:

In [100]:
# Calculates the square of every number in the range [0,9]
list(map(lambda x: x**2, range(10)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

map() can be applied to more than one list. The lists have to have the same length.

In [104]:
# Calculates the sum of the elements of two arrays
# Note a pairs set is passed as argument for the anonymous function
print(list(map(lambda x, y: x+y, [1,2,3,4,5],[5,4,3,2,1])))

# Same example done by means of functions definition
tickers = ['AAPL','GOOG','IBM','FB','F','V', 'G', 'GE']
print(list(map( lambda x: x + "_TEST", tickers)))

[6, 6, 6, 6, 6]
['AAPL_TEST', 'GOOG_TEST', 'IBM_TEST', 'FB_TEST', 'F_TEST', 'V_TEST', 'G_TEST', 'GE_TEST']


**sorted()** takes a list or set and returns a new sorted list

In [106]:
print(sorted([5,2,3,4,1])) # Sorts a list (also applies for dictionaries, sets and tuples)

print(sorted((5,3,1,4,2))) # Tuple

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]


We can add a "key" parameter to specify a function to be called on each list element prior to making comparisons. For example:

In [107]:
price_list = [('AAPL',144.09),('GOOG',911.71),('MSFT',69),('FB',150),('WMT',75.32)]
sorted(price_list, key = lambda x: x[1]) # Will sort dictionary by items value, instead of items key

[('MSFT', 69), ('WMT', 75.32), ('AAPL', 144.09), ('FB', 150), ('GOOG', 911.71)]

By default the values are sorted by ascending order. We can change it to descending by adding an optional parameter "reverse'.

In [None]:
price_list = [('AAPL',144.09),('GOOG',911.71),('MSFT',69),('FB',150),('WMT',75.32)]
sorted(price_list, key = lambda x: x[1],reverse = True) # Reversed order for the same sample above

[('GOOG', 911.71), ('FB', 150), ('AAPL', 144.09), ('WMT', 75.32), ('MSFT', 69)]

Lists also have a function list.sort(). This function takes the same "key" and "reverse" arguments as sorted(), but it doesn't return a new list.

In [None]:
price_list = [('AAPL',144.09),('GOOG',911.71),('MSFT',69),('FB',150),('WMT',75.32)]
price_list.sort(key = lambda x: x[1]) # Analogical method for dictionaries
print(price_list)

[('MSFT', 69), ('WMT', 75.32), ('AAPL', 144.09), ('FB', 150), ('GOOG', 911.71)]


# Object-Oriented Programming
Python is an object-oriented programming language. It's important to understand the concept of "objects" because almost every kind of data from QuantConnect API is an object.

## Class
A class is a type of data, just like a string, float, or list. When we create an object of that data type, we call it an instance of a class.

In Python, everything is an object - everything is an instance of some class. The data stored inside an object are called attributes, and the functions which are associated with the object are called methods.

For example, as mentioned above, a list is an object of the "list" class, and it has a method list.sort().

We can create our own objects by defining a class. We would do this when it's helpful to group certain functions together. For example, we define a class named "Stock" here:

In [109]:
class stock:
    def __init__(self, ticker, open, close, volume):
        # Object construcctor, gets 4 arguments
        self.ticker = ticker
        self.open = open
        self.close = close
        self.volume = volume
        self.rate_return = float(close)/open - 1

    # Methods:
    def update(self, open, close):
        self.open = open
        self.close = close
        self.rate_return = float(self.close)/self.open - 1
 
    def print_return(self):
        print(self.rate_return)

The "Stock" class has attributes "ticker", "open", "close", "volume" and "rate_return". Inside the class body, the first method is called __init__, which is a special method. When we create a new instance of the class, the __init__ method is immediately executed with all the parameters that we pass to the "Stock" object. The purpose of this method is to set up a new "Stock" object using data we have provided.

Here we create two Stock objects named "apple" and "google".

In [110]:
# Invoques constructor in order to create two objects of class Stock
apple = stock('AAPL', 143.69, 144.09, 20109375)
google = stock('GOOG', 898.7, 911.7, 1561616)

Stock objects also have two other methods: update() and print_return(). We can access the attribues of a Stock object and call its methods:

In [112]:
apple.ticker # Gets a public attribute of apple object

# Invoques methods from google object
google.print_return() 
google.update(912.8,913.4)
google.print_return()

# Edit:

apple.update(92,45)
apple.print_return()

0.0006573181419806673
0.0006573181419806673
-0.5108695652173914


By calling the update() function, we updated the open and close prices of a stock. Please note that when we use the attributes or call the methods **inside a class**, we need to specify them as self.attribute or self.method(), otherwise Python will deem them as global variables and thus raise an error.

We can add an attribute to an object anywhere:

In [114]:
# Similar to Javascript, can get new attributes anywhere
apple.ceo = 'Tim Cook'
print( apple.ceo )

# Edit:
# Adds a method to apple
apple.test = lambda x: x + 2

print( apple.test(2))

Tim Cook
4


We can check what names (i.e. attributes and methods) are defined on an object using the dir() function:

In [115]:
dir(apple)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'ceo',
 'close',
 'open',
 'print_return',
 'rate_return',
 'test',
 'ticker',
 'update',
 'volume']

## Inheritance
Inheritance is a way of arranging classes in a hierarchy from the most general to the most specific. A "child" class is a more specific type of a "parent" class because a child class will inherit all the attribues and methods of its parent. For example, we define a class named "Child" which inherits "Stock":

In [117]:
class child(stock):
    # Child inherits from stock
    def __init__(self,name):
        self.name = name

In [125]:
aa = child('aa')
print(aa.name)

# Methods are inherited from stock
aa.update(100,102)
print(aa.open)
print(aa.close)
print(aa.print_return())

# However, since apple object got a property definion outside its class
try:
  print( aa.test )
except Exception:
  print( "Error, could not find test attribute of aa object" )


aa
100
102
0.020000000000000018
None
Error, could not find test attribute of aa object


As seen above, the new class Child has inherited the methods from Stock.

#Summary

In this chapter we have introduced functions and classes. When we write a QuantConnect algorithm, we would define our algorithm as a class (QCAlgorithm). This means our algorithm inherited the QC API methods from QCAlgorithm class.

In the next chapter, we will introduce NumPy and Pandas, which enable us to conduct scientific calculations in Python.

<div align="center">
<img style="display: block; margin: auto;" alt="photo" src="https://cdn.quantconnect.com/web/i/icon.png">

Quantconnect

Introduction to Financial Python
</div>

# 04 NumPy and Basic Pandas

# Introduction

Now that we have introduced the fundamentals of Python, it's time to learn about NumPy and Pandas.

# NumPy
NumPy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. It also has strong integration with Pandas, which is another powerful tool for manipulating financial data.

Python packages like NumPy and Pandas contain classes and methods which we can use by importing the package:

In [126]:
import numpy as np

## Basic NumPy Arrays
A NumPy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. Here we make an array by passing a list of Apple stock prices:

In [128]:
price_list = [143.73, 145.83, 143.68, 144.02, 143.5, 142.62]
price_array = np.array(price_list)
print(price_array, type(price_array)) # Custom NumPy array

[143.73 145.83 143.68 144.02 143.5  142.62] <class 'numpy.ndarray'>


Notice that the type of array is "ndarray" which is a multi-dimensional array. If we pass np.array() a list of lists, it will create a 2-dimensional array.

In [129]:
Ar = np.array([[1,3],[2,4]])
print(Ar, type(Ar)) # Custom NumPy matrix

[[1 3]
 [2 4]] <class 'numpy.ndarray'>


We get the dimensions of an ndarray using the .shape attribute:

In [130]:
print(Ar.shape) # Shape of matrix, it has 2 rows and 2 columns

(2, 2)


If we create an 2-dimensional array (i.e. matrix), each row can be accessed by index:

In [131]:
print(Ar[0])
print(Ar[1]) # Universal indexing methods

[1 3]
[2 4]


If we want to access the matrix by column instead:

In [132]:
print('the first column: ', Ar[:,0])
print('the second column: ', Ar[:,1])

# Edit:
# Get a copy of matrix
print( Ar[:,:] )

the first column:  [1 2]
the second column:  [3 4]
[[1 3]
 [2 4]]


## Array Functions
Some functions built in NumPy that allow us to perform calculations on arrays. For example, we can apply the natural logarithm to each element of an array:

In [133]:
print(np.log(price_array)) # Calculates log (natural logarithm) of each element in the array

[4.96793654 4.98244156 4.9675886  4.96995218 4.96633504 4.96018375]


Other functions return a single value:

In [134]:
# Calculates the mean of the numerical elements in the array
# It is (element1 + element2 + ... elementn) / n
print(np.mean(price_array)) 

# Calculates the standard desviation for the elements in the array
# It is sqrt( ( (element1 - mean)**2 + (element2 - mean)**2 + ... + (elementn - mean)**2 ) / (n-1) )
print(np.std(price_array))

# Calculates the sum of the elements in the array
# It is element1 + element2 + ... + elementn
print(np.sum(price_array))

# Calculates the maximum value of an element in the array
# It is a elementk such that elementk >= elementi for every elementi in the array
print(np.max(price_array))

143.89666666666668
0.9673790478515796
863.38
145.83


The functions above return the mean, standard deviation, total and maximum value of an array.

# Pandas
Pandas is one of the most powerful tools for dealing with financial data. 

First we need to import Pandas:

In [135]:
import pandas as pd

## Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, float, Python object, etc.)

We create a Series by calling pd.Series(data), where data can be a dictionary, an array or just a scalar value.

In [136]:
price = [143.73, 145.83, 143.68, 144.02, 143.5, 142.62]
s = pd.Series(price) # Creates a dataset from an array
s

0    143.73
1    145.83
2    143.68
3    144.02
4    143.50
5    142.62
dtype: float64

We can customize the indices of a new Series:

In [138]:
# Assign custom indexes for elements in the array
s = pd.Series(price,index = ['a','b','c','d','e','f'])
print(s)

# Edit:
s = pd.Series(price,index = [ "element" + str(x+1) for x in range( len(price) )])
print(s)

a    143.73
b    145.83
c    143.68
d    144.02
e    143.50
f    142.62
dtype: float64
element1    143.73
element2    145.83
element3    143.68
element4    144.02
element5    143.50
element6    142.62
dtype: float64


Or we can change the indices of an existing Series:

In [139]:
s.index = [6,5,4,3,2,1] # Updates indexes (holding the initial values order)
s

6    143.73
5    145.83
4    143.68
3    144.02
2    143.50
1    142.62
dtype: float64

Series is like a list since it can be sliced by index:

In [140]:
print(s[1:])
print(s[:-2]) # Same slice methods than arrays

5    145.83
4    143.68
3    144.02
2    143.50
1    142.62
dtype: float64
6    143.73
5    145.83
4    143.68
3    144.02
dtype: float64


Series is also like a dictionary whose values can be set or fetched by index label:

In [141]:
print(s[4])
s[4] = 0
print(s)

# Edit:
# Changing indexes and accesing Series object again
s.index = [ "element" + str(x+1) for x in range( len(price) )]
s["element5"] = 20
print(s)

143.68
6    143.73
5    145.83
4      0.00
3    144.02
2    143.50
1    142.62
dtype: float64
element1    143.73
element2    145.83
element3      0.00
element4    144.02
element5     20.00
element6    142.62
dtype: float64


Series can also have a name attribute, which will be used when we make up a Pandas DataFrame using several series.

In [142]:
s = pd.Series(price, name = 'Apple Price List') # Sets a name to the Serie object (s)
print(s)
print(s.name) # Gets its attribute "name", setted before

0    143.73
1    145.83
2    143.68
3    144.02
4    143.50
5    142.62
Name: Apple Price List, dtype: float64
Apple Price List


We can get the statistical summaries of a Series:

In [144]:
print(s.describe()) # Calculates size, mean, std, min,max and quartiles data of a dataset

count      6.000000
mean     143.896667
std        1.059711
min      142.620000
25%      143.545000
50%      143.705000
75%      143.947500
max      145.830000
Name: Apple Price List, dtype: float64


## Time Index
Pandas has a built-in function specifically for creating date indices: pd.date_range(). We use it to create a new index for our Series:

In [145]:
# Creates an array of indexes with the same size as original s object
# such that every index represents exactly the date one day after the previous one in the array
time_index = pd.date_range('2017-01-01',periods = len(s),freq = 'D')
print(time_index)
s.index = time_index # Update index based on a date format
print(s)

DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06'],
              dtype='datetime64[ns]', freq='D')
2017-01-01    143.73
2017-01-02    145.83
2017-01-03    143.68
2017-01-04    144.02
2017-01-05    143.50
2017-01-06    142.62
Freq: D, Name: Apple Price List, dtype: float64


Series are usually accessed using the iloc[] and loc[] methods. iloc[] is used to access elements by integer index, and loc[] is used to access the index of the series.

iloc[] is necessary when the index of a series are integers, take our previous defined series as example:

In [148]:
s.index = [6,5,4,3,2,1]
print(s)
print(s[1]) # There is ambiguous results, array position or serie index?

# Edit:
# Getting element by array index, instead of serie index
print( s.iloc[1] ) # Element indexed by a "5" key

6    143.73
5    145.83
4    143.68
3    144.02
2    143.50
1    142.62
Name: Apple Price List, dtype: float64
142.62
145.83


If we intended to take the second element of the series, we would make a mistake here, because the index are integers. In order to access to the element we want, we use iloc[] here:

In [150]:
print(s.iloc[1]) # Accesing by arrays memory index, instead of series custom index

145.83


While working with time series data, we often use time as the index. Pandas provides us with various methods to access the data by time index

In [152]:
s.index = time_index
print(s['2017-01-03']) # Accesing via date format (Note it is even a string)

143.68


We can even access to a range of dates:

In [153]:
print(s['2017-01-02':'2017-01-05']) # Accesing elements by a range of dates

2017-01-02    145.83
2017-01-03    143.68
2017-01-04    144.02
2017-01-05    143.50
Freq: D, Name: Apple Price List, dtype: float64


Series[] provides us a very flexible way to index data. We can add any condition in the square brackets:

In [158]:
print(s[s < np.mean(s)] ) # Getting elements whose value is less than the general mean

# Without writing "s" before the condition, it will evaluates and return
# data about the compliance of the condition, in this case
# values whose value is greater than the media and nearer to it by less than 1.64 times standard desviation
print([(s > np.mean(s)) & (s < np.mean(s) + 1.64*np.std(s))]) 

2017-01-01    143.73
2017-01-03    143.68
2017-01-05    143.50
2017-01-06    142.62
Name: Apple Price List, dtype: float64
[2017-01-01    False
2017-01-02    False
2017-01-03    False
2017-01-04     True
2017-01-05    False
2017-01-06    False
Freq: D, Name: Apple Price List, dtype: bool]


As demonstrated, we can use logical operators like & (and), | (or) and ~ (not) to group multiple conditions.

# Summary
Here we have introduced NumPy and Pandas for scientific computing in Python. In the next chapter, we will dive into Pandas to learn resampling and manipulating Pandas DataFrame, which are commonly used in financial data analysis.

<div align="center">
<img style="display: block; margin: auto;" alt="photo" src="https://cdn.quantconnect.com/web/i/icon.png"> <img style="display: block; margin: auto;" alt="photo" src="https://www.marketing-branding.com/wp-content/uploads/2020/07/google-colaboratory-colab-guia-completa.jpg " width="50" height="50">
<img style="display: block; margin: auto;" alt="photo" src="https://upload.wikimedia.org/wikipedia/commons/d/da/Yahoo_Finance_Logo_2019.svg" width="50" height="50">  

Quantconnect -> Google Colab with Yahoo Finance data

Introduction to Financial Python
</div>

# 05 Pandas-Resampling and DataFrame

# Introduction
In the last chapter we had a glimpse of Pandas. In this chapter we will learn about resampling methods and the DataFrame object, which is a powerful tool for financial data analysis.

# Fetching Data
Here we use the Yahoo Finance to retrieve data.


In [159]:
!pip install yfinance

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting yfinance
  Downloading yfinance-0.1.74-py2.py3-none-any.whl (27 kB)
Collecting requests>=2.26
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
[K     |████████████████████████████████| 62 kB 984 kB/s 
Installing collected packages: requests, yfinance
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
Successfully installed requests-2.28.1 yfinance-0.1.74


In [1]:
import yfinance as yf

aapl = yf.Ticker("AAPL")

# get stock info
print(aapl.info)

# get historical market data
aapl_table = aapl.history(start="2016-01-01",  end="2017-12-31")

# Prints table graphically
aapl_table

{'zip': '95014', 'sector': 'Technology', 'fullTimeEmployees': 154000, 'longBusinessSummary': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. It also sells various related services. In addition, the company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; AirPods Max, an over-ear wireless headphone; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, HomePod, and iPod touch. Further, it provides AppleCare support services; cloud services store services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts. Additionally, the company offers various services, such as Apple Arcade, a game subscription service; Apple Music, which offers users a curated listening experience with o

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2016-01-04,23.523352,24.156083,23.383509,24.151497,270597600,0.0,0
2016-01-05,24.243193,24.266117,23.477498,23.546272,223164000,0.0,0
2016-01-06,23.053389,23.468333,22.895207,23.085484,273829600,0.0,0
2016-01-07,22.622398,22.954810,22.106586,22.111170,324377600,0.0,0
2016-01-08,22.592599,22.720978,22.182241,22.228090,283192000,0.0,0
...,...,...,...,...,...,...,...
2017-12-22,41.594669,41.770878,41.551809,41.673248,65397600,0.0,0
2017-12-26,40.670768,40.830307,40.404072,40.616001,132742000,0.0,0
2017-12-27,40.504094,40.666013,40.411228,40.623154,85992800,0.0,0
2017-12-28,40.718392,40.920795,40.594569,40.737442,65920800,0.0,0


We will create a Series named "aapl" whose values are Apple's daily closing prices, which are of course indexed by dates:

In [4]:
# Selects Close field of aapl_table, and filters values whose id's date's year is 2017
aapl = aapl_table['Close']['2017']

In [3]:
print(aapl)

Date
2017-01-03    27.219833
2017-01-04    27.189371
2017-01-05    27.327633
2017-01-06    27.632292
2017-01-09    27.885389
                ...    
2017-12-22    41.673248
2017-12-26    40.616001
2017-12-27    40.623154
2017-12-28    40.737442
2017-12-29    40.296928
Name: Close, Length: 251, dtype: float64


Recall that we can fetch a specific data point using series['yyyy-mm-dd']. We can also fetch the data in a specific month using series['yyyy-mm'].

In [5]:
print(aapl['2017-3']) # Filters data whose id's date's year and month are 2017 and 03 (March) respectively

Date
2017-03-01    32.901917
2017-03-02    32.706562
2017-03-03    32.899563
2017-03-06    32.796005
2017-03-07    32.838375
2017-03-08    32.715977
2017-03-09    32.640652
2017-03-10    32.748924
2017-03-13    32.763058
2017-03-14    32.713627
2017-03-15    33.059616
2017-03-16    33.113743
2017-03-17    32.948990
2017-03-20    33.294983
2017-03-21    32.913681
2017-03-22    33.285568
2017-03-23    33.167877
2017-03-24    33.101974
2017-03-27    33.158466
2017-03-28    33.845741
2017-03-29    33.921055
2017-03-30    33.876335
2017-03-31    33.812790
Name: Close, dtype: float64


In [6]:
aapl['2017-2':'2017-4'] # Filters data whose id's date's is between February and April 2017

Date
2017-02-01    30.172647
2017-02-02    30.121096
2017-02-03    30.249987
2017-02-06    30.533554
2017-02-07    30.824142
                ...    
2017-04-24    33.808079
2017-04-25    34.017555
2017-04-26    33.817493
2017-04-27    33.843388
2017-04-28    33.810432
Name: Close, Length: 61, dtype: float64

.head(N) and .tail(N) are methods for quickly accessing the first or last N elements.

In [7]:
print(aapl.head(5)) # First dataset element
print(aapl.tail(10)) # Last dataset element

Date
2017-01-03    27.219833
2017-01-04    27.189371
2017-01-05    27.327633
2017-01-06    27.632292
2017-01-09    27.885389
Name: Close, dtype: float64
Date
2017-12-15    41.425613
2017-12-18    42.009007
2017-12-19    41.561337
2017-12-20    41.516098
2017-12-21    41.673248
2017-12-22    41.673248
2017-12-26    40.616001
2017-12-27    40.623154
2017-12-28    40.737442
2017-12-29    40.296928
Name: Close, dtype: float64


# Resampling
**_series.resample(freq)_** is a class called "DatetimeIndexResampler" which groups data in a Series object into regular time intervals. The argument "freq" determines the length of each interval.

**_series.resample.mean()_** is a complete statement that groups data into intervals, and then compute the mean of each interval. For example, if we want to aggregate the daily data into monthly data by mean:

In [8]:
by_month = aapl.resample('M').mean() # Groups data by month, for each group calculates its values mean
print(by_month)

Date
2017-01-31    28.021313
2017-02-28    31.430155
2017-03-31    33.096760
2017-04-30    33.630813
2017-05-31    35.924381
2017-06-30    34.938205
2017-07-31    35.048844
2017-08-31    37.686051
2017-09-30    37.395193
2017-10-31    37.444726
2017-11-30    41.004147
2017-12-31    40.930679
Freq: M, Name: Close, dtype: float64


We can also aggregate the data by week:

In [9]:
by_week = aapl.resample('W').mean() # Groups by week and calculates each group's mean
print(by_week.head())

Date
2017-01-08    27.342282
2017-01-15    27.941166
2017-01-22    28.108609
2017-01-29    28.394868
2017-02-05    29.497252
Freq: W-SUN, Name: Close, dtype: float64


We can also aggregate the data by month with max:

In [11]:
print(aapl.resample('M').max()) # Groups by month and calculates each group's maximum value

# Edit:
print(aapl.resample("Y").mean()) # Groups by year and calculates each group's mean

Date
2017-01-31    28.579067
2017-02-28    32.271130
2017-03-31    33.921055
2017-04-30    34.074047
2017-05-31    36.892399
2017-06-30    36.738781
2017-07-31    36.268475
2017-08-31    38.911682
2017-09-30    38.923546
2017-10-31    40.107498
2017-11-30    41.815823
2017-12-31    42.009007
Freq: M, Name: Close, dtype: float64
Date
2017-12-31    35.601321
Freq: A-DEC, Name: Close, dtype: float64


We can choose almost any frequency by using the format 'nf', where 'n' is an integer and 'f' is M for month, W for week and D for day.

In [15]:
# For each group calculates its mean:

three_day = aapl.resample('3D').mean() # Groups elements created every three days
two_week = aapl.resample('2W').mean() # Groups elements created every two weeks
two_month = aapl.resample('2M').mean() # Groups elements created every two months
test = aapl.resample("2D").mean() # Groups elements created every two days

print(three_day)
print(two_week)
print(two_month )
print( test )

Date
2017-01-03    27.245612
2017-01-06    27.632292
2017-01-09    27.954134
2017-01-12    27.921715
2017-01-15    28.122084
                ...    
2017-12-17    41.785172
2017-12-20    41.620865
2017-12-23          NaN
2017-12-26    40.658866
2017-12-29    40.296928
Freq: 3D, Name: Close, Length: 121, dtype: float64
Date
2017-01-08    27.342282
2017-01-22    28.015585
2017-02-05    28.946060
2017-02-19    31.341193
2017-03-05    32.413925
2017-03-19    32.833897
2017-04-02    33.437847
2017-04-16    33.661109
2017-04-30    33.603547
2017-05-14    35.498714
2017-05-28    36.292811
2017-06-11    36.311531
2017-06-25    34.336638
2017-07-09    34.074197
2017-07-23    35.082285
2017-08-06    36.070652
2017-08-20    37.692904
2017-09-03    38.244487
2017-09-17    38.069386
2017-10-01    36.635584
2017-10-15    36.865020
2017-10-29    37.546923
2017-11-12    40.803593
2017-11-26    40.974770
2017-12-10    40.639341
2017-12-24    41.388940
2018-01-07    40.568381
Freq: 2W-SUN, Name: Close, 

Besides the mean() method, other methods can also be used with the resampler:



In [16]:
std = aapl.resample('W').std() # Groups elements by week and calculates their standard desviation
max = aapl.resample('W').max() # Groups elements by week and calculates their max value
min = aapl.resample('W').min() # Groups elements by week and calculates their min value

# Edit:
count = aapl.resample('W').count() # Groups elements by week and counts their elements count
print(std)
print(max)
print(min)
print(count)

Date
2017-01-08    0.202234
2017-01-15    0.072126
2017-01-22    0.025410
2017-01-29    0.243920
2017-02-05    0.938003
2017-02-12    0.250595
2017-02-19    0.230102
2017-02-26    0.059008
2017-03-05    0.338194
2017-03-12    0.075865
2017-03-19    0.176840
2017-03-26    0.156390
2017-04-02    0.318026
2017-04-09    0.127970
2017-04-16    0.211291
2017-04-23    0.173700
2017-04-30    0.089523
2017-05-07    0.234326
2017-05-14    0.351011
2017-05-21    0.533100
2017-05-28    0.060058
2017-06-04    0.279658
2017-06-11    0.616581
2017-06-18    0.380429
2017-06-25    0.128173
2017-07-02    0.262638
2017-07-09    0.158002
2017-07-16    0.402025
2017-07-23    0.124294
2017-07-30    0.382301
2017-08-06    0.919238
2017-08-13    0.464864
2017-08-20    0.432827
2017-08-27    0.274226
2017-09-03    0.250323
2017-09-10    0.379516
2017-09-17    0.292507
2017-09-24    0.731096
2017-10-01    0.352870
2017-10-08    0.203716
2017-10-15    0.118069
2017-10-22    0.514122
2017-10-29    0.676664
2017-1

Often we want to calculate monthly returns of a stock, based on prices on the last day of each month. To fetch those prices, we use the series.resample.agg() method:

In [17]:
# Fetches the last element from a group
# in this case, its the last day's price
last_day = aapl.resample('M').agg(lambda x: x[-1])
print(last_day)

Date
2017-01-31    28.438454
2017-02-28    32.242889
2017-03-31    33.812790
2017-04-30    33.810432
2017-05-31    36.103027
2017-06-30    34.037437
2017-07-31    35.150585
2017-08-31    38.911682
2017-09-30    36.567482
2017-10-31    40.107498
2017-11-30    40.920811
2017-12-31    40.296928
Freq: M, Name: Close, dtype: float64


Or directly calculate the monthly rates of return using the data for the first day and the last day:

In [20]:
# Fetches last and first day prices data
# Then calculates its relationship (montly rate)
monthly_return = aapl.resample('M').agg(lambda x: x[-1]/x[0] - 1)

print(monthly_return)

# Edit:
# Calculate the mean of prices bassed on the first, middle and last days prices
montly_mean = aapl.resample('M').agg(lambda x: (
    x[0] + x[-1] + x[ ( len(x) // 2 if len(x) % 2 == 1 else len(x) // 2 -1 )] ) / 3
  )

print(montly_mean)

Date
2017-01-31    0.044770
2017-02-28    0.068613
2017-03-31    0.027685
2017-04-30   -0.000348
2017-05-31    0.046463
2017-06-30   -0.059799
2017-07-31    0.036446
2017-08-31    0.097261
2017-09-30   -0.060531
2017-10-31    0.099018
2017-11-30    0.033422
2017-12-31   -0.010640
Freq: M, Name: Close, dtype: float64
Date
2017-01-31    27.926790
2017-02-28    31.398250
2017-03-31    33.276150
2017-04-30    33.671571
2017-05-31    35.800317
2017-06-30    34.780322
2017-07-31    34.803956
2017-08-31    37.520749
2017-09-30    37.808392
2017-10-31    38.178527
2017-11-30    40.259796
2017-12-31    40.678716
Freq: M, Name: Close, dtype: float64


Series object also provides us some convenient methods to do some quick calculation.

In [23]:
# Calculations over previously obtained sets
print(monthly_return.mean())
print(monthly_return.std())
print(monthly_return.max())

# Edit:
print(montly_mean.mean())
print(montly_mean.std())
print(montly_mean.max())

0.02686335042097802
0.052258534895370336
0.09901819505353271
35.50862799750434
3.6880593044053307
40.67871602376302


Another two methods frequently used on Series are .diff() and .pct_change(). The former calculates the difference between consecutive elements, and the latter calculates the percentage change.

In [None]:
print(last_day.diff()) # Difference between consecutive elements
print(last_day.pct_change()) # Percentage change of the previous difference (compared with the original value)

Date
2017-01-31         NaN
2017-02-28    3.825745
2017-03-31    1.578693
2017-04-30   -0.002365
2017-05-31    2.305443
2017-06-30   -2.077160
2017-07-31    1.119385
2017-08-31    3.782162
2017-09-30   -2.357327
2017-10-31    3.559849
2017-11-30    0.817856
2017-12-31   -0.627361
Freq: M, Name: Close, dtype: float64
Date
2017-01-31         NaN
2017-02-28    0.133778
2017-03-31    0.048690
2017-04-30   -0.000070
2017-05-31    0.067807
2017-06-30   -0.057214
2017-07-31    0.032704
2017-08-31    0.106999
2017-09-30   -0.060244
2017-10-31    0.096808
2017-11-30    0.020278
2017-12-31   -0.015246
Freq: M, Name: Close, dtype: float64


Notice that we induced a NaN value while calculating percentage changes i.e. returns.

When dealing with NaN values, we usually either removing the data point or fill it with a specific value. Here we fill it with 0:

In [24]:
daily_return = last_day.pct_change()
print(daily_return.fillna(0)) # Replaces the initial NaN value with a 0, meaning there is not any difference in the first day

Date
2017-01-31    0.000000
2017-02-28    0.133778
2017-03-31    0.048690
2017-04-30   -0.000070
2017-05-31    0.067807
2017-06-30   -0.057214
2017-07-31    0.032704
2017-08-31    0.107000
2017-09-30   -0.060244
2017-10-31    0.096808
2017-11-30    0.020278
2017-12-31   -0.015246
Freq: M, Name: Close, dtype: float64


Alternatively, we can fill a NaN with the next fitted value. This is called 'backward fill', or 'bfill' in short:

In [25]:
daily_return = last_day.pct_change()
print(daily_return.fillna(method = 'bfill')) # Replaces the initial NaN value with the next value in the set

Date
2017-01-31    0.133778
2017-02-28    0.133778
2017-03-31    0.048690
2017-04-30   -0.000070
2017-05-31    0.067807
2017-06-30   -0.057214
2017-07-31    0.032704
2017-08-31    0.107000
2017-09-30   -0.060244
2017-10-31    0.096808
2017-11-30    0.020278
2017-12-31   -0.015246
Freq: M, Name: Close, dtype: float64


As expected, since there is a 'backward fill' method, there must be a 'forward fill' method, or 'ffill' in short. However we can't use it here because the NaN is the first value.

We can also simply remove NaN values by **_.dropna()_**

In [26]:
daily_return = last_day.pct_change()
daily_return.dropna() # Deletes NaN values from the result

Date
2017-02-28    0.133778
2017-03-31    0.048690
2017-04-30   -0.000070
2017-05-31    0.067807
2017-06-30   -0.057214
2017-07-31    0.032704
2017-08-31    0.107000
2017-09-30   -0.060244
2017-10-31    0.096808
2017-11-30    0.020278
2017-12-31   -0.015246
Freq: M, Name: Close, dtype: float64

# DataFrame
The **DataFrame** is the most commonly used data structure in Pandas. It is essentially a table, just like an Excel spreadsheet.

More precisely, a DataFrame is a collection of Series objects, each of which may contain different data types. A DataFrame can be created from various data types: dictionary, 2-D numpy.ndarray, a Series or another DataFrame.

## Create DataFrames
The most common method of creating a DataFrame is passing a dictionary:

In [28]:
import pandas as pd

# Creates a dataset with three columns, indexed by date index (with daily difference)

dict = {'AAPL': [143.5, 144.09, 142.73, 144.18, 143.77],'GOOG':[898.7, 911.71, 906.69, 918.59, 926.99],
        'IBM':[155.58, 153.67, 152.36, 152.94, 153.49]}
data_index = pd.date_range('2017-07-03',periods = 5, freq = 'D')
df = pd.DataFrame(dict, index = data_index)
print(df)

              AAPL    GOOG     IBM
2017-07-03  143.50  898.70  155.58
2017-07-04  144.09  911.71  153.67
2017-07-05  142.73  906.69  152.36
2017-07-06  144.18  918.59  152.94
2017-07-07  143.77  926.99  153.49


## Manipulating DataFrames
We can fetch values in a DataFrame by columns and index. Each column in a DataFrame is essentially a Pandas Series. We can fetch a column by square brackets: **df['column_name']**

If a column name contains no spaces, then we can also use df.column_name to fetch a column:

In [29]:
df = aapl_table
print(df.Close.tail(5)) # Fetchs first 5 elements, selects only Close column
print(df['Volume'].tail(5)) # Fetchs first 5 elements, selects only Volume column

Date
2017-12-22    41.673248
2017-12-26    40.616001
2017-12-27    40.623154
2017-12-28    40.737442
2017-12-29    40.296928
Name: Close, dtype: float64
Date
2017-12-22     65397600
2017-12-26    132742000
2017-12-27     85992800
2017-12-28     65920800
2017-12-29    103999600
Name: Volume, dtype: int64


All the methods we applied to a Series index such as iloc[], loc[] and resampling methods, can also be applied to a DataFrame:

In [32]:
aapl_2016 = df['2016'] # Selects records whose date index has 2016 as its year
aapl_month = aapl_2016.resample('M').agg(lambda x: x[-1]) # Chooses the last day data for all montly group
print(aapl_month)

# Edit:
aapl_year = aapl_2016.resample("3M").agg(lambda x: x[-1]) # Chooses the last day data for all quarterly group
print(aapl_year)

                 Open       High        Low      Close     Volume  Dividends  \
Date                                                                           
2016-01-31  21.730612  22.315199  21.629742  22.315199  257666000        0.0   
2016-02-29  22.325658  22.641435  22.277254  22.286474  140865200        0.0   
2016-03-31  25.289805  25.331294  25.096189  25.121544  103553600        0.0   
2016-04-30  21.664137  21.832399  21.323007  21.606514  274126000        0.0   
2016-05-31  23.096979  23.282497  22.916099  23.157272  169228800        0.0   
2016-06-30  21.900393  22.208816  21.867928  22.169394  143345600        0.0   
2016-07-31  24.161388  24.244871  24.043120  24.166025  110934800        0.0   
2016-08-31  24.635014  24.847183  24.630350  24.737600  118649600        0.0   
2016-09-30  26.220463  26.432634  26.066582  26.358025  145516400        0.0   
2016-10-31  26.497912  26.633142  26.392992  26.472265  105677600        0.0   
2016-11-30  26.153539  26.294149  25.841

We may select certain columns of a DataFrame using their names:

In [33]:
aapl_bar = aapl_month[['Open', 'High', 'Low', 'Close']] # Select a group of specific columns from a dataframe
print(aapl_bar)

                 Open       High        Low      Close
Date                                                  
2016-01-31  21.730612  22.315199  21.629742  22.315199
2016-02-29  22.325658  22.641435  22.277254  22.286474
2016-03-31  25.289805  25.331294  25.096189  25.121544
2016-04-30  21.664137  21.832399  21.323007  21.606514
2016-05-31  23.096979  23.282497  22.916099  23.157272
2016-06-30  21.900393  22.208816  21.867928  22.169394
2016-07-31  24.161388  24.244871  24.043120  24.166025
2016-08-31  24.635014  24.847183  24.630350  24.737600
2016-09-30  26.220463  26.432634  26.066582  26.358025
2016-10-31  26.497912  26.633142  26.392992  26.472265
2016-11-30  26.153539  26.294149  25.841853  25.900440
2016-12-31  27.337010  27.465901  27.051101  27.142498


We can even specify both rows and columns using loc[]. The row indices and column names are separated by a comma:

In [34]:
# Selects records whose indes is between 2016-03 and 2016-6, and additionally selects only a set of columns
print(aapl_month.loc['2016-03':'2016-06',['Open', 'High', 'Low', 'Close']])

                 Open       High        Low      Close
Date                                                  
2016-03-31  25.289805  25.331294  25.096189  25.121544
2016-04-30  21.664137  21.832399  21.323007  21.606514
2016-05-31  23.096979  23.282497  22.916099  23.157272
2016-06-30  21.900393  22.208816  21.867928  22.169394


The subset methods in DataFrame is quite useful. By writing logical statements in square brackets, we can make customized subsets:

In [36]:
import numpy as np

# Select records (result of previosuly columns reduction and montly grouping) whose
# Close field value is greater than the mean of that field for the whole dataset
above = aapl_bar[aapl_bar.Close > np.mean(aapl_bar.Close)]
print(above)

                 Open       High        Low      Close
Date                                                  
2016-03-31  25.289805  25.331294  25.096189  25.121544
2016-08-31  24.635014  24.847183  24.630350  24.737600
2016-09-30  26.220463  26.432634  26.066582  26.358025
2016-10-31  26.497912  26.633142  26.392992  26.472265
2016-11-30  26.153539  26.294149  25.841853  25.900440
2016-12-31  27.337010  27.465901  27.051101  27.142498


## Data Validation
As mentioned, all methods that apply to a Series can also be applied to a DataFrame. Here we add a new column to an existing DataFrame:

In [38]:
# Adding rate_return column and defining its value for each record, based on Close field
# assigns it percentage change ( calculated by comparing a row field's value with the next row one)
aapl_bar['rate_return'] = aapl_bar.Close.pct_change()
print(aapl_bar)

                 Open       High        Low      Close  rate_return
Date                                                               
2016-01-31  21.730612  22.315199  21.629742  22.315199          NaN
2016-02-29  22.325658  22.641435  22.277254  22.286474    -0.001287
2016-03-31  25.289805  25.331294  25.096189  25.121544     0.127210
2016-04-30  21.664137  21.832399  21.323007  21.606514    -0.139921
2016-05-31  23.096979  23.282497  22.916099  23.157272     0.071773
2016-06-30  21.900393  22.208816  21.867928  22.169394    -0.042660
2016-07-31  24.161388  24.244871  24.043120  24.166025     0.090063
2016-08-31  24.635014  24.847183  24.630350  24.737600     0.023652
2016-09-30  26.220463  26.432634  26.066582  26.358025     0.065505
2016-10-31  26.497912  26.633142  26.392992  26.472265     0.004334
2016-11-30  26.153539  26.294149  25.841853  25.900440    -0.021601
2016-12-31  27.337010  27.465901  27.051101  27.142498     0.047955


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Here the calculation introduced a NaN value. If the DataFrame is large, we would not be able to observe it. **isnull()** provides a convenient way to check abnormal values.

In [39]:
missing = aapl_bar.isnull() # Evaluates every row and every field checking out if there is a null value
print(missing)
print('---------------------------------------------')
print(missing.describe()) # Shows statistics about the given table

             Open   High    Low  Close  rate_return
Date                                               
2016-01-31  False  False  False  False         True
2016-02-29  False  False  False  False        False
2016-03-31  False  False  False  False        False
2016-04-30  False  False  False  False        False
2016-05-31  False  False  False  False        False
2016-06-30  False  False  False  False        False
2016-07-31  False  False  False  False        False
2016-08-31  False  False  False  False        False
2016-09-30  False  False  False  False        False
2016-10-31  False  False  False  False        False
2016-11-30  False  False  False  False        False
2016-12-31  False  False  False  False        False
---------------------------------------------
         Open   High    Low  Close rate_return
count      12     12     12     12          12
unique      1      1      1      1           2
top     False  False  False  False       False
freq       12     12     12     12    

The row labelled "unique" indicates the number of unique values in each column. Since the "rate_return" column has 2 unique values, it has at least one missing value.

We can deduce the number of missing values by comparing "count" with "freq". There are 12 counts and 11 False values, so there is one True value which corresponds to the missing value.

We can also find the rows with missing values easily:

In [40]:
# Finally, there can be filter these records whose rate_return value is null
print(missing[missing.rate_return == True])

             Open   High    Low  Close  rate_return
Date                                               
2016-01-31  False  False  False  False         True


Usually when dealing with missing data, we either delete the whole row or fill it with some value. As we introduced in the Series chapter, the same method **dropna()** and **fillna()** can be applied to a DataFrame.

In [41]:
drop = aapl_bar.dropna() # Drops a whole row if there is any null value on it
print(drop)
print('\n--------------------------------------------------\n')
fill = aapl_bar.fillna(0) # Fills all rows Null values with a 0
print(fill)

                 Open       High        Low      Close  rate_return
Date                                                               
2016-02-29  22.325658  22.641435  22.277254  22.286474    -0.001287
2016-03-31  25.289805  25.331294  25.096189  25.121544     0.127210
2016-04-30  21.664137  21.832399  21.323007  21.606514    -0.139921
2016-05-31  23.096979  23.282497  22.916099  23.157272     0.071773
2016-06-30  21.900393  22.208816  21.867928  22.169394    -0.042660
2016-07-31  24.161388  24.244871  24.043120  24.166025     0.090063
2016-08-31  24.635014  24.847183  24.630350  24.737600     0.023652
2016-09-30  26.220463  26.432634  26.066582  26.358025     0.065505
2016-10-31  26.497912  26.633142  26.392992  26.472265     0.004334
2016-11-30  26.153539  26.294149  25.841853  25.900440    -0.021601
2016-12-31  27.337010  27.465901  27.051101  27.142498     0.047955

--------------------------------------------------

                 Open       High        Low      Close  rate_re

## DataFrame Concat
We have seen how to extract a Series from a dataFrame. Now we need to consider how to merge a Series or a DataFrame into another one.

In Pandas, the function **concat()** allows us to merge multiple Series into a DataFrame:

In [45]:
s1 = pd.Series([143.5, 144.09, 142.73, 144.18, 143.77], name = 'AAPL')
s2 = pd.Series([898.7, 911.71, 906.69, 918.59, 926.99], name = 'GOOG')
# Joins two dataframes columns into a single dataframe

# Edit:

s3 = pd.Series([ x**2 for x in range(5)], name = "TEST")

data_frame = pd.concat([s1,s2, s3], axis = 1)
print(data_frame)

     AAPL    GOOG  TEST
0  143.50  898.70     0
1  144.09  911.71     1
2  142.73  906.69     4
3  144.18  918.59     9
4  143.77  926.99    16


The "axis = 1" parameter will join two DataFrames by columns:

In [47]:
log_price = np.log(aapl_bar.Close) # Calculates ln( item ) for each element in Close column of appl_bar
log_price.name = 'log_price'
print(log_price)
print('\n---------------------- separate line--------------------\n')
concat = pd.concat([aapl_bar, log_price], axis = 1) # Concats new column into a new column in aapl_bar
print(concat)

Date
2016-01-31    3.105268
2016-02-29    3.103980
2016-03-31    3.223726
2016-04-30    3.072995
2016-05-31    3.142309
2016-06-30    3.098713
2016-07-31    3.184948
2016-08-31    3.208324
2016-09-30    3.271773
2016-10-31    3.276098
2016-11-30    3.254260
2016-12-31    3.301101
Freq: M, Name: log_price, dtype: float64

---------------------- separate line--------------------

                 Open       High        Low      Close  rate_return  log_price
Date                                                                          
2016-01-31  21.730612  22.315199  21.629742  22.315199          NaN   3.105268
2016-02-29  22.325658  22.641435  22.277254  22.286474    -0.001287   3.103980
2016-03-31  25.289805  25.331294  25.096189  25.121544     0.127210   3.223726
2016-04-30  21.664137  21.832399  21.323007  21.606514    -0.139921   3.072995
2016-05-31  23.096979  23.282497  22.916099  23.157272     0.071773   3.142309
2016-06-30  21.900393  22.208816  21.867928  22.169394    -0.04266

We can also join two DataFrames by rows. Consider these two DataFrames:

In [48]:
# Selects records whose id's date is in the range 2016-10 and 2017-04, also selects Volume and Stock Splits columns
# Then groups results montly and sets as values (of these two selected columns) the values corresponding to the last
# month's day record
df_volume = aapl_table.loc['2016-10':'2017-04',['Volume', 'Stock Splits']].resample('M').agg(lambda x: x[-1])
print(df_volume)
print('\n---------------------- separate line--------------------\n')

# Selects records whose id's date is in the range 2016-10 and 2017-04, also selects Open,High,Low and Close columns
# Then groups results montly and sets as values (of these two selected columns) the values corresponding to the last
# month's day record

df_2017 = aapl_table.loc['2016-10':'2017-04',['Open', 'High', 'Low', 'Close']].resample('M').agg(lambda x: x[-1])
print(df_2017)

               Volume  Stock Splits
Date                               
2016-10-31  105677600             0
2016-11-30  144649200             0
2016-12-31  122345200             0
2017-01-31  196804000             0
2017-02-28   93931600             0
2017-03-31   78646800             0
2017-04-30   83441600             0

---------------------- separate line--------------------

                 Open       High        Low      Close
Date                                                  
2016-10-31  26.497912  26.633142  26.392992  26.472265
2016-11-30  26.153539  26.294149  25.841853  25.900440
2016-12-31  27.337010  27.465901  27.051101  27.142498
2017-01-31  28.391584  28.447828  28.267379  28.438454
2017-02-28  32.264072  32.348804  32.174631  32.242889
2017-03-31  33.826911  33.956364  33.659799  33.812790
2017-04-30  33.913994  33.963423  33.720996  33.810432


Now we merge the DataFrames with our DataFrame 'aapl_bar'

In [50]:
# Concats values mathing their id, also appends ids from one or another datafram
# in case there could not be resolved a match
concat = pd.concat([aapl_bar, df_volume], axis = 1)
print(concat)

                 Open       High        Low      Close  rate_return  \
Date                                                                  
2016-01-31  21.730612  22.315199  21.629742  22.315199          NaN   
2016-02-29  22.325658  22.641435  22.277254  22.286474    -0.001287   
2016-03-31  25.289805  25.331294  25.096189  25.121544     0.127210   
2016-04-30  21.664137  21.832399  21.323007  21.606514    -0.139921   
2016-05-31  23.096979  23.282497  22.916099  23.157272     0.071773   
2016-06-30  21.900393  22.208816  21.867928  22.169394    -0.042660   
2016-07-31  24.161388  24.244871  24.043120  24.166025     0.090063   
2016-08-31  24.635014  24.847183  24.630350  24.737600     0.023652   
2016-09-30  26.220463  26.432634  26.066582  26.358025     0.065505   
2016-10-31  26.497912  26.633142  26.392992  26.472265     0.004334   
2016-11-30  26.153539  26.294149  25.841853  25.900440    -0.021601   
2016-12-31  27.337010  27.465901  27.051101  27.142498     0.047955   
2017-0

By default the DataFrame are joined with all of the data. This default options results in zero information loss. We can also merge them by intersection, this is called 'inner join

In [51]:
# Do the same operation as above, but it will only merge these records
# whose id matches on both dataframes
concat = pd.concat([aapl_bar,df_volume],axis = 1, join = 'inner')
print(concat)

                 Open       High        Low      Close  rate_return  \
Date                                                                  
2016-10-31  26.497912  26.633142  26.392992  26.472265     0.004334   
2016-11-30  26.153539  26.294149  25.841853  25.900440    -0.021601   
2016-12-31  27.337010  27.465901  27.051101  27.142498     0.047955   

               Volume  Stock Splits  
Date                                 
2016-10-31  105677600             0  
2016-11-30  144649200             0  
2016-12-31  122345200             0  


Only the intersection part was left if use 'inner join' method. Now let's try to append a DataFrame to another one:

In [53]:
# Directly appends records from one dataframe to another
# Inserts at the end of the second the first one's rows
append = aapl_bar.append(df_2017)
print(append)

                 Open       High        Low      Close  rate_return
Date                                                               
2016-01-31  21.730612  22.315199  21.629742  22.315199          NaN
2016-02-29  22.325658  22.641435  22.277254  22.286474    -0.001287
2016-03-31  25.289805  25.331294  25.096189  25.121544     0.127210
2016-04-30  21.664137  21.832399  21.323007  21.606514    -0.139921
2016-05-31  23.096979  23.282497  22.916099  23.157272     0.071773
2016-06-30  21.900393  22.208816  21.867928  22.169394    -0.042660
2016-07-31  24.161388  24.244871  24.043120  24.166025     0.090063
2016-08-31  24.635014  24.847183  24.630350  24.737600     0.023652
2016-09-30  26.220463  26.432634  26.066582  26.358025     0.065505
2016-10-31  26.497912  26.633142  26.392992  26.472265     0.004334
2016-11-30  26.153539  26.294149  25.841853  25.900440    -0.021601
2016-12-31  27.337010  27.465901  27.051101  27.142498     0.047955
2016-10-31  26.497912  26.633142  26.392992  26.

'Append' is essentially to concat two DataFrames by axis = 0, thus here is an alternative way to append:

In [54]:
# Setting axis = 0 will simulate the process made by append method
concat = pd.concat([aapl_bar, df_2017], axis = 0)
print(concat)

                 Open       High        Low      Close  rate_return
Date                                                               
2016-01-31  21.730612  22.315199  21.629742  22.315199          NaN
2016-02-29  22.325658  22.641435  22.277254  22.286474    -0.001287
2016-03-31  25.289805  25.331294  25.096189  25.121544     0.127210
2016-04-30  21.664137  21.832399  21.323007  21.606514    -0.139921
2016-05-31  23.096979  23.282497  22.916099  23.157272     0.071773
2016-06-30  21.900393  22.208816  21.867928  22.169394    -0.042660
2016-07-31  24.161388  24.244871  24.043120  24.166025     0.090063
2016-08-31  24.635014  24.847183  24.630350  24.737600     0.023652
2016-09-30  26.220463  26.432634  26.066582  26.358025     0.065505
2016-10-31  26.497912  26.633142  26.392992  26.472265     0.004334
2016-11-30  26.153539  26.294149  25.841853  25.900440    -0.021601
2016-12-31  27.337010  27.465901  27.051101  27.142498     0.047955
2016-10-31  26.497912  26.633142  26.392992  26.

Please note that if the two DataFrame have some columns with the same column names, these columns are considered to be the same and will be merged. It's very important to have the right column names. If we change a column names here:

In [56]:
# Overwrites columns values on append, altering initial data
# Also creates new columns if needed, depending on incoming data structure
df_2017.columns = ['Change', 'High','Low','Close']
concat = pd.concat([aapl_bar, df_2017], axis = 0)
print(concat)

                 Open       High        Low      Close  rate_return     Change
Date                                                                          
2016-01-31  21.730612  22.315199  21.629742  22.315199          NaN        NaN
2016-02-29  22.325658  22.641435  22.277254  22.286474    -0.001287        NaN
2016-03-31  25.289805  25.331294  25.096189  25.121544     0.127210        NaN
2016-04-30  21.664137  21.832399  21.323007  21.606514    -0.139921        NaN
2016-05-31  23.096979  23.282497  22.916099  23.157272     0.071773        NaN
2016-06-30  21.900393  22.208816  21.867928  22.169394    -0.042660        NaN
2016-07-31  24.161388  24.244871  24.043120  24.166025     0.090063        NaN
2016-08-31  24.635014  24.847183  24.630350  24.737600     0.023652        NaN
2016-09-30  26.220463  26.432634  26.066582  26.358025     0.065505        NaN
2016-10-31  26.497912  26.633142  26.392992  26.472265     0.004334        NaN
2016-11-30  26.153539  26.294149  25.841853  25.9004

Since the column name of 'Open' has been changed, the new DataFrame has an new column named 'Change'.

# Summary

Hereby we introduced the most import part of python: resampling and DataFrame manipulation. We only introduced the most commonly used method in Financial data analysis. There are also many methods used in data mining, which are also beneficial. You can always check the [Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html) official documentations for help.