# Python For Traders and Investors

Welcome to *Python for Traders and Investors*. This course aims to optimise the learning of programming and quantitative skills specifically for traders/investors without getting too deep into abstract computer science concepts. That said, we will learn some advanced programming techniques as we step through many practical examples. 

This is the first chapter of this course. The aim of the course is to learn the programming and numerical skills necessary for quantitative trading using the Python programming language. It is structured in a way that it walks through common problems in quantitative finance and teaches the required skills to solve them as we go along. You will learn a wide range of programming techniques in conjunction with practical finance examples. We try to stay away from theoretical and abstract concepts and adopt a learning-by-doing approach. There are exercises which help to consolidate what has been taught. 

The course initially starts with basic concepts and gradually becomes more complex. In order to get the most out of it there is no substitute for coming up with your own ideas and projects and applying what you've learned rather than trying to memorise the ideas presented here. 

Some of the concepts of this course may initially seem hard to grasp. Usually, with some practise this will become easier very quickly. Due to time limitations the learning curve will be steep but these notebooks can always be used as a reference for solving new problems.

As traders/investors, whenever we do anything quantitative we always deal with data. Data go into the computer, get processed and get something back, usually in the form of data. Normally, data exist in the form of files on a hard drive but they could also be more ephemeral in the memory of your computer. For now, we only deal with files. Let's download some data from a free data from Yahoo Finance.

Python works with packages, so first we need to import a package that connects us to Yahoo. Packages are pieces of software that someone else has written for us and we can just use them.

In order to execute a cell, just press Shift & Enter the same time.

## 2.1 Download Data from APIs

In [43]:
import yfinance as yf

Ouch! We get an error message. You will see a lot of those as you continue to do any kind of programming. But more about this later.

In [44]:
mydata = yf.download("AAPL")

[*********************100%***********************]  1 of 1 completed


In [3]:
# We called our output mydata and whatever comes out of pandas_datareader.get_data_yahoo() is stored here.
# We can just inspect the data by calling the name

In [45]:
mydata

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1980-12-12,0.128348,0.128906,0.128348,0.128348,0.100751,469033600
1980-12-15,0.122210,0.122210,0.121652,0.121652,0.095495,175884800
1980-12-16,0.113281,0.113281,0.112723,0.112723,0.088485,105728000
1980-12-17,0.115513,0.116071,0.115513,0.115513,0.090676,86441600
1980-12-18,0.118862,0.119420,0.118862,0.118862,0.093304,73449600
...,...,...,...,...,...,...
2021-07-09,142.750000,145.649994,142.649994,145.110001,145.110001,99788400
2021-07-12,146.210007,146.320007,144.000000,144.500000,144.500000,76299700
2021-07-13,144.029999,147.460007,143.630005,145.639999,145.639999,100698900
2021-07-14,148.100006,149.570007,147.679993,149.149994,149.149994,127050800


This is a nice table of data and we could already start to do some work with this. But let's first explore some more data handling, as it is really important going forward.

In the pandas_datareader package is a function called .get_data_yahoo() and we would like to know what that does. This is how we can do that:

In [46]:
import pandas as pd

In [47]:
??print

[1;31mDocstring:[0m
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
[1;31mType:[0m      builtin_function_or_method


In [48]:
??yf

[1;31mType:[0m        module
[1;31mString form:[0m <module 'yfinance' from 'd:\\encore-4team\\step3 데이터 전처리\\project-yfinance\\venv\\lib\\site-packages\\yfinance\\__init__.py'>
[1;31mFile:[0m        d:\encore-4team\step3 데이터 전처리\project-yfinance\venv\lib\site-packages\yfinance\__init__.py
[1;31mSource:[0m     
[1;31m#!/usr/bin/env python[0m[1;33m
[0m[1;31m# -*- coding: utf-8 -*-[0m[1;33m
[0m[1;31m#[0m[1;33m
[0m[1;31m# Yahoo! Finance market data downloader (+fix for Pandas Datareader)[0m[1;33m
[0m[1;31m# https://github.com/ranaroussi/yfinance[0m[1;33m
[0m[1;31m#[0m[1;33m
[0m[1;31m# Copyright 2017-2019 Ran Aroussi[0m[1;33m
[0m[1;31m#[0m[1;33m
[0m[1;31m# Licensed under the Apache License, Version 2.0 (the "License");[0m[1;33m
[0m[1;31m# you may not use this file except in compliance with the License.[0m[1;33m
[0m[1;31m# You may obtain a copy of the License at[0m[1;33m
[0m[1;31m#[0m[1;33m
[0m[1;31m#     http://www.apache.org/licenses

Some arcane code that we might not understand yet, but don't worry. This is the beauty of packages, we don't need to know what's running under the hood, all we need to know is what it produces and what we can do with it. 

Now that we have *mydata*, let's do something with it just out of curiosity.

In [49]:
# This tells us what kind of structure mydata is based on. 
type(mydata)

pandas.core.frame.DataFrame

In [50]:
# This gives us the length of mydata.
len(mydata)

10234

In [51]:
# This shows us the names of the rows.
list(mydata)

['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']

Unless we are computer scientists, writing programs is, to a large extent, about finding the right packages and understanding how to use them. 

## 2.2 Reading Data From Files

Right now, our output is stored in the computer's memory. When we shut down this notebook we will loose the data. In order to keep them we need to write them to a file and read them from a file if we want to use them. Let's now learn how to use files. 

First, let's produce a data file from our output. The most common data format is csv file, which stands for *comma seperated values* and that's exactly what it is. 

Lucky for us, someone has already written a package that writes our pandas output to a csv file and that function is automagically attached to any pandas structure, so we can use it directly.

In [16]:
mydata.to_csv('output.csv')

Most data providers such as Reuters or Bloomberg provide their data in this format, so it is important we understand it. It is quite simple, the first row gives us the names of the columns and the following rows are the values in each column. You will notice that each of the rows has the same number of values even though they are different lengths. 

Let's say we start another notebook and we want to read a data file stored on our computer. If we know that it was stored from a pandas format it is really easy to bring it back:

In [17]:
import pandas as pd

myinput = pd.read_csv('output.csv')

In [18]:
myinput.head()

Unnamed: 0,Date,Close,Open,High,Low,Volume,Change
0,1980-12-12,0.13,0.13,0.13,0.13,469030000.0,-0.9988
1,1980-12-15,0.12,0.12,0.12,0.12,175880000.0,-0.0769
2,1980-12-16,0.11,0.11,0.11,0.11,105730000.0,-0.0833
3,1980-12-17,0.12,0.12,0.12,0.12,86440000.0,0.0909
4,1980-12-18,0.12,0.12,0.12,0.12,73450000.0,0.0


Unfortunately, not all files are in this nice, easy-to-use format. Many of them have blank lines, missing data or bad symbols because they were, for example, produced by a different operating system. Somehow, if we want the data, we have to deal with this situation. 

Next, you will learn how read text files of any type, so you can use them for your trading. 

Let's start with the file that we already have, *output.csv*. If we want to read the file in its raw format, Python want us to open it first.

In [19]:
fid = open('output.csv','r')

In [20]:
fid.readline()

'Date,Close,Open,High,Low,Volume,Change\n'

In [21]:
fid.readline()

'1980-12-12,0.13,0.13,0.13,0.13,469030000.0,-0.9987999999999999\n'

This does nothing other than telling Python that the file is ready to be read. 

One of the problems we often encouter with market data is that the files are huge. So when we open a file with the pandas csv reader it loads it all into memory at once and that can seriously jam our computer. Apart from missing data, blank lines and so on, this is another case where we have to do it in a different way.

Once we have opened the file, we can now read a line from it.

In [22]:
fid.readline()

'1980-12-15,0.12,0.12,0.12,0.12,175880000.0,-0.07690000000000001\n'

You notice the \n at the end. This is a character that tells the file to move to the next line. The quotation marks indicate that the line is a so-called *string*, a data type which we will learn a lot about below.

Let's read another line:

In [23]:
fid.readline()

'1980-12-16,0.11,0.11,0.11,0.11,105730000.0,-0.0833\n'

If we want to read a whole lot of lines, we have to use a for loop. This is a command that repeats the same task a specified number of times:

In [24]:
for i in range(5):
    print(fid.readline())  

1980-12-17,0.12,0.12,0.12,0.12,86440000.0,0.0909

1980-12-18,0.12,0.12,0.12,0.12,73450000.0,0.0

1980-12-19,0.13,0.13,0.13,0.13,48630000.0,0.0833

1980-12-22,0.13,0.13,0.13,0.13,37360000.0,0.0

1980-12-23,0.14,0.14,0.14,0.14,46950000.0,0.07690000000000001



There are a few things happening here. First we have a for loop that runs through a specified number of tasks. But actually, it runs through a specified number if items and for each item it does the task specified underneath. In our case, we specify our items with range(5). Let's see what that does on its own. We call each item of range(5) **i** and print each **i** to see what's in there.

In [25]:
for i in range(5):
    print(i)

0
1
2
3
4


It's simply the numbers from 0 to 5. Everything in Python starts with a zero and this is the case for most programming languages. If you have previously used Matlab (where everything starts with 1), this might be a bit confusing at first.

You will also notice that we use a print() statement. Jupyter only prints values if they are not within another construct such as a for loop. In this case we just use *print()* to see what's there.

Finally, you can see that the print(i) statement is indented. This is to show Python that it is inside the loop. Other languages use brackets for that purpose.

Let run another example where we do not indent the print() statement. Note, that we always need to put something in the loop, otherwise we will get an error. In our case we use the *pass* command which essentially means "run the loop but do nothing".

In [26]:
for i in range(5):
    pass
print(i)

4


What we can see now is that it runs through all the **i**'s, does nothing and when it comes out of the loop it only prints the current one, which is 4.

## 2.3 Working with data files

In order to understand how to read a file, we need to understand a few basic data types, namely *string*, *int* and *float*. 

- A *string* is simply a collection of arbitrary characters such as 'R$ty61~' or 'AAPL'. It is usually written inside quotation marks and used for descriptive names and reading and writing files.
- An interger is any positive or negative whole number denoted by the data type int(). It is typcally used for counting and indexing. 
- A float is a floating point number and this is generally used for arithmetic.

Sometimes we have to be careful not to mix data types. For example, this is a float: **12.432** but this is a string: **'909.32'**. We cannot do arithmetic with the latter. Even though it looks like a float, the computer thinks it is a decriptive expression.

Let's read the next line from our file *output.csv':

In [27]:
line = fid.readline()
line

'1980-12-24,0.15,0.15,0.15,0.15,48000000.0,0.07139999999999999\n'

We can see that this line has quotation marks, so it's a string. However, we want to do some calculations with the numbers in that line. Next, we will see step-by-step how to do that. 

First, notice the \n at the end of the line. We get rid of that like this:

In [28]:
line.strip()

'1980-12-24,0.15,0.15,0.15,0.15,48000000.0,0.07139999999999999'

Next, we want to separate the numbers from each other, like this:

In [29]:
line.split(',')

['1980-12-24',
 '0.15',
 '0.15',
 '0.15',
 '0.15',
 '48000000.0',
 '0.07139999999999999\n']

We can see that the line is now split at the commas but we have our \n again for the last number. This is because we only printed the line with strip() but did not assign it to a new variable. For this we have to do:

In [30]:
new_line = line.strip()
new_line

'1980-12-24,0.15,0.15,0.15,0.15,48000000.0,0.07139999999999999'

In [31]:
new_line2 = new_line.split(',')
new_line2

['1980-12-24',
 '0.15',
 '0.15',
 '0.15',
 '0.15',
 '48000000.0',
 '0.07139999999999999']

As it is tedious to create new variables all the time, Python conveniently lets us string commands together:

In [32]:
new_line = line.strip().split(',')
new_line

['1980-12-24',
 '0.15',
 '0.15',
 '0.15',
 '0.15',
 '48000000.0',
 '0.07139999999999999']

You can now see the the newline character \n is gone. However, you can see that inside the brackest above, which we call a *list*, we still have a set of strings, and we cannot do any calculations with that. Let's say we want to calculate the daily price range of our Open,High,Low and Close values. Remember, we listed them previously with our Pandas dataframe:

In [34]:
list(myinput)

['Date', 'Close', 'Open', 'High', 'Low', 'Volume', 'Change']

From this we can see that for our calculation we want the second, third, forth and fifth value of new_line and convert them to *float* values with which we can do arithmetic. 

So, let's first extract the relevant numbers from the list with a technique called **indexing**. Indexes are numbers that indicate the positions of list elements. For example:

In [35]:
new_line[0]

'1980-12-24'

gives us the first element of the list. Remember that in Python we start counting from 0.

Likewise,

In [36]:
new_line[1]

'0.15'

gives us our *Open* price. We can see that this price is still in a string format but it is easy to convert is to a *float* with:

In [37]:
float(new_line[1])

0.15

Now we are able to calculate our daily price range:

In [38]:
hi = float(new_line[2])
lo = float(new_line[3])
(hi-lo)

0.0

So far we have only looked at one type of loops. But it is possible to practically loop over anything that has more than one element. Here some examples:

In [39]:
# looping through the list of columns
for i in list(myinput):
    print(i)

Date
Close
Open
High
Low
Volume
Change


In [40]:
# looping through the split line
for i in new_line:
    print(i)

1980-12-24
0.15
0.15
0.15
0.15
48000000.0
0.07139999999999999


Remember that we have a whole file and we might want to loop through that. So far, we've only looked at a limited number of loops. To go through every line in our *output.csv* file we would do the following:

In [42]:
fid = open('output.csv')
for line in fid:
    print(line)

Now we've looped through the entire file and of course, we could apply our little calculation to that. 