I will be showing an example of how to obtain Data about a stock for our analysis. In this example we will be using an API called Alpha Vantage and a python library called pandas in order to do this.

pandas is a library that provides fast and intuitive data structures for the use of data analysis.

Alpha Vantage queries can return either a JSON response or a CVS (Comma Seperate Values) response. For the purpose of reading the data using the PANDAS library from python we will be using the CVS format.

The instructions on how to format the queries can be found [here](https://www.alphavantage.co/documentation/).

An example of a query: https://www.alphavantage.co/query?function=TIME_SERIES_MONTHLY_ADJUSTED&symbol=MSFT&apikey=demo&datatype=csv

Following this link will download the example CSV file, though we have provided the file for you.

In [25]:
import pandas
from IPython.display import display # display is used to display the dataframe
stock_data = pandas.read_csv("example.csv") # parses the CSV file into a dataframe
# A csv file can also be read from a URL that provides a CSV file
# The index_col parameter sets which column of the data set is to be used as the index
stock_data = pandas.read_csv('https://www.alphavantage.co/query?function=TIME_SERIES_MONTHLY_ADJUSTED&symbol=MSFT&apikey=demo&datatype=csv',
                            index_col='timestamp')

The dataframe and series objects are the two main data structures in pandas.
#### DataFrame

The dataframe is tabular data structure that contains rows(entries) and columns(attributes).

**stock_data = pandas.read_csv("example.csv")** creates a dataframe from the example.csv file and assigns it to the stock_data variable.

The head() method prints the first k rows of a dataframe. The default amount is 5, though we will display the first 10 rows. 

In [18]:
print(display(stock_data.head(5))) # displays the first 10 rows in our dataframe
stock_data = stock_data.iloc[::-1] # reOrders the data frame so it starts with the last date
print(display(stock_data.head(5)))

Unnamed: 0_level_0,open,high,low,close,adjusted close,volume,dividend amount
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-02-14,94.79,96.07,83.83,89.75,89.75,424644290,0.0
2018-01-31,86.125,95.45,85.5,95.01,95.01,543377322,0.0
2017-12-29,83.6,87.4999,80.7,85.54,85.54,447828256,0.0
2017-11-30,83.68,85.06,82.24,84.17,84.17,416152260,0.42
2017-10-31,74.71,86.2,73.71,83.18,82.7611,440510118,0.0


None


Unnamed: 0_level_0,open,high,low,close,adjusted close,volume,dividend amount
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2000-02-29,98.5,110.0,88.12,89.37,29.7464,667243800,0.0
2000-03-31,89.62,115.0,88.94,106.25,35.3648,1014093800,0.0
2000-04-28,94.44,96.5,65.0,69.75,23.216,1129073300,0.0
2000-05-31,72.87,74.0,60.38,62.56,20.8228,672215400,0.0
2000-06-30,64.37,82.19,63.81,80.0,26.6276,733525100,0.0


None


The line, **from IPython.display import display**, imports a method that is used to format a dataframe when it is printed.

We will now explore some basic features of a dataframe.

In [19]:
print(stock_data.shape)
print(stock_data.shape[0]) # Prints only the number of rows
print(stock_data.shape[1]) # Prints on the number of columns

(217, 7)
217
7


The .shape attribute of dataframes list the number of rows and the number of columns.

In [20]:
print(stock_data.dtypes)

open               float64
high               float64
low                float64
close              float64
adjusted close     float64
volume               int64
dividend amount    float64
dtype: object


The .dtypes attribute of dataframes lists the name and data type of each column.

In [21]:
new_stock_data = stock_data.drop('volume', axis = 1) # Drops the specified column from a dataframe
print(display(new_stock_data.head()))

Unnamed: 0_level_0,open,high,low,close,adjusted close,dividend amount
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-02-29,98.5,110.0,88.12,89.37,29.7464,0.0
2000-03-31,89.62,115.0,88.94,106.25,35.3648,0.0
2000-04-28,94.44,96.5,65.0,69.75,23.216,0.0
2000-05-31,72.87,74.0,60.38,62.56,20.8228,0.0
2000-06-30,64.37,82.19,63.81,80.0,26.6276,0.0


None


Here we have created a new dataframe that is identical to stock_data except now the volume column has been removed. Please note that stock_data has not been modified, we merely created a copy. To remove a column from the same dataframe would look like this. **stock_data = stock_data.drop('volume', axis = 1)**

In [22]:
stock_subset = stock_data[[ 'open', 'close' ]] # Creates a subset dataframe from another dataframe
print(display(stock_subset.head(5)))

Unnamed: 0_level_0,open,close
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-02-29,98.5,89.37
2000-03-31,89.62,106.25
2000-04-28,94.44,69.75
2000-05-31,72.87,62.56
2000-06-30,64.37,80.0


None


Here we have created a subset of the dataframe. A subset can contain any number of columns from the original dataframe. Take note of how the column names are enclosed with double brackets.
# to do : continue from here with more functionality of dataframes. Change timestamp to date object

#### Series
A series is a one dimensional array that is used to that can be used to store a single column from a dataframe.

In [23]:
close_series = stock_data['close'] # Assigns the close column from the dataframe to the series variable, close_series
print(display(close_series.head(10))) # prints the first ten rows in the series

timestamp
2000-02-29     89.37
2000-03-31    106.25
2000-04-28     69.75
2000-05-31     62.56
2000-06-30     80.00
2000-07-31     69.81
2000-08-31     69.81
2000-09-29     60.31
2000-10-31     68.87
2000-11-30     57.38
Name: close, dtype: float64

None


**stock_data['close']** allows us to access a single column in the dataframe. If you wish to use a different column then simply change the string to the name of the column.