# Connecting to Commodity & Company Share Prices
#### November 30, 2017 | Jon Honda | Sprint 2
#### Description
####Learn about connecting to data sources (APIs, databases). Learn about developing data models to hold data in ways that are helpful to us.

## Skill Backlog User Story
####As a financial analyst I need to be able to retrieve and store web based market price data so that I can run analyses on the collected data.

## Project Proposal
####I will work with the Quandl API to retrieve daily closing metal commodity price and mining stock price data. I will build a SQLite database model to store the data. I will build several SQL SELECT commands to get different data reports from the database.

## Key Questions
####- Things I want to find out
####    1. What data type does Quandl API provide information in?
####    2. What is a pandas data type? what is a NUMPYS data type?
####    3. How do you create and work with SQLite databases in python?
####- Definitions I want to establish and clarify

## Key Findings
####- A running list of things that I'm learning and don't want to forget

## Gameplan
####Here is my overall approach 
####1. Step 1
####2. Step 2
####3. Step 3
####4. Step 4



In [1]:
## Simple Quandl call:
import quandl
quandl.ApiConfig.api_key= "L8zt1AhNiQi4gufsXy4g"
mydata = quandl.get("FRED/GDP")
print(mydata)

                Value
Date                 
1947-01-01    243.080
1947-04-01    246.267
1947-07-01    250.115
1947-10-01    260.309
1948-01-01    266.173
1948-04-01    272.897
1948-07-01    279.497
1948-10-01    280.656
1949-01-01    275.370
1949-04-01    271.692
1949-07-01    273.262
1949-10-01    270.984
1950-01-01    281.209
1950-04-01    290.735
1950-07-01    308.510
1950-10-01    320.320
1951-01-01    336.372
1951-04-01    344.455
1951-07-01    351.774
1951-10-01    356.579
1952-01-01    360.195
1952-04-01    361.414
1952-07-01    368.084
1952-10-01    381.241
1953-01-01    388.472
1953-04-01    392.259
1953-07-01    391.696
1953-10-01    386.521
1954-01-01    385.924
1954-04-01    386.716
...               ...
2010-04-01  14888.600
2010-07-01  15057.660
2010-10-01  15230.208
2011-01-01  15238.371
2011-04-01  15460.926
2011-07-01  15587.125
2011-10-01  15785.312
2012-01-01  15973.881
2012-04-01  16121.851
2012-07-01  16227.939
2012-10-01  16297.349
2013-01-01  16475.440
2013-04-01

In [2]:
## Working with Quandl data
#### Quandl's default is to provide data in the PANDAS data format. 
print(type(mydata))


<class 'pandas.core.frame.DataFrame'>


In [3]:

#### I'll explore pandas.
#### random 5x3 data table:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,3), index=list('abcde'), columns=list('xyz'))
print(df)
type(df)

          x         y         z
a -1.008736 -0.872613  1.045428
b  0.589808 -1.193875 -0.641696
c  1.025645  0.265708  1.058409
d  0.971619 -0.508075  0.170491
e -0.409208  2.297472 -0.927199


pandas.core.frame.DataFrame

In [4]:
#### YAY! I also learned that you can do arithmatic ops on the entire table at once:
print(df/df)

     x    y    z
a  1.0  1.0  1.0
b  1.0  1.0  1.0
c  1.0  1.0  1.0
d  1.0  1.0  1.0
e  1.0  1.0  1.0


In [5]:
#### wow! this has POTENTIAL! so much operation, so little code!
#### what if I want to access a value? say, (b,x)? 
#### according to site, several ways: https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python


print(df.iloc[1][0])
print(type(df.iloc[1][0]))
print(df.iloc[1]['x'])
print(type(df.iloc[1]['x']))

0.589808308107
<class 'numpy.float64'>
0.589808308107
<class 'numpy.float64'>


In [6]:
#### great. let's try getting some data

import quandl
quandl.ApiConfig.api_key= "L8zt1AhNiQi4gufsXy4g"
mydata = quandl.get("FRED/GDP")
print(mydata.iloc[0][0])

243.08


In [7]:
#### so, this gets the column data, but what about the date?
#### after some googlefoo, I discovered that the dates are being held in the dataframe's index.
#### question: how to access dataframe index value???
#### Justin helped me figure it out:
#### .index gets the entire index:
print(mydata.index)

#### .index[0] gets a specific index value:
print(mydata.index[0])

DatetimeIndex(['1947-01-01', '1947-04-01', '1947-07-01', '1947-10-01',
               '1948-01-01', '1948-04-01', '1948-07-01', '1948-10-01',
               '1949-01-01', '1949-04-01',
               ...
               '2015-04-01', '2015-07-01', '2015-10-01', '2016-01-01',
               '2016-04-01', '2016-07-01', '2016-10-01', '2017-01-01',
               '2017-04-01', '2017-07-01'],
              dtype='datetime64[ns]', name='Date', length=283, freq=None)
1947-01-01 00:00:00


In [8]:
#### so, it's a datetime value...what are SQLite data types for handling date time?
#### according to: https://www.sqlite.org/datatype3.html

#### there is no datetime data type. we can only use test, real, and integer.

In [9]:
#### BUT....Justin says that there is a more efficient way to insert data using a pandas sqlite library
#### use the write_frame method
#### https://stackoverflow.com/questions/14431646/how-to-write-pandas-dataframe-to-sqlite-with-index

##### ALSO NOTE!!!! we can change the index into a column!!!! the site above uses this method.
####  this uses the .rese_index() method of pandas object.  in our case:

Data2Col = mydata.reset_index()
Data2Col

Unnamed: 0,Date,Value
0,1947-01-01,243.080
1,1947-04-01,246.267
2,1947-07-01,250.115
3,1947-10-01,260.309
4,1948-01-01,266.173
5,1948-04-01,272.897
6,1948-07-01,279.497
7,1948-10-01,280.656
8,1949-01-01,275.370
9,1949-04-01,271.692


In [10]:
#### continuing on...:
#### https://stackoverflow.com/questions/14431646/how-to-write-pandas-dataframe-to-sqlite-with-index

import sqlite3
from pandas.io import sql

import os

db = sqlite3.connect(':memory:') #### allocate memory
os.remove('_jonhonda_dat\mydb2') #### remove existing db if it exists
db = sqlite3.connect('_jonhonda_dat\mydb2')  #### make a database
#### build table:
cursor = db.cursor()
cursor.execute ('''
CREATE TABLE StockPrices(id INTEGER PRIMARY KEY, Date TEXT, Value REAL)
''')
db.commit()
db.close()
Date2Col = mydata.reset_index() #### write date index to column
sql.write_frame(Date2Col,name='StockPrices')



AttributeError: module 'pandas.io.sql' has no attribute 'write_frame'

In [None]:
#### ARG!!!!! reading the next lines of comments in stackoverflow indicates write_Frame is discontinued....
#### it says to use sqlalchemy: 

#### from sqlalchemy import create_engine

#### disk_engine = create_engine('sqlite:///my_lite_store.db')
#### price.to_sql('stock_price', disk_engine, if_exists='append')

#### BUT I DONT KNOW HOW TO USE THIS, AND AM RUNNING SHORT OF TIME.
#### OKAY - JUST USE AN INSERT INTO LOOP

In [None]:
#### need to cast datetime to text:
#### https://stackoverflow.com/questions/10624937/convert-datetime-object-to-a-string-of-date-only-in-python
import datetime
t = datetime.datetime(2012, 2, 23, 0, 0)
t.strftime('%m/%d/%Y')

In [None]:
a=mydata.index[0]
a.strftime('%m/%d/%Y')
print(a)

In [None]:
#### need to cast NUMPY 64 float to SQLITE3 real:
#### https://stackoverflow.com/questions/35106631/compatibility-between-numpy-and-sql-data-types

# SQLite supports a limited set of data types. To use numpy Python data types you can use the register_adapter function that converts the type to an SQLite supported representation.
# import numpy as np
# import sqlite3

# sqlite3.register_adapter(np.float64, float)
# sqlite3.register_adapter(np.float32, float)
# sqlite3.register_adapter(np.int64, int)
# sqlite3.register_adapter(np.int32, int)  

In [22]:
import sqlite3
from pandas.io import sql

import datetime
import os

db = sqlite3.connect(':memory:') #### allocate memory
os.remove('_jonhonda_dat\mydb2') #### remove existing db if it exists
db = sqlite3.connect('_jonhonda_dat\mydb2')  #### make a database
 #### build table:
cursor = db.cursor()
cursor.execute ('''CREATE TABLE StockPrices(id INTEGER PRIMARY KEY, Date TEXT, StockValue FLOAT)''')
db.commit()
db.close()

#### insert data
db = sqlite3.connect(':memory:') #### allocate memory
db = sqlite3.connect('_jonhonda_dat\mydb2')  #### make a database
#### build table:
cursor = db.cursor()
aFloat = sqlite3.register_adapter(mydata.iloc[0][0],float)
print(type(mydata.iloc[0][0]))
cursor.execute ('''INSERT INTO StockPrices(Date, StockValue) VALUES(?,?)''',(mydata.index[0].strftime('%m %d %Y'),mydata.iloc[0][0]))
db.commit()
db.close()

db = sqlite3.connect(':memory:') #### allocate memory
db = sqlite3.connect('_jonhonda_dat\mydb2')  #### make a database
cursor = db.cursor()
cursor.execute('''SELECT Date, StockValue FROM StockPrices''')
Output=cursor.fetchone()
print(Output[0])
db.close()


<class 'numpy.float64'>
01 01 1947
