<h1 align="center">6.3 Dealing with Web APIs and Databases

<b>Web Based APIs

Many websites have public APIs providing data feeds via JSON or some other format.There  are  a  number  of  ways  to  access  these  APIs  from  Python;  one  easy-to-usemethod that I recommend is the requests package

In [1]:
import requests
import pandas as pd
url = 'https://api.github.com/repos/pandas-dev/pandas/issues'
resp = requests.get(url)
resp

<Response [200]>

In [2]:
data = resp.json()

In [3]:
data[0]['title']

'BUG: Pandas changes dtypes of columns when no float (or other) assignments are done to this column #34573'

In [4]:
issues = pd.DataFrame(data, columns=['number', 'title','labels', 'state'])

In [5]:
issues.head()

Unnamed: 0,number,title,labels,state
0,34599,BUG: Pandas changes dtypes of columns when no ...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
1,34598,BUG: Groupby: Int-Datatype always casted to int64,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
2,34597,BUG: Reindex doesn't add NaN when using level=1,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
3,34596,BUG: DataFrame.attrs are lost when writing to ...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
4,34595,BUG: fix Series.where(cond) when cond is empty,"[{'id': 57296398, 'node_id': 'MDU6TGFiZWw1NzI5...",open


<b>Databases

In  a  business  setting,  most  data  may  not  be  stored  in  text  or  Excel  files. 

SQL-based relational  databases  (such  as  SQL  Server,  PostgreSQL,  and  MySQL)  are  in  wide  use

In [6]:
import sqlite3

In [7]:
query1 = """
            CREATE TABLE test1 
            (a VARCHAR(20),
            b VARCHAR(20),c REAL,
            d INTEGER);
        """

In [8]:
con = sqlite3.connect('mydata.sqlite')
con.execute(query1)

<sqlite3.Cursor at 0x1db61546ea0>

In [9]:
con.commit()

In [10]:
data = [('Atlanta', 'Georgia', 1.25, 6),('Tallahassee', 'Florida', 2.6, 3),('Sacramento', 'California', 1.7, 5)]

In [11]:
stmt = "INSERT INTO test VALUES(?, ?, ?,?)"

In [12]:
con.executemany(stmt, data)

<sqlite3.Cursor at 0x1db61588030>

Most Python SQL drivers (PyODBC, psycopg2, MySQLdb, pymssql, etc.) return a listof tuples when selecting data from a table

In [13]:
cursor = con.execute('select * from test')
rows = cursor.fetchall()

In [14]:
rows

[('Atlanta', 'Georgia', 1.25, 6),
 ('Tallahassee', 'Florida', 2.6, 3),
 ('Sacramento', 'California', 1.7, 5)]

In [15]:
cursor.description

(('a', None, None, None, None, None, None),
 ('b', None, None, None, None, None, None),
 ('c', None, None, None, None, None, None),
 ('d', None, None, None, None, None, None))

In [16]:
pd.DataFrame(rows, columns=[x[0] for x in cursor.description])

Unnamed: 0,a,b,c,d
0,Atlanta,Georgia,1.25,6
1,Tallahassee,Florida,2.6,3
2,Sacramento,California,1.7,5


The  SQLAlchemy  project  is  a  popular  Python  SQL  toolkit  that  abstracts away   many   of   the   common   differences  between   SQL   databases.   pandas   has   aread_sql  function  that  enables  you  to  read  data  easily  from  a  general  SQLAlchemy connection. 

In [17]:
import sqlalchemy as sqla

In [18]:
db = sqla.create_engine('sqlite:///mydata.sqlite')

In [19]:
pd.read_sql('select * from test', db)

Unnamed: 0,a,b,c,d
