# Connecting Multiple Data Sources to SQL DB
#### 11/29/17 | Natalie | Sprint 2
#### Description
This project connects multiple data sources with DevLeague student information for DevLeague and pulls data into one SQL database.

## Skill Backlog User Story
As a the Director of Technology I need to identify errors and inconsistencies in the data so that I can develop solutions to address them, and possibly their source.

## Project Proposal
Retrieve all existing student data from multiple data sources and place them in one central database using a Python script and PostgreSQL. Data is currently stored in Hubspot, Asana, and a google drive.


## Key Questions
- How to access api data sources using python and postgreSQL.

## Key Findings
- How to write a python scripts and execute from the command line, since I'm only taking data in at this step, I don't need to create a full application. A script should work fine.

## Gameplan
Here is my overall approach 
1. Determine what python libraries I can use to pull in data from an api.
2. Determine how to execute command line scripts in python.
3. Create a python script to grab data from hubspots api.
3. Determine how to create a postgres db in python.
4. Create script to connect a postgres db in python.
5. Understand how to grab data in python data structures and store in sql db.

## Day 1 Work

Initially, I was going to create a python application, however I realized that since I'm only taking data in, I don't need a huge framework or application to do so. I can just create a script that will pull data from any data source and store it to a database. My first step will just be to work with one data source (Hubspot's API) and just try to bring in the data and parse it. 

Below is the first script I wrote using my own generated Hubspot api key.

In [29]:
import urllib.request, json

# Store api key value into variable
APIKEY_VALUE = "my api key"

# concat query string with api key
APIKEY = "?hapikey=" + APIKEY_VALUE

#user id
usrid = "&userID=0000000"

# hs api end point stored to a variable
HS_API_URL = "http://api.hubapi.com"

def get_total_number_of_contacts():
    # First, we build the correct url
    xurl = "/contacts/v1/lists/all/contacts/all"
    url = HS_API_URL + xurl + APIKEY + usrid
    # Now we use urllib to open the url and read it
    response = urllib.request.urlopen(url).read()
    # print(response) to see what it looks like
    statistics = json.loads(response)
    # Finally, return all contacts in dict
    return statistics["contacts"]

print (get_total_number_of_contacts())

b'{"contacts":[],"has-more":false,"vid-offset":0}'
[]


## Day 2 Work
My api script to Hubspot works, however it gives me back an empty dataset with zero records in it. Tried using user id, however I still get 0 results. I may just need to use the demo api key to prototype a solution. 

Once I used the demo key, I was able to pull in the data from hubspot. Below is the script I created that just pulls in all the data.

In [4]:
from pprint import pprint
import urllib.request, json

# Store api key value into variable
APIKEY_VALUE = "demo"

# concat query string with api key
APIKEY = "?hapikey=" + APIKEY_VALUE

# hs api end point stored to a variable
HS_API_URL = "http://api.hubapi.com"

def get_contacts():
    # builds the correct url
    xurl = "/contacts/v1/lists/all/contacts/all"
    url = HS_API_URL + xurl + APIKEY 
    # Now we use urllib to open the url and read it
    response = urllib.request.urlopen(url).read()
    #loads to json obj to all_contacts variable
    all_contacts = json.loads(response)
    #return the contact data
    return all_contacts

# calls the function stores to variable
contacts = get_contacts()

# pretty print just the first name of a contact
pprint(contacts)


{'contacts': [{'addedAt': 1456333855820,
               'canonical-vid': 860626,
               'form-submissions': [],
               'identity-profiles': [{'deleted-changed-timestamp': 0,
                                      'identities': [{'timestamp': 1456333855819,
                                                      'type': 'LEAD_GUID',
                                                      'value': 'c4324a6f-ef03-4250-a2a8-252536f4e443'},
                                                     {'is-primary': True,
                                                      'timestamp': 1502322131721,
                                                      'type': 'EMAIL',
                                                      'value': 'new-email1@hubspot.com'}],
                                      'saved-at-timestamp': 1511397633819,
                                      'vid': 860626}],
               'is-contact': True,
               'merge-audits': [],
               'merged-vids': [

## Day 3

Python doesn't provide much error checking out of the box. I spent a lot of time trying to access values in nested lists and dictionaries.

Below is the working script that I created that just gave me back a specific value in the json object from Hubspot

In [63]:
from pprint import pprint
import urllib.request, json
import datetime

# Store api key value into variable
APIKEY_VALUE = "demo"

# concat query string with api key
APIKEY = "?hapikey=" + APIKEY_VALUE

# hs api end point stored to a variable
HS_API_URL = "http://api.hubapi.com"

thin_contact_list = []

def get_contacts():
    # First, we build the correct url
    xurl = "/contacts/v1/lists/all/contacts/all"
    url = HS_API_URL + xurl + APIKEY 
    # Now we use urllib to open the url and read it
    response = urllib.request.urlopen(url).read()
    all_contacts = json.loads(response)
    #return the contact data
    return all_contacts

def process_contacts(contact_list):
    new_contact_list = []
    
    #create a loop through contacts dict and store values to new dict
    for i in range(len(contacts['contacts'])):

        #store values needed to variables
        first_name= contacts['contacts'][i]['properties']['firstname']['value']
        last_name= contacts['contacts'][i]['properties']['lastname']['value']
        
        email = ''
        for identity in contacts['contacts'][i]['identity-profiles'][0]['identities']:
            if identity['type'] == 'EMAIL':
                email = identity['value']
        
        created_on= contacts['contacts'][i]['addedAt']
        last_login= contacts['contacts'][i]['identity-profiles'][0]['saved-at-timestamp']

        #added null values to blanks in fields
        if(first_name == ''):
         first_name = 'null'

        if(last_name == ''):
         last_name = 'null'

        if(email == ''):
         email = 'null@null.com'

        #created contact dict to go into db
        contact = {"firstname": first_name,
                   "lastname": last_name,
                   "email": email,
                   "createdon": created_on,
                   "lastlogin": last_login
                  }

        new_contact_list.append(contact)
    
    return new_contact_list
        
# Start processing logic
if __name__== "__main__":
    #invoke function to get data from api
    contacts = get_contacts()

    #process list of contacts
    thin_contact_list = process_contacts(contacts)
    
    print(thin_contact_list)

[{'firstname': 'kolokithas', 'lastname': 'Record11', 'email': 'new-email1@hubspot.com', 'createdon': 1456333855820, 'lastlogin': 1511397633819}, {'firstname': 'John', 'lastname': 'cruz', 'email': 'juanignaciosl-ded-05-578@test-org.com', 'createdon': 1456333839974, 'lastlogin': 1511413619107}, {'firstname': 'Updated', 'lastname': 'Record', 'email': 'new-email99@hubspot.com', 'createdon': 1456333849586, 'lastlogin': 1512081116948}, {'firstname': 'null', 'lastname': 'null', 'email': 'juanignaciosl-ded-05-587@test-org.com', 'createdon': 1456333869192, 'lastlogin': 1511401626850}, {'firstname': 'null', 'lastname': 'null', 'email': 'juanignaciosl-ded-05-588@test-org.com', 'createdon': 1456333873752, 'lastlogin': 1511414279605}, {'firstname': 'null', 'lastname': 'null', 'email': 'juanignaciosl-ded-05-594@test-org.com', 'createdon': 1456333895045, 'lastlogin': 1511417369152}, {'firstname': 'null', 'lastname': 'null', 'email': 'juanignaciosl-ded-05-597@test-org.com', 'createdon': 1456333905477,

In [72]:
import psycopg2
import sys

print(thin_contact_list)

con = None

try:
     
    con = psycopg2.connect(database='hsbd', user='nat') 
    cur = con.cursor()
    cur.execute('SELECT version()')          
    ver = cur.fetchone()
    print (ver, "i can conncet")    
    
    for contact in thin_contact_list:
        cur.execute("INSERT INTO contacts(first_name, last_name, email) VALUES ('"+ contact['firstname'] + "','" + contact['lastname'] + "',' " + contact['email'] + "')")
        print('inserted')
        
    con.commit()

except psycopg2.DatabaseError as e:
    print ('Error %s' % e)    
    sys.exit(1)
    
    
finally:
    
    if con:
        con.close()

[{'firstname': 'kolokithas', 'lastname': 'Record11', 'email': 'new-email1@hubspot.com', 'createdon': 1456333855820, 'lastlogin': 1511397633819}, {'firstname': 'John', 'lastname': 'cruz', 'email': 'juanignaciosl-ded-05-578@test-org.com', 'createdon': 1456333839974, 'lastlogin': 1511413619107}, {'firstname': 'Updated', 'lastname': 'Record', 'email': 'new-email99@hubspot.com', 'createdon': 1456333849586, 'lastlogin': 1512081116948}, {'firstname': 'null', 'lastname': 'null', 'email': 'juanignaciosl-ded-05-587@test-org.com', 'createdon': 1456333869192, 'lastlogin': 1511401626850}, {'firstname': 'null', 'lastname': 'null', 'email': 'juanignaciosl-ded-05-588@test-org.com', 'createdon': 1456333873752, 'lastlogin': 1511414279605}, {'firstname': 'null', 'lastname': 'null', 'email': 'juanignaciosl-ded-05-594@test-org.com', 'createdon': 1456333895045, 'lastlogin': 1511417369152}, {'firstname': 'null', 'lastname': 'null', 'email': 'juanignaciosl-ded-05-597@test-org.com', 'createdon': 1456333905477,

## Peer Feedback on Day 5

After talking it over with a peer, I received the following feedback and decided to make these changes

## Here are some overall notes on the skills I learned
And perhaps some stream of consciousness notes about what I did, and other questions I might have