## Extract available JSON data
Trying to understand what kinds of data I have access to from the API for CSCW 2018 revisions. Specifically, looking to know if I can get at people's tenure on the website as well as the average job length. 5/30/2018

### Conclusion
I don't have access to the date when users joined Upwork; however, I do have access to their earliest "from" date for their earliest job assignment. I also have access to the rate at which workers were paid for each of their assignments, the duration of the assignments, and number of hours worked for each assingment. With this in mind, here's what I plan to do:

(DONE) 
1. Extract earliest "from" date for a fixed and/or an hourly assignment --> "Upwork_tenure" (add as a control variable in causal model and in regular linear regression - this time using revenue as the outcome variable) 
** Just realized I don't actually have the from date, I just have the latest "to" date (for some)
+ Add capability to take into account from dates

2. Create a measure of job duration for hourly assignments. Measure duration of an assignment --> "assignment_duration" by taking the "to" and "from" dates into account. Take an average of the number of hours worked per day by dividing hours worked for each assignment and "assignment_duration" --> "average_daily_hours_worked". Average this out for all the hourly assignments a worker has to compute "average_daily_hours_worked". May also just want to compute the total number of days worked on hourly jobs, calculate the average --> "average_days_of_hourly_jobs". Compare descriptive statistics across job categories, possibly see if there are any patterns between these measures and the job category's average pay, female representation, etc, or if there are any gender differences between these as well. Also, may want to add this as a control variable in a causal analysis/linear regression, where the outcome variable is estimated hourly revenue. 
** This isn't possible, given that there isn't a "from" date for some random assignments 
** A fix might be to use the job assignments I collected back in February, and see what was the average workload? Or simply address this in the limitations section of the paper 

(IN PROGRESS)

3. BONUS: I want to compare the actual average rate for all a user's hourly assignments to the listed hourly bill rate. Calculate average rate for hourly assignments --> "actual_hourly_rate", then see if there is still a gender pay gap in terms of hourly rate. Remember to include "Upwork_tenure" as a control variable. 

### June 1, 2018
Realized I can't get the from date for assignments, so I'm using earliest skill test date as a proxy for Upwork tenure. I also still want to use to date for assignments as a proxy. This might take me another hour to do this. 

In [6]:
import httplib2
import oauth2
import urllib3
import types
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from gender_detector import GenderDetector 
import psycopg2, psycopg2.extras
from causalinference import CausalModel
from causalinference.utils import random_data
import httplib
import base64
import json # For Microsoft Face API
import urllib as urllib # For Microsoft Face API
import time 
import csv
import datetime 


class UpworkDataExtractor:
    
    def __init__(self):
        # Connect to the database 
        self.conn = psycopg2.connect("dbname=eureka01")
        self.cur = self.conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
        psycopg2.extensions.register_adapter(dict, psycopg2.extras.Json)
        
        # Get detailed_info from workers in our database
        self.cur.execute("SELECT detailed_info FROM upwork_unitedstates_allskills_2017_12_12 LIMIT 10;")
        
    def print_detailed_info(self):
        
        for user in self.cur:
            print json.dumps(user, indent=2, sort_keys=False)

myObject = UpworkDataExtractor()
myObject.print_detailed_info()


[
  {
    "dev_adj_score_recent": "5", 
    "dev_ui_profile_access": "Public", 
    "dev_portrait": "https://odesk-prod-portraits.s3.amazonaws.com/Users:marianaelara:PortraitUrl?AWSAccessKeyId=1XVAX3FNQZAFC9GJCFR2&Expires=2147483647&Signature=of%2BHLUkWfSMAba0BC2EHVizCV7U%3D&1502292249360166", 
    "dev_country": "United States", 
    "dev_first_name": "Mariana", 
    "dev_groups": "", 
    "dev_city": "Hartford", 
    "dev_blurb": "I am a Spanish speaker from Argentina. I studied at Universidad Nacional de Cuyo in Mendoza. In my home country I earned a degree as a Teacher of English as a Second Language. I moved to the United States in 2002 and studied at Saint Michael's College where I earned my MA in Teaching English to Speakers of Other Languages. \nWhile in the United States I became a Spanish teacher and taught middle and high school students. Besides teaching, I translated school documents for parents with limited English proficiency. As an animal lover I also volunteer to trans