**Objectives of this Webinar:**

You will be able to query SHARE for records that are known to be tied to your institution, and might be tied to your institution.  You will then be able to export them to a csv file for review.

Let's start in the Search Interface: http://share.osf.io

**Replicating this process in code**

First we will set some values for Jupyter to use.

Why is this useful?

Remember Drew Barrymore in '50 First Dates'?

http://www.imdb.com/title/tt0343660/

**Special Tip:**  Put your cursor in the Cell (i.e., box) below and hit Shift+Enter on your keyboard to run it 

In [None]:
SHARE_API = 'https://staging-share.osf.io/api/search/abstractcreativework/_search'

This is now in memory!  However, beware.  Just like Drew Barrymore if you close this notebook and reopen it, jupyter has completely forgotten about it, and you will need to rerun it for Jupyter to know about it.

**Special Tip:** Why all CAPS? It is convention (i.e., common practice) to use all capital letters when defining something that does not change?

Next, let's move onto a simple request.

In [None]:
import furl
import requests

search_url = furl.furl(SHARE_API)
search_url.args['size'] = 3
recent_results = requests.get(search_url.url).json()

recent_results = recent_results['hits']['hits']

print('The request URL is {}'.format(search_url.url))
print('----------')
for result in recent_results:
    print(
        '{} -- from {}'.format(
            result['_source']['title'],
            result['_source']['sources']
        )
    )

Let's add a search string

**Special Tip:** Sometimes you can learn a lot from a url and let's look at it after we search using the search box on SHARE (http://share.osf.io)
    
https://share.osf.io/discover?q=university%20of%20oregon

In [None]:
search_url.args['q'] = 'university of oregon'
recent_results = requests.get(search_url.url).json()

recent_results = recent_results['hits']['hits']

print('The request URL is {}'.format(search_url.url))
print('---------')
for result in recent_results:
    print(
        '{} -- from {}'.format(
            result['_source']['title'],
            result['_source']['sources']
        )
    )

**Special Tip:**  To print text and variables together use the format function as below with '{}' anywhere you want to insert a value and then include them in order to the format method.

**Looking for exact matches use our facet filter**

Let's go back to SHARE, apply a facet and see what it is doing

http://share.osf.io
    

In [None]:
affiliation_query = {
    "query": {
        "bool": {
            "must": {
                "query_string": {
                    "query": "*"
                }
            },
            "filter": [
                {
                    "term": {
                        "institutions.raw": "University of Oregon Libraries"
                    }
                }
            ]
        }
    }
}

Let's input a method into memory so we can use it later.  Do not worry too much about the specifics yet.

**Special Tip:** Just like variables you can define methods that do a job and you can then use them later without rewriting what it does

In [None]:
import json
import requests

def query_share(url, query):
    # A helper function that will use the requests library,
    # pass along the correct headers,
    # and make the query we want
    headers = {'Content-Type': 'application/json'}
    data = json.dumps(query)
    return requests.post(url, headers=headers, data=data, verify=False).json()

**Special Tip:** Use '#' as first character in a line if you want to put in a comment

In [None]:
affiliation_results = query_share(search_url.url, affiliation_query)

total_results = requests.get(search_url.url).json()['hits']['total']
print('total results found: {}'.format(total_results))

Where are my results?  Things can differently from staging to production environments.

Let's create a new variable and change our search url

In [None]:
PROD_SHARE_API= 'https://share.osf.io/api/search/abstractcreativework/_search'

search_url = furl.furl(PROD_SHARE_API)
search_url.args['size'] = 3

In [None]:
affiliation_results = query_share(search_url.url, affiliation_query)

total_results = requests.get(search_url.url).json()['hits']['total']
print('total results found: {}'.format(total_results))

In [None]:
#with_tags = query_share(search_url.url, tags_query)
#missing_tags = query_share(search_url.url, missing_tags_query)
affiliation_results = query_share(search_url.url, affiliation_query)
print(affiliation_results)
print('------------')
print('------------')
print(affiliation_results['hits'])
print('------------')
print('------------')
print(affiliation_results['hits']['hits'])
print('------------')
print('------------')
#print(affiliation_results['hits']['hits'].keys())
row1 = affiliation_results['hits']['hits'].pop()
print(row1.keys())
print(affiliation_results['hits']['hits'])
# it is not a flat list, so need to look at hierarchy to see how best to work with it
# at more advanced stage interact with data in a way that keeps the hierarchical structure

#error handling
#if hits total > 0

total_results = requests.get(search_url.url).json()['hits']['total']

#with_tags_percent = (float(with_tags['hits']['total'])/total_results)*100
#missing_tags_percent = (float(missing_tags['hits']['total'])/total_results)*100
affiliation_results_percent = (float(affiliation_results['hits']['total'])/total_results)*100


#print(
#    '{} results out of {}, or {}%, have tags.'.format(
#        with_tags['hits']['total'],
#        total_results,
#        format(with_tags_percent, '.2f')
#    )
#)

#print(
#    '{} results out of {}, or {}%, do NOT have tags.'.format(
#        missing_tags['hits']['total'],
#        total_results,
#        format(missing_tags_percent, '.2f')
#    )
#)

print(
    '{} results out of {}, or {}%, do NOT have tags.'.format(
        affiliation_results['hits']['total'],
        total_results,
        format(affiliation_results_percent, '.2f')
    )
)

print('------------')


In [None]:
print(affiliation_results['hits']['hits'])

In [None]:
affiliation_results = query_share(search_url.url, affiliation_query)
print(affiliation_results)

In [None]:
affiliation_results = query_share(search_url.url, affiliation_query)
print(affiliation_results['hits'])

In [None]:
affiliation_results = query_share(search_url.url, affiliation_query)
print(affiliation_results['hits']['hits'])

In [None]:
affiliation_results = query_share(search_url.url, affiliation_query)
row1 = affiliation_results['hits']['hits'].pop()
print(row1.keys())
print('total results found: {}'.format(affiliation_results['hits']['total']))


In [None]:
affiliation_results = query_share(search_url.url, affiliation_query)
row1 = affiliation_results['hits']['hits'].pop()
print(affiliation_results['hits']['hits'])
print(row1.keys())
print(row1['_source'].keys())
print('total results found: {}'.format(affiliation_results['hits']['total']))

In [None]:
affiliation_results = query_share(search_url.url, affiliation_query)
row1 = affiliation_results['hits']['hits'].pop()
print(affiliation_results['hits']['hits'])
print(row1.keys())
print(row1['_source'].keys())
print('total results found: {}'.format(affiliation_results['hits']['total']))

In [None]:
#get the keys
for key in row1.keys():
    print(key)

In [None]:
#get the values
for key in row1.keys():
    print(row1[key])

In [None]:
#put keys and values together
for key in row1.keys():
    print('{}:{}'.format(key,row1[key]))

In [None]:
for key in row1['_source']:
    print('{}:{}\n'.format(key,row1['_source']))

In [None]:
for key in row1['_source']:
    print('{}:{}\n'.format(key,row1['_source'][key]))

In [None]:
for key in row1['_source']:
    print('{}:{}\n'.format(key,row1['_source'][key]))

In [None]:
#Put id first

In [None]:
import os

#os.getcwd()
#os.chdir('..')
#os.getcwd()
#os.chdir('share_tutorials')
#os.getcwd()

**Copy, Paste, and Adapt...**

Let's try writing to a simple csv file as an example (example pulled from https://docs.python.org/3/library/csv.html)

In [None]:
import csv

with open('names.csv', 'w') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})

Let's adapt this now with our data from SHARE...

In [None]:
import csv

affiliation_results = query_share(search_url.url, affiliation_query)
records = affiliation_results['hits']['hits']

#set our filenames
SHARE_MATCHING_INSTITUTION_RECORDS = 'share_matching.csv'

with open(SHARE_MATCHING_INSTITUTION_RECORDS, 'w') as csvfile:
    #instead of pop us a for loop
    i = 0
    for record in records:
        if i == 0:
            fieldnames = record['_source'].keys()
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
            i = i + 1
        row = {}
        for key in record['_source']:
            row.update({key:record['_source'][key]})
        writer.writerow(row)    
        #for key in row['_source']:
            
        #    print('{}:{}\n'.format(key,row1['_source'][key]))

Special Tip: extrasaction value of 'ignore' will not throw an error if there are keys in this method such as a nested dictionary

In [None]:
import csv

affiliation_results = query_share(search_url.url, affiliation_query)
records = affiliation_results['hits']['hits']

#set our filenames
SHARE_MATCHING_INSTITUTION_RECORDS = 'share_matching.csv'

with open(SHARE_MATCHING_INSTITUTION_RECORDS, 'w') as csvfile:
    #instead of pop us a for loop
    i = 0
    for row in records:
        if i == 0:
            fieldnames = row['_source'].keys()
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames, extrasaction='ignore')
            writer.writeheader()
            i = i + 1
        writer.writerow(row['_source'])    

Finally to pull everything together (remember Drew Barrymore that this would otherwise not work if you started here)