## Install virtual environment kernel

1. In the terminal, activate your virtual environment

```
$ workon my-virtualenv-name  
```

2. Now run the kernel "self-install" script:

```
$ python -m ipykernel install --user --name=my-virtualenv-name # or --name=foods
```

3. You should now be able to see your kernel in the IPython notebook menu:  ```Kernel -> Change kernel```






## Import needed libraries

In [None]:
!!pip freeze # shows you all the packages installed in your local environment

In [1]:
import json
import requests
import time

## Assign variables to Variables 

These will be used later in the code. In the future, we may want to import these variables from a .yaml or .json file. Configurations should be imported or extracted from a parsable, human-friendly config file. When setting up complex systems, it's nice to have configuration files thoughtfully organized.

First let's define some variables that will be helpful.  These values are stored in `food-app-database/instance/config.py`

*NDBNO_TOTAL* is the last recorded number of items in the USDA database.  

*LAST_SR* is the version number of the USDA Database

*API_KEY* is the identifier acquired through USDA api services

*q* is the search term (any string) for the search API.  We'll most likely leave this blank.

*ds* is the datasource.  Must be 'Branded Food Products', 'Standard Reference', or ''.  We'll leave it blank to include both.

*fg* is the Food group ID.  We'll also leave this blank.

*sort* the results by food name (n) or by search relevance (r).  We'll sort by food name (n) for standardization's sake.

*mx* refers to the maximum number of items to return.  This seems to max out at 1500.

*offset* determines the index of the beginning row in the results set to begin.  

*formt* can either be JSON ('json') or XML ('xml).  We'll stick with JSON.

In [2]:
LAST_NDBNO_TOTAL = 200000
LAST_SR = 28
current_ndbo_total = 0
current_sr = 0
API_KEY = '7WqOHQdC2shEfBrx25bIEwxBkvUkYTHMoHYlLWL8' #1000 requests/hour
q = ''
ds = ''
fg = ''
sort = 'n'
mx = 1 # max is 1500 
offset = 0
formt = 'json' 
q = '' 
typ = 'f'
ndbno_id = ''

In [3]:
# special api request to get meta information on database (total number of items, standard reference version)
initial_search_request = 'https://api.nal.usda.gov/ndb/search/?format=json&q=&sort=n&max=1&offset=0&api_key=7WqOHQdC2shEfBrx25bIEwxBkvUkYTHMoHYlLWL8'

## Let's define some useful functions 

In [4]:
def get_db_status(initial_search_request):
    '''
    This function returns the number of items in database, as well as the database version
    
    initial_search_request = 'https://api.nal.usda.gov/ndb/search/?format=json&q=&sort=n&max=1&offset=0&api_key=DEMO_KEY'
    current_total = total number of items in usda database at the time of request
    current_sr = Standard Release version of the data at the time of request
    
    Returns {'current_total': current_total, 'current_sr': current_sr}
    '''
    usda_database_check = requests.get(initial_search_request)
    check_json = usda_database_check.json()
    current_total = check_json['list']['total']
    current_sr = check_json['list']['sr']
    return {'current_total': current_total, 'current_sr': current_sr}

def get_ndbno_list(search_api_request_url):
    '''This returns a list of ndbno's in the usda foods database
    '''
    search_object = requests.get(search_api_request_url) 
    search_json = search_object.json() # convert search_object to JSON

    ndbno_list_dict = search_json['list']['item'] # ndbno_list_dict is a list of dictionaries, where each dictionary is a unique food item
    
    return ndbno_list_dict

def get_ndbno_full_report(report_api_request_url):
    '''Returns JSON Full Report 
    '''
    full_report = requests.get(report_api_request_url)
    full_report_json = full_report.json()
    return full_report_json

def get_search_api_request_url(formt, q, sort, mx, offset, API_KEY):
    """Returns URL for search API
    """
    return 'https://api.nal.usda.gov/ndb/search/?format={}&q={}&sort={}&max={}&offset={}&api_key={}'.format(formt, q, sort, mx, offset, API_KEY)

def get_report_api_request_url(ndbno_id, typ, formt, API_KEY):
    """Returns URL for search API
    """
    return 'https://api.nal.usda.gov/ndb/reports/?ndbno={}&type={}&format={}&api_key={}'.format(ndbno_id, typ, formt, API_KEY)

## Determine the number of items in the USDA database 

We will be using this marker as well as the lastupdated date to initiate a scan for new data.  running the webiste through internetarchives will do the trick.

In [5]:
db_status = get_db_status(initial_search_request)
current_ndbno_total = db_status['current_total']
current_sr = db_status['current_sr']
print("Current Number of Items in database: ", current_ndbno_total)
print("Current Standard Reference Database Version: ", current_sr)

Current Number of Items in database:  214072
Current Standard Reference Database Version:  28


## Grab Metadata for items in the USDA database

In [6]:
# total amount that we can search 
print("The maximum number of items we can pull from a request is: ", mx)

# determine how many number of API request for searches will need to be done
import math
# search_num = math.ceil(current_ndbno_total/mx) # to round up!
search_iterations = 1 # for testing

ndbno_list = []

for index, request_set in enumerate(range(0, search_iterations)):
    # determine offset
    if index > 0:
        offset += mx
    # print("offset value is: ", offset)
    
    search_api_request_url = get_search_api_request_url(formt, q, sort, mx, offset, API_KEY)
    print(search_api_request_url)
    
    temp_ndbno_list = get_ndbno_list(search_api_request_url)
    
    for item in temp_ndbno_list:
        # print(item)
        ndbno_list.append(item)

# print results
for item in ndbno_list:
    print(item)

The maximum number of items we can pull from a request is:  1
https://api.nal.usda.gov/ndb/search/?format=json&q=&sort=n&max=1&offset=0&api_key=7WqOHQdC2shEfBrx25bIEwxBkvUkYTHMoHYlLWL8
{'group': 'Branded Food Products Database', 'name': 'AARDVARK HABENERO HOT SAUCE, UPC: 853393000030', 'offset': 0, 'ds': 'BL', 'ndbno': '45078606'}


## Let's define one of our object models

We need to first define the Flask app and database configurations:

In [23]:
from flask import Flask

app = Flask(__name__);

POSTGRES = {
    'user': 'ifrancium',
    'pw': 'password',
    'db': 'usda',
    'host': 'localhost',
    'port': '5432',
}

app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://%(user)s:\
%(pw)s@%(host)s:%(port)s/%(db)s' % POSTGRES

app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False # SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead 

In [24]:
print(app.config['SQLALCHEMY_DATABASE_URI']) # check database connection

postgresql://ifrancium:password@localhost:5432/usda


We'll be using SQLAlchemy to import our base model:

In [25]:
from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy(app) # connect to database

Our first table in the database will be ```food_description```:

In [26]:
class food_desc(db.Model):
    __tablename__ = 'food_description'

    food_id = db.Column(db.Integer, primary_key=True)
    db_id = db.Column(db.String())
    ndbno_id = db.Column(db.String())
    name = db.Column(db.String())
    short_desc = db.Column(db.String())
    food_cat = db.Column(db.String())
    updated = db.Column(db.String())
    upc_code = db. Column(db.String(12))

    def __init__(self, food_id, db_id, ndbno_id, name, short_desc, food_cat, updated, upc_code):
        self.food_id = food_id
        self.db_id = db_id
        self.ndbno_id = ndbno_id
        self.name = name
        self.short_desc = short_desc
        self.food_cat = food_cat
        self.updated = updated
        self.upc_code = upc_code

    def __repr__(self):
        return '<id {}>'.format(self.ndbno_id)

Our second table in the database will be   ```food_units```:

In [27]:
class food_units(db.Model):
    __tablename__ = 'food_units'

    food_id = db.Column(db.Integer, primary_key=True)
    ndbno_id = db.Column(db.String())
    unit_desc = db.Column(db.String())
    grams_per_unit = db.Column(db.Float)

    def __init__(self, food_id, ndbno_id, unit_desc, grams_per_unit):
        self.food_id = food_id
        self.ndbno_id = ndbno_id
        self.unit_desc = unit_desc
        self.grams_per_unit = grams_per_unit

    def __repr__(self):
        return '<id {}>'.format(self.food_id)

Our third table in the database will be   ```nutrients_per_100_grams```:

In [29]:
class nut_per_100_g(db.Model):
    __tablename__ = 'nutients_per_100_grams'

    food_id = db.Column(db.Integer, primary_key=True)
    kcal = db.Column(db.Float)
    protein_g = db.Column(db.Float)
    total_fat_g = db.Column(db.Float)
    total_carb_g = db.Column(db.Float)
    total_diet_fiber_g = db.Column(db.Float)
    calcium_mg = db.Column(db.Float)
    iron_mg = db.Column(db.Float)
    magnesium_mg = db.Column(db.Float)
    phosphorus_mg = db.Column(db.Float)
    potassium_mg = db.Column(db.Float)
    sodium_mg = db.Column(db.Float)
    zinc_mg = db.Column(db.Float)
    copper_mg = db.Column(db.Float)
    manganese_mg = db.Column(db.Float)
    selenium_mcg = db.Column(db.Float)
    vitamin_c_mg = db.Column(db.Float)
    thiamin_mg = db.Column(db.Float)
    riboflavin_mg = db.Column(db.Float)
    niacin_mg = db.Column(db.Float)
    pantothenic_acid_mg = db.Column(db.Float)
    vitamin_b6_mg = db.Column(db.Float)
    total_folate_mcg = db.Column(db.Float)
    vitamin_b12_mcg = db.Column(db.Float)
    vitamin_d_mcg = db.Column(db.Float)
    vitamin_e_mg = db.Column(db.Float)
    vitamin_k_mcg = db.Column(db.Float)
    total_sat_fat_g = db.Column(db.Float)
    total_monounsat_fat_g = db.Column(db.Float)
    total_poly_unsat_fat_g = db.Column(db.Float)
    total_trans_fat_g = db.Column(db.Float)
    cholesterol_mg = db.Column(db.Float)
    total_sugar_g = db.Column(db.Float)
    omega_3_fatty_acids_g = db.Column(db.Float)

    def __init__(self, food_id, kcal, protein_g, total_fat_g, total_carb_g,
                 total_diet_fiber_g, calcium_mg, iron_mg, magnesium_mg,
                 phosphorus_mg, potassium_mg, sodium_mg, zinc_mg, copper_mg,
                 manganese_mg, selenium_mcg, vitamin_c_mg, thiamin_mg,
                 riboflavin_mg,
                 niacin_mg, pantothenic_acid_mg, vitamin_b6_mg,
                 total_folate_mcg,
                 vitamin_b12_mcg, vitamin_d_mcg, vitamin_e_mg, vitamin_k_mcg,
                 total_sat_fat_g, total_monounsat_fat_g,
                 total_poly_unsat_fat_g,
                 total_trans_fat_g, cholesterol_mg, total_sugar_g,
                 omega_3_fatty_acids_g):
        self.food_id = food_id
        self.kcal = kcal
        self.protein_g = protein_g
        self.total_fat_g = total_fat_g
        self.total_carb_g = total_carb_g
        self.total_diet_fiber_g = total_diet_fiber_g
        self.calcium_mg = calcium_mg
        self.iron_mg = iron_mg
        self.magnesium_mg = magnesium_mg
        self.phosphorus_mg = phosphorus_mg
        self.potassium_mg = potassium_mg
        self.sodium_mg = sodium_mg
        self.zinc_mg = zinc_mg
        self.copper_mg = copper_mg
        self.manganese_mg = manganese_mg
        self.selenium_mcg = selenium_mcg
        self.vitamin_c_mg = vitamin_c_mg
        self.thiamin_mg = thiamin_mg
        self.riboflavin_mg = riboflavin_mg
        self.niacin_mg = niacin_mg
        self.pantothenic_acid_mg = pantothenic_acid_mg
        self.vitamin_b6_mg = vitamin_b6_mg
        self.total_folate_mcg = total_folate_mcg
        self.vitamin_b12_mcg = vitamin_b12_mcg
        self.vitamin_d_mcg = vitamin_d_mcg
        self.vitamin_e_mg = vitamin_e_mg
        self.vitamin_k_mcg = vitamin_k_mcg
        self.total_sat_fat_g = total_sat_fat_g
        self.total_monounsat_fat_g = total_monounsat_fat_g
        self.total_poly_unsat_fat_g = total_poly_unsat_fat_g
        self.total_trans_fat_g = total_trans_fat_g
        self.cholesterol_mg = cholesterol_mg
        self.total_sugar_g = total_sugar_g
        self.omega_3_fatty_acids_g = omega_3_fatty_acids_g

    def __repr__(self):
        return '<id {}>'.format(self.food_id)

Let's push this model to the database schema (create table)

In [30]:
db.create_all() # creates tables in database

Let's check that our tables have been properly created.  In the terminal:

```
$ sudo -i -u postgres
```

In the postgres prompt, connect to the usda postgres database:

```
postgres@baloo:~$ psql -U jamiemenhall -d usda -h localhost
```

When connected to the database, display all tables in database:

```
usda=# \dt

              List of Relations
 Schema |      Name        | Type  | Owner
 -------+------------------+-------+-------------
 public | food_description | table | jamiemenhall
(1 row)
```

You can also see the table schema as well:

```
usda=# \d food_description
```

## Now Let's create model instances for each USDA item we collected and add them to our database

In [31]:
food_id = 1

# let's unpack data from JSON 
for food_item in ndbno_list:
    
    # let's get the metadata for food_desc
    db_id = 'usda'
    name = food_item['name'][:-19] 
    upc_code = food_item['name'][-12:]
    ndbno_id = food_item['ndbno']
    food_cat = food_item['group']
    
    # get full report API URL
    report_api_request_url = get_report_api_request_url(ndbno_id, typ, formt, API_KEY)
    print(report_api_request_url)
    
    # get full report JSON
    full_report_json = get_ndbno_full_report(report_api_request_url)
    print(full_report_json)
    
    short_desc = full_report_json['report']['food']['ing']['desc']
    updated = full_report_json['report']['food']['ing']['upd']
    
    # let's define the food_unit data 
    unit_desc = 
    grams_per_unit = 
    
    # let's define the nut_per_100_g data
    nutrients = full_report_json['report']['food']['nutrients'] # this should be a LIST of nutrients
    
    
    # create model istances
    food_id = food_desc(food_id, name, ndbno_id, name, short_desc, food_cat, updated, upc_code)
    
    # add instance to database
    db.session.add(food_id)
    
    # increase food_id counter
    food_id += 1
    
# commit changes to database
db.session.commit()
db.session.close()

https://api.nal.usda.gov/ndb/reports/?ndbno=45078606&type=f&format=json&api_key=7WqOHQdC2shEfBrx25bIEwxBkvUkYTHMoHYlLWL8
{'report': {'footnotes': [], 'food': {'nutrients': [{'value': '0', 'group': 'Proximates', 'name': 'Energy', 'nutrient_id': '208', 'derivation': 'LCCS', 'measures': [{'value': '0', 'eqv': 5.0, 'label': 'tsp', 'eunit': 'g', 'qty': 1.0}], 'unit': 'kcal'}, {'value': '0.00', 'group': 'Proximates', 'name': 'Protein', 'nutrient_id': '203', 'derivation': 'LCCS', 'measures': [{'value': '0.00', 'eqv': 5.0, 'label': 'tsp', 'eunit': 'g', 'qty': 1.0}], 'unit': 'g'}, {'value': '0.00', 'group': 'Proximates', 'name': 'Total lipid (fat)', 'nutrient_id': '204', 'derivation': 'LCCS', 'measures': [{'value': '0.00', 'eqv': 5.0, 'label': 'tsp', 'eunit': 'g', 'qty': 1.0}], 'unit': 'g'}, {'value': '0.00', 'group': 'Proximates', 'name': 'Carbohydrate, by difference', 'nutrient_id': '205', 'derivation': 'LCCS', 'measures': [{'value': '0.00', 'eqv': 5.0, 'label': 'tsp', 'eunit': 'g', 'qty': 1.

NameError: name 'food_id' is not defined

In [None]:
kcal = nutrients['']
    protein_g = db.Column(db.Float)
    total_fat_g = db.Column(db.Float)
    total_carb_g = db.Column(db.Float)
    total_diet_fiber_g = db.Column(db.Float)
    calcium_mg = db.Column(db.Float)
    iron_mg = db.Column(db.Float)
    magnesium_mg = db.Column(db.Float)
    phosphorus_mg = db.Column(db.Float)
    potassium_mg = db.Column(db.Float)
    sodium_mg = db.Column(db.Float)
    zinc_mg = db.Column(db.Float)
    copper_mg = db.Column(db.Float)
    manganese_mg = db.Column(db.Float)
    selenium_mcg = db.Column(db.Float)
    vitamin_c_mg = db.Column(db.Float)
    thiamin_mg = db.Column(db.Float)
    riboflavin_mg = db.Column(db.Float)
    niacin_mg = db.Column(db.Float)
    pantothenic_acid_mg = db.Column(db.Float)
    vitamin_b6_mg = db.Column(db.Float)
    total_folate_mcg = db.Column(db.Float)
    vitamin_b12_mcg = db.Column(db.Float)
    vitamin_d_mcg = db.Column(db.Float)
    vitamin_e_mg = db.Column(db.Float)
    vitamin_k_mcg = db.Column(db.Float)
    total_sat_fat_g = db.Column(db.Float)
    total_monounsat_fat_g = db.Column(db.Float)
    total_poly_unsat_fat_g = db.Column(db.Float)
    total_trans_fat_g = db.Column(db.Float)
    cholesterol_mg = db.Column(db.Float)
    total_sugar_g = db.Column(db.Float)
    omega_3_fatty_acids_g = db.Column(db.Float)

We can double-check the ```session.add()``` while connected to the database with the following:
    
```
usda=# SELECT * FROM food_description;
```