# Load GeoQuery Dataset

### GeoQuery dataset property:
1. logic forms are different (probably could rewrite ourselves)
2. We have could parse to get the query
3. Cross-schema questions (e.g. "what,are,the,populations,of,all,the,major,cities,in,montana","which,capitals,are,in,the,states,that,border,texas")
--- Solution: create two logical predicates: 'border' and 'in': states border states, cities in states; type 'array' with new predicate


### Following is a description of the geobase predicates:

state(name, abbreviation, capital, population, area, state_number, city1, city2, city3, city4)

city(state, state_abbreviation, name, population)

river(name, length, [states through which it flows])

border(state, state_abbreviation, [states that border it])

highlow(state, state_abbreviation, highest_point, highest_elevation, lowest_point, lowest_elevation)

mountain(state, state_abbreviation, name, height)

road(number, [states it passes through])

lake(name, area, [states it is in])


### We can change to the format we want:

state, abbreviation, capital, population, area, state_number, highest_point, highest_elevation, lowest_point, lowest_elevation, major_cities, rivers, states_border, mountains, lakes

city, state, abbreviation, population

mountain, state, abbreviation, height

river, length, states_through

(road, states_through) (No questions)

lake, area, states_in


### How to do that? Transform a graph-structure knowledge base to table-like is easy.
1. Read in 'base' file to get all the values
2. structured graph to table
3. analyze logical forms and rewrite
4. data augmentation and train

In [1]:
import numpy as np
import re

def strIsNum(s):
    '''verify if the word represent a numerical value
    ''' 
    if not isinstance(s, basestring):
        return 0
    ones = {"zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"}
    tens = {"twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}
    teens = {"ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", \
            "seventeen", "eighteen", "nineteen"}
    levels = {"hundred", "thousand", "million", "billion", "trillion"}
    if s in ones:
        return True 
    if s in tens:
        return True 
    if s in teens:
        return True 
    if s in levels:
        return True 
    if s.isdigit():
        return True #return int(s)
    # for float value
    try:
        x = float(s)
        return True
    except ValueError:
        return False
    return False


In [41]:
query_text = []
with open('./GeoQuery/geoqueries880','r') as f_query:
    for line in f_query:
        #print line
        extract = re.findall(r'\[([^()]+)\]', line)
        #print extract
        tokens = extract[0].split(',')
        sentence = ' '.join(tokens[:-1])
        print sentence
        query_text.append(sentence)

give me the cities in virginia
what are the high points of states surrounding mississippi
name the rivers in arkansas
name all the rivers in colorado
can you tell me the capital of texas
could you tell me what is the highest point in the state of oregon
count the states which have elevations lower than what alabama has
give me all the states of usa
give me the cities in texas
give me the cities in usa
give me the cities in virginia
give me the cities which are in texas
give me the lakes in california
give me the largest state
give me the longest river that passes through the us
give me the number of rivers in california
give me the states that border utah
how big is alaska
how big is massachusetts
how big is new mexico
how big is north dakota
how big is texas
how big is the city of new york
how high are the highest points of all the states
how high is guadalupe peak
how high is mount mckinley
how high is the highest point in america
how high is the highest point in montana
how high is 

### Now
1. collect all the possible values
2. Replace multi-word value by '_' connected one, and also replace in the text

In [46]:
analyze = ['state', 'abbreviation', 'capital', 'population', 'area', 'state_number', 'highest_point', \
           'highest_elevation', 'lowest_point', 'lowest_elevation', 'major_cities', 'rivers', \
           'border', 'mountains', 'lakes', 'city', \
           'mountain', 'height', 'river', 'length', 'states_through', 'lake',\
           'states_in','population_density']

fields_dict = {}
for field in analyze:
    fields_dict[field] = {'value_type':'', 'value_range':[], 'query_word':[field.lower()]}

fields_dict['Country'] = {'value_type':'string', 'value_range':['us','united_states','usa'], 'query_word':['us','united_states','usa']}
fields_dict['population_density']['value_type'] = 'int'
fields_dict['major_cities']['value_type'] = 'list'
fields_dict['rivers']['value_type'] = 'list'
fields_dict['border']['value_type'] = 'list'
fields_dict['mountains']['value_type'] = 'list'
fields_dict['lakes']['value_type'] = 'list'
fields_dict['states_through']['value_type'] = 'list'
fields_dict['states_in']['value_type'] = 'list'
print fields_dict


{'highest_elevation': {'query_word': ['highest_elevation'], 'value_type': '', 'value_range': []}, 'lowest_elevation': {'query_word': ['lowest_elevation'], 'value_type': '', 'value_range': []}, 'height': {'query_word': ['height'], 'value_type': '', 'value_range': []}, 'abbreviation': {'query_word': ['abbreviation'], 'value_type': '', 'value_range': []}, 'population_density': {'query_word': ['population_density'], 'value_type': 'int', 'value_range': []}, 'state_number': {'query_word': ['state_number'], 'value_type': '', 'value_range': []}, 'major_cities': {'query_word': ['major_cities'], 'value_type': 'list', 'value_range': []}, 'border': {'query_word': ['border'], 'value_type': 'list', 'value_range': []}, 'states_in': {'query_word': ['states_in'], 'value_type': 'list', 'value_range': []}, 'mountain': {'query_word': ['mountain'], 'value_type': '', 'value_range': []}, 'area': {'query_word': ['area'], 'value_type': '', 'value_range': []}, 'state': {'query_word': ['state'], 'value_type': ''

In [47]:
with open('./GeoQuery/geobase','r') as f_base:
    idx = 0
    for line in f_base:
        idx += 1
        if idx < 23:
            continue
        if idx < 74:
            # read state schema
            m = re.findall(r'\(([^()]+)\)', line)
            tokens = m[0].split(',')
            fields_dict['state']['value_range'].append(tokens[0][1:-1])
            fields_dict['state']['value_type'] = 'string'
            fields_dict['abbreviation']['value_range'].append(tokens[1][1:-1])
            fields_dict['abbreviation']['value_type'] = 'string'
            fields_dict['capital']['value_range'].append(tokens[2][1:-1])
            fields_dict['capital']['value_type'] = 'string'
            #fields_dict['population']['value_range'].append(float(tokens[3]))
            fields_dict['population']['value_type'] = 'int'
            #fields_dict['area']['value_range'].append(float(tokens[4]))
            fields_dict['area']['value_type'] = 'int'
            #fields_dict['state_number']['value_range'].append(int(tokens[5]))
            fields_dict['state_number']['value_type'] = 'int'
            continue
        if idx < 460:
            # read city
            m = re.findall(r'\(([^()]+)\)', line)
            tokens = m[0].split(',')
            fields_dict['city']['value_range'].append(tokens[2][1:-1])
            fields_dict['city']['value_type'] = 'string'
            continue
        if idx < 506:
            # read river
            m = re.findall(r'\(([^()]+)\)', line)
            tokens = m[0].split(',')
            fields_dict['river']['value_range'].append(tokens[0][1:-1])
            fields_dict['river']['value_type'] = 'string'
            #fields_dict['population']['value_range'].append(float(tokens[3]))
            fields_dict['length']['value_type'] = 'int'
            continue
        if idx < 557:
            # read border:
            continue
        if idx < 608:
            # read highlow
            m = re.findall(r'\(([^()]+)\)', line)
            tokens = m[0].split(',')
            fields_dict['highest_point']['value_range'].append(tokens[2][1:-1])
            fields_dict['highest_point']['value_type'] = 'string'
            #fields_dict['population']['value_range'].append(float(tokens[3]))
            fields_dict['highest_elevation']['value_type'] = 'int'
            fields_dict['lowest_point']['value_range'].append(tokens[4][1:-1])
            fields_dict['lowest_point']['value_type'] = 'string'
            #fields_dict['population']['value_range'].append(float(tokens[3]))
            fields_dict['lowest_elevation']['value_type'] = 'int'
            continue
        if idx < 658:
            # read mountain
            m = re.findall(r'\(([^()]+)\)', line)
            tokens = m[0].split(',')
            fields_dict['mountain']['value_range'].append(tokens[2][1:-1])
            fields_dict['mountain']['value_type'] = 'string'
            #fields_dict['population']['value_range'].append(float(tokens[3]))
            fields_dict['height']['value_type'] = 'int'
            continue
        if idx < 698:
            # read road
            continue
        if idx < 720:
            # read lake
            m = re.findall(r'\(([^()]+)\)', line)
            tokens = m[0].split(',')
            fields_dict['lake']['value_range'].append(tokens[0][1:-1])
            fields_dict['lake']['value_type'] = 'string'
            

print fields_dict

{'highest_elevation': {'query_word': ['highest_elevation'], 'value_type': 'int', 'value_range': []}, 'lowest_elevation': {'query_word': ['lowest_elevation'], 'value_type': 'int', 'value_range': []}, 'height': {'query_word': ['height'], 'value_type': 'int', 'value_range': []}, 'abbreviation': {'query_word': ['abbreviation'], 'value_type': 'string', 'value_range': ['al', 'ak', 'az', 'ar', 'ca', 'co', 'ct', 'de', "'dc", 'fl', 'ga', 'hi', 'id', 'il', 'in', 'ia', 'ks', 'ky', 'la', 'me', 'md', 'ma', 'mi', 'mn', 'ms', 'mo', 'mt', 'ne', 'nv', 'nh', 'nj', 'nm', 'ny', 'nc', 'nd', 'oh', 'ok', 'or', 'pa', 'ri', 'sc', 'sd', 'tn', 'tx', 'ut', 'vt', 'va', 'wa', 'wv', 'wi', 'wy']}, 'population_density': {'query_word': ['population_density'], 'value_type': 'int', 'value_range': []}, 'state_number': {'query_word': ['state_number'], 'value_type': 'int', 'value_range': []}, 'major_cities': {'query_word': ['major_cities'], 'value_type': 'list', 'value_range': []}, 'border': {'query_word': ['border'], 'valu

In [48]:
newquery_text = [x for x in query_text]
for k,v in fields_dict.items():
#     print k
#     print v['value_range']
    for j in range(len(v['value_range']), 0, -1):
        f_value = v['value_range'][j-1]
        if ' ' in f_value:
            new_value = f_value.replace(' ', '_')
            #print new_value
            for i in range(len(newquery_text)):
                newquery_text[i] = newquery_text[i].replace(f_value, new_value)
            v['value_range'].remove(f_value)
            v['value_range'].insert(j,new_value)            
            
for line in newquery_text:
    print line #newquery_text

print fields_dict

give me the cities in virginia
what are the high_points of states surrounding mississippi
name the rivers in arkansas
name all the rivers in colorado
can you tell me the capital of texas
could you tell me what is the highest point in the state of oregon
count the states which have elevations lower than what alabama has
give me all the states of usa
give me the cities in texas
give me the cities in usa
give me the cities in virginia
give me the cities which are in texas
give me the lakes in california
give me the largest state
give me the longest river that passes through the us
give me the number of rivers in california
give me the states that border utah
how big is alaska
how big is massachusetts
how big is new_mexico
how big is north_dakota
how big is texas
how big is the city of new_york
how high are the highest points of all the states
how high is guadalupe_peak
how high is mount_mckinley
how high is the highest point in america
how high is the highest point in montana
how high is 

what states have rivers named colorado
what states have rivers running through them
what states have towns named springfield
what states high_point are higher than that of colorado
what states in the united states have a city of springfield
what states neighbor maine
what states surround kentucky
what texas city has the largest population
whats the largest city
where are mountains
where is austin
where is baton_rouge
where is dallas
where is fort_wayne
where is houston
where is indianapolis
where is massachusetts
where is mount_whitney
where is mount_whitney located
where is new_hampshire
where is new_orleans
where is portland
where is san_diego
where is san_jose
where is scotts_valley
where is springfield
where is the chattahoochee river
where is the highest mountain of the united states
where is the highest point in hawaii
where is the highest point in montana
where is the lowest point in maryland
where is the lowest point in the us
where is the lowest spot in iowa
where is the most 

In [None]:
schemas_dict = {'highest_elevation': {'query_word': ['highest_elevation'], 'value_type': 'int', 'value_range': []}, 
                'lowest_elevation': {'query_word': ['lowest_elevation'], 'value_type': 'int', 'value_range': []}, 
                'height': {'query_word': ['height'], 'value_type': 'int', 'value_range': []}, 
                'abbreviation': {'query_word': ['abbreviation'], 'value_type': 'string', 
                                 'value_range': ['al', 'ak', 'az', 'ar', 'ca', 'co', 'ct', 'de', "'dc", 'fl',\
                                                 'ga', 'hi', 'id', 'il', 'in', 'ia', 'ks', 'ky', 'la', 'me', \
                                                 'md', 'ma', 'mi', 'mn', 'ms', 'mo', 'mt', 'ne', 'nv', 'nh', \
                                                 'nj', 'nm', 'ny', 'nc', 'nd', 'oh', 'ok', 'or', 'pa', 'ri', \
                                                 'sc', 'sd', 'tn', 'tx', 'ut', 'vt', 'va', 'wa', 'wv', 'wi', 'wy']}, 
                'population_density': {'query_word': ['population_density'], 'value_type': 'int', 'value_range': []}, 
                'state_number': {'query_word': ['state_number'], 'value_type': 'int', 'value_range': []}, 
                'major_cities': {'query_word': ['major_cities'], 'value_type': 'list', 'value_range': []}, 
                'border': {'query_word': ['border','bordering','surround','surrounding','neighbor','neighboring'], 
                           'value_type': 'list', 'value_range': []}, 
                'states_in': {'query_word': ['states_in'], 'value_type': 'list', 'value_range': []}, 
                'mountain': {'query_word': ['mountain'], 'value_type': 'string', 
                             'value_range': ['mckinley', 'foraker', 'st._elias', 'bona', 'blackburn', 'kennedy', \
                                             'sanford', 'vancouver', 'south_buttress', 'churchill', 'fairweather', \
                                             'hubbard', 'bear', 'hunter', 'east_buttress', 'alverstone', 'whitney', \
                                             'browne_tower', 'elbert', 'massive', 'harvard', 'rainier', 'williamson', \
                                             'bianca', 'uncompahgre', 'la_plata', 'crestone', 'lincoln', 'grays', \
                                             'antero', 'torreys', 'castle', 'quandary', 'evans', 'longs', 'wilson', \
                                             'white', 'shavano', 'north_palisade', 'belford', 'princeton', 'yale', \
                                             'crestone_needle', 'bross', 'wrangell', 'kit_carson', 'shasta', 'sill', \
                                             'maroon', 'el_diente']}, 
                'area': {'query_word': ['area'], 'value_type': 'int', 'value_range': []}, 
                'state': {'query_word': ['state'], 'value_type': 'string', 
                          'value_range': ['alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', \
                                          'connecticut', 'delaware', 'florida', 'district_of_columbia', 'georgia', \
                                          'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', \
                                          'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', \
                                          'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'ohio', \
                                          'new_hampshire', 'new_jersey', 'new_mexico', 'new_york', 'north_carolina', \
                                          'north_dakota', 'oklahoma', 'oregon', 'pennsylvania', 'tennessee', \
                                          'rhode_island', 'south_carolina', 'south_dakota', 'texas', 'utah', 'vermont', \
                                          'virginia', 'washington', 'wisconsin', 'west_virginia', 'wyoming']}, 
                'highest_point': {'query_word': ['highest_point'], 'value_type': 'string', 
                                  'value_range': ['centerville', 'cheaha_mountain', 'mount_mckinley', 'humphreys_peak', \
                                                  'magazine_mountain', 'mount_whitney', 'mount_elbert', 'mount_frissell', \
                                                  'tenleytown', 'gannett_peak', 'walton_county', 'brasstown_bald', \
                                                  'mauna_kea', 'borah_peak', 'charles_mound', 'franklin_township', \
                                                  'ocheyedan_mound', 'mount_sunflower', 'black_mountain', 'driskill_mountain', \
                                                  'mount_katahdin', 'backbone_mountain', 'mount_greylock', 'mount_curwood', \
                                                  'eagle_mountain', 'woodall_mountain', 'taum_sauk_mountain', 'granite_peak', \
                                                  'johnson_township', 'boundary_peak', 'mount_washington', 'high_point', \
                                                  'wheeler_peak', 'mount_marcy', 'mount_mitchell', 'white_butte', \
                                                  'campbell_hill', 'black_mesa', 'mount_hood', 'mount_davis', 'jerimoth_hill', \
                                                  'sassafras_mountain', 'harney_peak', 'clingmans_dome', 'guadalupe_peak', \
                                                  'kings_peak', 'mount_mansfield', 'mount_rogers', 'mount_rainier', \
                                                  'spruce_knob', 'timms_hill']}, 
                'capital': {'query_word': ['capital'], 'value_type': 'string', 
                            'value_range': ['montgomery', 'juneau', 'phoenix', 'sacramento', 'little_rock', 'denver', \
                                            'hartford', 'dover', 'washington', 'tallahassee', 'atlanta', 'honolulu', \
                                            'boise', 'springfield', 'indianapolis', 'topeka', 'des_moines', 'frankfort', \
                                            'augusta', 'baton_rouge', 'annapolis', 'boston', 'lansing', 'jackson', \
                                            'st._paul', 'helena', 'jefferson_city', 'lincoln', 'concord', 'carson_city', \
                                            'trenton', 'albany', 'santa_fe', 'raleigh', 'bismarck', 'columbus', 'salem', \
                                            'oklahoma_city', 'harrisburg', 'providence', 'columbia', 'pierre', 'nashville', \
                                            'austin', 'montpelier', 'salt_lake_city', 'richmond', 'olympia', 'charleston', \
                                            'madison', 'cheyenne']}, 
                'lowest_point': {'query_word': ['lowest_point'], 'value_type': 'string', 
                                 'value_range': ['belle_fourche_river', 'ouachita_river', 'death_valley', 'arkansas_river', \
                                                 'long_island_sound', 'snake_river', 'verdigris_river', 'new_orleans', \
                                                 'lake_erie', 'lake_superior', 'st._francis_river', 'kootenai_river', \
                                                 'southeast_corner', 'colorado_river', 'red_bluff_reservoir', \
                                                 'red_river', 'ohio_river', 'little_river', 'delaware_river', \
                                                 'big_stone_lake', 'mississippi_river', 'gulf_of_mexico', 'beaver_dam_creek', \
                                                 'lake_champlain', 'atlantic_ocean', 'pacific_ocean', 'potomac_river', 'lake_michigan']}, 
                'lake': {'query_word': ['lake'], 'value_type': 'string', 
                         'value_range': ['superior', 'huron', 'michigan', 'erie', 'ontario', 'iliamna', 'great_salt_lake', \
                                         'lake_of_the_woods', 'okeechobee', 'pontchartrain', 'becharof', 'red', 'champlain', \
                                         'st._clair', 'rainy', 'teshekpuk', 'salton_sea', 'naknek', 'winnebago', 'flathead', \
                                         'mille_lacs', 'tahoe']}, 
                'mountains': {'query_word': ['mountains'], 'value_type': 'list', 'value_range': []}, 
                'city': {'query_word': ['city'], 'value_type': 'string', 
                         'value_range': ['birmingham', 'mobile', 'montgomery', 'huntsville', 'tuscaloosa', 'anchorage', \
                                         'phoenix', 'tucson', 'mesa', 'tempe', 'glendale', 'scottsdale', 'oakland', \
                                         'little_rock', 'fort_smith', 'north_little_rock', 'los_angeles', 'san_diego', \
                                         'san_francisco', 'san_jose', 'long_beach', 'sacramento', 'anaheim', 'fresno', \
                                         'riverside', 'santa_ana', 'stockton', 'huntington_beach', 'glendale', 'fremont', \
                                         'torrance', 'pasadena', 'garden_grove', 'san_bernardino', 'oxnard', 'east_los_angeles', \
                                         'modesto', 'sunnyvale', 'bakersfield', 'concord', 'berkeley', 'fullerton', \
                                         'inglewood', 'hayward', 'pomona', 'orange', 'ontario', 'norwalk', 'santa_monica', \
                                         'santa_clara', 'citrus_heights', 'burbank', 'downey', 'chula_vista', 'santa_rosa', \
                                         'compton', 'costa_mesa', 'carson', 'salinas', 'vallejo', 'west_covina', 'oceanside', \
                                         'el_monte', 'daly_city', 'thousand_oaks', 'san_mateo', 'simi_valley', 'richmond', \
                                         'lakewood', 'ventura', 'santa_barbara', 'el_cajon', 'westminster', 'whittier', \
                                         'alhambra', 'south_gate', 'alameda', 'buena_park', 'san_leandro', 'escondido', \
                                         'newport_beach', 'irvine', 'fairfield', 'mountain_view', 'denver', 'redondo_beach', \
                                         'scotts_valley', 'aurora', 'colorado_springs', 'lakewood', 'pueblo', 'arvada', \
                                         'boulder', 'bridgeport', 'fort_collins', 'hartford', 'waterbury', 'new_haven', \
                                         'stamford', 'norwalk', 'danbury', 'new_britain', 'west_hartford', 'greenwich', \
                                         'bristol', 'meriden', 'wilmington', 'washington', 'jacksonville', 'miami', 'tampa', \
                                         'orlando', 'st._petersburg', 'fort_lauderdale', 'hollywood', 'clearwater', 'miami_beach', \
                                         'tallahassee', 'gainesville', 'kendall', 'largo', 'west_palm_beach', 'pensacola', \
                                         'atlanta', 'columbus', 'savannah', 'macon', 'albany', 'honolulu', 'ewa', 'koolaupoko', \
                                         'boise', 'chicago', 'rockford', 'peoria', 'springfield', 'decatur', 'aurora', 'joliet', \
                                         'evanston', 'waukegan', 'elgin', 'arlington_heights', 'cicero', 'skokie', 'oak_lawn', \
                                         'champaign', 'indianapolis', 'gary', 'fort_wayne', 'evansville', 'hammond', 'south_bend', \
                                         'muncie', 'anderson', 'davenport', 'terre_haute', 'des_moines', 'cedar_rapids', 'waterloo', \
                                         'sioux_city', 'dubuque', 'wichita', 'topeka', 'louisville', 'overland_park', 'lexington', \
                                         'shreveport', 'new_orleans', 'baton_rouge', 'metairie', 'lafayette', 'kenner', 'lake_charles', \
                                         'monroe', 'portland', 'baltimore', 'dundalk', 'silver_spring', 'bethesda', 'boston', \
                                         'worcester', 'springfield', 'cambridge', 'new_bedford', 'brockton', 'lowell', 'fall_river', \
                                         'quincy', 'newton', 'lynn', 'somerville', 'framingham', 'lawrence', 'waltham', 'medford', \
                                         'detroit', 'warren', 'grand_rapids', 'flint', 'lansing', 'livonia', 'sterling_heights', \
                                         'ann_arbor', 'dearborn', 'westland', 'kalamazoo', 'taylor', 'saginaw', 'pontiac', 'southfield', \
                                         'st._clair_shores', 'clinton', 'troy', 'royal_oak', 'dearborn_heights', 'waterford', \
                                         'wyoming', 'redford', 'minneapolis', 'farmington_hills', 'duluth', 'st._paul', 'bloomington', \
                                         'rochester', 'jackson', 'springfield', 'st._louis', 'kansas_city', 'kansas_city', \
                                         'independence', 'columbia', 'st._joseph', 'billings', 'omaha', 'great_falls', 'lincoln', \
                                         'reno', 'las_vegas', 'manchester', 'nashua', 'newark', 'paterson', 'jersey_city', \
                                         'elizabeth', 'trenton', 'woodbridge', 'camden', 'clifton', 'east_orange', 'edison', \
                                         'bayonne', 'cherry_hill', 'middletown', 'irvington', 'albuquerque', 'buffalo', 'new_york', \
                                         'rochester', 'yonkers', 'syracuse', 'albany', 'cheektowaga', 'utica', 'schenectady', \
                                         'niagara_falls', 'new_rochelle', 'irondequoit', 'mount_vernon', 'levittown', 'charlotte', \
                                         'greensboro', 'raleigh', 'winston-salem', 'durham', 'fayetteville', 'high_point', 'fargo', \
                                         'cleveland', 'columbus', 'cincinnati', 'toledo', 'akron', 'dayton', 'youngstown', 'canton', \
                                         'parma', 'lorain', 'springfield', 'hamilton', 'lakewood', 'kettering', 'euclid', 'elyria', \
                                         'tulsa', 'oklahoma_city', 'lawton', 'norman', 'portland', 'eugene', 'salem', 'philadelphia', \
                                         'pittsburgh', 'erie', 'allentown', 'scranton', 'reading', 'upper_darby', 'bethlehem', \
                                         'abingdon', 'lower_merion', 'altoona', 'bristol_township', 'penn_hills', 'providence', \
                                         'warwick', 'cranston', 'pawtucket', 'columbia', 'charleston', 'greenville', 'north_charleston', \
                                         'memphis', 'sioux_falls', 'nashville', 'knoxville', 'chattanooga', 'houston', 'dallas', \
                                         'austin', 'san_antonio', 'el_paso', 'fort_worth', 'lubbock', 'corpus_christi', 'arlington', \
                                         'amarillo', 'garland', 'beaumont', 'pasadena', 'irving', 'waco', 'abilene', 'laredo', \
                                         'wichita_falls', 'odessa', 'brownsville', 'richardson', 'san_angelo', 'plano', 'midland', \
                                         'grand_prairie', 'tyler', 'mesquite', 'mcallen', 'longview', 'provo', 'port_arthur', \
                                         'salt_lake_city', 'ogden', 'west_valley', 'norfolk', 'richmond', 'virginia_beach', 'arlington', \
                                         'hampton', 'newport_news', 'chesapeake', 'portsmouth', 'alexandria', 'roanoke', 'lynchburg', \
                                         'seattle', 'spokane', 'tacoma', 'bellevue', 'charleston', 'huntington', 'milwaukee', \
                                         'madison', 'racine', 'green_bay', 'kenosha', 'appleton', 'west_allis', 'casper']}, 
                'Country': {'query_word': ['us', 'united_states', 'usa'], 'value_type': 'string', 
                            'value_range': ['us', 'united_states', 'usa']}, 
                'states_through': {'query_word': ['states_through'], 'value_type': 'list', 'value_range': []}, 
                'lakes': {'query_word': ['lakes'], 'value_type': 'list', 'value_range': []}, 
                'rivers': {'query_word': ['rivers'], 'value_type': 'list', 'value_range': []}, 
                'population': {'query_word': ['population'], 'value_type': 'int', 'value_range': []}, 
                'length': {'query_word': ['length'], 'value_type': 'int', 'value_range': []}, 
                'river': {'query_word': ['river'], 'value_type': 'string', 
                          'value_range': ['mississippi', 'missouri', 'colorado', 'ohio', 'red', 'arkansas', 'canadian', 'connecticut', \
                                          'delaware', 'snake', 'little_missouri', 'chattahoochee', 'cimarron', 'green', 'potomac', \
                                          'north_platte', 'republican', 'tennessee', 'rio_grande', 'san_juan', 'wabash', 'yellowstone', \
                                          'allegheny', 'bighorn', 'cheyenne', 'columbia', 'clark_fork', 'cumberland', 'dakota', 'gila', \
                                          'hudson', 'neosho', 'niobrara', 'ouachita', 'pearl', 'pecos', 'powder', 'roanoke', 'rock', \
                                          'tombigbee', 'smoky_hill', 'south_platte', 'st._francis', 'washita', 'white', 'wateree_catawba']}
               }