Welcome to our demo! In this demo, we present SYSTEM, our method for discovering joins that introduce semantically related features. SYSTEM chooses between a knowledge-graph based method and a non-knowledge-graph-based method. Our demo is structured as follows:

(1) We will display a list of input datasets representing interesting use cases, a few of which are discussed in our paper. Users can explore these, and pick one to find joinable datasets.
(2) Then, we will display color coded datasets, where the join keys are colored, and, if the joins are KG joins, the entities we think represent each row of each table are displayed as well.
(3) For non-KG joins, we display the proxy table from the data lake that best representsthe input.

In [4]:
import pandas as pd
import seaborn as sns
from fullmethod import display_df,display_results #, display_kg

## Example 1: Joining Bus Company Revenues and Population Densities using a Knowledge Graph

Suppose a data analyst is interested in understanding what factors contribute to the financial success of bus companies. While the analyst has statistics on the bus companies themselves, such as revenue and annual ridership, there could be many external factors as well. To discover these factors, the analyst wants to find joins that introduce semantically related columns to their input bus company dataset. Then, she could, for example, find a correlation between the introduced columns and the bus company revenue column to determine whether the introduced columns are external factors affecting bus company financial success.

In this example, we show how a user could discover a joinable dataset containing population densities. We walk through our knowledge graph-based method for join detection

In [2]:
input_df = pd.read_csv('demo_lake/busridertbl.csv')

In [3]:
input_df.columns

Index(['Unnamed: 0', 'Unnamed: 0.1', 'dbo:BusCompany',
       '<http://dbpedia.org/property/annualRidership>',
       '<http://dbpedia.org/ontology/numberOfLines>', 'dbo:regionServed'],
      dtype='object')

In [4]:
# Set colormap equal to seaborns light green color palette
cm = sns.light_palette("green", as_cmap=True)

In [5]:
# Set CSS properties for th elements in dataframe
th_props = [
  ('font-size', '11px'),
  ('text-align', 'center'),
  ('font-weight', 'bold'),
  ('color', '#6d6d6d'),
  ('background-color', '#f7f7f9')
  ]

# Set CSS properties for td elements in dataframe
td_props = [
  ('font-size', '11px')
  ]

# Set table styles
styles = [
  dict(selector="th", props=th_props),
  dict(selector="td", props=td_props)
  ]

In [6]:
(input_df.style
   .set_properties(**{'background-color' : 'green'}, subset=['dbo:BusCompany'])
   .set_properties(**{'background-color' : 'yellow'}, subset=['dbo:regionServed'])
  #.background_gradient(cmap=cm, subset=['dbo:BusCompany','dbo:regionServed'])
  #.highlight_max(subset=['dbo:BusCompany','dbo:regionServed'])
  #.set_caption('The ground truth is in green, and the join key is yellow.')
  #.format({'dbo:regionServed': "{:.2%}"})
  .set_table_styles(styles))

Unnamed: 0.1,Unnamed: 0,dbo:BusCompany,Unnamed: 3,Unnamed: 4,dbo:regionServed
0,0,http://dbpedia.org/resource/Shuttle–UM,2956600.0,31,"http://dbpedia.org/resource/Montgomery_County,_Maryland"
1,1,http://dbpedia.org/resource/Razorback_Transit,1989087.0,19,"http://dbpedia.org/resource/Washington_County,_Arkansas"
2,2,http://dbpedia.org/resource/El_Metro_Transit,4300000.0,24,"http://dbpedia.org/resource/Webb_County,_Texas"
3,3,http://dbpedia.org/resource/Pace_(transit),28392400.0,218,"http://dbpedia.org/resource/Will_County,_Illinois"
4,4,http://dbpedia.org/resource/Roaring_Fork_Transportation_Authority,5470000.0,3,"http://dbpedia.org/resource/Eagle_County,_Colorado"
5,5,http://dbpedia.org/resource/Roaring_Fork_Transportation_Authority,5470000.0,15,"http://dbpedia.org/resource/Eagle_County,_Colorado"
6,6,http://dbpedia.org/resource/Roaring_Fork_Transportation_Authority,4.99,3,"http://dbpedia.org/resource/Eagle_County,_Colorado"
7,7,http://dbpedia.org/resource/Roaring_Fork_Transportation_Authority,4.99,15,"http://dbpedia.org/resource/Eagle_County,_Colorado"
8,8,http://dbpedia.org/resource/Erie_Metropolitan_Transit_Authority,5.87,25,"http://dbpedia.org/resource/Erie_County,_Pennsylvania"
9,9,http://dbpedia.org/resource/Erie_Metropolitan_Transit_Authority,2743473.0,25,"http://dbpedia.org/resource/Erie_County,_Pennsylvania"


In [7]:
out_df = pd.read_csv('demo_lake/busriderjoin.csv')

In [8]:
out_df.columns

Index(['Unnamed: 0', 'dbo:regionServed',
       '<http://dbpedia.org/ontology/PopulatedPlace/areaTotal>',
       'dbo:percentageOfAreaWater'],
      dtype='object')

In [9]:
(out_df.style
   #.set_properties(**{'background-color' : 'green'}, subset=['dbo:BusCompany'])
   .set_properties(**{'background-color' : 'yellow'}, subset=['dbo:regionServed'])
  #.background_gradient(cmap=cm, subset=['dbo:BusCompany','dbo:regionServed'])
  #.highlight_max(subset=['dbo:BusCompany','dbo:regionServed'])
  #.set_caption('The ground truth is in green, and the join key is yellow.')
  #.format({'dbo:regionServed': "{:.2%}"})
  .set_table_styles(styles))

Unnamed: 0.1,Unnamed: 0,dbo:regionServed,Unnamed: 3,dbo:percentageOfAreaWater
0,0,"http://dbpedia.org/resource/Montgomery_County,_Maryland",1313.123972,0.031
1,1,"http://dbpedia.org/resource/Washington_County,_Arkansas",2464.943484,0.6
2,2,"http://dbpedia.org/resource/Webb_County,_Texas",8741.209872,0.4
3,3,"http://dbpedia.org/resource/Will_County,_Illinois",2198.899906,0.015
4,4,"http://dbpedia.org/resource/Eagle_County,_Colorado",4382.259883,0.4
5,5,"http://dbpedia.org/resource/Eagle_County,_Colorado",4382.259883,0.4
6,6,"http://dbpedia.org/resource/Eagle_County,_Colorado",4382.259883,0.4
7,7,"http://dbpedia.org/resource/Eagle_County,_Colorado",4382.259883,0.4
8,8,"http://dbpedia.org/resource/Erie_County,_Pennsylvania",4035.201476,0.49
9,9,"http://dbpedia.org/resource/Erie_County,_Pennsylvania",4035.201476,0.49


## Example 2: Joining ETFs and Mutual Fund historical data without a KG

In [10]:
fulldf = pd.read_csv('data/ETF prices.csv')
fulldf.dtypes['price_date']
fulldf['price_date'] = pd.to_datetime(fulldf['price_date'])
fulldf = fulldf.sort_values(by='price_date')
fulldf.set_index('price_date')

Unnamed: 0_level_0,fund_symbol,open,high,low,close,adj_close,volume
price_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1993-01-29,SPY,43.97,43.97,43.75,43.94,25.80,1003200
1993-02-01,SPY,43.97,44.25,43.97,44.25,25.98,480500
1993-02-02,SPY,44.22,44.38,44.12,44.34,26.04,201300
1993-02-03,SPY,44.41,44.84,44.38,44.81,26.31,529400
1993-02-04,SPY,44.97,45.09,44.47,45.00,26.42,531500
...,...,...,...,...,...,...,...
2021-11-30,USLVF,65.33,65.33,61.00,61.85,61.85,9200
2021-11-30,GBF,122.54,122.54,122.12,122.26,122.26,5000
2021-11-30,GBGR,23.05,23.05,23.05,23.05,23.05,200
2021-11-30,USMF,39.80,39.80,39.42,39.42,39.42,7300


In [11]:
df1 = fulldf[:50]
proxy_df = fulldf[51:100]

In [12]:
df1

Unnamed: 0,fund_symbol,price_date,open,high,low,close,adj_close,volume
3262244,SPY,1993-01-29,43.97,43.97,43.75,43.94,25.8,1003200
3262245,SPY,1993-02-01,43.97,44.25,43.97,44.25,25.98,480500
3262246,SPY,1993-02-02,44.22,44.38,44.12,44.34,26.04,201300
3262247,SPY,1993-02-03,44.41,44.84,44.38,44.81,26.31,529400
3262248,SPY,1993-02-04,44.97,45.09,44.47,45.0,26.42,531500
3262249,SPY,1993-02-05,44.97,45.06,44.72,44.97,26.41,492100
3262250,SPY,1993-02-08,44.97,45.12,44.91,44.97,26.41,596100
3262251,SPY,1993-02-09,44.81,44.81,44.56,44.66,26.22,122100
3262252,SPY,1993-02-10,44.66,44.75,44.53,44.72,26.26,379600
3262253,SPY,1993-02-11,44.78,45.12,44.78,44.94,26.39,19500


In [13]:
(df1.style
   #.set_properties(**{'background-color' : 'green'}, subset=['dbo:BusCompany'])
   .set_properties(**{'background-color' : 'yellow'}, subset=['price_date'])
  #.background_gradient(cmap=cm, subset=['dbo:BusCompany','dbo:regionServed'])
  #.highlight_max(subset=['dbo:BusCompany','dbo:regionServed'])
  #.set_caption('The ground truth is in green, and the join key is yellow.')
  #.format({'dbo:regionServed': "{:.2%}"})
  .set_table_styles(styles))

Unnamed: 0,fund_symbol,price_date,open,high,low,close,adj_close,volume
3262244,SPY,1993-01-29 00:00:00,43.97,43.97,43.75,43.94,25.8,1003200
3262245,SPY,1993-02-01 00:00:00,43.97,44.25,43.97,44.25,25.98,480500
3262246,SPY,1993-02-02 00:00:00,44.22,44.38,44.12,44.34,26.04,201300
3262247,SPY,1993-02-03 00:00:00,44.41,44.84,44.38,44.81,26.31,529400
3262248,SPY,1993-02-04 00:00:00,44.97,45.09,44.47,45.0,26.42,531500
3262249,SPY,1993-02-05 00:00:00,44.97,45.06,44.72,44.97,26.41,492100
3262250,SPY,1993-02-08 00:00:00,44.97,45.12,44.91,44.97,26.41,596100
3262251,SPY,1993-02-09 00:00:00,44.81,44.81,44.56,44.66,26.22,122100
3262252,SPY,1993-02-10 00:00:00,44.66,44.75,44.53,44.72,26.26,379600
3262253,SPY,1993-02-11 00:00:00,44.78,45.12,44.78,44.94,26.39,19500


In [14]:
nonkg_out = pd.read_csv('data/MutualFund prices - A-E.csv', nrows=50)

In [15]:
(nonkg_out.style
   #.set_properties(**{'background-color' : 'green'}, subset=['dbo:BusCompany'])
   .set_properties(**{'background-color' : 'yellow'}, subset=['price_date'])
  #.background_gradient(cmap=cm, subset=['dbo:BusCompany','dbo:regionServed'])
  #.highlight_max(subset=['dbo:BusCompany','dbo:regionServed'])
  #.set_caption('The ground truth is in green, and the join key is yellow.')
  #.format({'dbo:regionServed': "{:.2%}"})
  .set_table_styles(styles))

Unnamed: 0,fund_symbol,price_date,nav_per_share
0,AAAAX,2007-07-31,10.02
1,AAAAX,2007-08-01,9.98
2,AAAAX,2007-08-02,10.01
3,AAAAX,2007-08-03,9.9
4,AAAAX,2007-08-06,9.93
5,AAAAX,2007-08-07,9.94
6,AAAAX,2007-08-08,10.02
7,AAAAX,2007-08-09,9.91
8,AAAAX,2007-08-10,9.91
9,AAAAX,2007-08-13,9.92


In [16]:
(proxy_df.style
   #.set_properties(**{'background-color' : 'green'}, subset=['dbo:BusCompany'])
   .set_properties(**{'background-color' : 'yellow'}, subset=['price_date'])
  #.background_gradient(cmap=cm, subset=['dbo:BusCompany','dbo:regionServed'])
  #.highlight_max(subset=['dbo:BusCompany','dbo:regionServed'])
  #.set_caption('The ground truth is in green, and the join key is yellow.')
  #.format({'dbo:regionServed': "{:.2%}"})
  .set_table_styles(styles))

Unnamed: 0,fund_symbol,price_date,open,high,low,close,adj_close,volume
3262295,SPY,1993-04-14 00:00:00,45.03,45.06,44.91,44.94,26.51,119600
3262296,SPY,1993-04-15 00:00:00,44.91,45.03,44.75,44.94,26.51,148600
3262297,SPY,1993-04-16 00:00:00,44.97,45.03,44.88,44.94,26.51,47900
3262298,SPY,1993-04-19 00:00:00,44.94,45.06,44.72,44.75,26.4,157000
3262299,SPY,1993-04-20 00:00:00,44.69,44.75,44.25,44.53,26.27,279500
3262300,SPY,1993-04-21 00:00:00,44.62,44.62,44.38,44.5,26.25,67900
3262301,SPY,1993-04-22 00:00:00,44.31,44.69,43.94,43.94,25.92,97700
3262302,SPY,1993-04-23 00:00:00,43.84,43.97,43.69,43.75,25.81,106000
3262303,SPY,1993-04-26 00:00:00,43.78,43.94,43.28,43.41,25.61,62600
3262304,SPY,1993-04-27 00:00:00,43.34,43.88,43.34,43.88,25.88,156800


In [17]:
def display_results(df, col2highlight, styles):
    return (df.style
   #.set_properties(**{'background-color' : 'green'}, subset=['dbo:BusCompany'])
   .set_properties(**{'background-color' : 'yellow'}, subset=[col2highlight])
  #.background_gradient(cmap=cm, subset=['dbo:BusCompany','dbo:regionServed'])
  #.highlight_max(subset=['dbo:BusCompany','dbo:regionServed'])
  #.set_caption('The ground truth is in green, and the join key is yellow.')
  #.format({'dbo:regionServed': "{:.2%}"})
  .set_table_styles(styles))
    

In [18]:
display_df(proxy_df, 'price_date', styles)

Unnamed: 0,fund_symbol,price_date,open,high,low,close,adj_close,volume
3262295,SPY,1993-04-14 00:00:00,45.03,45.06,44.91,44.94,26.51,119600
3262296,SPY,1993-04-15 00:00:00,44.91,45.03,44.75,44.94,26.51,148600
3262297,SPY,1993-04-16 00:00:00,44.97,45.03,44.88,44.94,26.51,47900
3262298,SPY,1993-04-19 00:00:00,44.94,45.06,44.72,44.75,26.4,157000
3262299,SPY,1993-04-20 00:00:00,44.69,44.75,44.25,44.53,26.27,279500
3262300,SPY,1993-04-21 00:00:00,44.62,44.62,44.38,44.5,26.25,67900
3262301,SPY,1993-04-22 00:00:00,44.31,44.69,43.94,43.94,25.92,97700
3262302,SPY,1993-04-23 00:00:00,43.84,43.97,43.69,43.75,25.81,106000
3262303,SPY,1993-04-26 00:00:00,43.78,43.94,43.28,43.41,25.61,62600
3262304,SPY,1993-04-27 00:00:00,43.34,43.88,43.34,43.88,25.88,156800


In [3]:
display_results()

TypeError: display_results() missing 1 required positional argument: 'infile'

In [136]:
pd.set_option('display.max_colwidth', 10)
#pd.set_option("display.chop_threshold", 8)

In [137]:
def prettify_st(st : str):
    if st.startswith('dbo:'):
        new_st = st[4:]
        return new_st
    elif 'http' in st:
        new_st = st.split('/')[-1]
        return new_st
    else:
        #leave it alone
        return st

In [142]:
def display_kg(indf, in_ent, in_jk, df, ent_col, jk_col, styles, title, rel_score):
    if 'Unnamed: 0.1' in indf.columns:
        indf = indf.drop(columns=['Unnamed: 0.1'])
    if 'Unnamed: 0.1' in df.columns:
        df = df.drop(columns=['Unnamed: 0.1'])
    pretty_incol_lst = [{incol : prettify_st(incol)} for incol in indf.columns]
    pretty_incols = {}
    for e in pretty_incol_lst:
        for k in e:
            pretty_incols[k] = e[k]
    
    pretty_outcol_lst = [{outcol : prettify_st(outcol)} for outcol in df.columns]
    pretty_outcols = {}
    for e in pretty_outcol_lst:
        for k in e:
            pretty_outcols[k] = e[k]
    
    pretty_indf = indf.rename(columns=pretty_incols)
    pretty_df = df.rename(columns=pretty_outcols)
    print(pretty_indf.columns)
    print(pretty_df.columns)
    # print(pretty_indf.index.is_unique)
    # print(pretty_df.index.is_unique)
    
    inobj_cols = pretty_indf.select_dtypes(include='object').head()
    oobj_cols = pretty_df.select_dtypes(include='object').head()
    
    infl_cols = pretty_indf.select_dtypes(include='float').head()
    ofl_cols = pretty_df.select_dtypes(include='float').head()
    
    
    for c in inobj_cols:
        pretty_indf[c] = pretty_indf[c].apply(lambda x: prettify_st(x))
    
    for c in oobj_cols:
        pretty_df[c] = pretty_df[c].apply(lambda x: prettify_st(x))
    
    infl_dct = {}
    ofl_dct = {}
    for c in infl_cols:
        infl_dct[c] = '{:.2f}'
    
    for c in ofl_cols:
        ofl_dct[c] = '{:.2f}'
    
    
    
    pd.set_option('display.max_colwidth', 10)
    #space = "\xa0" * 10
    space = ""
    indf_styler = (pretty_indf.style
                   .set_table_attributes("style='display:inline; margin-right:20px;'")
                   .set_properties(**{'background-color' : 'yellow'}, subset=[pretty_incols[in_jk]])
                   .set_properties(**{'background-color' : 'green'}, subset=[pretty_incols[in_ent]])
                   .set_caption('KG Table: ' + title)
                   #.set_table_styles(styles)
                   .format(infl_dct))
    
    
    
    if ent_col == jk_col:
        outdf_styler = (pretty_df.style
                        .set_table_attributes("style='display:inline'")
                        .set_properties(**{'background-color' : 'yellow'}, subset=[pretty_outcols[jk_col]])
                        .set_caption('KG Table: ' + title + ',Relationship Strength: ' + str(rel_score))
                        #.set_table_styles(styles)
                        .format(ofl_dct))
    
    else:
        outdf_styler = (pretty_df.style
                       .set_table_attributes("style='display:inline'")
                       .set_properties(**{'background-color' : 'green'}, subset=[pretty_outcols[ent_col]])
                       .set_properties(**{'background-color' : 'yellow'}, subset=[pretty_outcols[jk_col]])
  #.background_gradient(cmap=cm, subset=['dbo:BusCompany','dbo:regionServed'])
  #.highlight_max(subset=['dbo:BusCompany','dbo:regionServed'])
  #.set_caption('The ground truth is in green, and the join key is yellow.')
  #.format({'dbo:regionServed': "{:.2%}"})
                       #.set_table_styles(styles)
                       .format(ofl_dct))
    
    final_display_obj = indf_styler._repr_html_() + outdf_styler._repr_html_()
    final_display_obj = final_display_obj.replace('table','table style="display:inline"')
    return final_display_obj

In [143]:
indf = pd.read_csv('demo_lake/busridertbl.csv', nrows=5)
in_jk = 'dbo:regionServed'
in_ent = 'dbo:BusCompany'
outdf = pd.read_csv('demo_lake/busriderjoin.csv', nrows=5)
out_jk = 'dbo:regionServed'
out_ent = 'dbo:regionServed'
rel_score = 0.04950495049504951
title = 'demo_lake/busriderjoin.csv'

# Set colormap equal to seaborns light green color palette
cm = sns.light_palette("green", as_cmap=True)

# Set CSS properties for th elements in dataframe
th_props = [
  ('font-size', '11px'),
  ('text-align', 'center'),
  ('font-weight', 'bold'),
  ('color', '#6d6d6d'),
  ('background-color', '#f7f7f9')
  ]

# Set CSS properties for td elements in dataframe
td_props = [
  ('font-size', '11px'),
  ('vertical-align', 'top')
  ]

#css = [
#    ('flex-direction', 'row')
#    ]

# Set table styles
styles = [
  dict(selector="th", props=th_props),
  dict(selector="td", props=td_props)
  #dict(selector="css", props=css)
  ]


from IPython.display import display_html
#pd.set_option('display.max_colwidth', 5)
display_html(display_kg(indf, in_ent, in_jk, outdf, out_ent, out_jk, styles, title, rel_score), raw=True)

Index(['Unnamed: 0', 'BusCompany', 'annualRidership>', 'numberOfLines>',
       'regionServed'],
      dtype='object')
Index(['Unnamed: 0', 'regionServed', 'areaTotal>', 'percentageOfAreaWater'], dtype='object')


Unnamed: 0.1,Unnamed: 0,BusCompany,annualRidership>,numberOfLines>,regionServed
0,0,Shuttle–UM,2956600.0,31,"Montgomery_County,_Maryland"
1,1,Razorback_Transit,1989087.0,19,"Washington_County,_Arkansas"
2,2,El_Metro_Transit,4300000.0,24,"Webb_County,_Texas"
3,3,Pace_(transit),28392400.0,218,"Will_County,_Illinois"
4,4,Roaring_Fork_Transportation_Authority,5470000.0,3,"Eagle_County,_Colorado"

Unnamed: 0.1,Unnamed: 0,regionServed,areaTotal>,percentageOfAreaWater
0,0,"Montgomery_County,_Maryland",1313.12,0.03
1,1,"Washington_County,_Arkansas",2464.94,0.6
2,2,"Webb_County,_Texas",8741.21,0.4
3,3,"Will_County,_Illinois",2198.9,0.01
4,4,"Eagle_County,_Colorado",4382.26,0.4


In [34]:
pd.set_option('display.max_colwidth', 10)

In [46]:
indf.dtypes['dbo:BusCompany']

dtype('O')

In [22]:
pd.get_option()

compute.use_bottleneck : bool
    Use the bottleneck library to accelerate if it is installed,
    the default is True
    Valid values: False,True
    [default: True] [currently: True]
compute.use_numba : bool
    Use the numba engine option for select operations if it is installed,
    the default is False
    Valid values: False,True
    [default: False] [currently: False]
compute.use_numexpr : bool
    Use the numexpr library to accelerate computation if it is installed,
    the default is True
    Valid values: False,True
    [default: True] [currently: True]
display.chop_threshold : float or None
    if set to a float value, all float values smaller then the given threshold
    will be displayed as exactly 0 by repr and friends.
    [default: None] [currently: None]
display.colheader_justify : 'left'/'right'
    Controls the justification of column headers. used by DataFrameFormatter.
    [default: right] [currently: right]
display.column_space No description available.
    [defa

In [116]:
from itertools import chain,cycle
#here's one more attempt at side-by-side
def display_side_by_side(*args,titles=cycle([''])):
    html_str=''
    for df,title in zip(args, chain(titles,cycle(['</br>'])) ):
        if type(df) == list:
            print(df)
        html_str+='<th style="text-align:center"><td style="vertical-align:top">'
        html_str+=f'<h2 style="text-align: center;">{title}</h2>'
        html_str+=df.to_html().replace('table','table style="display:inline"')
        html_str+='</td></th>'
    display_html(html_str,raw=True)

In [117]:
display_side_by_side(indf, outdf, titles=['input', 'output'])

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,dbo:BusCompany,<http://dbpedia.org/property/annualRidership>,<http://dbpedia.org/ontology/numberOfLines>,dbo:regionServed
0,0,0,http://dbpedia.org/resource/Shuttle–UM,2956600.0,31,"http://dbpedia.org/resource/Montgomery_County,_Maryland"
1,1,1,http://dbpedia.org/resource/Razorback_Transit,1989087.0,19,"http://dbpedia.org/resource/Washington_County,_Arkansas"
2,2,2,http://dbpedia.org/resource/El_Metro_Transit,4300000.0,24,"http://dbpedia.org/resource/Webb_County,_Texas"
3,3,3,http://dbpedia.org/resource/Pace_(transit),28392400.0,218,"http://dbpedia.org/resource/Will_County,_Illinois"
4,4,4,http://dbpedia.org/resource/Roaring_Fork_Transportation_Authority,5470000.0,3,"http://dbpedia.org/resource/Eagle_County,_Colorado"

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,dbo:regionServed,<http://dbpedia.org/ontology/PopulatedPlace/areaTotal>,dbo:percentageOfAreaWater
0,0,0,"http://dbpedia.org/resource/Montgomery_County,_Maryland",1313.123972,0.0
1,1,1,"http://dbpedia.org/resource/Washington_County,_Arkansas",2464.943484,0.0
2,2,2,"http://dbpedia.org/resource/Webb_County,_Texas",8741.209872,0.0
3,3,3,"http://dbpedia.org/resource/Will_County,_Illinois",2198.899906,0.0
4,4,4,"http://dbpedia.org/resource/Eagle_County,_Colorado",4382.259883,0.0
