# Project 1 Part 4 - Creating a master parcel database

In this part of the project, we will use Python to read, process, and double all of the parcel data into a database.  Note that this is not our only alternative, and in Project 1 Part 4 b, we will look at another alternative, that is reading all the of original, raw files into their own database table, then using SQL to join/link/aggregate the tables.

## Chunking Files in Pandas – Part 1 (20 Points)

In this part of the project, you will use `Panda`’s to process the data from the MinneMUDAC 2016 competition Dive into Water Data.  The data can be found at the [MinneMUDAC site](http://minneanalytics.org/minnemudac/data/).  You should document your work in a Jupyter notebook, which will be used to submit your solution.  **For the rest of the parts of this project, we will limit ourselves to the years 2004-2014.**

1. Remind me why we want to skip 2003.

There was not a lot of matching things in it so it made the union smaller

2. Import the common columns list and translation dictionaries from the `.py` file you created in the last part of the project.

In [1]:
from project_data_Quam import *

3. Use glob and a list comprehension to get a list of file names for the years 2004-2014.

In [2]:
from glob import glob
bad_files = ['./MinneMUDAC_raw_files/2002_metro_tax_parcels.txt', 
             './MinneMUDAC_raw_files/2003_metro_tax_parcels.txt', 
             './MinneMUDAC_raw_files/2015_metro_tax_parcels.txt',
             './MinneMUDAC_raw_files/2016_metro_tax_parcels.txt']
 

In [3]:
files = [file for file in glob('./MinneMUDAC_raw_files/20**_metro_tax_parcels.txt') if file not in bad_files ]
files[:2]

['./MinneMUDAC_raw_files/2009_metro_tax_parcels.txt',
 './MinneMUDAC_raw_files/2007_metro_tax_parcels.txt']

4. Use the first chunk of the first file to prototype an expression that <br>
    a. Selects the common columns <br>
    b. Fixes any issues with the column names <br>
    c. Changes columns to the correct types (if necessary).  More information about the columns can be found [here](ftp://ftp.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_metrogis/plan_regonal_prcls_open/metadata/metadata.html). It is **imperative** that you keep the lat and long columns as strings. <br>
    d. Use the translation dictionaries from the last part to add three new columns to the chunk: lake code, lake name, parcel distance to the lake.
    e. Filters to only properties that are within 1600 m (~1 mile) of the closest lake.

In [4]:
dtypes_dict = {'centroid_lat': str,
               'centroid_long': str}

In [5]:
import pandas as pd
from toolz import first
example_iter = pd.read_csv("./MinneMUDAC_raw_files/2005_metro_tax_parcels.txt", sep= "|", chunksize= 500, dtype=dtypes_dict)
example_chunk = first(example_iter)
len(example_chunk.columns)


70

In [6]:
string_types = {col: str for col in example_chunk.columns}
string_types

{'ACRES_DEED': str,
 'ACRES_POLY': str,
 'AGPRE_ENRD': str,
 'AGPRE_EXPD': str,
 'AG_PRESERV': str,
 'BASEMENT': str,
 'BLDG_NUM': str,
 'BLOCK': str,
 'CITY': str,
 'CITY_USPS': str,
 'COOLING': str,
 'COUNTY_ID': str,
 'DWELL_TYPE': str,
 'EMV_BLDG': str,
 'EMV_LAND': str,
 'EMV_TOTAL': str,
 'FIN_SQ_FT': str,
 'GARAGE': str,
 'GARAGESQFT': str,
 'GREEN_ACRE': str,
 'HEATING': str,
 'HOMESTEAD': str,
 'HOME_STYLE': str,
 'LANDMARK': str,
 'LOT': str,
 'MULTI_USES': str,
 'NUM_UNITS': str,
 'OPEN_SPACE': str,
 'OWNER_MORE': str,
 'OWNER_NAME': str,
 'OWN_ADD_L1': str,
 'OWN_ADD_L2': str,
 'OWN_ADD_L3': str,
 'PARC_CODE': str,
 'PIN': str,
 'PLAT_NAME': str,
 'PREFIXTYPE': str,
 'PREFIX_DIR': str,
 'SALE_DATE': str,
 'SALE_VALUE': str,
 'SCHOOL_DST': str,
 'SPEC_ASSES': str,
 'STREETNAME': str,
 'STREETTYPE': str,
 'SUFFIX_DIR': str,
 'Shape_Area': str,
 'Shape_Leng': str,
 'TAX_ADD_L1': str,
 'TAX_ADD_L2': str,
 'TAX_ADD_L3': str,
 'TAX_CAPAC': str,
 'TAX_EXEMPT': str,
 'TAX_NAME': st

In [7]:
from project_cols_to_keep_and_drop import *

In [8]:
from dfply import *
from more_dfply import recode, ifelse
df_ll = (example_chunk
         >> select(cols_to_keep)
         >> mutate(lat_long = pd.Series(zip(example_chunk.centroid_lat, example_chunk.centroid_long)))
         >> mutate(lake_name = recode(X.lat_long, lat_long_name_dict),
                   lake_code = recode(X.lat_long, lat_long_code_dict),
                   distance = recode(X.lat_long, lat_long_dist))
         >> filter_by(~X.lake_name.isna())
         >> filter_by(X.distance <= 1600)
        )
df_ll

Unnamed: 0,ACRES_DEED,ACRES_POLY,AGPRE_ENRD,AG_PRESERV,BASEMENT,CITY,COOLING,DWELL_TYPE,EMV_BLDG,EMV_LAND,...,XUSE3_DESC,XUSE4_DESC,YEAR_BUILT,Year,centroid_lat,centroid_long,lat_long,lake_name,lake_code,distance


In [9]:
len(lat_long_name_dict)

2300453

5. Now convert your expression from the last problem to a function and test that this function works on the first few chunks of each file.

In [10]:
from functoolz import pipeable
read_parcel = lambda path: pd.read_csv(path, sep= "|", chunksize= 50000, dtype=string_types)


In [28]:
add_name_code_dist = pipeable(lambda chunk:(chunk
                                            >> select(cols_to_keep)
                                            >> mutate(lat_long = pd.Series(zip(chunk.centroid_lat, chunk.centroid_long)))
                                            >> mutate(lake_name = recode(X.lat_long, lat_long_name_dict),
                                                      lake_code = recode(X.lat_long, lat_long_code_dict),
                                                      distance = recode(X.lat_long, lat_long_dist))
                                            >> drop(X.lat_long)
                                            >> filter_by(~X.lake_name.isna())
                                            >> filter_by(~X.distance.astype(float).isna())
                                            >> filter_by(X.distance.astype(float) <= 1600)))


In [12]:
ex = read_parcel(files[3])

In [12]:
poo =next(ex)

  interactivity=interactivity, compiler=compiler, result=result)


In [13]:
poo

Unnamed: 0,ACRES_DEED,ACRES_POLY,AGPRE_ENRD,AGPRE_EXPD,AG_PRESERV,BASEMENT,BLDG_NUM,BLOCK,CITY,CITY_USPS,...,XUSE1_DESC,XUSE2_DESC,XUSE3_DESC,XUSE4_DESC,YEAR_BUILT,Year,ZIP,ZIP4,centroid_long,centroid_lat
0,0.0,8.03,,,N,N,,,ST FRANCIS,,...,,,,,1980.0,2005,,,-93.26739,45.41332
1,0.0,0.93,,,N,Y,24457.0,,ST FRANCIS,BETHEL,...,,,,,1974.0,2005,55005,9547.0,-93.2701,45.41354
2,0.0,8.75,,,N,Y,24442.0,,ST FRANCIS,BETHEL,...,,,,,1969.0,2005,55005,9547.0,-93.27344,45.41318
3,0.0,11.17,,,N,N,410.0,,ST FRANCIS,BETHEL,...,,,,,1989.0,2005,55005,9404.0,-93.27684,45.41167
4,0.0,14.46,,,N,Y,480.0,,ST FRANCIS,BETHEL,...,,,,,1995.0,2005,55070,9404.0,-93.27849,45.41169
5,0.0,82.14,,,N,,,,ST FRANCIS,,...,,,,,0.0,2005,,,-93.29885,45.40981
6,0.0,4.69,,,N,Y,532.0,,ST FRANCIS,ST FRANCIS,...,,,,,2004.0,2005,,5070.0,-93.27973,45.41172
7,0.0,4.59,,,N,Y,550.0,,ST FRANCIS,ISANTI,...,,,,,1969.0,2005,55040,4552.0,-93.28033,45.41168
8,0.0,4.28,,,N,,,,ST FRANCIS,,...,,,,,0.0,2005,,,-93.28091,45.4117
9,0.0,38.17,,,N,,,,ST FRANCIS,,...,,,,,0.0,2005,,,-93.28364,45.41162


In [14]:
example_file = add_name_code_dist(next(ex))
example_file

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,ACRES_DEED,ACRES_POLY,AGPRE_ENRD,AG_PRESERV,BASEMENT,CITY,COOLING,DWELL_TYPE,EMV_BLDG,EMV_LAND,...,XUSE3_DESC,XUSE4_DESC,YEAR_BUILT,Year,centroid_lat,centroid_long,lat_long,lake_name,lake_code,distance


6. We need to make a unique primary key for each row in the combined parcel file.<br>
    a. There is a column that appears to be a unique parcel id.  Double check that this is a true primary key for each individual file. (To do this you need to verify that the number of unique values is the same as the number of rows for each of the parcel files.  **Hint:** For each file, use of the accumulator pattern with two accumualtors (one number and one data frame). <br>
    b. Explain why this column will not work as a primary key if we want to combine all years in one database. <br>
    c. Suppose we make a new column that consist of `str(year) + '-' + PID`.  Explain why this should make a proper primary key for the combined data. <br>

In [13]:
from project_cols_to_keep_and_drop import *

In [14]:
cols_to_keep

['ACRES_DEED',
 'ACRES_POLY',
 'AGPRE_ENRD',
 'AG_PRESERV',
 'BASEMENT',
 'CITY',
 'COOLING',
 'DWELL_TYPE',
 'EMV_BLDG',
 'EMV_LAND',
 'FIN_SQ_FT',
 'GARAGE',
 'GARAGESQFT',
 'GREEN_ACRE',
 'HOMESTEAD',
 'LANDMARK',
 'OWN_ADD_L1',
 'OWN_ADD_L2',
 'OWN_ADD_L3',
 'PARC_CODE',
 'PIN',
 'SALE_VALUE',
 'SPEC_ASSES',
 'TAX_CAPAC',
 'TAX_EXEMPT',
 'TOTAL_TAX',
 'USE1_DESC',
 'USE2_DESC',
 'USE3_DESC',
 'USE4_DESC',
 'WSHD_DIST',
 'XUSE1_DESC',
 'XUSE2_DESC',
 'XUSE3_DESC',
 'XUSE4_DESC',
 'YEAR_BUILT',
 'Year',
 'centroid_lat',
 'centroid_long']

In [17]:
unique_pins = set([])
rows = 0
for file in files:
    print("\n\n\n Starting to process " + file)
    for i, chunk in enumerate(read_parcel(file)):
        #processed_chunk = add_name_code_dist(chunk)
        rows = rows + len(chunk.PIN)
        unique_pins = unique_pins.union(chunk.PIN)
        print("On chunk" + str(i) + " : amount of unique pins = " + str(len(unique_pins)) + " ,     total amount = " + str(rows))




 Starting to process ./MinneMUDAC_raw_files/2009_metro_tax_parcels.txt


  interactivity=interactivity, compiler=compiler, result=result)


On chunk0 : amount of unique pins = 49728 ,     total amount = 50000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk1 : amount of unique pins = 99299 ,     total amount = 100000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk2 : amount of unique pins = 149165 ,     total amount = 150000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk3 : amount of unique pins = 199160 ,     total amount = 200000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk4 : amount of unique pins = 249160 ,     total amount = 250000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk5 : amount of unique pins = 299160 ,     total amount = 300000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk6 : amount of unique pins = 349159 ,     total amount = 350000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk7 : amount of unique pins = 399159 ,     total amount = 400000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk8 : amount of unique pins = 449159 ,     total amount = 450000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk9 : amount of unique pins = 499159 ,     total amount = 500000


  interactivity=interactivity, compiler=compiler, result=result)


On chunk10 : amount of unique pins = 549159 ,     total amount = 550000


KeyboardInterrupt: 

7. Make a function to add the key suggested in the last problem (`str(year) + '-' + PID`) to a given chunk.

In [None]:
example_chunk.dtypes['PIN']

In [29]:
add_primary_key = pipeable(lambda start, df: (df
                                              >> mutate(id = np.arange(start, start + len(df))
                                              )))

#### Note: If you are clever, you can do parts 8 in one double loop, which will save you from having to read the parcel files twice.

8. It is probably worth our time to test that our new key column is truely unique. (If not, we might be wasting out time loading the data into a database, only to have process fail hours in.) Test that the new column works by <br>
    a. Iterating over all the files.<br>
    b. Using an accumulator to count total number of rows across all parcel files. <br>
    c. Using an accumulator to accumulate a set of all unique values of our new key. <br>
    d. Verifying that we have as many total rows as unique keys.
    a. Selecting just this column. <br>
    b. Dumping this column into a temporary database <br>

In [19]:
unique_ids = set([])
rows = 0
rows_so_far = 0
#for file in files:
 #   print("\n\n\n Starting to process " + file)
  #  c = 0
   # for i, chunk in enumerate(read_parcel(file)):
    #    while c <= 2 :
     #       c = c+1
      #      processed_chunk = add_primary_key(rows_so_far,chunk)
       #     rows = rows + len(processed_chunk.id)
        #    unique_ids = unique_ids.union(processed_chunk.id)
         #   print("On chunk" + str(i) + " : amount of unique ids = " + str(len(unique_ids)) + " ,     total amount = " + str(rows))
          #  rows_so_far = rows_so_far + len(chunk)
        

In [None]:
add_primary_key(0,example_chunk)

In [None]:
example_chunk.AGPRE_ENRD


In [30]:
from project_raw_sql_parsel_types import *

9. If the last step succeeded, you can proceed to make a master parcel data database.  If not, you will need to figure out another primary key, probably an `id` column similar to the example in the lectures.

In [31]:
!rm ./databases/parcel.db

In [32]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///databases/parcel.db', echo=False)

In [33]:
from toolz import first
import pandas as pd
complete_first_chunk = add_primary_key(0,add_name_code_dist(first(read_parcel(files[0]))))

In [34]:
schema = pd.io.sql.get_schema(complete_first_chunk, # dataframe
                              'parcel', # name in SQL db
                              keys='id', # primary key
                              con=engine, # connection
                              dtype=common_parcel_types # SQL types
)
print(schema)
engine.execute(schema)


CREATE TABLE parcel (
	"ACRES_DEED" VARCHAR, 
	"ACRES_POLY" VARCHAR, 
	"AGPRE_ENRD" VARCHAR, 
	"AG_PRESERV" VARCHAR, 
	"BASEMENT" VARCHAR, 
	"CITY" VARCHAR, 
	"COOLING" VARCHAR, 
	"DWELL_TYPE" VARCHAR, 
	"EMV_BLDG" VARCHAR, 
	"EMV_LAND" VARCHAR, 
	"FIN_SQ_FT" VARCHAR, 
	"GARAGE" VARCHAR, 
	"GARAGESQFT" VARCHAR, 
	"GREEN_ACRE" VARCHAR, 
	"HOMESTEAD" VARCHAR, 
	"LANDMARK" VARCHAR, 
	"OWN_ADD_L1" VARCHAR, 
	"OWN_ADD_L2" VARCHAR, 
	"OWN_ADD_L3" VARCHAR, 
	"PARC_CODE" VARCHAR, 
	"PIN" VARCHAR, 
	"SALE_VALUE" VARCHAR, 
	"SPEC_ASSES" VARCHAR, 
	"TAX_CAPAC" VARCHAR, 
	"TAX_EXEMPT" VARCHAR, 
	"TOTAL_TAX" VARCHAR, 
	"USE1_DESC" VARCHAR, 
	"USE2_DESC" VARCHAR, 
	"USE3_DESC" VARCHAR, 
	"USE4_DESC" VARCHAR, 
	"WSHD_DIST" VARCHAR, 
	"XUSE1_DESC" VARCHAR, 
	"XUSE2_DESC" VARCHAR, 
	"XUSE3_DESC" VARCHAR, 
	"XUSE4_DESC" VARCHAR, 
	"YEAR_BUILT" VARCHAR, 
	"Year" VARCHAR, 
	centroid_lat VARCHAR, 
	centroid_long VARCHAR, 
	lake_name TEXT, 
	lake_code TEXT, 
	distance FLOAT, 
	id BIGINT NOT NULL, 
	CONSTRA

<sqlalchemy.engine.result.ResultProxy at 0x1697336d8>

In [21]:
unique_ids = set([])
rows = 0
rows_so_far = 0
for file in files:
    print("\n\n\n Starting to process " + file)
    for i, chunk in enumerate(read_parcel(file)):
        processed_chunk = add_primary_key(rows_so_far,chunk)
        rows = rows + len(processed_chunk.id)
        unique_ids = unique_ids.union(processed_chunk.id)
        print("On chunk" + str(i) + " : amount of unique ids = " + str(len(unique_ids)) + " ,     total amount = " + str(rows))
        rows_so_far = rows_so_far + len(chunk)
        
        




 Starting to process ./MinneMUDAC_raw_files/2009_metro_tax_parcels.txt
On chunk0 : amount of unique ids = 50000 ,     total amount = 50000
On chunk1 : amount of unique ids = 100000 ,     total amount = 100000
On chunk2 : amount of unique ids = 150000 ,     total amount = 150000
On chunk3 : amount of unique ids = 200000 ,     total amount = 200000
On chunk4 : amount of unique ids = 250000 ,     total amount = 250000
On chunk5 : amount of unique ids = 300000 ,     total amount = 300000


KeyboardInterrupt: 

In [38]:
rows_so_far = len(complete_first_chunk)
for file in files:
    print("\n\n\n Starting to process " + file)
    for i, chunk in enumerate(read_parcel(file)):
        processed_chunk = add_primary_key(rows_so_far,add_name_code_dist(chunk))
        processed_chunk.to_sql('parcel', 
                               con=engine, 
                               dtype=common_parcel_types, 
                               index=False,
                               if_exists='append')
        rows_so_far = rows_so_far + len(chunk)
        print("On chunk" + str(i) + " : amount of unique ids = " + str(rows_so_far))
        




 Starting to process ./MinneMUDAC_raw_files/2009_metro_tax_parcels.txt
On chunk0 : amount of unique ids = 62458
On chunk1 : amount of unique ids = 112458
On chunk2 : amount of unique ids = 162458
On chunk3 : amount of unique ids = 212458
On chunk4 : amount of unique ids = 262458
On chunk5 : amount of unique ids = 312458
On chunk6 : amount of unique ids = 362458
On chunk7 : amount of unique ids = 412458
On chunk8 : amount of unique ids = 462458
On chunk9 : amount of unique ids = 512458
On chunk10 : amount of unique ids = 562458
On chunk11 : amount of unique ids = 612458
On chunk12 : amount of unique ids = 662458
On chunk13 : amount of unique ids = 712458
On chunk14 : amount of unique ids = 762458
On chunk15 : amount of unique ids = 812458
On chunk16 : amount of unique ids = 862458
On chunk17 : amount of unique ids = 912458
On chunk18 : amount of unique ids = 962458
On chunk19 : amount of unique ids = 1012458
On chunk20 : amount of unique ids = 1062458
On chunk21 : amount of unique id

  interactivity=interactivity, compiler=compiler, result=result)


On chunk23 : amount of unique ids = 1212458


  interactivity=interactivity, compiler=compiler, result=result)


On chunk24 : amount of unique ids = 1262458
On chunk25 : amount of unique ids = 1312458
On chunk26 : amount of unique ids = 1362458
On chunk27 : amount of unique ids = 1412458
On chunk28 : amount of unique ids = 1462458
On chunk29 : amount of unique ids = 1512458
On chunk30 : amount of unique ids = 1562458
On chunk31 : amount of unique ids = 1612458
On chunk32 : amount of unique ids = 1662458
On chunk33 : amount of unique ids = 1712458
On chunk34 : amount of unique ids = 1762458
On chunk35 : amount of unique ids = 1812458
On chunk36 : amount of unique ids = 1862458
On chunk37 : amount of unique ids = 1912458
On chunk38 : amount of unique ids = 1962458
On chunk39 : amount of unique ids = 2012458
On chunk40 : amount of unique ids = 2062458
On chunk41 : amount of unique ids = 2100676



 Starting to process ./MinneMUDAC_raw_files/2007_metro_tax_parcels.txt
On chunk0 : amount of unique ids = 2150676
On chunk1 : amount of unique ids = 2200676
On chunk2 : amount of unique ids = 2250676
On ch

  interactivity=interactivity, compiler=compiler, result=result)


On chunk23 : amount of unique ids = 3300676


  interactivity=interactivity, compiler=compiler, result=result)


On chunk24 : amount of unique ids = 3350676
On chunk25 : amount of unique ids = 3400676
On chunk26 : amount of unique ids = 3450676
On chunk27 : amount of unique ids = 3500676
On chunk28 : amount of unique ids = 3550676
On chunk29 : amount of unique ids = 3600676
On chunk30 : amount of unique ids = 3650676
On chunk31 : amount of unique ids = 3700676
On chunk32 : amount of unique ids = 3750676
On chunk33 : amount of unique ids = 3800676
On chunk34 : amount of unique ids = 3850676
On chunk35 : amount of unique ids = 3900676
On chunk36 : amount of unique ids = 3950676
On chunk37 : amount of unique ids = 4000676
On chunk38 : amount of unique ids = 4050676
On chunk39 : amount of unique ids = 4100676
On chunk40 : amount of unique ids = 4126159



 Starting to process ./MinneMUDAC_raw_files/2011_metro_tax_parcels.txt
On chunk0 : amount of unique ids = 4176159
On chunk1 : amount of unique ids = 4226159
On chunk2 : amount of unique ids = 4276159
On chunk3 : amount of unique ids = 4326159
On chu

  interactivity=interactivity, compiler=compiler, result=result)


On chunk27 : amount of unique ids = 11702250
On chunk28 : amount of unique ids = 11752250
On chunk29 : amount of unique ids = 11802250
On chunk30 : amount of unique ids = 11852250
On chunk31 : amount of unique ids = 11902250
On chunk32 : amount of unique ids = 11952250
On chunk33 : amount of unique ids = 12002250
On chunk34 : amount of unique ids = 12052250
On chunk35 : amount of unique ids = 12102250


  interactivity=interactivity, compiler=compiler, result=result)


On chunk36 : amount of unique ids = 12152250
On chunk37 : amount of unique ids = 12202250
On chunk38 : amount of unique ids = 12252250
On chunk39 : amount of unique ids = 12302250
On chunk40 : amount of unique ids = 12352250
On chunk41 : amount of unique ids = 12402250
On chunk42 : amount of unique ids = 12418649



 Starting to process ./MinneMUDAC_raw_files/2008_metro_tax_parcels.txt
On chunk0 : amount of unique ids = 12468649
On chunk1 : amount of unique ids = 12518649
On chunk2 : amount of unique ids = 12568649
On chunk3 : amount of unique ids = 12618649
On chunk4 : amount of unique ids = 12668649
On chunk5 : amount of unique ids = 12718649
On chunk6 : amount of unique ids = 12768649
On chunk7 : amount of unique ids = 12818649
On chunk8 : amount of unique ids = 12868649
On chunk9 : amount of unique ids = 12918649
On chunk10 : amount of unique ids = 12968649
On chunk11 : amount of unique ids = 13018649
On chunk12 : amount of unique ids = 13068649
On chunk13 : amount of unique ids = 

  interactivity=interactivity, compiler=compiler, result=result)


On chunk24 : amount of unique ids = 13668649


  interactivity=interactivity, compiler=compiler, result=result)


On chunk25 : amount of unique ids = 13718649
On chunk26 : amount of unique ids = 13768649
On chunk27 : amount of unique ids = 13818649
On chunk28 : amount of unique ids = 13868649
On chunk29 : amount of unique ids = 13918649
On chunk30 : amount of unique ids = 13968649
On chunk31 : amount of unique ids = 14018649
On chunk32 : amount of unique ids = 14068649
On chunk33 : amount of unique ids = 14118649
On chunk34 : amount of unique ids = 14168649
On chunk35 : amount of unique ids = 14218649
On chunk36 : amount of unique ids = 14268649
On chunk37 : amount of unique ids = 14318649
On chunk38 : amount of unique ids = 14368649
On chunk39 : amount of unique ids = 14418649
On chunk40 : amount of unique ids = 14468649
On chunk41 : amount of unique ids = 14518649
On chunk42 : amount of unique ids = 14528370



 Starting to process ./MinneMUDAC_raw_files/2010_metro_tax_parcels.txt
On chunk0 : amount of unique ids = 14578370
On chunk1 : amount of unique ids = 14628370
On chunk2 : amount of unique

  interactivity=interactivity, compiler=compiler, result=result)


On chunk23 : amount of unique ids = 15728370


  interactivity=interactivity, compiler=compiler, result=result)


On chunk24 : amount of unique ids = 15778370
On chunk25 : amount of unique ids = 15828370
On chunk26 : amount of unique ids = 15878370
On chunk27 : amount of unique ids = 15928370
On chunk28 : amount of unique ids = 15978370
On chunk29 : amount of unique ids = 16028370
On chunk30 : amount of unique ids = 16078370
On chunk31 : amount of unique ids = 16128370
On chunk32 : amount of unique ids = 16178370
On chunk33 : amount of unique ids = 16228370
On chunk34 : amount of unique ids = 16278370
On chunk35 : amount of unique ids = 16328370
On chunk36 : amount of unique ids = 16378370
On chunk37 : amount of unique ids = 16428370
On chunk38 : amount of unique ids = 16478370
On chunk39 : amount of unique ids = 16528370
On chunk40 : amount of unique ids = 16578370
On chunk41 : amount of unique ids = 16626168



 Starting to process ./MinneMUDAC_raw_files/2006_metro_tax_parcels.txt
On chunk0 : amount of unique ids = 16676168
On chunk1 : amount of unique ids = 16726168
On chunk2 : amount of unique