## Address Data Format Validation
- Checks whether address data conforms to expectations
    - Values fit expected catogery names
    - State is always = MN
    - Zip is always 5 digits
    - No missing Address
    - No missing City

### Input:
- Csv format file

### Output:
- Records with formatting issues saved as csv

In [1]:
import pandas as pd
import arcpy

In [98]:
## Read in the new address data spreadsheet
file_path = r"C:\Projects\GIS Tools\DataValidation\1Data\ScottCoFoodSources_Additions - Locations.csv"# update for your file path
df = pd.read_csv(file_path)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Name                20 non-null     object
 1   Addresss            20 non-null     object
 2   City                20 non-null     object
 3   State               20 non-null     object
 4   Zip                 20 non-null     int64 
 5   TYP                 20 non-null     object
 6   County              20 non-null     object
 7   Within_ScottCounty  20 non-null     int64 
dtypes: int64(2), object(6)
memory usage: 1.4+ KB


In [3]:
### Check for invalid state in the new list of addresses
invalid_state = df[df["State"] !="MN"]

if not invalid_state.empty:
     print(invalid_state)
else:
     print("All State Names Valid")

All State Names Valid


In [4]:
## Search for County Open Data

from IPython.display import IFrame # a module for controlling notebook outputs, allows you to embed images, video, webpages with the Iframe function

# County URL for addresses
tool_url = "https://open-data-scottcounty.hub.arcgis.com/"

# Display the documentation inside Jupyter Notebook
IFrame(tool_url, width="100%", height="600px") # iframe can be used to display local or online webpages, documents, reports, visualizations , videos

In [5]:
## Download County Address data

from IPython.display import IFrame # a module for controlling notebook outputs, allows you to embed images, video, webpages with the Iframe function

# County URL for addresses
tool_url = "https://open-data-scottcounty.hub.arcgis.com/datasets/ScottCounty::address-points/explore?location=44.786991%2C-93.542122%2C18.86"

# Display the documentation inside Jupyter Notebook
IFrame(tool_url, width="100%", height="600px") # iframe can be used to display local or online webpages, documents, reports, visualizations , videos

In [6]:
## Download City Township data, as a shapefile

from IPython.display import IFrame # a module for controlling notebook outputs, allows you to embed images, video, webpages with the Iframe function

# County URL for addresses
tool_url = "https://gisdata.mn.gov/dataset/bdry-mn-city-township-unorg"

# Display the documentation inside Jupyter Notebook
IFrame(tool_url, width="100%", height="600px") # iframe can be used to display local or online webpages, documents, reports, visualizations , videos

In [None]:
### Pull in some reference datasets for city and county names

### Pull in only the attribute table for the city township shapefile
city_ref_file = r"C:\Projects\GIS Tools\DataValidation\1Data\shp_bdry_mn_city_township_unorg\city_township_unorg.dbf" # update for your file path


In [None]:

# Extract field names
fields= [field.name for field in arcpy.ListFields(city_ref_file)]
print("Availiable field names: ", fields)

Availiable field names:  ['FID', 'Shape', 'GNIS_FEATU', 'FEATURE_NA', 'CTU_CLASS', 'COUNTY_GNI', 'COUNTY_COD', 'COUNTY_NAM', 'POPULATION', 'SHAPE_Leng', 'SHAPE_Area']


### **TableToNumPyArray()** 
- Goal : validate the format of the data in our spreadsheet with pandas
- ArcPy doesn’t allow direct conversion to pandas—it works with NumPy structured arrays
- Need to convert the attribute table to a Numpy Array 
    - A NumPy array is a structured, efficient way to store and process numerical data in Python
    - Pandas is better for structured data operations than raw numerical data --> NumPy arrays
        - Structured data = Data that has both a tabular structure (rows and columns) and explicit labels for indexing and referencing values—just like a spreadsheet 


- TableToNumPyArray (in_table, field_names, {where_clause}, {skip_nulls}, {null_value})

#### Pandas and NumPy are Faster than Arcpy for tabular data processing
- Vectorized Operations: NumPy and Pandas allows batch processing instead of iterating row-by-row like SearchCursor, which speeds up computations
- Optimized Memory Management: NumPy processes data in blocks, reducing overhead compared to ArcPy’s Python-based cursor operations
- Efficient Filtering: Pandas and NumPy can apply filters or transformations across entire fields without looping through records, which is much faster than manually iterating with arcpy 

#### Key points:
- less than 10K records, Arcpy is similar to pandas
    - if using arcpy cursors, use where_clause to limit records python sees 
- More than 100K records, pandas and numpy would be faster

In [None]:
### optional function to capture the null rows when we are converting the data to numpy array (rather than just ignoring and keeping them)
def getnull_records(fid):
    nullRows.append(fid)
    return True

nullRows = list() # will hold the rows with nulls

### TOOL: TableToNumPyArray (in_table, field_names, {where_clause}, {skip_nulls}, {null_value})
array = arcpy.da.TableToNumPyArray(city_ref_file, fields, skip_nulls=getnull_records) # For field names can specify them as: * for all, OR use  ["Field_name1", "Field_name2"]

print(nullRows)

[]


In [10]:
city_county_df = pd.DataFrame(array.tolist(), columns=fields)

city_county_df.head() 

Unnamed: 0,FID,Shape,GNIS_FEATU,FEATURE_NA,CTU_CLASS,COUNTY_GNI,COUNTY_COD,COUNTY_NAM,POPULATION,SHAPE_Leng,SHAPE_Area
0,0,"[225598.52628416658, 5378551.765985596]",663477,Augsburg,TOWNSHIP,659489,45,Marshall,74,38750.666647,93841190.0
1,1,"[366370.8753846548, 4986857.45185792]",663921,Danielson,TOWNSHIP,659492,47,Meeker,279,38599.51503,93177150.0
2,2,"[328504.432222157, 5148648.358436297]",2394210,Bluffton,CITY,659501,56,Otter Tail,208,12870.280762,7087683.0
3,3,"[464878.93198022206, 5059560.480763708]",663916,Dalbo,TOWNSHIP,659475,30,Isanti,807,38863.361414,93726810.0
4,4,"[206181.76129246986, 5408472.681788991]",664373,Hallock,TOWNSHIP,659480,35,Kittson,85,45478.109443,89676340.0


In [12]:
city_county_df.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2743 entries, 0 to 2742
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   FID         2743 non-null   int64  
 1   Shape       2743 non-null   object 
 2   GNIS_FEATU  2743 non-null   int64  
 3   FEATURE_NA  2743 non-null   object 
 4   CTU_CLASS   2743 non-null   object 
 5   COUNTY_GNI  2743 non-null   int64  
 6   COUNTY_COD  2743 non-null   object 
 7   COUNTY_NAM  2743 non-null   object 
 8   POPULATION  2743 non-null   int64  
 9   SHAPE_Leng  2743 non-null   float64
 10  SHAPE_Area  2743 non-null   float64
dtypes: float64(2), int64(4), object(5)
memory usage: 235.9+ KB


In [None]:
### Confirms there are no null records (i.e., row count= non-null count)

In [13]:
# Standardize case for consistency -- ensure every word start with caps letter 
city_county_df['FEATURE_NA'] =city_county_df['FEATURE_NA'].str.title()
city_county_df['COUNTY_NAM'] =city_county_df['COUNTY_NAM'].str.title()

# Extract distinct city-county pairs
valid_city_county = city_county_df[['FEATURE_NA', 'COUNTY_NAM']].dropna().drop_duplicates()

In [14]:
### Reminder of our input list of addresses' fields
df.columns

Index(['Name', 'Addresss', 'City', 'State', 'Zip', 'TYP', 'County',
       'Within_ScottCounty'],
      dtype='object')

In [None]:


# Merge the input dataset with the reference dataset
merged_df = df.merge(valid_city_county, left_on=['City', 'County'], right_on=['FEATURE_NA', 'COUNTY_NAM'], how='left', indicator=True)

merged_df.head(20)

                                                 Name  \
0                               Faith Covenant Church   
1                         St. Stephen Lutheran Church   
2                      Judson Memorial Baptist Church   
3                Mount Calvary Lutheran Church - ELCA   
4                Community Action Center - Northfield   
5                       Northfield Hospital + Clinics   
6                      Scott Carver Dakota CAP Agency   
7                      Chaska Free Food Distribution    
8       Norwood Young America Free Food Distribution    
9                                    Mi Casita Pantry   
10                        Bountiful Basket Food Shelf   
11                          Eagle Ridge Middle School   
12                                     Hosanna Church   
13                            Belle Plaine Food Shelf   
14                         St. John's Lutheran Church   
15                               Bethel's Rock Church   
16                             

In [16]:
merged_df

Unnamed: 0,Name,Addresss,City,State,Zip,TYP,County,Within_ScottCounty,FEATURE_NA,COUNTY_NAM,_merge
0,Faith Covenant Church,12921 Nicollet Ave,Burnsville,MN,55337,Home Delivered Meals,Dakota,1,Burnsville,Dakota,both
1,St. Stephen Lutheran Church,8400 France Ave S,Bloomington,MN,55431,Home Delivered Meals,Hennepin,0,Bloomington,Hennepin,both
2,Judson Memorial Baptist Church,4101 Harriet Ave,Minneapolis,MN,55409,Home Delivered Meals,Hennepin,0,Minneapolis,Hennepin,both
3,Mount Calvary Lutheran Church - ELCA,301 County Road 19,Excelsior,MN,55331,Home Delivered Meals,Hennepin,0,Excelsior,Hennepin,both
4,Community Action Center - Northfield,1651 Jefferson Pkwy Ste Hs-200,Northfield,MN,55057,Home Delivered Meals,Rice,0,Northfield,Rice,both
5,Northfield Hospital + Clinics,2000 North Avenue,Northfield,MN,55057,Home Delivered Meals,Dakota,0,Northfield,Dakota,both
6,Scott Carver Dakota CAP Agency,738 1st Ave E,Shakopee,MN,55379,Food Shelf,Scott,1,Shakopee,Scott,both
7,Chaska Free Food Distribution,2100 Stoughton Ave,Chaska,MN,55318,Free Food Box,Carver,0,Chaska,Carver,both
8,Norwood Young America Free Food Distribution,310 Elm St. W,Norwood Young America,MN,55368,Free Food Box,Carver,0,Norwood Young America,Carver,both
9,Mi Casita Pantry,1053 Jefferson St S,Shakopee,MN,55379,Food Shelf,Scott,1,Shakopee,Scott,both


In [68]:
### Pull in reference datasets for county addresses
### address reference file for csv
address_ref_file = r"C:\Projects\GIS Tools\DataValidation\1Data\Address_Points_ScottCounty.csv"  # update for your file path

In [69]:
address_df = pd.read_csv(address_ref_file)
print(f"*Data Types Summary*: \n", address_df.dtypes)

*Data Types Summary*: 
 OBJECTID                             int64
Address Unique Identifier           object
Local Address Unique Identifier     object
Address Number Prefix              float64
Address Number                       int64
                                    ...   
LST_TYPE                           float64
LST_POSDIR                         float64
STATE                               object
x                                  float64
y                                  float64
Length: 74, dtype: object


  address_df = pd.read_csv(address_ref_file)


In [70]:
address_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64506 entries, 0 to 64505
Data columns (total 74 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   OBJECTID                         64506 non-null  int64  
 1   Address Unique Identifier        64506 non-null  object 
 2   Local Address Unique Identifier  64506 non-null  object 
 3   Address Number Prefix            0 non-null      float64
 4   Address Number                   64506 non-null  int64  
 5   Address Number Suffix            41 non-null     object 
 6   Street Name Pre Modifier         0 non-null      float64
 7   Street Name Pre Directional      0 non-null      float64
 8   Street Name Pre Type             1 non-null      object 
 9   Street Name Pre Separator        0 non-null      float64
 10  Street Name                      64506 non-null  object 
 11  Street Name Post Type            63648 non-null  object 
 12  StreetName Post Di

In [71]:
address_df.head()

Unnamed: 0,OBJECTID,Address Unique Identifier,Local Address Unique Identifier,Address Number Prefix,Address Number,Address Number Suffix,Street Name Pre Modifier,Street Name Pre Directional,Street Name Pre Type,Street Name Pre Separator,...,AC_DATE,USPS_VALIDATION_COMMENTS,EDIT_DATE,LST_PREDIR,LST_NAME,LST_TYPE,LST_POSDIR,STATE,x,y
0,1,CEB5E0CD-A973-4E95-AD57-9C8C2AD6AF9B,CEB5E0CD-A973-4E95-AD57-9C8C2AD6AF9B,,2090,,,,,,...,5/1/2025 12:00:00 AM,,5/6/2025 6:18:09 PM,,,,,Minnesota,453389.785216,216853.079825
1,2,CBCE5EAD-6078-49BD-9925-14E70D394CFC,CBCE5EAD-6078-49BD-9925-14E70D394CFC,,2092,,,,,,...,5/1/2025 12:00:00 AM,,5/6/2025 6:18:09 PM,,,,,Minnesota,453413.745974,216830.821154
2,3,8B54A234-AAF9-4C9B-8F07-F106ECE9E4E8,8B54A234-AAF9-4C9B-8F07-F106ECE9E4E8,,2094,,,,,,...,5/1/2025 12:00:00 AM,,5/6/2025 6:18:09 PM,,,,,Minnesota,453413.695448,216970.378457
3,4,CE053903-CE56-4CF5-B589-E4983922D1CB,CE053903-CE56-4CF5-B589-E4983922D1CB,,2096,,,,,,...,5/1/2025 12:00:00 AM,,5/6/2025 6:18:09 PM,,,,,Minnesota,453435.556493,216951.758499
4,5,CC1AB691-67ED-4D32-BA32-17A4BEE3F4C5,CC1AB691-67ED-4D32-BA32-17A4BEE3F4C5,,2098,,,,,,...,5/1/2025 12:00:00 AM,,5/6/2025 6:18:09 PM,,,,,Minnesota,453459.100453,216932.352111


In [72]:
address_df.columns

Index(['OBJECTID', 'Address Unique Identifier',
       'Local Address Unique Identifier', 'Address Number Prefix',
       'Address Number', 'Address Number Suffix', 'Street Name Pre Modifier',
       'Street Name Pre Directional', 'Street Name Pre Type',
       'Street Name Pre Separator', 'Street Name', 'Street Name Post Type',
       'StreetName Post Directional', 'Street Name Post Modifier',
       'Subaddress Type 1', 'Subaddress Identifier 1', 'Subaddress Type 2',
       'Subaddress Identifier 2', 'ZIP Code', 'ZIP Plus 4', 'CTU Name',
       'CTU Code', 'Postal Community Name', 'County Code', 'County Name',
       'State Code', 'Location Description', 'Complete Landmark Name',
       'Residence', 'Mailable Address', 'Parcel Unique Identifier',
       'Placement Location', 'Centerline Geocodable',
       'Unique Without Subaddresses', 'Longitude', 'Latitude',
       'US National Grid Code', '911 GIS Point-of-Contact',
       'Emergency Service Number', 'PSAP Code', 'MSAG Community 

In [73]:
### Check if the addresses are formatted in the same data type
if not address_df["FULL_ADDRESS_USPS"].dtype == df["Addresss"].dtype:
    print("Addresses in the two datasets are not of the same datatype. Need to reformat before merge")
else:
    
    print("OK to merge Address in the input and reference have the same format: ", address_df["FULL_ADDRESS_USPS"].dtype)

OK to merge Address in the input and reference have the same format:  object


In [74]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Name                20 non-null     object
 1   Addresss            20 non-null     object
 2   City                20 non-null     object
 3   State               20 non-null     object
 4   Zip                 20 non-null     int64 
 5   TYP                 20 non-null     object
 6   County              20 non-null     object
 7   Within_ScottCounty  20 non-null     int64 
dtypes: int64(2), object(6)
memory usage: 1.4+ KB


In [75]:
### Reformat the reference addresses casing to match the input
address_df["FULL_ADDRESS_USPS"] = address_df["FULL_ADDRESS_USPS"].str.title()
address_df['CTU Name'] = address_df['CTU Name'].str.title()
address_df['County Name'] = address_df['County Name'].str.title()

In [76]:
### Merge input with address reference 


merged_address_df = df.merge(address_df[['FULL_ADDRESS_USPS','CTU Name','County Name']],how="left", left_on=["Addresss", "City","County"], right_on=['FULL_ADDRESS_USPS','CTU Name','County Name'],indicator=True)

merged_address_df

Unnamed: 0,Name,Addresss,City,State,Zip,TYP,County,Within_ScottCounty,FULL_ADDRESS_USPS,CTU Name,County Name,_merge
0,Faith Covenant Church,12921 Nicollet Ave,Burnsville,MN,55337,Home Delivered Meals,Dakota,1,,,,left_only
1,St. Stephen Lutheran Church,8400 France Ave S,Bloomington,MN,55431,Home Delivered Meals,Hennepin,0,,,,left_only
2,Judson Memorial Baptist Church,4101 Harriet Ave,Minneapolis,MN,55409,Home Delivered Meals,Hennepin,0,,,,left_only
3,Mount Calvary Lutheran Church - ELCA,301 County Road 19,Excelsior,MN,55331,Home Delivered Meals,Hennepin,0,,,,left_only
4,Community Action Center - Northfield,1651 Jefferson Pkwy Ste Hs-200,Northfield,MN,55057,Home Delivered Meals,Rice,0,,,,left_only
5,Northfield Hospital + Clinics,2000 North Avenue,Northfield,MN,55057,Home Delivered Meals,Dakota,0,,,,left_only
6,Scott Carver Dakota CAP Agency,738 1st Ave E,Shakopee,MN,55379,Food Shelf,Scott,1,,,,left_only
7,Chaska Free Food Distribution,2100 Stoughton Ave,Chaska,MN,55318,Free Food Box,Carver,0,,,,left_only
8,Norwood Young America Free Food Distribution,310 Elm St. W,Norwood Young America,MN,55368,Free Food Box,Carver,0,,,,left_only
9,Mi Casita Pantry,1053 Jefferson St S,Shakopee,MN,55379,Food Shelf,Scott,1,1053 Jefferson St S,Shakopee,Scott,both


In [None]:
valid_categories = ['Home Delivered Meals', 'Food Shelf', 'Free Food Box', 'Free Meal'] # Update categories if needed

valid_categories

['Home Delivered Meals', 'Food Shelf', 'Free Food Box', 'Free Meal']

In [83]:
address_df.columns

Index(['OBJECTID', 'Address Unique Identifier',
       'Local Address Unique Identifier', 'Address Number Prefix',
       'Address Number', 'Address Number Suffix', 'Street Name Pre Modifier',
       'Street Name Pre Directional', 'Street Name Pre Type',
       'Street Name Pre Separator', 'Street Name', 'Street Name Post Type',
       'StreetName Post Directional', 'Street Name Post Modifier',
       'Subaddress Type 1', 'Subaddress Identifier 1', 'Subaddress Type 2',
       'Subaddress Identifier 2', 'ZIP Code', 'ZIP Plus 4', 'CTU Name',
       'CTU Code', 'Postal Community Name', 'County Code', 'County Name',
       'State Code', 'Location Description', 'Complete Landmark Name',
       'Residence', 'Mailable Address', 'Parcel Unique Identifier',
       'Placement Location', 'Centerline Geocodable',
       'Unique Without Subaddresses', 'Longitude', 'Latitude',
       'US National Grid Code', '911 GIS Point-of-Contact',
       'Emergency Service Number', 'PSAP Code', 'MSAG Community 

In [84]:
valid_zips = [val for val in address_df["ZIP Code"].unique()]

In [99]:
df = df.rename(columns={"Addresss":"Address"})

In [102]:
merged_address_df= merged_address_df.rename(columns={"Addresss":"Address"})

In [100]:
df.columns

Index(['Name', 'Address', 'City', 'State', 'Zip', 'TYP', 'County',
       'Within_ScottCounty'],
      dtype='object')

In [133]:
# Filter address to those not matched in the address reference with .loc[row_condition, columns]
nonreference_addresses = merged_address_df.loc[merged_address_df["_merge"] != 'both',"Address"].tolist()
print(f"There are {len(nonreference_addresses)} addresses in the input not found in the reference:\n")
nonreference_addresses

There are 16 addresses in the input not found in the reference:



['12921 Nicollet Ave',
 '8400 France Ave S',
 '4101 Harriet Ave',
 '301 County Road 19',
 '1651 Jefferson Pkwy Ste Hs-200',
 '2000 North Avenue',
 '738 1st Ave E',
 '2100 Stoughton Ave',
 '310 Elm St. W',
 '1600 Bavaria Road',
 '9600 163rd Street W',
 '128 N Meridian Street',
 '300 E 4th Street, Chaska, MN 55318',
 '14201 Cedar Avenue',
 '119 8th Ave W',
 '13901 Fairview Drive']

In [None]:
# Validation checks

invalid_type = df[~df["TYP"].isin(valid_categories)] 
nonReference_zip = df[~df["Zip"].isin(valid_zips)]
invalid_state = df[df["State"] !="MN"]
invalid_zip =  df[~df["Zip"].astype(str).str.match(r"^\d{5}$")]

missing_address = df[df["Address"].isnull() | df["City"].isnull()]
nonreference_addresses = df[df["Address"].isin(nonreference_addresses)]

def show_validation_issues(validation_fieldName, issues):
    """ function takes two arguments:
        arg1: string, Field name and description
        arg2: a pre-defined validation variable to check
        Example: show_validation_issues("TYP, check for invalid categories", invalid_type)
    """
    if not issues.empty:# check if filtered dataframes created above have any rows --> validation issues
        print(f"\n{validation_fieldName} - {len(issues)} issues: ")
        print(issues)
    else:
        print(f"\n{validation_fieldName} - OK. No validation issues found")




In [145]:
### prints the object type created from the validation checks above
print(type(invalid_type))
print(invalid_type.__class__.__name__) ## prints only the name 

<class 'pandas.core.frame.DataFrame'>
DataFrame


In [119]:
show_validation_issues("TYP field (invalid categories)", invalid_type)
show_validation_issues("State field (should be 'MN')", invalid_state)



TYP field (invalid categories) - OK. No validation issues found

State field (should be 'MN') - OK. No validation issues found


In [120]:
show_validation_issues("ZIP codes (should be 5-digit)", invalid_zip)


ZIP codes (should be 5-digit) - OK. No validation issues found


In [121]:
show_validation_issues("Inputs Zip not found in reference data", nonReference_zip)


Inputs Zip not found in reference data - 10 issues: 
                                             Name  \
1                     St. Stephen Lutheran Church   
2                  Judson Memorial Baptist Church   
3            Mount Calvary Lutheran Church - ELCA   
4            Community Action Center - Northfield   
5                   Northfield Hospital + Clinics   
7                  Chaska Free Food Distribution    
8   Norwood Young America Free Food Distribution    
10                    Bountiful Basket Food Shelf   
14                     St. John's Lutheran Church   
15                           Bethel's Rock Church   

                               Address                   City State    Zip  \
1                    8400 France Ave S            Bloomington    MN  55431   
2                     4101 Harriet Ave            Minneapolis    MN  55409   
3                   301 County Road 19              Excelsior    MN  55331   
4       1651 Jefferson Pkwy Ste Hs-200            

In [122]:
show_validation_issues("Missing Address or City fields", missing_address)


Missing Address or City fields - OK. No validation issues found


In [135]:
show_validation_issues("Input Address not found in Address reference list", nonreference_addresses)


Input Address not found in Address reference list - 16 issues: 
                                                 Name  \
0                               Faith Covenant Church   
1                         St. Stephen Lutheran Church   
2                      Judson Memorial Baptist Church   
3                Mount Calvary Lutheran Church - ELCA   
4                Community Action Center - Northfield   
5                       Northfield Hospital + Clinics   
6                      Scott Carver Dakota CAP Agency   
7                      Chaska Free Food Distribution    
8       Norwood Young America Free Food Distribution    
10                        Bountiful Basket Food Shelf   
12                                     Hosanna Church   
13                            Belle Plaine Food Shelf   
14                         St. John's Lutheran Church   
15                               Bethel's Rock Church   
17  St. John's Lutheran Church Shakopee\nLoaves & ...   
19                     

In [136]:
### Combine issues

combined_issues  = pd.concat([nonReference_zip, nonreference_addresses])

combined_issues_unique = combined_issues.drop_duplicates()

print(f"There are {len(combined_issues_unique)} issues with non-reference records for the following: \n")
combined_issues_unique

There are 16 issues with non-reference records for the following: 



Unnamed: 0,Name,Address,City,State,Zip,TYP,County,Within_ScottCounty
1,St. Stephen Lutheran Church,8400 France Ave S,Bloomington,MN,55431,Home Delivered Meals,Hennepin,0
2,Judson Memorial Baptist Church,4101 Harriet Ave,Minneapolis,MN,55409,Home Delivered Meals,Hennepin,0
3,Mount Calvary Lutheran Church - ELCA,301 County Road 19,Excelsior,MN,55331,Home Delivered Meals,Hennepin,0
4,Community Action Center - Northfield,1651 Jefferson Pkwy Ste Hs-200,Northfield,MN,55057,Home Delivered Meals,Rice,0
5,Northfield Hospital + Clinics,2000 North Avenue,Northfield,MN,55057,Home Delivered Meals,Dakota,0
7,Chaska Free Food Distribution,2100 Stoughton Ave,Chaska,MN,55318,Free Food Box,Carver,0
8,Norwood Young America Free Food Distribution,310 Elm St. W,Norwood Young America,MN,55368,Free Food Box,Carver,0
10,Bountiful Basket Food Shelf,1600 Bavaria Road,Chaska,MN,55318,Food Shelf,Carver,0
14,St. John's Lutheran Church,"300 E 4th Street, Chaska, MN 55318",Chaska,MN,55318,Free Meal,Carver,0
15,Bethel's Rock Church,14201 Cedar Avenue,Apple Valley,MN,55124,Free Food Box,Dakota,0


In [126]:
### Save the validation results to a new csv to check the records
combined_issues_unique.to_csv("combined_validation_issues.csv", index=False)