

## NYCHA Property Directory Block and Lot Guide Parser: Cross-Referencing Records with the JustFix Script

**Developed by**: Itzamna Huerta, for the Association for Neighborhood and Housing Development (ANHD)  
**Created**: September 2024  
**Last Updated**: N/A  
**2024 (JUSTFIX)**: [NYCHA Block and Lot CSV](https://github.com/JustFixNYC/nycha-scraper/blob/master/data/2024/Block-and-Lot-Guide-01012024.csv)

---

### Overview:

This script is designed to compare the records extracted from the NYCHA Property Directory Block and Lot Guide PDF with the data from the 2024 JustFix dataset. There’s a noticeable difference in the number of records between the two sources, and the goal of this script is to identify any rows that may have been missed during extraction and to determine if updates or adjustments are needed in the script to capture all data accurately.

### Initial Findings:
The result shows that 1,261 rows from the JustFix dataset are missing in the ANHD dataset, and 319 rows from the ANHD dataset are missing in the JustFix dataset. 


In [1]:
# Import Libraries
import pandas as pd

In [2]:
justfix_df = pd.read_csv('./2024/justfix/Block-and-Lot-Guide-01012024-JustFix.csv')

anhd_df = pd.read_csv('./2024/Block-and-Lot-Guide-01012024-ANHD.csv')

In [3]:
# This is a 943 difference in data

print("Total Records for Justfix: ", justfix_df.shape[0])
print("Total Records for ANHD:    ", anhd_df.shape[0])

Total Records for Justfix:  4519
Total Records for ANHD:     4519


In [4]:
# Specify the key column
key_column = 'ADDRESS'

# Find missing rows in anhd_df compared to justfix_df
missing_in_justfix = anhd_df.merge(justfix_df[[key_column]], on=key_column, how='left', indicator=True)
missing_in_justfix = missing_in_justfix[missing_in_justfix['_merge'] == 'left_only']

# Find missing rows in justfix_df compared to anhd_df
missing_in_anhd = justfix_df.merge(anhd_df[[key_column]], on=key_column, how='left', indicator=True)
missing_in_anhd = missing_in_anhd[missing_in_anhd['_merge'] == 'left_only']



In [5]:
# Display results with full DataFrame rows
print("Full rows in ANHD not in Justfix:")
missing_in_justfix

Full rows in ANHD not in Justfix:


Unnamed: 0,BOROUGH,BLOCK,LOT,ADDRESS,ZIP CODE,DEVELOPMENT,MANAGED BY,CD#,FACILITY,_merge
1323,BROOKLYN,2050,1,BED OF FLEET STREET,11201,INGERSOLL,INGERSOLL,2,COMMERCIAL SPACE PARKING LOT,left_only


In [6]:
print("\nFull rows in Justfix not in ANHD:")
missing_in_anhd


Full rows in Justfix not in ANHD:


Unnamed: 0,BOROUGH,BLOCK,LOT,ADDRESS,ZIP CODE,DEVELOPMENT,MANAGED BY,CD#,FACILITY,_merge
1718,BROOKLYN,2050,1,BED OF FLEET STREET,11201,INGERSOLL,INGERSOLL,2,COMMERCIAL SPACE PARKING LOT,left_only
