# Project Overview
## Purpose

This Python script processes property data from a JSON file to complete two main tasks:

1.Identify Nullable Fields: Detect all the fields that can have null values and count the occurrences of null values.
2.Identify 'Lake View' Properties: Use regular expressions to identify properties with mentions of "lake view" in specific fields (PublicRemarks, WaterfrontFeatures[], and View[]).

## Objective
The objective of this script is to process a JSON file containing property data and extract useful insights based on the tasks provided. This includes counting null values for each field and checking whether any property has a lake view.



# Code

In [68]:
#importing the required libraries
import pandas as pd
import re

In [70]:
#Load data from the JSON file
df = pd.read_json(r'C:\Users\monusha s\OneDrive\Desktop\listhub_incremental_202409241307.json', lines = True) 

In [71]:
# 1. Identify Nullable Fields (Count of null values in each column)
nullable_fields = df.isnull().sum()
print("Nullable Fields and Null Count:")
print(nullable_fields)

Nullable Fields and Null Count:
@odata.id                         0
AccessibilityFeatures             0
AccessibleElevatorInstalled    3855
Appliances                        0
ArchitecturalStyle                0
                               ... 
WaterfrontFeatures                0
WaterfrontYN                      0
WindowFeatures                    0
YearBuilt                       290
YearBuiltEffective             3855
Length: 111, dtype: int64


In [72]:
# 2. Identify 'Lake View' Properties:
lake_view_pattern = re.compile(r'lake\s?view', re.IGNORECASE)  # Regex pattern for "lake view"
# Initialize an empty list to store the results
lake_view_results = []

In [76]:
# Loop through each row and check the relevant fields for "lake view"
for _, row in df.iterrows():
    found_lake_view = False
    
    # Check 'PublicRemarks' field (string)
    if isinstance(row['PublicRemarks'], str) and lake_view_pattern.search(row['PublicRemarks']):
        found_lake_view = True
    
    # Check 'WaterfrontFeatures' field (list of strings)
    elif isinstance(row['WaterfrontFeatures'], list):
        for item in row['WaterfrontFeatures']:
            if isinstance(item, str) and lake_view_pattern.search(item):
                found_lake_view = True
                break  # Stop once found
    
    # Check 'View' field (list of strings)
    elif isinstance(row['View'], list):
        for item in row['View']:
            if isinstance(item, str) and lake_view_pattern.search(item):
                found_lake_view = True
                break  # Stop once found
    
    # Append the result for the current row
    lake_view_results.append(found_lake_view)

In [78]:
# Add the results as a new column to the DataFrame
df['LakeView'] = lake_view_results

# Print out the properties with Lake View
print("\nProperties with 'Lake View':")
print(df[['LakeView']])


Properties with 'Lake View':
      LakeView
0        False
1        False
2        False
3        False
4        False
...        ...
3850     False
3851     False
3852     False
3853     False
3854     False

[3855 rows x 1 columns]


In [80]:
# Display only the 'LakeView' column with True values
print(df[df['LakeView'] == True]['LakeView'])

23      True
46      True
77      True
82      True
98      True
        ... 
3784    True
3813    True
3833    True
3834    True
3847    True
Name: LakeView, Length: 115, dtype: bool


In [82]:
# Display only the 'LakeView' column with False values
print(df[df['LakeView'] == False]['LakeView'])

0       False
1       False
2       False
3       False
4       False
        ...  
3850    False
3851    False
3852    False
3853    False
3854    False
Name: LakeView, Length: 3740, dtype: bool


In [88]:
# Count the number of True values in the 'LakeView' column
lake_view_count = df['LakeView'].sum()

# Display the count
print("Number of properties with lake view:", lake_view_count)

Number of properties with lake view: 115


# Results:
Nullable Fields:
Total number of nullable fields and their respective counts.

Lake View Properties:
Number of properties identified with a lake view.

# Conclusion:
The task was successfully completed by identifying nullable fields and properties with lake views using pandas and regular expressions. The analysis provided valuable insights into the dataset.