#### Renaming & Combining

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv('../data/NYC_Rolling_Sales_Dataset/nyc-rolling-sales.csv')

df['SALE PRICE'] = pd.to_numeric(df['SALE PRICE'], errors='coerce')
df['LAND SQUARE FEET'] = pd.to_numeric(df['LAND SQUARE FEET'], errors='coerce')
df['RESIDENTIAL UNITS'] = pd.to_numeric(df['RESIDENTIAL UNITS'], errors='coerce')

df_clean = df.dropna(subset=['SALE PRICE', 'LAND SQUARE FEET', 'RESIDENTIAL UNITS'])

##### Renaming Columns

In [2]:
df_renamed = df_clean.rename(columns={
    'SALE PRICE': 'sale_price',
    'LAND SQUARE FEET': 'land_sqft',
    'RESIDENTIAL UNITS': 'residential_units',
    'NEIGHBORHOOD': 'neighborhood'
})

df_renamed.columns

Index(['Unnamed: 0', 'BOROUGH', 'neighborhood', 'BUILDING CLASS CATEGORY',
       'TAX CLASS AT PRESENT', 'BLOCK', 'LOT', 'EASE-MENT',
       'BUILDING CLASS AT PRESENT', 'ADDRESS', 'APARTMENT NUMBER', 'ZIP CODE',
       'residential_units', 'COMMERCIAL UNITS', 'TOTAL UNITS', 'land_sqft',
       'GROSS SQUARE FEET', 'YEAR BUILT', 'TAX CLASS AT TIME OF SALE',
       'BUILDING CLASS AT TIME OF SALE', 'sale_price', 'SALE DATE'],
      dtype='object')

##### Concatenation

In [6]:
sample1 = df_renamed.head(5)
sample2 = df_renamed.tail(5)

combined_sample = pd.concat([sample1, sample2])
combined_sample

Unnamed: 0.1,Unnamed: 0,BOROUGH,neighborhood,BUILDING CLASS CATEGORY,TAX CLASS AT PRESENT,BLOCK,LOT,EASE-MENT,BUILDING CLASS AT PRESENT,ADDRESS,...,residential_units,COMMERCIAL UNITS,TOTAL UNITS,land_sqft,GROSS SQUARE FEET,YEAR BUILT,TAX CLASS AT TIME OF SALE,BUILDING CLASS AT TIME OF SALE,sale_price,SALE DATE
0,4,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,392,6,,C2,153 AVENUE B,...,5,0,5,1633.0,6440,1900,2,C2,6625000.0,2017-07-19 00:00:00
3,7,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2B,402,21,,C4,154 EAST 7TH STREET,...,10,0,10,2272.0,6794,1913,2,C4,3936272.0,2016-09-23 00:00:00
4,8,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,404,55,,C2,301 EAST 10TH STREET,...,6,0,6,2369.0,4615,1900,2,C2,8000000.0,2016-11-17 00:00:00
6,10,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2B,406,32,,C4,210 AVENUE B,...,8,0,8,1750.0,4226,1920,2,C4,3192840.0,2016-09-23 00:00:00
9,13,1,ALPHABET CITY,08 RENTALS - ELEVATOR APARTMENTS,2,387,153,,D9,629 EAST 5TH STREET,...,24,0,24,4489.0,18523,1920,2,D9,16232000.0,2016-11-07 00:00:00
84543,8409,5,WOODROW,02 TWO FAMILY DWELLINGS,1,7349,34,,B9,37 QUAIL LANE,...,2,0,2,2400.0,2575,1998,1,B9,450000.0,2016-11-28 00:00:00
84544,8410,5,WOODROW,02 TWO FAMILY DWELLINGS,1,7349,78,,B9,32 PHEASANT LANE,...,2,0,2,2498.0,2377,1998,1,B9,550000.0,2017-04-21 00:00:00
84545,8411,5,WOODROW,02 TWO FAMILY DWELLINGS,1,7351,60,,B2,49 PITNEY AVENUE,...,2,0,2,4000.0,1496,1925,1,B2,460000.0,2017-07-05 00:00:00
84546,8412,5,WOODROW,22 STORE BUILDINGS,4,7100,28,,K6,2730 ARTHUR KILL ROAD,...,0,7,7,208033.0,64117,2001,4,K6,11693337.0,2016-12-21 00:00:00
84547,8413,5,WOODROW,35 INDOOR PUBLIC AND CULTURAL FACILITIES,4,7105,679,,P9,155 CLAY PIT ROAD,...,0,1,1,10796.0,2400,2006,4,P9,69300.0,2016-10-27 00:00:00


#### üêº Module 6: Renaming & Combining (Pandas)

##### üìå Objective
To improve dataset clarity by renaming columns and demonstrate how to combine multiple DataFrame segments using Pandas.

---

##### üìÇ Dataset
**NYC Rolling Sales Dataset**  
Contains property sale records across New York City, including sale price, land area, residential units, and neighborhood details.

---

##### üîß Work Done

##### 1Ô∏è‚É£ Data Preparation
- Loaded the dataset using `read_csv()`
- Converted key columns to numeric format:
  - `SALE PRICE`
  - `LAND SQUARE FEET`
  - `RESIDENTIAL UNITS`
- Removed rows with missing values in critical columns

##### 2Ô∏è‚É£ Renaming Columns
- Renamed columns for better readability and consistency:
  - `SALE PRICE` ‚Üí `sale_price`
  - `LAND SQUARE FEET` ‚Üí `land_sqft`
  - `RESIDENTIAL UNITS` ‚Üí `residential_units`
  - `NEIGHBORHOOD` ‚Üí `neighborhood`
- Verified updated column names

##### 3Ô∏è‚É£ Combining DataFrames (Concatenation)
- Selected the first five and last five rows of the dataset
- Combined both subsets using `pd.concat()`
- Created a unified sample DataFrame for inspection and validation

---

##### üìä Key Outcomes
- Improved dataset readability through clear and consistent column names
- Demonstrated how multiple DataFrame segments can be combined row-wise
- Created a clean, analysis-ready dataset structure

---

##### üõ†Ô∏è Tools Used
- Python  
- Pandas  
- NumPy  
- Jupyter Notebook (VS Code)

---