Mulk.az Real Estate Scraper

A high-performance, asynchronous web scraper for extracting real estate property data from mulk.az for analytics and data analysis purposes.

Features

Asynchronous scraping using aiohttp and asyncio for maximum performance
Comprehensive data extraction including property details, prices, locations, and contact information
Multiple output formats (CSV, JSON) for data analysis
Built-in analytics with price statistics and market insights
Rate limiting and retry logic to respect the website
Configurable concurrency to balance speed and server load
Pagination handling to scrape multiple pages automatically
Data validation and cleaning for analytics-ready output

Installation

Clone or download the scraper files
Install dependencies:

pip install -r requirements.txt

Quick Start

Basic Usage (Async Version - Recommended)

import asyncio
from async_mulk_scraper import AsyncMulkAzScraper

async def main():
    scraper = AsyncMulkAzScraper(max_concurrent=10, delay=0.5)

    # Scrape properties
    search_url = "https://www.mulk.az/search.php?category=&lease=false&bolge_id=1"
    properties = await scraper.scrape_all_properties(search_url, max_pages=3)

    # Save data
    await scraper.save_to_csv_async("properties.csv")
    await scraper.save_to_json_async("properties.json")

    # Get analytics
    analytics = scraper.get_analytics_summary()
    print(f"Average price: {analytics['price_stats']['avg']:,} AZN")

if __name__ == "__main__":
    asyncio.run(main())

Run Example Scripts

# Run interactive examples
python example_usage.py

# Run the main async scraper
python async_mulk_scraper.py

# Run analytics version (slower but more features)
python analytics_scraper.py

Available Scripts

File	Description	Best For
`async_mulk_scraper.py`	High-performance async scraper	Large-scale data collection
`mulk_scraper.py`	Traditional sync scraper	Small-scale scraping, learning
`analytics_scraper.py`	Enhanced version with visualizations	Data analysis and reporting
`example_usage.py`	Usage examples and tests	Learning and testing

Data Fields Extracted

Each property record includes:

Basic Information

listing_id - Unique property ID
title - Property title
url - Detail page URL
listing_date - When property was listed
scraped_at - When data was scraped

Property Details

price / price_numeric - Property price in AZN
category - Property type (apartment, house, etc.)
rooms / rooms_numeric - Number of rooms
area / area_numeric - Property area in m²
floor / current_floor / total_floors - Floor information
deed_available - Legal documentation status

Location Information

location_district - District/region
location_neighborhood - Neighborhood
location_metro - Nearest metro station
full_address - Complete address

Contact Information

contact_person - Contact person name
contact_type - Type of contact (agent, owner, etc.)
contact_phone - Phone number

Additional Data

description - Property description
images - Array of image URLs
image_count - Number of images

Search URL Examples

The scraper works with any mulk.az search URL. Here are some examples:

# All properties for sale
"https://www.mulk.az/search.php?category=&lease=false&bolge_id=1"

# Apartments only
"https://www.mulk.az/search.php?category=apartment&lease=false&bolge_id=1"

# Price range 100k-300k AZN
"https://www.mulk.az/search.php?pricemin=100000&pricemax=300000&lease=false&bolge_id=1"

# 3-room properties
"https://www.mulk.az/search.php?rooms=3&lease=false&bolge_id=1"

# Specific district (Nasimi)
"https://www.mulk.az/search.php?rayon_id=2&lease=false&bolge_id=1"

Configuration

Async Scraper Settings

scraper = AsyncMulkAzScraper(
    max_concurrent=10,  # Number of concurrent requests
    delay=0.5          # Delay between requests (seconds)
)

Performance Tuning

Setting	Conservative	Balanced	Aggressive
max_concurrent	5	10	15-20
delay	1.0s	0.5s	0.2-0.3s
Best for	Slow/unstable connection	General use	Fast connection, bulk scraping

Output Formats

CSV Output

Perfect for Excel, data analysis tools:

listing_id,title,price_numeric,location_district,contact_phone,...
348729,Satış » Köhnə tikili,205000,Sabunçu,(070) 845-73-70,...

JSON Output

Structured data for programming:

[
  {
    "listing_id": "348729",
    "title": "Satış » Köhnə tikili",
    "price_numeric": 205000,
    "location_district": "Sabunçu",
    "contact_phone": "(070) 845-73-70",
    "images": ["https://mulk.az/images/555231.jpg", ...]
  }
]

Analytics Features

Basic Analytics

analytics = scraper.get_analytics_summary()
print(analytics)

Output:

{
  "total_properties": 150,
  "valid_properties": 142,
  "price_stats": {
    "min": 23000,
    "max": 550000,
    "avg": 185000,
    "median": 165000
  },
  "top_districts": {
    "Sabunçu": 45,
    "Abşeron": 32,
    "Nəsimi": 28
  }
}

Advanced Analytics (analytics_scraper.py)

Price distribution charts
Properties by district visualization
Price per square meter analysis
Market trend analysis
Comprehensive reporting

Performance Benchmarks

Typical performance on a modern machine:

Scenario	Properties	Time	Rate
Small test (1 page)	~25 props	15s	1.7/sec
Medium scrape (3 pages)	~75 props	45s	1.7/sec
Large scrape (10 pages)	~250 props	150s	1.7/sec

Performance depends on network speed, server response time, and concurrency settings

Best Practices

Respectful Scraping

Use reasonable delays (0.5s minimum)
Limit concurrent requests (10 max recommended)
Don't scrape during peak hours
Cache results to avoid repeated requests

Data Quality

Always validate extracted data
Handle missing fields gracefully
Clean and normalize data for analysis
Remove duplicates

Error Handling

The scraper includes built-in retry logic
Check logs for failed requests
Monitor success rates
Implement fallback strategies

Common Use Cases

Market Research

# Compare prices across districts
properties = await scraper.scrape_all_properties(search_url, max_pages=10)
district_prices = {}
for prop in properties:
    if prop.location_district and prop.price_numeric:
        district_prices[prop.location_district] = district_prices.get(prop.location_district, [])
        district_prices[prop.location_district].append(prop.price_numeric)

for district, prices in district_prices.items():
    avg_price = sum(prices) / len(prices)
    print(f"{district}: {avg_price:,.0f} AZN average")

Investment Analysis

# Find undervalued properties (price per sqm)
undervalued = []
for prop in properties:
    if prop.area_numeric and prop.price_numeric:
        price_per_sqm = prop.price_numeric / prop.area_numeric
        if price_per_sqm < 2000:  # Below 2000 AZN/sqm
            undervalued.append(prop)

print(f"Found {len(undervalued)} potentially undervalued properties")

Contact Analysis

# Analyze agent vs owner listings
agent_count = len([p for p in properties if 'agent' in p.contact_type.lower()])
owner_count = len([p for p in properties if 'owner' in p.contact_type.lower()])
print(f"Agents: {agent_count}, Owners: {owner_count}")

Troubleshooting

Common Issues

"No properties found"

Check if the search URL is valid
Try reducing max_pages to test
Verify internet connection

"Too many request errors"

Increase delay between requests
Reduce max_concurrent setting
Check if IP is being rate limited

"Invalid data in output"

Some properties may have incomplete data
Filter out invalid entries before analysis
Check the website structure hasn't changed

"Slow performance"

Increase max_concurrent (up to 15)
Reduce delay (down to 0.3s)
Use async version instead of sync

Debug Mode

Enable detailed logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Legal and Ethical Considerations

This scraper is for educational and research purposes
Respect the website's terms of service
Don't overload the server with excessive requests
Use scraped data responsibly and ethically
Consider contacting the website for official API access
Respect robots.txt guidelines

Contributing

Feel free to improve the scraper:

Add new data fields
Improve error handling
Optimize performance
Add new analytics features
Fix bugs and edge cases

License

This project is provided as-is for educational purposes. Use responsibly and in accordance with applicable laws and terms of service.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
README.md		README.md
mulk_data_20250929_143644.csv		mulk_data_20250929_143644.csv
mulk_data_20250929_143644.xlsx		mulk_data_20250929_143644.xlsx
mulk_scraper_final.py		mulk_scraper_final.py
requirements.txt		requirements.txt

scraper-bots/mulk_az

Folders and files

Latest commit

History

Repository files navigation