Skip to content

scraper-bots/mulk_az

Repository files navigation

Mulk.az Real Estate Scraper

A high-performance, asynchronous web scraper for extracting real estate property data from mulk.az for analytics and data analysis purposes.

Features

  • Asynchronous scraping using aiohttp and asyncio for maximum performance
  • Comprehensive data extraction including property details, prices, locations, and contact information
  • Multiple output formats (CSV, JSON) for data analysis
  • Built-in analytics with price statistics and market insights
  • Rate limiting and retry logic to respect the website
  • Configurable concurrency to balance speed and server load
  • Pagination handling to scrape multiple pages automatically
  • Data validation and cleaning for analytics-ready output

Installation

  1. Clone or download the scraper files
  2. Install dependencies:
pip install -r requirements.txt

Quick Start

Basic Usage (Async Version - Recommended)

import asyncio
from async_mulk_scraper import AsyncMulkAzScraper

async def main():
    scraper = AsyncMulkAzScraper(max_concurrent=10, delay=0.5)

    # Scrape properties
    search_url = "https://www.mulk.az/search.php?category=&lease=false&bolge_id=1"
    properties = await scraper.scrape_all_properties(search_url, max_pages=3)

    # Save data
    await scraper.save_to_csv_async("properties.csv")
    await scraper.save_to_json_async("properties.json")

    # Get analytics
    analytics = scraper.get_analytics_summary()
    print(f"Average price: {analytics['price_stats']['avg']:,} AZN")

if __name__ == "__main__":
    asyncio.run(main())

Run Example Scripts

# Run interactive examples
python example_usage.py

# Run the main async scraper
python async_mulk_scraper.py

# Run analytics version (slower but more features)
python analytics_scraper.py

Available Scripts

File Description Best For
async_mulk_scraper.py High-performance async scraper Large-scale data collection
mulk_scraper.py Traditional sync scraper Small-scale scraping, learning
analytics_scraper.py Enhanced version with visualizations Data analysis and reporting
example_usage.py Usage examples and tests Learning and testing

Data Fields Extracted

Each property record includes:

Basic Information

  • listing_id - Unique property ID
  • title - Property title
  • url - Detail page URL
  • listing_date - When property was listed
  • scraped_at - When data was scraped

Property Details

  • price / price_numeric - Property price in AZN
  • category - Property type (apartment, house, etc.)
  • rooms / rooms_numeric - Number of rooms
  • area / area_numeric - Property area in m²
  • floor / current_floor / total_floors - Floor information
  • deed_available - Legal documentation status

Location Information

  • location_district - District/region
  • location_neighborhood - Neighborhood
  • location_metro - Nearest metro station
  • full_address - Complete address

Contact Information

  • contact_person - Contact person name
  • contact_type - Type of contact (agent, owner, etc.)
  • contact_phone - Phone number

Additional Data

  • description - Property description
  • images - Array of image URLs
  • image_count - Number of images

Search URL Examples

The scraper works with any mulk.az search URL. Here are some examples:

# All properties for sale
"https://www.mulk.az/search.php?category=&lease=false&bolge_id=1"

# Apartments only
"https://www.mulk.az/search.php?category=apartment&lease=false&bolge_id=1"

# Price range 100k-300k AZN
"https://www.mulk.az/search.php?pricemin=100000&pricemax=300000&lease=false&bolge_id=1"

# 3-room properties
"https://www.mulk.az/search.php?rooms=3&lease=false&bolge_id=1"

# Specific district (Nasimi)
"https://www.mulk.az/search.php?rayon_id=2&lease=false&bolge_id=1"

Configuration

Async Scraper Settings

scraper = AsyncMulkAzScraper(
    max_concurrent=10,  # Number of concurrent requests
    delay=0.5          # Delay between requests (seconds)
)

Performance Tuning

Setting Conservative Balanced Aggressive
max_concurrent 5 10 15-20
delay 1.0s 0.5s 0.2-0.3s
Best for Slow/unstable connection General use Fast connection, bulk scraping

Output Formats

CSV Output

Perfect for Excel, data analysis tools:

listing_id,title,price_numeric,location_district,contact_phone,...
348729,Satış » Köhnə tikili,205000,Sabunçu,(070) 845-73-70,...

JSON Output

Structured data for programming:

[
  {
    "listing_id": "348729",
    "title": "Satış » Köhnə tikili",
    "price_numeric": 205000,
    "location_district": "Sabunçu",
    "contact_phone": "(070) 845-73-70",
    "images": ["https://mulk.az/images/555231.jpg", ...]
  }
]

Analytics Features

Basic Analytics

analytics = scraper.get_analytics_summary()
print(analytics)

Output:

{
  "total_properties": 150,
  "valid_properties": 142,
  "price_stats": {
    "min": 23000,
    "max": 550000,
    "avg": 185000,
    "median": 165000
  },
  "top_districts": {
    "Sabunçu": 45,
    "Abşeron": 32,
    "Nəsimi": 28
  }
}

Advanced Analytics (analytics_scraper.py)

  • Price distribution charts
  • Properties by district visualization
  • Price per square meter analysis
  • Market trend analysis
  • Comprehensive reporting

Performance Benchmarks

Typical performance on a modern machine:

Scenario Properties Time Rate
Small test (1 page) ~25 props 15s 1.7/sec
Medium scrape (3 pages) ~75 props 45s 1.7/sec
Large scrape (10 pages) ~250 props 150s 1.7/sec

Performance depends on network speed, server response time, and concurrency settings

Best Practices

Respectful Scraping

  • Use reasonable delays (0.5s minimum)
  • Limit concurrent requests (10 max recommended)
  • Don't scrape during peak hours
  • Cache results to avoid repeated requests

Data Quality

  • Always validate extracted data
  • Handle missing fields gracefully
  • Clean and normalize data for analysis
  • Remove duplicates

Error Handling

  • The scraper includes built-in retry logic
  • Check logs for failed requests
  • Monitor success rates
  • Implement fallback strategies

Common Use Cases

Market Research

# Compare prices across districts
properties = await scraper.scrape_all_properties(search_url, max_pages=10)
district_prices = {}
for prop in properties:
    if prop.location_district and prop.price_numeric:
        district_prices[prop.location_district] = district_prices.get(prop.location_district, [])
        district_prices[prop.location_district].append(prop.price_numeric)

for district, prices in district_prices.items():
    avg_price = sum(prices) / len(prices)
    print(f"{district}: {avg_price:,.0f} AZN average")

Investment Analysis

# Find undervalued properties (price per sqm)
undervalued = []
for prop in properties:
    if prop.area_numeric and prop.price_numeric:
        price_per_sqm = prop.price_numeric / prop.area_numeric
        if price_per_sqm < 2000:  # Below 2000 AZN/sqm
            undervalued.append(prop)

print(f"Found {len(undervalued)} potentially undervalued properties")

Contact Analysis

# Analyze agent vs owner listings
agent_count = len([p for p in properties if 'agent' in p.contact_type.lower()])
owner_count = len([p for p in properties if 'owner' in p.contact_type.lower()])
print(f"Agents: {agent_count}, Owners: {owner_count}")

Troubleshooting

Common Issues

"No properties found"

  • Check if the search URL is valid
  • Try reducing max_pages to test
  • Verify internet connection

"Too many request errors"

  • Increase delay between requests
  • Reduce max_concurrent setting
  • Check if IP is being rate limited

"Invalid data in output"

  • Some properties may have incomplete data
  • Filter out invalid entries before analysis
  • Check the website structure hasn't changed

"Slow performance"

  • Increase max_concurrent (up to 15)
  • Reduce delay (down to 0.3s)
  • Use async version instead of sync

Debug Mode

Enable detailed logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Legal and Ethical Considerations

  • This scraper is for educational and research purposes
  • Respect the website's terms of service
  • Don't overload the server with excessive requests
  • Use scraped data responsibly and ethically
  • Consider contacting the website for official API access
  • Respect robots.txt guidelines

Contributing

Feel free to improve the scraper:

  • Add new data fields
  • Improve error handling
  • Optimize performance
  • Add new analytics features
  • Fix bugs and edge cases

License

This project is provided as-is for educational purposes. Use responsibly and in accordance with applicable laws and terms of service.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages