# Check for Valid Domain Name in Python

This notebook demonstrates different methods to validate domain names in Python.

**Article:** [Check for Valid Domain Name in Python](https://datascientyst.com/how-to-validate-domain-name-in-pandas-python/)

**Related:** [Python Pandas Validation](https://datascientyst.com/data-validation/)

## Setup: Install Required Libraries

First, let's install the necessary libraries for domain validation.

In [3]:
# Uncomment to install required libraries
# !pip install validators
# !pip install python-whois

In [4]:
import re
import socket
import pandas as pd
from urllib.parse import urlparse

## Method 1: Using the validators Library (Recommended)

The validators library provides a simple and reliable way to validate domain names.

In [6]:
import validators

# Valid domain
result = validators.domain('example.com')
print(result)

# Subdomain
result = validators.domain('sub.example.com')
print(result)

True
True


In [7]:
# Invalid domains
result = validators.domain('example.com/')
print(result)

result = validators.domain('not a domain!')
print(result)

ValidationError(func=domain, args={'value': 'example.com/'})
ValidationError(func=domain, args={'value': 'not a domain!'})


### Pandas

In [9]:
test_domains = [
    'example.com',
    'sub.example.com',
    'example',
    'example..com',
    '-example.com',
    'valid-domain.com'
]

pd.Series(test_domains).apply(validators.domain)

0                                                 True
1                                                 True
2    ValidationError(func=domain, args={'value': 'e...
3    ValidationError(func=domain, args={'reason': '...
4    ValidationError(func=domain, args={'value': '-...
5                                                 True
dtype: object

## Method 2: Using Regular Expressions

Regex patterns can validate domain syntax without external dependencies.

In [11]:
def validate_domain_regex(domain):
    """Validate domain using regex pattern"""
    pattern = r'^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$'
    return bool(re.match(pattern, domain))

In [12]:
# Test regex validation
test_domains = [
    'example.com',
    'sub.example.com',
    'example',
    'example..com',
    '-example.com',
    'valid-domain.com'
]

for domain in test_domains:
    print(f"{domain}: {validate_domain_regex(domain)}")

example.com: True
sub.example.com: True
example: False
example..com: False
-example.com: False
valid-domain.com: True


In [13]:
pd.Series(test_domains).apply(validate_domain_regex)

0     True
1     True
2    False
3    False
4    False
5     True
dtype: bool

## Method 3: DNS Lookup to Check Domain Existence

This method verifies that a domain actually exists and resolves to an IP address.

In [15]:
def domain_exists(domain):
    """Check if domain exists using DNS lookup"""
    try:
        socket.gethostbyname(domain)
        return True
    except:
        print(domain)
        return False

In [16]:
# Test DNS lookup
print(f"google.com exists: {domain_exists('google.com')}")
print(f"github.com exists: {domain_exists('github.com')}")
print(f"thisisnotarealdomain123456.com exists: {domain_exists('thisisnotarealdomain123456.com')}")

google.com exists: True
github.com exists: True
thisisnotarealdomain123456.com
thisisnotarealdomain123456.com exists: False


In [17]:
### Pandas

In [18]:
df = pd.DataFrame({'domain': test_domains})
df

Unnamed: 0,domain
0,example.com
1,sub.example.com
2,example
3,example..com
4,-example.com
5,valid-domain.com


In [19]:
df['valid'] = df['domain'].apply(domain_exists)
df

sub.example.com
example
example..com
-example.com
valid-domain.com


Unnamed: 0,domain,valid
0,example.com,True
1,sub.example.com,False
2,example,False
3,example..com,False
4,-example.com,False
5,valid-domain.com,False


## Method 4: Custom Validation Following RFC 1035

A comprehensive validation function that follows RFC specifications.

In [21]:
def is_valid_hostname(hostname):
    """
    Validate hostname according to RFC 1035
    - Maximum length: 255 characters
    - Labels separated by dots
    - Each label: 1-63 characters
    - Labels can contain letters, digits, hyphens
    - Labels cannot start or end with hyphen
    """
    if len(hostname) > 255:
        return False
    
    # Remove trailing dot if present
    if hostname[-1] == ".":
        hostname = hostname[:-1]
    
    # Check each label
    allowed = re.compile(r"(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

In [22]:
# Test RFC validation
test_cases = [
    'example.com',
    'sub.domain.example.com',
    '-invalid.com',
    'valid-domain.com',
    'example.co.uk'
]

for domain in test_cases:
    print(f"{domain}: {is_valid_hostname(domain)}")

example.com: True
sub.domain.example.com: True
-invalid.com: False
valid-domain.com: True
example.co.uk: True


In [23]:
# Test length limits
print(f"Very long domain (256 chars): {is_valid_hostname('a' * 256)}")
print(f"Label too long (64 chars): {is_valid_hostname('a' * 64 + '.com')}")

Very long domain (256 chars): False
Label too long (64 chars): False


## Method 5: Extracting and Validating Domains from URLs

When working with URLs, you need to extract the domain first before validation.

In [25]:
def validate_url_domain(url):
    """Extract and validate domain from URL"""
    try:
        parsed = urlparse(url)
        domain = parsed.netloc or parsed.path.split('/')[0]
        return validators.domain(domain)
    except:
        return False

In [26]:
# Test URL domain extraction
urls = [
    'https://www.example.com/path',
    'http://example.com',
    'www.example.com',
    'ftp://files.example.com',
    'not-a-url'
]

for url in urls:
    result = validate_url_domain(url)
    print(f"{url}: {result if result else 'ValidationFailure(...)'}")

https://www.example.com/path: True
http://example.com: True
www.example.com: True
ftp://files.example.com: True
not-a-url: ValidationFailure(...)


## Method 6: Batch Domain Validation

Validating multiple domains efficiently with error handling.

In [28]:
def validate_domains_batch(domains):
    """Validate multiple domains and return results"""
    results = {}
    
    for domain in domains:
        try:
            is_valid = validators.domain(domain)
            results[domain] = {
                'valid': bool(is_valid),
                'error': None if is_valid else 'Invalid format'
            }
        except Exception as e:
            results[domain] = {
                'valid': False,
                'error': str(e)
            }
    
    return results

In [29]:
# Test batch validation
domains_to_validate = [
    'google.com',
    'invalid..domain',
    'sub.example.com',
    'not a domain',
    'example.co.uk',
    '-starts-with-hyphen.com',
    'valid-domain.com'
]

results = validate_domains_batch(domains_to_validate)

for domain, result in results.items():
    status = "✓" if result['valid'] else "✗"
    print(f"{status} {domain}: {result}")

✓ google.com: {'valid': True, 'error': None}
✗ invalid..domain: {'valid': False, 'error': 'Invalid format'}
✓ sub.example.com: {'valid': True, 'error': None}
✗ not a domain: {'valid': False, 'error': 'Invalid format'}
✓ example.co.uk: {'valid': True, 'error': None}
✗ -starts-with-hyphen.com: {'valid': False, 'error': 'Invalid format'}
✓ valid-domain.com: {'valid': True, 'error': None}


## Method 7: Creating a Comprehensive Validation Function

Combining multiple validation methods for robust checking.

In [31]:
def comprehensive_domain_check(domain, check_dns=False):
    """
    Comprehensive domain validation with multiple checks
    
    Args:
        domain: Domain name to validate
        check_dns: Whether to perform DNS lookup
    
    Returns:
        Dictionary with validation results
    """
    results = {
        'domain': domain,
        'syntax_valid': False,
        'rfc_compliant': False,
        'dns_exists': None,
        'errors': []
    }
    
    # Check syntax with validators
    try:
        results['syntax_valid'] = bool(validators.domain(domain))
    except Exception as e:
        results['errors'].append(f"Syntax check failed: {e}")
    
    # Check RFC compliance
    try:
        results['rfc_compliant'] = is_valid_hostname(domain)
    except Exception as e:
        results['errors'].append(f"RFC check failed: {e}")
    
    # Check DNS if requested
    if check_dns:
        try:
            results['dns_exists'] = domain_exists(domain)
        except Exception as e:
            results['errors'].append(f"DNS check failed: {e}")
    
    return results

In [32]:
# Test comprehensive validation
test_domains = ['google.com', 'invalid..domain', 'example-site.com']

for domain in test_domains:
    result = comprehensive_domain_check(domain, check_dns=True)
    print(f"\nDomain: {result['domain']}")
    print(f"  Syntax valid: {result['syntax_valid']}")
    print(f"  RFC compliant: {result['rfc_compliant']}")
    print(f"  DNS exists: {result['dns_exists']}")
    print(f"  Errors: {result['errors']}")


Domain: google.com
  Syntax valid: True
  RFC compliant: True
  DNS exists: True
  Errors: []
invalid..domain

Domain: invalid..domain
  Syntax valid: False
  RFC compliant: False
  DNS exists: False
  Errors: []

Domain: example-site.com
  Syntax valid: True
  RFC compliant: True
  DNS exists: True
  Errors: []


## Comparison: Performance of Different Methods

Let's compare the performance of different validation methods.

In [34]:
import time

def benchmark_validation(domain, iterations=100):
    """Benchmark different validation methods"""
    
    # Regex method
    start = time.time()
    for _ in range(iterations):
        validate_domain_regex(domain)
    regex_time = time.time() - start
    
    # Validators library
    start = time.time()
    for _ in range(iterations):
        validators.domain(domain)
    validators_time = time.time() - start
    
    # Custom RFC function
    start = time.time()
    for _ in range(iterations):
        is_valid_hostname(domain)
    rfc_time = time.time() - start
    
    return {
        'regex': regex_time,
        'validators': validators_time,
        'rfc_custom': rfc_time
    }

# Run benchmark
results = benchmark_validation('example.com', iterations=100)
print("\nPerformance Comparison (100 validations each):")
print(f"Regex validation: {results['regex']:.4f} seconds")
print(f"Validators library: {results['validators']:.4f} seconds")
print(f"RFC custom function: {results['rfc_custom']:.4f} seconds")


Performance Comparison (100 validations each):
Regex validation: 0.0001 seconds
Validators library: 0.0007 seconds
RFC custom function: 0.0002 seconds


## Summary Table: Validation Methods Comparison

| Method | Pros | Cons | Best For |
|--------|------|------|----------|
| Validators Library | Easy to use, RFC compliant | External dependency | Production use |
| Regex | No dependencies, fast | May miss edge cases | Quick syntax checks |
| DNS Lookup | Confirms existence | Slow, requires internet | Verifying real domains |
| Custom RFC | Full control, compliant | More complex | Custom requirements |
| WHOIS | Checks registration | Very slow, rate limits | Domain research |

## Practical Example: Email Domain Validator

A real-world application validating email domains.

In [37]:
def validate_email_domain(email):
    """
    Extract and validate domain from email address
    """
    try:
        # Extract domain from email
        if '@' not in email:
            return {'valid': False, 'error': 'Invalid email format'}
        
        domain = email.split('@')[1]
        
        # Validate domain
        is_valid = bool(validators.domain(domain))
        
        return {
            'valid': is_valid,
            'domain': domain,
            'error': None if is_valid else 'Invalid domain'
        }
    except Exception as e:
        return {'valid': False, 'error': str(e)}

In [38]:
# Test email domain validation
emails = [
    'user@example.com',
    'admin@company.co.uk',
    'invalid@-domain.com',
    'noemail.com',
    'user@sub.domain.com'
]

for email in emails:
    result = validate_email_domain(email)
    print(f"{email}: {result}")

user@example.com: {'valid': True, 'domain': 'example.com', 'error': None}
admin@company.co.uk: {'valid': True, 'domain': 'company.co.uk', 'error': None}
invalid@-domain.com: {'valid': False, 'domain': '-domain.com', 'error': 'Invalid domain'}
noemail.com: {'valid': False, 'error': 'Invalid email format'}
user@sub.domain.com: {'valid': True, 'domain': 'sub.domain.com', 'error': None}


## Conclusion

This notebook demonstrated various methods for validating domain names in Python:

1. **validators library** - Recommended for most use cases
2. **Regex validation** - Fast and lightweight
3. **DNS lookup** - Confirms domain existence
4. **Custom RFC validation** - Full control and compliance
5. **Batch validation** - Efficient multiple domain checking
6. **URL domain extraction** - Working with full URLs

Choose the method that best fits your requirements based on:
- Performance needs
- Validation accuracy required
- Whether you need to verify existence vs. just syntax
- Dependency constraints