# Wykop project
Project of fetching and analysing wykop data.

## Fetching data
Pages in wykop are available under URL's with numbers: https://www.wykop.pl/link/3953283/ This URL will redirect to correct one, containing title: https://www.wykop.pl/link/3953283/to-juz-oficjalne-nasa-wraca-na-ksiezyc/

Shortest sequence of steps require to call URL using https protocol and with "/" at the end.

### Existing resource

In [None]:
import requests

# allow_redirects=False - this is to prevent requests from automaticly following redirections
r = requests.get('https://www.wykop.pl/link/3953283/', allow_redirects=False)
print r.status_code
print r.headers['Location']

301
https://www.wykop.pl/link/3953283/to-juz-oficjalne-nasa-wraca-na-ksiezyc/


### Non existing resource

In [3]:
import requests

# allow_redirects=False - this is to prevent requests from automaticly following redirections
r = requests.get('https://www.wykop.pl/link/3953282/', allow_redirects=False)
print r.status_code

404


### Function fetching wykop pages

In [4]:
def fetch_wykop_page(id):
    """
    Fetch main wykop page by id
    
    Parameters:
    id: page id
    
    Returns:
    Dictionary: {
        id: requested page's id
        exists: boolean - is page exists
        url: full url of page with requested id
        error: error massage if error ocure
        body: text with body of page
    }
    """
    
    r = requests.get('https://www.wykop.pl/link/%d/' % id, allow_redirects=False)
    if r.status_code==404:
        return {
            'id': id, 
            'exists': False, 
            'error': None
        }
    elif r.status_code==301:
        r2 = requests.get(r.headers['Location'], allow_redirects=False)
        if r2.status_code==200:
            return {
                'id': id, 
                'exists': True, 
                'url': r.headers['Location'],
                'error': None,
                'body': r2.text
                   }
        else:
            return {
                'id': id, 
                'exists': True, 
                'url': r.headers['Location'],
                'error': 'error fetching redirected page: %d' % r2.status_code
                   }
    else:
        return {
                'id': id, 
                'exists': False, 
                'error': 'error fetching main page: %d' % r.status_code
                   }

In [5]:
# correct page, contains page body so it's litle messy
# print fetch_wykop_page(3953283)

# non existing page
print fetch_wykop_page(3953282)


{'id': 3953282, 'exists': False, 'error': None}
