# Introduction

This notebook explains how to run the program to fetch abuselogs from Wikipedia API.

In [5]:
import requests
import pandas as pd

In [6]:
# append system path to import spider program
import sys
sys.path.append('../')
from utils.spider_abuselog import retrieve_abuselog_month # fetch data for a specific month in a given year
from utils.spider_abuselog import retrieve_abuselog_year # fetch data for a given year

# Code examples

## Retrieve data for 1 month

To retrieve abuselogs for a specific month, please use the `retrieve_abuselog_month` function. It automatically retrieves the logs and write them into a local csv file (format: year-month.csv).

It requires 5 parameters:
- request_session:
    - A request session to sent API requests.
- year (int): 
    - The year for which block logs should be retrieved.
- month (int):
    - The month for which block logs should be retrieved.
- folder_path (str):
    - The folder path where the CSV file will be saved.
    - Example: './data/abusefilter_logs/'
- compress (bool): 
    - Specify if compress the CSV file into a gz file.
    - Default to False

**Note:**
- The first abuselog was in March 2009.
- If 0 record is retrieved, no csv will be created.
- If you encounter issues at running the function, please check if a file already exists on your local disk. This function automatically stop to prevent from overwritting local file.

In [7]:
# Set folder path to store fetched data
folder_path = '../data/abuselogs/'

In [8]:
for month in range(6, 13, 1):
    s = requests.Session()
    retrieve_abuselog_month(request_session=s, year=2010, month=month, folder_path=folder_path, compress=True)

0 records retrieved.
Retrieval complete for logs between 2009-01-01T00:00:00Z and 2009-02-01T00:00:00Z.
No record was retrieved or saved.
0 records retrieved.
Retrieval complete for logs between 2009-02-01T00:00:00Z and 2009-03-01T00:00:00Z.
No record was retrieved or saved.


## Retrieve data for 1 year

To retrieve block logs for a specific year, please use the retrieve_abuselog_year function. It automatically loops over the month retrive function to download and store logs for a given year. The data will be stored in 12 local csv files (format: year-month.csv), each for 1 month.

It requires 3 parameters:
- year (int): 
    - The year for which block logs should be retrieved.
- folder_path (str):
    - The folder path where the CSV file will be saved.
- compress (bool): 
    - Specify if compress the CSV file into a gz file.
    - Default to False

**Note:**
- The first abuselog was in March 2009.
- If 0 record is retrieved, no csv will be created.
- If you encounter issues at running the function, please check if a file already exists on your local disk. This function automatically stop to prevent from overwritting local file.

In [11]:
# Set folder path to store fetched data
folder_path = '../data/abuselogs/'

In [12]:
for year in range(2010, 2016, 1):
    retrieve_abuselog_year(year=year, folder_path=folder_path, compress=True)

500 records retrieved.
Search with continue index: {'aflstart': '2010-01-01T03:02:04Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-01-01T06:03:47Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-01-01T09:11:53Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-01-01T14:51:48Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-01-01T18:49:07Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-01-01T21:31:59Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-01-01T23:57:56Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-01-02T02:31:23Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-01-02T05:44:40Z', 'continue': '-||'}, 500 records retrieved.
Searc

KeyboardInterrupt: 