# Introduction

This notebook explains how to run the program to fetch abuselogs from Wikipedia API.

In [5]:
import requests
import pandas as pd

In [6]:
# append system path to import spider program
import sys
sys.path.append('../')
from utils.spider_abuselog import retrieve_abuselog_month # fetch data for a specific month in a given year
from utils.spider_abuselog import retrieve_abuselog_year # fetch data for a given year

# Code examples

## Retrieve data for 1 month

To retrieve abuselogs for a specific month, please use the `retrieve_abuselog_month` function. It automatically retrieves the logs and write them into a local csv file (format: year-month.csv).

It requires 5 parameters:
- request_session:
    - A request session to sent API requests.
- year (int): 
    - The year for which block logs should be retrieved.
- month (int):
    - The month for which block logs should be retrieved.
- folder_path (str):
    - The folder path where the CSV file will be saved.
    - Example: './data/abusefilter_logs/'
- compress (bool): 
    - Specify if compress the CSV file into a gz file.
    - Default to False

**Note:**
- The first abuselog was in March 2009.
- If 0 record is retrieved, no csv will be created.
- If you encounter issues at running the function, please check if a file already exists on your local disk. This function automatically stop to prevent from overwritting local file.

In [7]:
# Set folder path to store fetched data
folder_path = '../data/abuselogs/'

In [13]:
for month in range(6, 13, 1):
    s = requests.Session()
    retrieve_abuselog_month(request_session=s, year=2010, month=month, folder_path=folder_path, compress=True)

500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T01:33:34Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T03:49:57Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T06:51:55Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T09:39:29Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T12:24:29Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T13:57:18Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T15:12:17Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T16:33:44Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2010-06-01T18:17:03Z', 'continue': '-||'}, 500 records retrieved.
Searc

## Retrieve data for 1 year

To retrieve block logs for a specific year, please use the retrieve_abuselog_year function. It automatically loops over the month retrive function to download and store logs for a given year. The data will be stored in 12 local csv files (format: year-month.csv), each for 1 month.

It requires 3 parameters:
- year (int): 
    - The year for which block logs should be retrieved.
- folder_path (str):
    - The folder path where the CSV file will be saved.
- compress (bool): 
    - Specify if compress the CSV file into a gz file.
    - Default to False

**Note:**
- The first abuselog was in March 2009.
- If 0 record is retrieved, no csv will be created.
- If you encounter issues at running the function, please check if a file already exists on your local disk. This function automatically stop to prevent from overwritting local file.

In [19]:
# Set folder path to store fetched data
folder_path = '../data/abuselogs/'

In [20]:
for year in range(2013, 2018, 1):
    retrieve_abuselog_year(year=year, folder_path=folder_path, compress=True)

499 records retrieved.
Search with continue index: {'aflstart': '2013-01-01T04:47:31Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2013-01-01T09:42:01Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2013-01-01T15:17:44Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2013-01-01T19:06:52Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2013-01-01T22:40:06Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2013-01-02T01:54:22Z', 'continue': '-||'}, 499 records retrieved.
Search with continue index: {'aflstart': '2013-01-02T04:59:36Z', 'continue': '-||'}, 499 records retrieved.
Search with continue index: {'aflstart': '2013-01-02T09:25:28Z', 'continue': '-||'}, 500 records retrieved.
Search with continue index: {'aflstart': '2013-01-02T12:59:06Z', 'continue': '-||'}, 498 records retrieved.
Searc