## Fetchers Notebook Contents
- [How can I create a `Fetcher`?](#How-can-I-create-a-Fetcher-?)
- [How can I fetch GitHub issues?  ](#How-can-I-fetch-GitHub-issues?)
- [How does Donkeybot fetch Rucio documentation?](#How-does-Donkeybot-Fetch-Rucio-Documentation?)
- [How does Donkeybot save the fetched data?](#How-does-Donkeybot-save-the-fetched-data?)

**The scripts `fetch_issues.py`, `fetch_rucio_docs.py` do everything explained here.**  
See [scripts](https://github.com/rucio/donkeybot/tree/master/scripts) for source code and run the scripts with the '-h' option for info on the arguments they take.  
eg.  

`(virt)$ python scripts/fetch_rucio_docs.py -h`

### How can I create a `Fetcher` ?

Simple, use the `FetcherFactory` and just pick the fetcher type 
- Issue for a GitHub `IssueFetcher`
- Rucio Documentation for a `RucioDocsFetcher`   

What about the `EmailFetcher` ?
- Currently as explained in [How It Works](https://github.com/rucio/donkeybot/blob/master/docs/how_it_works.md) emails are fetched from different scripts run in CERN and not through Donkeybot.

In [None]:
from bot.fetcher.factory import FetcherFactory

Let's create a GitHub `IssueFetcher`.

In [5]:
issues_fetcher = FetcherFactory.get_fetcher("Issue")
issues_fetcher

<bot.fetcher.issues.IssueFetcher at 0x1b75c30b6c8>

### How can I fetch GitHub issues?

You need 4 things.
- The **repository** whose issues we are fetching
- A **GitHub API token**. To generate a GitHub token visit [Personal Access Tokens](https://github.com/settings/tokens) and follow [Creating a Personal Access Token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token).
- The **maximum number of pages** the fetcher will look through to fetch issues. (default is 201)
- A couple pandas **DataFrames**, one which will hold the issues data and one for the issue comments data.

In [None]:
import pandas as pd

In [None]:
repository = 'rucio/rucio' # but you can use any in the format user/repo
token = "<YOUR_TOKEN>"
max_pages = 3

In [None]:
(issues_df, comments_df) = issues_fetcher.fetch(repo=repository, api_token=token, max_pages=max_pages)

The resulting DataFrames will look like this:

In [14]:
issues_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   issue_id    26 non-null     object
 1   title       26 non-null     object
 2   state       26 non-null     object
 3   creator     26 non-null     object
 4   created_at  26 non-null     object
 5   comments    26 non-null     object
 6   body        26 non-null     object
dtypes: object(7)
memory usage: 1.5+ KB


In [15]:
comments_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   issue_id    16 non-null     object
 1   comment_id  16 non-null     object
 2   creator     16 non-null     object
 3   created_at  16 non-null     object
 4   body        16 non-null     object
dtypes: object(5)
memory usage: 768.0+ bytes


## How does Donkeybot Fetch Rucio Documentation? 

It's the same process we followed with the `IssueFetcher` only now the factory will create a `RucioDocsFetcher`

In [None]:
from bot.fetcher.factory import FetcherFactory

In [17]:
docs_fetcher = FetcherFactory.get_fetcher("Rucio Documentation")
docs_fetcher

<bot.fetcher.docs.RucioDocsFetcher at 0x1b75c43bf48>

In [None]:
token = "<YOUR_TOKEN>"

In [None]:
docs_df = docs_fetcher.fetch(api_token=token)

## How does Donkeybot save the fetched data?

For this we need to  
**Step 1.** open a connection to our Data Storage  

In [None]:
from bot.database.sqlite import Databae

# open the connection
db_name = 'data_storage'
data_storage = Database(f"{db_name}.db")

**Step 2.** Save the fetched issues and comments data.

In [None]:
# save the fetched data
issues_fetcher.save(
    db=data_storage,
    issues_table_name='issues',
    comments_table_name='issue_comments',
)

**Step 2.1.** Alternativerly save the documentation data.

In [None]:
# save the fetched data
docs_fetcher.save(db=data_storage, docs_table_name='docs')

**Step 3.** Finally close the connection

In [None]:
# close the connection
data_storage.close_connection()

**Alternative :** If you don't want to use Donkeybot's Data Storage you can use the `save_with_pickle()` and `load_with_pickle()` methods to achieve the same results.