Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zotero handler implementation #9084

Merged
merged 20 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions mindsdb/integrations/handlers/zotero_handler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Build your own Zotero AI agent

This is the implementation of the Zotero handler for MindsDB.

## Zotero
[Zotero](https://www.zotero.org/) is a free tool for organizing and managing research materials. It lets you store articles, books, and other resources in one place, annotate them, take notes, and create collections for organization. You can also share your collections with others.

## Implementation
This handler uses [pyzotero](https://pyzotero.readthedocs.io/en/latest/) , an API wrapper for Zotero as a simple library to facilitate the integration.

The required arguments to establish a connection are,
- `library_id` :
To find your library ID, as noted on [pyzotero](https://pyzotero.readthedocs.io/en/latest/) :
- Your personal library ID is available [here](https://www.zotero.org/settings/keys), in the section "Your userID for use in API calls" - you must be logged in for the link to work.
- For group libraries, the ID can be found by opening the group’s page: https://www.zotero.org/groups/groupname, and hovering over the group settings link. The ID is the integer after /groups/
- `library_type` : "user" for personal account or "group" for a group library
- `api_key` :
To find your api key, as noted on [pyzotero](https://pyzotero.readthedocs.io/en/latest/) : follow [this link](https://www.zotero.org/settings/keys/new)

## Usage
In order to make use of this handler and connect to Zotero in MindsDB, the following syntax can be used,

```sql
CREATE DATABASE mindsdb_zotero
WITH ENGINE = 'zotero',
PARAMETERS = {
"library_id": "<your-library-id>",
"library_type": "user",
"api_key": "<your-api-key>" }
```

## Implemented Features

Now, you can use this established connection to query your table as follows:

Important note: Most things (articles, books, annotations, notes etc.) are refered to as **items** in the pyzotero API. So each item has an item_type (such as "annotation") and an item_id. Some items are children to other items (for example an annotation can be a child-item on the article where it was made on which is the parent-item)

### Annotations
Annotations are the highlighted text of research materials. So, the most important information is in that annotated text content. To get that content and metadata about the annotations we can use the following queries.

To select all annotations of your library:

```sql
SELECT * FROM mindsdb_zotero.annotations
```

To select all data for one annotation by its id:
```sql
SELECT * FROM mindsdb_zotero.annotations WHERE item_id = "<item_id>"
```

To select all annotations of a parent item (like all annotations on a book):
```sql
SELECT * FROM mindsdb_zotero.annotations WHERE parent_item_id = "<parent_item_id>"
```


9 changes: 9 additions & 0 deletions mindsdb/integrations/handlers/zotero_handler/__about__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
__title__ = 'MindsDB Zotero handler'
__package_name__ = 'mindsdb_zotero_handler'
__version__ = '0.0.1'
__description__ = "MindsDB handler for Zotero"
__author__ = 'Elina Kapetanaki'
__github__ = 'https://github.com/mindsdb/mindsdb'
__pypi__ = 'https://pypi.org/project/mindsdb/'
__license__ = 'MIT'
__copyright__ = 'Copyright 2024 - mindsdb'
21 changes: 21 additions & 0 deletions mindsdb/integrations/handlers/zotero_handler/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from mindsdb.integrations.libs.const import HANDLER_TYPE

from .__about__ import __version__ as version, __description__ as description
try:
from .zotero_handler import (
ZoteroHandler as Handler
)
import_error = None
except Exception as e:
Handler = None
import_error = e

title = 'Zotero'
name = 'zotero'
type = HANDLER_TYPE.DATA
icon_path = 'icon.svg'

__all__ = [
'Handler', 'version', 'name', 'type', 'title', 'description',
'import_error', 'icon_path'
]
102 changes: 102 additions & 0 deletions mindsdb/integrations/handlers/zotero_handler/icon.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pyzotero
227 changes: 227 additions & 0 deletions mindsdb/integrations/handlers/zotero_handler/zotero_handler.py
ElinaKapetanaki marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
import os
import ast
ElinaKapetanaki marked this conversation as resolved.
Show resolved Hide resolved

from pyzotero import zotero
import pandas as pd

from mindsdb.utilities import log
from mindsdb.utilities.config import Config

from mindsdb.integrations.libs.api_handler import APIHandler, APITable, FuncParser
from mindsdb.integrations.utilities.sql_utils import extract_comparison_conditions

from mindsdb.integrations.libs.response import (
HandlerStatusResponse as StatusResponse,
HandlerResponse as Response,
RESPONSE_TYPE
)

logger = log.getLogger(__name__)

class AnnotationsTable(APITable):
"""Represents a table of annotations in Zotero."""

def select(self, query: ast.Select) -> Response:
"""Select annotations based on the provided query.

Parameters
----------
query : ast.Select
AST (Abstract Syntax Tree) representation of the SQL query.

Returns
-------
Response
Response object containing the selected annotations as a DataFrame.
"""
conditions = extract_comparison_conditions(query.where)

params = {}
method_name = 'items' # Default method name
for op, arg1, arg2 in conditions:

if op in {'or', 'and'}:
raise NotImplementedError(f'OR and AND are not supported')
if arg1 == 'item_id':
if op == '=':
params['item_id'] = arg2
method_name = 'item'
else:
NotImplementedError('Only "item_id=" is implemented')
if arg1 == 'parent_item_id':
if op == '=':
params['parent_item_id'] = arg2
method_name = 'children'
else:
NotImplementedError('Only "parent_item_id=" is implemented')

params.update({'itemType': 'annotation'}) # Add item type to params

annotations_data = self.handler.call_zotero_api(method_name, params)
annotations = [annotation['data'] for annotation in annotations_data]
df = pd.DataFrame(annotations)

# Get the columns of the annotations table
columns = self.get_columns()

# Filter the DataFrame by columns
df = df[columns]

return Response(RESPONSE_TYPE.TABLE, data_frame=df)

def get_columns(self):
"""Get the columns of the annotations table.

Returns
-------
list
List of column names.
"""
return [
'annotationColor',
'annotationComment',
'annotationPageLabel',
'annotationText',
'annotationType',
'dateAdded',
'dateModified',
'key',
'parentItem',
'relations',
'tags',
'version'
]

class ZoteroHandler(APIHandler):
"""Handles communication with the Zotero API."""

def __init__(self, name=None, **kwargs):
"""Initialize the Zotero handler.

Parameters
----------
name : str
Name of the handler instance.

Other Parameters
----------------
connection_data : dict
Dictionary containing connection data such as 'library_id', 'library_type', and 'api_key'.
If not provided, will attempt to fetch from environment variables or configuration file.
"""

super().__init__(name)

args = kwargs.get('connection_data', {})
self.connection_args = {}

handler_config = Config().get('zotero_handler', {})
for k in ['library_id', 'library_type', 'api_key']:
if k in args:
self.connection_args[k] = args[k]
elif f'ZOTERO_{k.upper()}' in os.environ:
self.connection_args[k] = os.environ[f'ZOTERO_{k.upper()}']
elif k in handler_config:
self.connection_args[k] = handler_config[k]

self.is_connected = False
self.api = None

annotations_table = AnnotationsTable(self)
self._register_table('annotations', annotations_table)

def connect(self) -> StatusResponse :
"""Connect to the Zotero API.

Returns
-------
StatusResponse
Status of the connection attempt.
"""

if self.is_connected is True:
return self.api

self.api = zotero.Zotero(self.connection_args['library_id'],
self.connection_args['library_type'],
self.connection_args['api_key'])

self.is_connected = True
return self.api

def check_connection(self) -> StatusResponse:
"""Check the connection status to the Zotero API.

Returns
-------
StatusResponse
Status of the connection.
"""

response = StatusResponse(False)

try:
self.connect()
response.success = True

except Exception as e:
response.error_message = f'Error connecting to Zotero API: {str(e)}. Check credentials.'
logger.error(response.error_message)

if response.success is False and self.is_connected is True:
self.is_connected = False

return response

def native_query(self, query_string: str = None) -> Response:
"""Execute a native query against the Zotero API.

Parameters
----------
query_string : str
The query string to execute, formatted as required by the Zotero API.

Returns
-------
Response
Response object containing the result of the query.
"""

method_name, params = FuncParser().from_string(query_string)

df = self.call_zotero_api(method_name, params)

return Response(
RESPONSE_TYPE.TABLE,
data_frame=df
)

def call_zotero_api(self, method_name: str = None, params: dict = None) -> pd.DataFrame:
"""Call a method in the Zotero API.

Parameters
----------
method_name : str
The name of the method to call in the Zotero API.
params : dict
Parameters to pass to the method.

Returns
-------
pd.DataFrame
DataFrame containing the result of the API call.
"""

if self.is_connected is False:
self.connect()

method = getattr(self.api, method_name)

try:
result = method(**params)
except Exception as e:
error = f"Error calling method '{method_name}' with params '{params}': {e}"
logger.error(error)
raise e
return pd.DataFrame(result)