# Confluence

[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform designed to save and organize all project-related materials. As a knowledge base, Confluence primarily serves content management activities.

This loader allows you to fetch and process Confluence pages into `Document` objects.

---

## Authentication Methods

The following authentication methods are supported:

- `username/api_key`
- `OAuth2 login`
- `cookies`
- On-premises installations: `token` authentication

---

## Page Selection

You can specify which pages to load using:

- **page_ids** (*list*):  
  A list of `page_id` values to load the corresponding pages.

- **space_key** (*string*):  
  A string of `space_key` value to load all pages within the specified confluence space.

If both `page_ids` and `space_key` are provided, the loader will return the union of pages from both lists.

*Hint:* Both `space_key` and `page_id` can be found in the URL of a Confluence page:  
`https://yoursite.atlassian.com/wiki/spaces/{space_key}/pages/{page_id}`

---

## Attachments

You may include attachments in the loaded `Document` objects by setting the boolean parameter **include_attachments** to `True` (default: `False`). When enabled, all attachments are downloaded and their text content is extracted and added to the Document.

**Currently supported attachment types:**

- PDF (`.pdf`)
- PNG (`.png`)
- JPEG/JPG (`.jpeg`, `.jpg`)
- SVG (`.svg`)
- Word (`.doc`, `.docx`)
- Excel (`.xls`, `.xlsx`)

---

Before using ConfluenceLoader make sure you have the latest version of the atlassian-python-api package installed:

In [None]:
%pip install --upgrade --quiet  atlassian-python-api langchain_community
%pip install pytesseract Pillow reportlab svglib

```bash
sudo apt update
sudo apt install libcairo2-dev
```

In [None]:
%pip install rlPyCairo

## Examples

### Username and Password or Username and API Token (Atlassian Cloud only)

This example authenticates using either a username and password or, if you're connecting to an Atlassian Cloud hosted version of Confluence, a username and an API Token.
You can generate an API token at: https://id.atlassian.com/manage-profile/security/api-tokens.

The `limit` parameter specifies how many documents will be retrieved in a single call, not how many documents will be retrieved in total.
By default the code will return up to 1000 documents in 50 documents batches. To control the total number of documents use the `max_pages` parameter. 
Plese note the maximum value for the `limit` parameter in the atlassian-python-api package is currently 100.  

In [None]:
url="https://xxxxx.atlassian.net/wiki"
username="xxxxx@xxxxx.com"
api_key="xxxxxx"


In [None]:
from langchain_community.document_loaders import ConfluenceLoader

loader = ConfluenceLoader(
    url=url,
    username=username,
    api_key=api_key,
    space_key="DDS",
    include_attachments=True,
    limit=50,
)
documents = loader.load()

In [None]:
from langchain_community.document_loaders import ConfluenceLoader

loader = ConfluenceLoader(
    url=url,
    username=username,
    api_key=api_key,
    include_attachments=True,
    limit=50,
    cql = """
        space = "DDS" AND 
        type = "page" AND 
        (
            text ~ "Métricas de éxito" OR
            text ~ "milestone" OR
            text ~ "timeline" OR
            text ~ "schedule" OR
            text ~ "deliverable" OR
            text ~ "scope" OR
            text ~ "risk" OR
            text ~ "stakeholder"
        )
    """
)
documents = loader.load()



In [None]:
documents

In [None]:
from atlassian import Jira

url="https://xxxxx.atlassian.net"
project="xxxxx"
jira = Jira(url=url, username=username, password=api_key)

In [None]:

def get_all_issues(project, batch_size=100):
    start = 0
    issues = []
    while True:
        batch = jira.jql(
            f'project = "{project}"',
            start=start,
            limit=batch_size,
            fields=["summary", "description", "status"]
        )["issues"]
        if not batch:
            break
        issues.extend(batch)
        start += batch_size
    return issues


all_issues = get_all_issues(project)
print(f"Total issues fetched: {len(all_issues)}")


In [None]:
all_issues