Skip to content

Add workflow to sync docs to Dify knowledge base#612

Merged
AmishaBisht merged 5 commits intomainfrom
dify-workflow
May 5, 2026
Merged

Add workflow to sync docs to Dify knowledge base#612
AmishaBisht merged 5 commits intomainfrom
dify-workflow

Conversation

@AmishaBisht
Copy link
Copy Markdown
Collaborator

@AmishaBisht AmishaBisht commented May 5, 2026

Summary by CodeRabbit

  • Chores
    • Added automation to sync documentation updates to the Dify knowledge base on changes and via manual trigger.
  • Documentation
    • Minor trailing adjustment to the Glific Overview document.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@AmishaBisht has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 39 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24618ca9-ce93-473b-a68b-ec4a20061236

📥 Commits

Reviewing files that changed from the base of the PR and between 972b71e and e57e5cb.

📒 Files selected for processing (1)
  • .github/workflows/sync_to_dify.yml
📝 Walkthrough

Walkthrough

Adds a GitHub Actions workflow and a Python script that sync Markdown files from docs/ into a Dify dataset. The script detects changed or all docs (for manual runs), pages existing dataset documents, and creates or updates documents via Dify file-upload endpoints using environment-provided credentials.

Changes

Documentation Sync to Dify

Layer / File(s) Summary
Config / Constants
.github/scripts/sync_to_dify.py
Module-level constants read DIFY_API_KEY, DIFY_DATASET_ID, optional DIFY_BASE_URL, and build HEADERS for Authorization.
Change Detection
.github/scripts/sync_to_dify.py
get_changed_md_files() runs git diff --name-only HEAD~1 HEAD to collect changed docs/**/*.md; get_all_md_files() walks docs/ for manual dispatch.
Document Naming
.github/scripts/sync_to_dify.py
to_doc_name(file_path) converts file paths to dataset document names by replacing / with __.
Dataset Querying
.github/scripts/sync_to_dify.py
get_existing_documents() pages /datasets/{id}/documents and maps doc["name"]doc["id"].
Create / Update APIs
.github/scripts/sync_to_dify.py
create_document(...) calls /datasets/{id}/document/create-by-file; update_document(...) calls /datasets/{id}/documents/{doc_id}/update-by-file, uploading text/markdown and using resp.raise_for_status().
Orchestration / CLI
.github/scripts/sync_to_dify.py
main() chooses file set based on GITHUB_EVENT_NAME, iterates files, logs per-file failures, and exits non-zero if any fail; __main__ invokes main().
Workflow Orchestration
.github/workflows/sync_to_dify.yml
Workflow triggers on push to main/dify-workflow when docs/** change and on workflow_dispatch; job checks out code, sets up Python 3.11, installs requests, and runs the script with secrets DIFY_API_KEY, DIFY_DATASET_ID, DIFY_BASE_URL.
Trivial Doc EOF
docs/01. Glific Overview.md
Minor trailing EOF adjustment with no substantive content change.

Sequence Diagram

sequenceDiagram
    participant GitHub as GitHub
    participant Workflow as Actions Workflow
    participant Script as sync_to_dify.py
    participant Dify as Dify API

    GitHub->>Workflow: push (docs/** changed) or workflow_dispatch
    Workflow->>Workflow: checkout repo, setup Python
    Workflow->>Script: run sync_to_dify.py (env secrets)
    Script->>Script: detect files (git diff or walk docs/)
    Script->>Dify: GET /datasets/{id}/documents (paginated)
    Dify-->>Script: existing documents list
    Script->>Script: for each file decide create vs update
    Script->>Dify: POST create-by-file / POST update-by-file (file upload)
    Dify-->>Script: success / error responses
    Script->>Workflow: print status, exit code
    Workflow->>GitHub: workflow complete
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I nibble at docs with nimble paws,
Pushed words hop to Dify without pause.
I stitch each markdown, neat and spry,
Knowledge gardens bloom—hop, hop, high! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the primary change: adding a GitHub Actions workflow that syncs documentation to Dify knowledge base.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dify-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread .github/workflows/sync_to_dify.yml Fixed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

@github-actions github-actions Bot temporarily deployed to pull request May 5, 2026 12:20 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
.github/workflows/sync_to_dify.yml (1)

24-25: ⚡ Quick win

pip install requests is unpinned — non-deterministic across runs

Without a pinned version, a new requests release could introduce breaking changes or be yanked between two workflow runs. Pin the version or use a requirements.txt.

♻️ Proposed fix
-        run: pip install requests
+        run: pip install requests==2.32.3

Or add a .github/scripts/requirements.txt:

requests==2.32.3

and change the step to:

-        run: pip install requests
+        run: pip install -r .github/scripts/requirements.txt
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/sync_to_dify.yml around lines 24 - 25, The workflow step
that runs "pip install requests" is unpinned and should be made deterministic;
update the "Install dependencies" step so it installs a specific requests
version (e.g., use "pip install requests==2.32.3") or change it to install from
a requirements file (e.g., add a .github/scripts/requirements.txt and run "pip
install -r .github/scripts/requirements.txt"); modify the step referencing the
Install dependencies run command accordingly to ensure reproducible installs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/scripts/sync_to_dify.py:
- Around line 27-31: Add a module-level timeout constant (e.g., REQUESTS_TIMEOUT
or TIMEOUT_SECONDS) and pass it as the timeout argument to every requests call
in this module: the requests.get call that fetches documents and the two
requests.post calls that upload/create documents; update their calls
(requests.get(..., params=..., timeout=REQUESTS_TIMEOUT) and requests.post(...,
json=..., timeout=REQUESTS_TIMEOUT)) so the HTTP calls cannot hang indefinitely.
Ensure the constant is defined near the top of the file and reused for all
requests.
- Line 7: The BASE_URL assignment uses os.environ.get("DIFY_BASE_URL", ...)
which returns an empty string when the secret exists but is unset, causing
BASE_URL to become "" and break URLs; change the assignment to use a truthy
fallback such as: BASE_URL = (os.environ.get("DIFY_BASE_URL") or
"https://api.dify.ai/v1").rstrip("/") so an empty DIFY_BASE_URL falls back to
the default, preserving the .rstrip("/") call.
- Around line 14-20: In get_changed_md_files(), subprocess.run is not checking
git's exit status so a failing "git diff" (e.g., missing HEAD~1) is swallowed;
update the call to subprocess.run to either pass check=True or call
result.check_returncode() after running (and optionally catch the resulting
CalledProcessError to log a clear error) so failures are surfaced instead of
returning an empty list; reference the get_changed_md_files function and the
subprocess.run invocation when making this change.
- Around line 81-90: The code uses os.path.basename(file_path) (variable
filename) to look up existing_docs which causes collisions; change the key to
the full relative path (e.g., use file_path or os.path.relpath(file_path, <repo
root>) instead of os.path.basename) and use that same relative-path key when
calling update_document(existing_docs[...], file_path, content) and
create_document(...), ensuring existing_docs is populated/queried by the same
full-path string rather than the bare filename to avoid silent overwrites.

In @.github/workflows/sync_to_dify.yml:
- Around line 10-12: The workflow job named "sync" currently only checks out
code but inherits default token scopes; add an explicit permissions block under
the sync job to limit GITHUB_TOKEN to read-only repo contents by inserting:
permissions: contents: read (as a child of the sync job alongside runs-on).
Update the job definition (job symbol "sync") to include that permissions
mapping so the job only has minimal read access.

---

Nitpick comments:
In @.github/workflows/sync_to_dify.yml:
- Around line 24-25: The workflow step that runs "pip install requests" is
unpinned and should be made deterministic; update the "Install dependencies"
step so it installs a specific requests version (e.g., use "pip install
requests==2.32.3") or change it to install from a requirements file (e.g., add a
.github/scripts/requirements.txt and run "pip install -r
.github/scripts/requirements.txt"); modify the step referencing the Install
dependencies run command accordingly to ensure reproducible installs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 71c633e6-0409-4686-a7c7-757b4b4d5b45

📥 Commits

Reviewing files that changed from the base of the PR and between 972d016 and 64bbddd.

📒 Files selected for processing (2)
  • .github/scripts/sync_to_dify.py
  • .github/workflows/sync_to_dify.yml

Comment thread .github/scripts/sync_to_dify.py
Comment thread .github/scripts/sync_to_dify.py Outdated
Comment thread .github/scripts/sync_to_dify.py
Comment thread .github/scripts/sync_to_dify.py Outdated
Comment thread .github/workflows/sync_to_dify.yml
@github-actions github-actions Bot temporarily deployed to pull request May 5, 2026 12:40 Inactive
@github-actions github-actions Bot temporarily deployed to pull request May 5, 2026 16:44 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
.github/scripts/sync_to_dify.py (3)

14-23: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

git diff exit status still not checked — failures produce a silent empty file list

result.returncode is never inspected. On a shallow clone (fetch-depth: 2 with only one commit, or a force-push) git diff HEAD~1 HEAD exits non-zero; the script swallows it and prints the misleading "No markdown files changed."

🐛 Proposed fix
 def get_changed_md_files():
     result = subprocess.run(
         ["git", "diff", "--name-only", "HEAD~1", "HEAD"],
         capture_output=True, text=True
     )
+    result.check_returncode()
     files = result.stdout.strip().splitlines()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/sync_to_dify.py around lines 14 - 23, get_changed_md_files
currently ignores subprocess.run exit status causing silent failures; update it
to check result.returncode (or use subprocess.run(..., check=True) inside
try/except) and surface errors: capture result.stderr and either log or raise an
exception (e.g., RuntimeError) with the stderr so callers aren't misled by an
empty list. Ensure you reference get_changed_md_files and the subprocess.run
result variable when adding the returncode check and error handling.

7-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

BASE_URL empty-string fallback still not fixed

When DIFY_BASE_URL is set as a GitHub secret but left blank, os.environ.get("DIFY_BASE_URL", ...) returns "" (the key exists), so BASE_URL becomes "" and every API URL is malformed.

🐛 Proposed fix
-BASE_URL = os.environ.get("DIFY_BASE_URL", "https://api.dify.ai/v1").rstrip("/")
+BASE_URL = (os.environ.get("DIFY_BASE_URL") or "https://api.dify.ai/v1").rstrip("/")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/sync_to_dify.py at line 7, The BASE_URL assignment uses
os.environ.get("DIFY_BASE_URL", ...) which returns an empty string when the
secret exists but is blank, producing an empty BASE_URL; change the logic to
treat an empty or whitespace-only DIFY_BASE_URL the same as unset by reading
os.environ.get("DIFY_BASE_URL") into a variable, trimming and checking
truthiness, and falling back to the default "https://api.dify.ai/v1" before
calling .rstrip("/"); update the BASE_URL initialization to use that variable so
malformed empty URLs are avoided.

45-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

All three requests calls are still missing a timeout — CI runner can block indefinitely

requests.get (Line 45) and both requests.post calls (Lines 62, 76) have no timeout argument. If the Dify API becomes slow or unreachable, the GitHub Actions job blocks forever until it hits the 6-hour workflow limit.

🐛 Proposed fix — add a module-level constant and pass it everywhere
+REQUEST_TIMEOUT = 30  # seconds
+
 HEADERS = { ... }
         resp = requests.get(
             f"{BASE_URL}/datasets/{DATASET_ID}/documents",
             headers=HEADERS,
-            params={"page": page, "limit": 100}
+            params={"page": page, "limit": 100},
+            timeout=REQUEST_TIMEOUT,
         )
     resp = requests.post(
         f"{BASE_URL}/datasets/{DATASET_ID}/document/create-by-file",
         headers=HEADERS,
         files={"file": (doc_name, content.encode("utf-8"), "text/markdown")},
         data={"data": '...'},
+        timeout=REQUEST_TIMEOUT,
     )
     resp = requests.post(
         f"{BASE_URL}/datasets/{DATASET_ID}/documents/{doc_id}/update-by-file",
         headers=HEADERS,
         files={"file": (doc_name, content.encode("utf-8"), "text/markdown")},
         data={"data": '...'},
+        timeout=REQUEST_TIMEOUT,
     )

Also applies to: 62-68, 76-82

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/sync_to_dify.py around lines 45 - 49, The requests calls in
sync_to_dify.py (requests.get and the two requests.post usages) lack a timeout;
add a module-level constant (e.g., TIMEOUT = 10) near the top of the file and
pass timeout=TIMEOUT to every requests.get and requests.post invocation (the
call that assigns resp = requests.get(...) and the two requests.post(...) calls)
so the script cannot block indefinitely if the Dify API is slow or unreachable.
.github/workflows/sync_to_dify.yml (1)

12-14: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing permissions block — GITHUB_TOKEN inherits default (potentially write) scopes

This job only needs to read repository contents; the absence of an explicit permissions block leaves it with whatever the repository's default token permissions are, which may include write access. Flagged by CodeQL.

🛡️ Proposed fix
   sync:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
     steps:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/sync_to_dify.yml around lines 12 - 14, The workflow job
"sync" currently lacks an explicit permissions block so GITHUB_TOKEN may inherit
broader scopes; add a permissions block for the job (or workflow) that restricts
GITHUB_TOKEN to only what is needed (e.g., set contents: read) to prevent
implicit write access — update the "sync" job to include a permissions section
limiting the token to read-only repository contents.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/scripts/sync_to_dify.py:
- Around line 5-6: Validate the credentials at startup instead of blindly using
os.environ[...]: replace the current top-level assignments for API_KEY and
DATASET_ID with guard logic that reads via os.environ.get, checks they are
non-empty, and verifies the API_KEY has the expected dataset-scoped prefix
(accept e.g. "ds-" or "dataset-") and that DATASET_ID is present; if any check
fails, log a clear error and exit (refer to the API_KEY and DATASET_ID symbols
to locate the code to change).

In @.github/workflows/sync_to_dify.yml:
- Around line 6-7: The workflow currently lists two push branches ("main" and
"dify-workflow") so every push to dify-workflow will trigger a production sync;
remove the "dify-workflow" entry from the push trigger and leave only "main"
(i.e., delete the line or array item containing dify-workflow in the push:
branches section) so only pushes to main run the sync.

---

Duplicate comments:
In @.github/scripts/sync_to_dify.py:
- Around line 14-23: get_changed_md_files currently ignores subprocess.run exit
status causing silent failures; update it to check result.returncode (or use
subprocess.run(..., check=True) inside try/except) and surface errors: capture
result.stderr and either log or raise an exception (e.g., RuntimeError) with the
stderr so callers aren't misled by an empty list. Ensure you reference
get_changed_md_files and the subprocess.run result variable when adding the
returncode check and error handling.
- Line 7: The BASE_URL assignment uses os.environ.get("DIFY_BASE_URL", ...)
which returns an empty string when the secret exists but is blank, producing an
empty BASE_URL; change the logic to treat an empty or whitespace-only
DIFY_BASE_URL the same as unset by reading os.environ.get("DIFY_BASE_URL") into
a variable, trimming and checking truthiness, and falling back to the default
"https://api.dify.ai/v1" before calling .rstrip("/"); update the BASE_URL
initialization to use that variable so malformed empty URLs are avoided.
- Around line 45-49: The requests calls in sync_to_dify.py (requests.get and the
two requests.post usages) lack a timeout; add a module-level constant (e.g.,
TIMEOUT = 10) near the top of the file and pass timeout=TIMEOUT to every
requests.get and requests.post invocation (the call that assigns resp =
requests.get(...) and the two requests.post(...) calls) so the script cannot
block indefinitely if the Dify API is slow or unreachable.

In @.github/workflows/sync_to_dify.yml:
- Around line 12-14: The workflow job "sync" currently lacks an explicit
permissions block so GITHUB_TOKEN may inherit broader scopes; add a permissions
block for the job (or workflow) that restricts GITHUB_TOKEN to only what is
needed (e.g., set contents: read) to prevent implicit write access — update the
"sync" job to include a permissions section limiting the token to read-only
repository contents.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96d65fdf-df34-46d0-85ec-8c6a7389db99

📥 Commits

Reviewing files that changed from the base of the PR and between 64bbddd and 972b71e.

📒 Files selected for processing (3)
  • .github/scripts/sync_to_dify.py
  • .github/workflows/sync_to_dify.yml
  • docs/01. Glific Overview.md
✅ Files skipped from review due to trivial changes (1)
  • docs/01. Glific Overview.md

Comment thread .github/scripts/sync_to_dify.py
Comment thread .github/workflows/sync_to_dify.yml Outdated
- drop dify-workflow from workflow branches filter
- revert trailing newline in docs/01. Glific Overview.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot temporarily deployed to pull request May 5, 2026 17:17 Inactive
@github-actions github-actions Bot temporarily deployed to pull request May 5, 2026 17:22 Inactive
@AmishaBisht AmishaBisht requested a review from shijithkjayan May 5, 2026 17:24
@AmishaBisht AmishaBisht merged commit ba406a1 into main May 5, 2026
7 checks passed
@AmishaBisht AmishaBisht deleted the dify-workflow branch May 5, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants