feat: port 4 valuable PRs from YOUTUBE-EXTENSION (pre-archive salvage) by groupthinking · Pull Request #113 · groupthinking/EventRelay

groupthinking · 2026-03-20T19:33:11Z

What this does

Ports the 4 PRs from YOUTUBE-EXTENSION that had real, valuable code before the repo was archived. All 26 open PRs in YOUTUBE-EXTENSION have been closed.

Changes

Source PR	What	Files
#706	Fix duplicate skeleton repo generation — UUID4 suffix replaces `time() % 10000`	code_generator.py, deployment_manager.py
#703	Skill Builder — deployment learning system (new)	services/skill_builder.py (368 lines)
#708	Cloud-native services — Vertex AI, Firestore, Cloud Tasks (new)	services/cloud/ (5 modules), cloud_api_endpoints.py, Dockerfile.cloudrun
#707	Gemini Vision multimodal ingestion	gemini_service.py, enhanced_video_processor.py, videopack/schema.py

Stats

20 files changed, +5,076 / -446
14 new files, 6 modified files
Zero new dependencies (cloud extras are optional)

Remaining

Issue chore: configure secrets for ported Gemini workflows #111: GOOGLE_API_KEY + APP_ID + APP_PRIVATE_KEY secrets still need manual configuration

Ported from YOUTUBE-EXTENSION PRs #703, #706, #707, #708 which were opened by Copilot/Claude agents right before the repo was archived. ## PR #706 — Fix duplicate skeleton repo generation - Added _extract_video_id() and _build_title() helpers to code_generator.py - deployment_manager.py: replaced timestamp % 10000 with UUID4 suffix - Root cause fix for the 11 identical uvai-generated-project-* repos ## PR #703 — Skill Builder (deployment learning system) - New: src/youtube_extension/services/skill_builder.py (368 lines) - Records deployment outcomes and derives lessons via EMA weighting - Surfaces top-N lessons for injection into future LLM prompts ## PR #708 — Cloud-native services (Vertex AI + Firestore + Cloud Tasks) - New: src/youtube_extension/services/cloud/ (5 service modules) - Firestore shared state replacing in-memory dicts - Cloud Tasks async video processing queue with retry/backoff - Vertex AI Agent Builder integration with Google Embedded 2 - New: cloud_api_endpoints.py (/api/v3/* routes) - New: Dockerfile.cloudrun, infrastructure/cloudrun/setup.sh - New: docs/cloud-native-architecture.md ## PR #707 — Gemini Vision multimodal ingestion - Updated gemini_service.py with frame extraction + batch analysis - Updated enhanced_video_processor.py with visual context pipeline - Updated videopack/schema.py with VisualContext + VisualElement models - New: docs/GEMINI_VISION_INTEGRATION.md Closes YOUTUBE-EXTENSION PRs: #702, #703, #704, #705, #706, #707, #708, #709, #710, #711 Related: EventRelay #111 (secrets still needed)

vercel · 2026-03-20T19:33:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
v0-uvai	Error		Mar 20, 2026 9:33pm

gemini-code-assist · 2026-03-20T19:34:28Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request ports four valuable PRs from the archived YOUTUBE-EXTENSION repository, focusing on enhancing the project's architecture and capabilities. It introduces cloud-native services, multimodal ingestion with Gemini Vision, and a fix for duplicate repository generation, significantly improving the project's scalability, intelligence, and robustness.

Highlights

Cloud-Native Architecture: Introduces a fully cloud-native architecture for the UVAI YouTube Extension using Google Cloud Platform services, including Vertex AI Agent Builder, Cloud Firestore, Cloud Tasks, and Cloud Run.
Gemini Vision Integration: Implements Gemini Vision integration for multimodal ingestion, enabling deep visual context extraction from YouTube videos alongside existing Speech-to-Text (STT) capabilities.
Duplicate Skeleton Repo Fix: Fixes the duplicate skeleton repo generation issue by replacing time() % 10000 with a UUID4 suffix for unique repository names.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

gemini-code-assist

Code Review

This pull request is a significant contribution, porting several valuable features from an archived repository, including a new Skill Builder, cloud-native services for scalable deployment, and multimodal ingestion with Gemini Vision. The code is generally well-structured and includes new documentation and tests.

My review focuses on improving robustness, security, and maintainability. Key suggestions include:

Fixing a critical bug in the Cloud Tasks integration where a URL mismatch would prevent tasks from being processed.
Enhancing the security of the Cloud Tasks handler by using OIDC tokens instead of a simple header check.
Improving the task handler's error handling to prevent unnecessary retries for permanent failures.
Addressing a race condition in the Vertex AI service by avoiding state mutation in a singleton.
Other suggestions focus on refactoring duplicated code, improving health checks, and making parsing logic more robust.

src/youtube_extension/services/cloud/cloud_tasks_queue.py

gemini-code-assist · 2026-03-20T19:39:09Z

src/youtube_extension/backend/cloud_api_endpoints.py

+        # Verify request is from Cloud Tasks
+        if not x_cloudtasks_taskname:
+            logger.warning("Unauthorized task handler access attempt")
+            raise HTTPException(
+                status_code=403,
+                detail="Only Cloud Tasks can call this endpoint"
+            )


The current security check for the Cloud Tasks handler relies on the x-cloudtasks-taskname header, which can be spoofed if the endpoint is publicly accessible. Since the Cloud Run service is deployed with --allow-unauthenticated, it's critical to use a more robust authentication mechanism. The recommended approach is to use OIDC tokens. You should configure Cloud Tasks to create tasks with an OIDC token and then validate this token in your FastAPI endpoint to ensure that requests are legitimate and originate from the correct service account.

gemini-code-assist · 2026-03-20T19:39:09Z

src/youtube_extension/services/cloud/vertex_ai_agent.py

+    async def generate_structured_output(
+        self,
+        prompt: str,
+        schema: Dict[str, Any],
+    ) -> Dict[str, Any]:
+        """
+        Generate structured JSON output from prompt.
+
+        Args:
+            prompt: User prompt
+            schema: JSON schema for output
+
+        Returns:
+            Structured data matching schema
+        """
+        # Update model config with schema
+        original_config = self.agent_config
+        self.agent_config.response_schema = schema
+        self._initialize_model()
+
+        try:
+            response = await self.process_text(prompt)
+            # Parse JSON response
+            result = json.loads(response.text)
+            return result
+
+        finally:
+            # Restore original config
+            self.agent_config = original_config
+            self._initialize_model()
+


The generate_structured_output method modifies the shared self.agent_config and re-initializes self.model. Since this service is used as a singleton, this is not thread-safe and can lead to race conditions in a concurrent environment. A better approach is to pass a custom generation_config to the generate_content call, which it supports. You should modify process_text to accept an optional generation_config override and pass it to the model. This avoids mutating the shared service state.

gemini-code-assist · 2026-03-20T19:39:09Z

Dockerfile.cloudrun

+
+# Health check
+HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
+    CMD python -c "import requests; requests.get('http://localhost:8000/health', timeout=5)"


The current healthcheck command doesn't verify the HTTP status code of the response. requests.get() will succeed even if the endpoint returns an error status like 500. It's better to use raise_for_status() to ensure the healthcheck fails on non-2xx responses.

CMD python -c "import requests; requests.get('http://localhost:8000/health', timeout=5).raise_for_status()"

gemini-code-assist · 2026-03-20T19:39:09Z

src/youtube_extension/backend/cloud_api_endpoints.py

+        except Exception as e:
+            error_msg = f"Task processing failed: {str(e)}"
+            logger.error(error_msg)
+
+            # Update state with error
+            try:
+                firestore_service = await get_firestore_service()
+                await firestore_service.update_state(
+                    payload.video_id,
+                    status='failed',
+                    error_message=error_msg
+                )
+            except Exception as state_error:
+                logger.error(f"Failed to update error state: {state_error}")
+
+            raise HTTPException(status_code=500, detail=error_msg)


The task handler catches all exceptions and returns a 500 HTTP status code. This causes Cloud Tasks to retry the task, even for permanent failures (e.g., an invalid video URL). This can lead to unnecessary retries and wasted resources. You should differentiate between transient and permanent errors. For permanent errors, log the failure, update the state in Firestore, and return a 2xx status code to Cloud Tasks to acknowledge the task and prevent retries.

src/youtube_extension/backend/enhanced_video_processor.py

src/youtube_extension/services/ai/gemini_service.py

gemini-code-assist · 2026-03-20T19:39:10Z

src/youtube_extension/services/cloud/cloud_video_processor.py

+    def _extract_video_id(self, video_url: str) -> str:
+        """Extract video ID from YouTube URL"""
+        # Simple extraction - can be enhanced
+        if 'youtube.com/watch?v=' in video_url:
+            return video_url.split('v=')[1].split('&')[0]
+        elif 'youtu.be/' in video_url:
+            return video_url.split('youtu.be/')[1].split('?')[0]
+        else:
+            # Assume it's already an ID
+            return video_url


This _extract_video_id implementation is duplicated and less robust than the one introduced in src/youtube_extension/backend/code_generator.py. To avoid code duplication and ensure consistency, you should move the more robust version from code_generator.py into a shared utility module and use it in both places. The other implementation correctly uses urllib.parse to handle various URL formats more reliably.

Copilot

Pull request overview

Ports previously authored (pre-archive) functionality into the current codebase, adding cloud-native processing components and Gemini Vision “multimodal ingestion”, while also improving project/repo naming uniqueness during generation/deployment.

Changes:

Add Gemini Vision visual-context extraction (schema + processor integration) and related documentation/tests.
Introduce cloud-native processing modules (Firestore state, Cloud Tasks queue, Vertex AI agent service) plus Cloud Run setup artifacts.
Fix duplicate repo generation by switching to UUID-based suffixes and improve generated project titles using YouTube video IDs.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
tests/test_gemini_vision_integration.py	Adds schema + integration tests for visual-context ingestion.
tests/test_firestore_state.py	Adds unit tests for the Firestore state service.
src/youtube_extension/videopack/schema.py	Extends VideoPack schema with visual context models/fields.
src/youtube_extension/services/skill_builder.py	Adds a deployment “learning system” for persisting lessons/weights.
src/youtube_extension/services/cloud/vertex_ai_agent.py	Adds a Vertex AI “agent” wrapper (text/video/transcript + embeddings).
src/youtube_extension/services/cloud/firestore_state.py	Adds Firestore-backed state management with optional local caching.
src/youtube_extension/services/cloud/cloud_video_processor.py	Adds a cloud-oriented orchestrator for queued/sync processing (currently with placeholders).
src/youtube_extension/services/cloud/cloud_tasks_queue.py	Adds Cloud Tasks enqueueing and queue operations.
src/youtube_extension/services/cloud/init.py	Exposes cloud service APIs from the cloud package.
src/youtube_extension/services/cloud/README.md	Documents cloud services usage and setup.
src/youtube_extension/services/ai/gemini_service.py	Updates Gemini service implementation and adds frame extraction + frame analysis helpers.
src/youtube_extension/backend/enhanced_video_processor.py	Integrates Gemini Vision visual-context extraction into the enhanced pipeline and markdown output.
src/youtube_extension/backend/deployment_manager.py	Fixes repo-name collisions by using UUID suffixes.
src/youtube_extension/backend/code_generator.py	Improves project naming by deriving titles from video IDs when metadata/title missing.
src/youtube_extension/backend/cloud_api_endpoints.py	Adds (opt-in) FastAPI endpoints for cloud-native processing and Cloud Tasks callbacks.
infrastructure/cloudrun/setup.sh	Adds GCP infra bootstrap script (APIs, SA, Firestore, Tasks, secrets).
examples/cloud_services_example.py	Adds example usage for the new cloud services modules.
docs/cloud-native-architecture.md	Adds end-to-end cloud-native architecture documentation.
docs/GEMINI_VISION_INTEGRATION.md	Adds Gemini Vision integration documentation and usage details.
Dockerfile.cloudrun	Adds a Cloud Run-oriented Dockerfile for deploying the backend with cloud extras.

tests/test_firestore_state.py

Copilot · 2026-03-20T19:40:08Z

src/youtube_extension/services/cloud/cloud_tasks_queue.py

+        task = tasks_v2.Task(
+            http_request=tasks_v2.HttpRequest(
+                http_method=tasks_v2.HttpMethod.POST,
+                url=f"{self.service_url}/api/v2/process-video-task",
+                headers={
+                    "Content-Type": "application/json",
+                },
+                body=video_task.to_json().encode(),
+            )


The Cloud Tasks HTTP target is configured to call /api/v2/process-video-task, but the newly added task handler endpoint is /api/v3/process-video-task. This mismatch will cause all queued tasks to hit a 404. Update the task URL to the actual handler path (or align the handler route to what the queue sends).

@copilot The Cloud Tasks HTTP target is configured to call /api/v2/process-video-task, but the newly added task handler endpoint is /api/v3/process-video-task. This mismatch will cause all queued tasks to hit a 404. Update the task URL to the actual handler path (or align the handler route to what the queue sends).

Copilot · 2026-03-20T19:40:08Z

src/youtube_extension/services/cloud/vertex_ai_agent.py

+        self.project_id = project_id or os.getenv('GOOGLE_CLOUD_PROJECT')
+        self.location = location
+        self.agent_config = agent_config or AgentConfig()
+
+        # Initialize Vertex AI
+        vertexai.init(project=self.project_id, location=self.location)
+


If GOOGLE_CLOUD_PROJECT is not set and no project_id is passed, self.project_id can be None, and vertexai.init(project=None, ...) will fail with a less clear error later. Validate project_id during initialization and raise a clear ValueError (or default from env) so misconfiguration is caught early.