Skip to content

feat: Discord ETL — messages, reactions, voice & moderation#197

Merged
gvieira18 merged 9 commits into
4.1from
feat/discord-etl-messages
May 1, 2026
Merged

feat: Discord ETL — messages, reactions, voice & moderation#197
gvieira18 merged 9 commits into
4.1from
feat/discord-etl-messages

Conversation

@danielhe4rt
Copy link
Copy Markdown
Contributor

@danielhe4rt danielhe4rt commented Apr 17, 2026

Summary

  • Imports 2.37M Discord messages from a full dump into the Activity module via php artisan discord:import-messages
  • Extracts reactions into polymorphic activity_reactions table (HasReactions trait reusable by any model)
  • Extracts ~9K voice events from Dyno bot embed logs into voice_messages
  • Extracts ~629 moderation events (bans, mutes, warns, kicks) from Dyno + heartdevs.com bot logs into moderation_events
  • All ETL DTOs implement toDatabase(array $extra) pattern — DTOs map their own fields, Actions only provide context
  • SourceBot enum for type-safe bot source identification (DB stays string, no cross-module dependency)

Architecture decisions

  • Polymorphic reactions: Reaction model with MorphTo in Activity module, HasReactions trait (follows Interaction/HasInteractions pattern)
  • toDatabase() on DTOs: Encapsulates field mapping + date parsing + computed fields (emoji_key, state mapping)
  • SourceBot enum: Lives in integration-discord, converted to string via ->value in toDatabase() — Activity module stays decoupled

New files (18 created, 3 modified)

  • 3 migrations (metadata+tenant_id on messages, moderation_events, activity_reactions)
  • 4 DTOs, 4 Actions, 1 Artisan command, 1 enum (SourceBot), 1 enum (ModerationType)
  • 2 models (Reaction, ModerationEvent), 1 trait (HasReactions)
  • 44 tests (107 assertions)

Test plan

  • php artisan test --compact --filter=ImportDiscordMessage — 44 tests, 107 assertions passing
  • vendor/bin/pint --dirty — code style clean
  • Run php artisan migrate on staging DB
  • Run php artisan discord:import-messages storage/app/private/discord-dump with real dump
  • Verify SELECT count(*) FROM messages — ~2.3M
  • Verify SELECT reactable_type, count(*) FROM activity_reactions GROUP BY 1 — all 'message'
  • Verify SELECT source_bot, type, count(*) FROM moderation_events GROUP BY 1,2 — ~629 events

Screenshot from 2026-05-01 18-19-48
image

danielhe4rt and others added 3 commits April 17, 2026 16:02
…eration events

Imports 2.37M messages from Discord dump into the Activity module with:
- Polymorphic reactions (activity_reactions table with MorphTo)
- toDatabase() pattern on all ETL DTOs for clean Action separation
- SourceBot enum for type-safe bot source identification
- Voice event extraction from Dyno embed logs
- Moderation event extraction from Dyno + heartdevs.com bot logs
…lexibility

- Removed `SourceBot` enum and replaced its usage with `botDiscordId` string attribute in `DiscordModerationEventDTO`.
- Updated associated logic and tests to adapt to this change.
- Modified database schema to replace `source_bot` with a foreign key `source_identity_id`.
- Enhanced idempotency and introduced unique name disambiguation for user handling in moderation events.
- Added new features to support filtering and limits during Discord message imports.
- Introduced multiple migrations for `messages`, `voice_messages`, `moderation_events`, and new tables like `message_mentions`, `message_threads`, `message_attachments`, and `message_embeds`.
- Added new unique indices and columns with JSONB support for enhanced query performance and data structure flexibility.
- Implemented a `DiscordMessageAdapter` to map Discord message types and features to canonical data formats.
- Created comprehensive unit tests to ensure correct behavior of the `DiscordMessageAdapter`.
- Expanded support for hierarchical message relationships and data enrichment in Discord integrations.
danielhe4rt and others added 6 commits May 1, 2026 16:57
Add 'truncate' target to clear laravel.log. Include optional
Makefile.local for user-specific targets that should not be tracked
(e.g., environment-bound import paths or queue helpers).
…l-fast

Replace Laravel Prompts single bar with dual-bar rendering using
Symfony console sections:
- Separate boxes for canals and chunks per canal
- Stats line throttled at 100ms within transaction
- Mini-batch by 100 DTOs (each commits independently)

Add fail-fast schema validation: command checks required columns on
'messages' table at startup and aborts with clear message if any are
missing. Prevents PostgreSQL transaction-aborted cascades caused by
running the command before migrations are applied.
…mmand

Apply same improvements from ImportDiscordMessagesCommand:
- Dual-bar rendering via Symfony console sections (chunks + profiles)
- Stats line throttled at 100ms within transaction
- Mini-batch by 100 profiles (each commits independently)
- Schema fail-fast on external_identities and users tables
- Disable query log to keep memory bounded on long runs
Copy link
Copy Markdown
Contributor

@thalesmengue thalesmengue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@YuriSouzaDev
Copy link
Copy Markdown
Contributor

LGTM

@gvieira18 gvieira18 merged commit b0e0f00 into 4.1 May 1, 2026
1 check passed
@gvieira18 gvieira18 deleted the feat/discord-etl-messages branch May 1, 2026 21:36
gvieira18 added a commit that referenced this pull request May 2, 2026
## Summary

Sincroniza o feature **Discord ETL** + **upgrade Laravel 12 → 13** da
branch `4.1` para `4.x`, descartando o tooling AI/MCP/Boost individual
que vieram bundled na branch original.

Como 4.x usa squash merges, não havia histórico granular dos 4 commits
originais — solução foi rebuilder via checkout seletivo + ajustes
manuais.

### Categorização dos 4 commits da 4.1

| Commit | Status | Files | Lines | Author |
|---|---|---|---|---|
| `4b4a136` Laravel 12→13 upgrade | KEEP (parcial) | 34 | +768 | Daniel
Reis |
| `84e26cf` Laravel Boost / AI tooling | MIXED — só pieces de Discord
profile | 44 | +5724 | kaster |
| `1449e80` wip console commands | KEEP | 6 | +1808 | Daniel Reis |
| `b0e0f00` Discord ETL feature | KEEP | 55 | +5014 | Daniel Reis |

## What's in

13 commits, +8433/−882 linhas em 94 arquivos:

| Commit | O que |
|---|---|
| `fcff50a` Discord ETL core | 9 migrations,
MembershipEvent/MessageAttachment/MessageEmbed/MessageMention/MessageThread/ModerationEvent/Reaction
models, HasReactions trait, SourceBot enum, 4 Actions, 4 DTOs,
DiscordMessageAdapter, ImportDiscordMessagesCommand, 116 ETL tests |
| `11e5a49` Discord profile ETL | ImportDiscordProfileAction + Command,
ConnectedAccountDTO, DiscordProfileDTO, IdentityProvider expandido com
24 cases (Spotify/Steam/Xbox/etc), ImportDiscordProfileTest |
| `3c31f3e` wip console commands | 6 comandos exploratórios:
discord:fetch-{members,profile,profiles}, discord:import-members,
discord:analyze-profiles, discord:community-report |
| `27d2e6c` Color::Dark → Color::Zinc | Bug do código 4.1 — Filament 5
não tem Color::Dark; crash em runtime quando EpicGames era renderizado |
| `4c5dbc4` ETL upsert + reply resolution | Bug do código 4.1 — Actions
usavam `create()` em vez de `updateOrCreate`, quebrando no unique index
`(tenant_id, provider_message_id)` na 2ª chamada. resolveReplyTargetId
não fazia fallback pra DB quando cache era miss |
| `1471958` PHPStan level 6 cleanup | Resolve 34 erros que vieram com o
transplante (return type narrowing, undefined property docblocks,
match.alwaysTrue, mixed!==null narrows, table-row casts em wip commands)
|
| `f23ffca` Laravel 12→13 upgrade + .gitignore align | composer bumps
(framework 12→13.7, tinker 2→3, backup 9→10), config/cache.php
serializable_classes, rector LARAVEL_130 set + skip 3 attribute
conversions (Fillable/Table/Appends), .gitignore alinhado com sycorax
(AI Agents section completa) |
| `95d75b6` Mantém laracord/bot-discord | Reverte remoção indevida —
fork `danielhe4rt/laracord-framework` + `tinker-zero` já suportam L13.
PR #195 body original mentia ao dizer que removeu |
| `762740a` Cleanup rector | Remove regras LARAVEL_130 redundantes do
`withSets`, drop `LARAVEL_FACTORIES` e outros sets que não rodavam |
| `9eff469` Style — object instantiation | Simplifica `(new
X)->method()` → `new X->method()` com PHP 8.4 syntax |
| `1141f2f` Merge `origin/4.x` | Traz PR #206 (admin panel module com
tenant-aware infrastructure) |
| `fe4f422` Untrack `.ai/mcp/mcp.json` | Config MCP do Laravel Boost com
path absoluto hardcoded — agora ignored pelo .gitignore |
| `a5b227f` Guidelines do sycorax | Adiciona
`.ai/guidelines/{filament,knowledge-base}.blade.php` (exception
`!.ai/guidelines/` no .gitignore) |
| `9a50d5a` Format .gitignore | Header da seção AI Agents corrigido |

## What's out

- **Refactor de 25 models** pra atributos PHP
`#[Fillable]`/`#[Table]`/`#[Appends]` — revertido pra `protected
$fillable` por consistência com 4.x (decisão do user). Rules
`FillablePropertyToFillableAttributeRector`,
`TablePropertyToTableAttributeRector`,
`AppendsPropertyToAppendsAttributeRector` skipped no rector
- **CSRF middleware rename** (`VerifyCsrfToken` →
`PreventRequestForgery`) nos PanelProviders — irrelevante pois 4.x
deletou todos esses providers via PR #203/#204 (admin virou módulo,
guest virou portal Livewire)
- **`.agents/skills/**`** (25 markdown skills do Laravel Boost)
- **`.mcp.json`, `boost.json`, `opencode.json`, `AGENTS.md`,
`CLAUDE.md`** (AI/MCP tooling individual)
- **Http Controllers/Requests removidos pelo PR #203**
(MessagesController, CreateMessageRequest, CreateVoiceMessageRequest) —
não re-adicionados
- **`tests/Feature/NewMessageTest.php`,
`tests/Feature/NewVoiceMessageTest.php`** — dependiam dos Http acima

## Sumário verificação cross-branch

Comparação byte-a-byte entre `sync/from-4.1-discord-etl`, `origin/4.x` e
`origin/4.1`:

| Categoria | Count | Status |
|---|---|---|
| Arquivos da 4.1 trazidos **idênticos** ao sync | **53** | ✅ bulk
checkout limpo (50 ETL + .gitignore + Makefile + BaseSeeder) |
| Arquivos da 4.1 com **diff intencional** no sync | **12** | ✅ 9 models
reverteram L13 attrs + 3 fixes nossos (Color, upsert profile,
upsert/reply message) |
| Arquivos da 4.1 **dropados** | **39** (era 69 — incluiu 25 model attrs
+ 5 PanelProviders + composer/cache/rector/L13 que agora trouxemos) | ✅
tudo no plano original |
| Arquivos no sync **fora da 4.1** | **2** | ✅ ActivityServiceProvider +
IntegrationDiscordServiceProvider em path 4.x (`src/`) |

## Co-authors

| Pessoa | Origem | Email usado |
|---|---|---|
| Daniel Reis | autor PR #195 (L13), #197 (ETL), commit wip |
`danielhe4rt@gmail.com` |
| kaster | autor commit Boost (parts de Discord profile vieram junto) |
`diogokaster@gmail.com` |
| thalesmengue | approved PR #197 |
`102062680+thalesmengue@users.noreply.github.com` |
| 1pride | approved PR #197 | `43507992+1pride@users.noreply.github.com`
|

## Test plan

- [x] `vendor/bin/pint --test --format agent` — pass
- [x] `vendor/bin/phpstan analyse --memory-limit=2G` — **0 errors**
(level 6)
- [x] `vendor/bin/rector --dry-run` — **0 changes**
- [x] `php artisan migrate` — todas 9 migrations aplicaram limpo
- [x] `php artisan test --compact` — **182 tests, 179 passed, 3 skipped,
0 failed** (608 assertions)
  - 70 ImportDiscordMessage tests ✅
  - 13 ImportDiscordProfile tests ✅
  - 46 DiscordMessageAdapter tests ✅
- [x] Laravel framework 13.7.0 instalado e funcional
- [ ] Smoke test `php artisan discord:import-messages` com dump real
- [ ] Smoke test `php artisan discord:import-profiles` com chunks JSON
- [ ] Smoke test backup com `spatie/laravel-backup` v10

## Notas técnicas

### CodeRabbit false positive — `SimpleStrategy.php`

CodeRabbit alertou que `app/Tasks/Cleanup/Strategies/SimpleStrategy.php`
usa classes removidas em `spatie/laravel-backup` v10. **False
positive:**

- `Spatie\Backup\BackupDestination\Backup` ✅ existe em v10.2.1
- `Spatie\Backup\BackupDestination\BackupCollection` ✅ existe em v10.2.1
- `CleanupStrategy::deleteOldBackups(BackupCollection)` ✅ assinatura
inalterada
- Sycorax (template) usa código idêntico byte-a-byte com backup v10

CodeRabbit confundiu mudança de **events** (que viraram primitive data
em v10) com a API de cleanup strategy (sem mudança).

### Estratégia técnica

Cherry-pick não funcionaria por causa do conflito com refactor
`#[Fillable]`/`#[Table]` em models que existem em ambas as branches.
Solução foi `git checkout origin/4.1 -- <paths>` por chunks, depois 4
merges manuais:

- `Message.php` — recriado com novos campos ETL mantendo `protected
$fillable`
- `Voice.php` — idem + comentário sobre coexistência de duas
vocabularies de `state`
- `ActivityServiceProvider.php` — adicionado morphMap mantendo path
`src/` (4.x não moveu pra `src/Providers/`)
- `IntegrationDiscordServiceProvider.php` — registrado novos commands no
path `src/` de 4.x

---------

Co-authored-by: Daniel Reis <danielhe4rt@gmail.com>
Co-authored-by: thalesmengue <102062680+thalesmengue@users.noreply.github.com>
Co-authored-by: 1pride <43507992+1pride@users.noreply.github.com>
Co-authored-by: kaster <diogokaster@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants