Skip to content

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

  • use unpdf instead of pdf-parse

Type of Change

  • New feature

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Nov 15, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
docs Ready Ready Preview Comment Nov 15, 2025 6:33am

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 15, 2025

Greptile Overview

Greptile Summary

Replaced pdf-parse library with unpdf for PDF text extraction, simplifying the API and reducing complexity.

  • Switched from class-based PDFParse instantiation to functional getDocumentProxy and extractText calls
  • Removed info and version metadata fields from return value (no downstream dependencies found)
  • Updated configuration files (next.config.ts, trigger.config.ts) to reference new library
  • Maintained backward compatibility with FileParseResult interface structure

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - it's a straightforward dependency swap
  • Score reflects clean library migration with proper configuration updates. Minor concern about removed metadata fields (info, version) - while no current code uses them, this could affect future debugging or analytics needs. Manual testing was performed but automated tests weren't updated to verify the new implementation.
  • Check apps/sim/lib/file-parsers/pdf-parser.ts - verify that removing info and version metadata won't impact any analytics or debugging workflows

Important Files Changed

File Analysis

Filename Score Overview
apps/sim/lib/file-parsers/pdf-parser.ts 4/5 Switched from pdf-parse to unpdf library, simplified API calls, removed info and version metadata fields
apps/sim/package.json 5/5 Removed pdf-parse and @types/pdf-parse dependencies, added unpdf 1.4.0
apps/sim/next.config.ts 5/5 Updated serverExternalPackages from pdf-parse to unpdf
apps/sim/trigger.config.ts 5/5 Updated build configuration to include unpdf instead of pdf-parse in additional packages

Sequence Diagram

sequenceDiagram
    participant Client
    participant PdfParser
    participant unpdf
    participant Buffer

    Client->>PdfParser: parseFile(filePath)
    PdfParser->>Buffer: readFile(filePath)
    Buffer-->>PdfParser: dataBuffer
    PdfParser->>PdfParser: parseBuffer(dataBuffer)
    PdfParser->>Buffer: new Uint8Array(dataBuffer)
    Buffer-->>PdfParser: uint8Array
    PdfParser->>unpdf: getDocumentProxy(uint8Array)
    unpdf-->>PdfParser: pdf
    PdfParser->>unpdf: extractText(pdf, {mergePages: true})
    unpdf-->>PdfParser: {totalPages, text}
    PdfParser->>PdfParser: cleanContent = text.replace(/\u0000/g, '')
    PdfParser-->>Client: {content, metadata: {pageCount, source}}
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1 waleedlatif1 merged commit d076750 into staging Nov 15, 2025
9 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/pdfs branch November 15, 2025 06:39
waleedlatif1 added a commit that referenced this pull request Nov 17, 2025
* test(pr): hackathon (#1999)

* test(pr): github trigger (#2000)

* fix(usage-indicator): conditional rendering, upgrade, and ui/ux (#2001)

* fix: usage-limit indicator and render conditonally on is billing enabled

* fix: upgrade render

* fix(notes): fix notes, tighten spacing, update deprecated zustand function, update use mention data to ignore block positon (#2002)

* fix(pdfs): use unpdf instead of pdf-parse (#2004)

* fix(modals): fix z-index for various modals and output selector and variables (#2005)

* fix(condition): treat condition input the same as the code subblock (#2006)

* feat(models): added gpt-5.1 (#2007)

* improvement: runpath edges, blocks, active (#2008)

* feat(i18n): update translations (#2009)

* fix(triggers): check triggermode and consolidate block type (#2011)

* fix(triggers): disabled trigger shouldn't be added to dag (#2012)

* Fix disabled blocks

* Comments

* Fix api/chat trigger not found message

* fix(tags): only show start block upstream if is ancestor (#2013)

* fix(variables): Fix resolution on double < (#2016)

* Fix variable <>

* Ling

* Clean

* feat(billing): add notif for first failed payment, added upgrade email from free, updated providers that supported granular tool control to support them, fixed envvar popover, fixed redirect to wrong workspace after oauth connect (#2015)

* feat(billing): add notif for first failed payment, added upgrade email from free, updated providers that supported granular tool control to support them, fixed envvar popover, fixed redirect to wrong workspace after oauth connect

* fix build

* ack PR comments

* feat(performance): added reactquery hooks for workflow operations, for logs, fixed logs reloading, fix subscription UI (#2017)

* feat(performance): added reactquery hooks for workflow operations, for logs, fixed logs reloading, fix subscription UI

* use useInfiniteQuery for logs fetching

* fix(copilot): run workflow supports input format and fix run id (#2018)

* fix(router): fix error edge in router block + fix source handle problem (#2019)

* Fix router block error port handling

* Remove comment

* Fix edge execution

* improvement: code subblock, action bar, connections (#2024)

* improvement: action bar, connections

* fix: code block draggable resize

* fix(response): fix response block http format (#2027)

* Fix response block

* Lint

* fix(notes): fix notes block spacing, additional logs for billing transfer route (#2029)

---------

Co-authored-by: Vikhyath Mondreti <vikhyathvikku@gmail.com>
Co-authored-by: Emir Karabeg <78010029+emir-karabeg@users.noreply.github.com>
Co-authored-by: Siddharth Ganesan <33737564+Sg312@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants