Developer's "crazy ideas" and TODO checklist for the gory details. #296
Labels
🐛bug
Something isn't working
🕵code review
When the issue popped up due to code review or when (larger) code review is required.
📖documentation
Improvements or additions to documentation
🦸♀️enhancement🦸♂️
New feature or request
👀FYI only
🧑🤝🧑help wanted🧑🤝🧑
Extra attention is needed.
🕵investigate
Needs further analysis to find the root cause.
⛷performance
Anything that's related to UX: speed of response; I/O speed, etc.
🕵TLC
Needs some special attention
Milestone
This particular issue is For Developers Only
Please file a fresh, new issue if you see something you want to request as a feature or report a bug on or simply talk about.
Copy&paste the bit of text that's relevant, if you want.
The notorious "Mother Of All PRs" for Qiqqa. (Folks who've worked with me before, will know the feeling. 😉🤯 )
Why?
Because I don't want to swamp the issue tracker with the stuff I note, think about, or otherwise need to remind myself about at some later point in time, where my brain very probably has already given up attempting to track and manage.
a.k.a. "Notes To Self"
Observed Crappiness?
antique--> filed as several PDFs caused Qiqqa to run indefinitely after closing it #305 for it happens almost every day now with my test repo 😩 😭pdfdraw.exe
still locks up on some particular 'evil PDFs'--> work is done in MuPDF repo (
mutool multipurp
):multipurp
tool is created to extract a metric ton of PDF metadata, including outlines, annotations and attachments and dump all that to JSON, so we can easily go through this stuff picking up what we want/need at that particular mo.\LaTeX
and\kern
TeX macros at least get their leading chars converted to Unicode, and that's plain wrong.fingerprint:HASH
(e.g.fingerprint:20359B18C8D6AC93F836962526FDC306118486C
) doesn't work. Would be handy debug/diag tool, while help screen says fingerprint is recognized as a field. Well... (Also does not work in global search. Obviously.)Crazy Ideas To Try?
mmap()
is superb, but for this cross-platform stuff we'ld better stick to named pipes or localhost TCP loopback: the latter being the most generic while pipes would be great, but at least named pipes on Windows are visible outside the machine, thus posing a security risk when I don't do something smart about it: https://docs.microsoft.com/en-us/windows/win32/ipc/named-pipes (see the bit there aboutNT AUTHORITY\NETWORK
.be prepared to kill the bugger or expect the bastard to lock up or crash due to nasty PDF inputs once every couple of documents: feed the entire evil library through it, and then everything else you can grab off the Net.
Reminder: the SynFusion libs b0rked out with a HUGE memory leakage just today, and that was only because Qiqqa was doing a bit of annotations extraction via that one today. We got rid of SORAX, but SynFusion is on its way out too. 💢
-[x] -->
mutool multipurp
is a new tool in the mupdf palette, derived offmutool info
and themupdf
gui app.multipurp
dumps all available metadata for the PDF in JSON format. This includes attachments, annotations, etc.see if I should revive my old mongoose clone for this, or grab another light, embeddable C/C++ web server that can do JSON and a whole lot more.
localhost
for as long as Qiqqa is alive: started and stopped by Qiqqa, preferably.process.exit()
will immediately terminate any pending promises!to disk (persistence)
to database (persistence in database records)
Keep in mind that we're considering NOT having everything in memory, i.e. query database on demand! This would benefit from faster data I/O. Nevertheless, for diagnostic purposes, it might be best to stay with a human-readable format such as JSON. Otherwise, see below for binary protocols (FlatBuffers, etc.)
between processes (NOT PERSISTED.)
This concerns processes where we have reasonable/full control of both ends: PDF processor, frontend (+ business logic layer? or do we have that one separate? If we use Chromely or
electron
, you're transferring data as message between backend layer (C#/node) and browser/UI frontend anyway, so there is another interface comms layer, however you turn it)Processes where we DO NOT have full control (or don't want to patch one side to gain control) will generally speak JSON (or XML?): SOLR//ES
msgpack(nah...)Just some stuff about FlatBuffers:
electron
orNW.js
(here's some reasons why I would ride that one rather than electron for something like Qiqqa -- shoot! browser crash & links lost! 😭 Anyway, older: http://my2iu.blogspot.com/2017/06/nwjs-vs-electron.html and google stats: https://trends.google.com/trends/explore?q=nwjs,Electron%20js,Chromely,nw.js) orChromely
?nodejs
, backend layer is C#. Which is fine and might help me moving as the overall "business glue" logic wouldn't need to be rewritten from scratch. That might save a bundle.fravia
back in the day.)scandir()
tek for the Watch Directories: that's an optimized glob which supports.gitignore
, etc. in there!The text was updated successfully, but these errors were encountered: