Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegularParser fails with memory exhausted errors #121

Open
ViliusS opened this issue Apr 18, 2024 · 0 comments
Open

RegularParser fails with memory exhausted errors #121

ViliusS opened this issue Apr 18, 2024 · 0 comments

Comments

@ViliusS
Copy link

ViliusS commented Apr 18, 2024

I have two cases where shortcode-core plugins fails when generating search index for tntsearch plugin.

sh-5.1$ bin/plugin tntsearch index

Re-indexing

PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php on line 339

Based on previous issues in #53 and the code in thunderer/Shortcode#71 I have prepared two reproducible test cases https://1drv.ms/f/s!AgnMn-haWyFrkcN50beuEO4A0m6PQw?e=GjwcEV

Test case 1 - HTML content
Test case 2 - Markdown content

From the Xdebug traces provided you will see that preg_match_all() statement in Thunderer Shortcode library uses almost 30MB for test case 1 parsing. For test case 2 it is almost 80MB!

I'm not sure why in one case TNT Search command line is parsing our page as HTML, but in other case it is parsing it as Markdown. All of our pages are stored as Markdown files on disk. Maybe it is something to do with HTML cache.
Anyway, this is what I see in full Xdebug trace when running "bin/plugin tntsearch index", so minimal reproducible cases in ZIP files are prepared accordingly.

Snippet from full Xdebug session of HTML page parsing:

   34.0595   96308304                                                         -> preg_match_all($pattern = '~((?<string>\\\\.|(?:(?!\\[|\\]|\\/|\\=|\\"|\\s+).)+)|(?<ws>\\s+)|(?<marker>\\/)|(?<delimiter>\\")|(?<separator>\\=)|(?<open>\\[)|(?<close>\\]))~us', $subject = '<table>\n<thead>\n<tr>\n<th>Užduotis</th>\n<th>Aprašymas</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><a href="https://gidas.rivile.lt/rivile_akademija/rivile_gama/x_pamoka/darbuotoju_aprasymas_avanso_ismokejimas#1-uzduotis" target="_blank" rel="nofollow noopener noreferrer" class="external-link no-image">1 užduotis</a></td>\n<td>Kalendorius. Naujo kalendoriaus sukūrimas ir pildymas.<br/>Prieššventinės dienos sutrumpinimas</td>\n</tr>\n<tr>\n<td><a href="https://gidas.rivile.lt/rivile_akademija/rivile_gama/x_pamoka/'..., $matches = NULL, $flags = 258) /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:339
   34.0672  126197600                                                         -> preg_last_error() /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:340
  

Snippet from full Xdebug session of Markdown page parsing:

   21.4159   83613968                                                           -> preg_match_all($pattern = '~((?<string>\\\\.|(?:(?!\\[|\\]|\\/|\\=|\\"|\\s+).)+)|(?<ws>\\s+)|(?<marker>\\/)|(?<delimiter>\\")|(?<separator>\\=)|(?<open>\\[)|(?<close>\\]))~us', $subject = '| Kodas                                                        | Pavadinimas                                               |\n| ------------------------------------------------------------ | --------------------------------------------------------- |\n| [I01_DKZR](#i01_dkzr-dk-žurnalų-sąrašas)                     | DK žurnalų sąrašas                                        |\n| [I02_DKH](#i02_dkh-dk-hederis)                               | DK hederis                                                |\n| [I'..., $matches = NULL, $flags = 258) /opt/app-root/src/user/plugins/shortcode-core/vendor/thunderer/shortcode/src/Parser/RegularParser.php:339
   21.4462  131903672
TRACE END   [2024-04-18 08:57:42.623960]

Sadly full Xdebug trace is very big so it would be difficult to share it.

I hope it is enough information to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant