Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inefficiencies in auto-linking code #16506

Merged
merged 1 commit into from
Jul 15, 2021

Conversation

ClearlyClaire
Copy link
Contributor

The auto-linking code basically rewrote the whole string escaping non-ascii characters in an inefficient way, and building a full character offset map between the unescaped and escaped texts before sending the contents to TwitterText's extractor.

Instead of doing that, this commit changes the TwitterText regexps to include valid IRI characters in addition to valid URI characters.

Performance yields

Methodology

Performances were evaluated by timing the rendering of a sample toot 100000 times on my development machine. Memory impact was evaluated using MemoryProfiler.

Example on old code
[mastodon@mastodev live]$ bundle exec rails c
Chewy console strategy is `urgent`
Loading production environment (Rails 6.1.4)
irb(main):001:0> s = Status.first
=> 
#<Status:0x0000564fdf4273d8
...
irb(main):002:0> s.text
=> "This is a test toot with a few #hashtags, emojis 🤔  and links, some including non-ascii characters and others not:\n- https://joinmastodon.org/\n- https://github.com/mastodon/mastodon\n- https://nic.みんな/\n#mastodev"
irb(main):003:0> Formatter.instance.format(s)
=> "<p>This is a test toot with a few <a href=\"https://mastodev.sitedethib.com/tags/hashtags\" class=\"mention hashtag\" rel=\"tag\">#<span>hashtags</span></a>, emojis 🤔  and links, some including non-ascii characters and others not:<br />- <a href=\"https://joinmastodon.org/\" rel=\"nofollow noopener noreferrer\" target=\"_blank\"><span class=\"invisible\">https://</span><span class=\"\">joinmastodon.org/</span><span class=\"invisible\"></span></a><br />- <a href=\"https://github.com/mastodon/mastodon\" rel=\"nofollow noopener noreferrer\" target=\"_blank\"><span class=\"invisible\">https://</span><span class=\"\">github.com/mastodon/mastodon</span><span class=\"invisible\"></span></a><br />- <a href=\"https://nic.みんな/\" rel=\"nofollow noopener noreferrer\" target=\"_blank\"><span class=\"invisible\">https://</span><span class=\"\">nic.みんな/</span><span class=\"invisible\"></span></a><br /><a href=\"https://mastodev.sitedethib.com/tags/mastodev\" class=\"mention hashtag\" rel=\"tag\">#<span>mastodev</span></a></p>"
irb(main):004:0> puts Benchmark.measure { for i in 1..100000; Formatter.instance.format(s); end }
114.277594   0.005209 114.282803 (114.291816)
=> nil
irb(main):005:0> require 'memory_profiler'
=> true
irb(main):007:0> MemoryProfiler.report { Formatter.instance.format(s) }.pretty_print
Total allocated: 140218 bytes (1637 objects)
Total retained:  6011 bytes (14 objects)

allocated memory by gem
-----------------------------------
     70949  htmlentities-4.3.4
     21833  live/app
     14025  activesupport-6.1.4
     12703  addressable-2.8.0
     10693  twitter-text-3.1.0
      5984  actionpack-6.1.4
      2831  actionview-6.1.4
      1080  activemodel-6.1.4
       120  other
[…]

Results

Timing

user system total (real)
old code 114.277594s 0.005209s 114.282803s 114.291816s
this PR 103.606800s 0.001257s 103.608057s 103.615692s

Memory allocations

total allocated total retained
old code 140218 bytes (1637 objects) 6011 bytes (14 objects)
this PR 117140 bytes (1208 objects) 5724 bytes (13 objects)

The auto-linking code basically rewrote the whole string escaping non-ascii
characters in an inefficient way, and building a full character offset map
between the unescaped and escaped texts before sending the contents to
TwitterText's extractor.

Instead of doing that, this commit changes the TwitterText regexps to include
valid IRI characters in addition to valid URI characters.
@ClearlyClaire ClearlyClaire marked this pull request as ready for review July 15, 2021 13:33
@Gargron Gargron merged commit 211d5c3 into mastodon:main Jul 15, 2021
Gargron pushed a commit that referenced this pull request Nov 5, 2021
The auto-linking code basically rewrote the whole string escaping non-ascii
characters in an inefficient way, and building a full character offset map
between the unescaped and escaped texts before sending the contents to
TwitterText's extractor.

Instead of doing that, this commit changes the TwitterText regexps to include
valid IRI characters in addition to valid URI characters.
kadoshita pushed a commit to kadoshita/mastodon that referenced this pull request Nov 6, 2021
The auto-linking code basically rewrote the whole string escaping non-ascii
characters in an inefficient way, and building a full character offset map
between the unescaped and escaped texts before sending the contents to
TwitterText's extractor.

Instead of doing that, this commit changes the TwitterText regexps to include
valid IRI characters in addition to valid URI characters.
kadoshita pushed a commit to kadoshita/mastodon that referenced this pull request Nov 7, 2021
The auto-linking code basically rewrote the whole string escaping non-ascii
characters in an inefficient way, and building a full character offset map
between the unescaped and escaped texts before sending the contents to
TwitterText's extractor.

Instead of doing that, this commit changes the TwitterText regexps to include
valid IRI characters in addition to valid URI characters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants