Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proper GeLU on CPU #441

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Use proper GeLU on CPU #441

wants to merge 1 commit into from

Commits on May 23, 2024

  1. Use proper GeLU on CPU

    This change removes the tanh GeLU approximation. This gives us the
    benefit of better accuracy, roughly equal perf and strict standard
    conformance, since we no longer need any compiler-specific tricks.
    
    Here's the last lines of train_gpt2 output before this change:
    
        step 37: train loss 3.739647 (took 598.548076 ms)
        step 38: train loss 4.611735 (took 596.626145 ms)
        step 39: train loss 3.970751 (took 598.439552 ms)
        val loss 4.016658
        generating:
        ---
        Come Running Away,
        Greater conquer
        With the Imperial blood
        the heaviest host of the gods
        into this wondrous world beyond.
        I will not back thee, for how sweet after birth
        Netflix against repounder,
        will not
        flourish against the earlocks of
        Allay
        ---
        step 40: train loss 4.377756 (took 592.704936 ms)
    
    Here's the last lines of train_gpt2 output after this change:
    
        step 37: train loss 3.731596 (took 594.893995 ms)
        step 38: train loss 4.561646 (took 600.064035 ms)
        step 39: train loss 3.933512 (took 599.666173 ms)
        val loss 4.014135
        generating:
        ---
        Whether Hipocrates,
        Bigon Nicinius, or rep'd
        With Thy fair winter-tail your outraged hand,
        The richness of the good smour
        Nine years by turns covered my Member. Thou art
        Nay, I fear be; but
        Lets o' thee know, if it
        ---
        step 40: train loss 4.358461 (took 597.594065 ms)
    
    This change has the disadvantage of diverging from PyTorch. I view
    this as being justified and worthwhile, for numerous reasons, e.g.
    
      "I used the tanh approximation simply because the error function
       erf was slow in tensorflow some years ago. If the exact version
       is fast enough now and does not have numerical issues, I do not
       see a reason to use an inexact version."  ──Quoth Dan Hendrycks
    
    See pytorch/pytorch#39853
    jart committed May 23, 2024
    Configuration menu
    Copy the full SHA
    34e2a98 View commit details
    Browse the repository at this point in the history