-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SwiGLU optimized fw/bw #490
Commits on Oct 24, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 069405e - Browse repository at this point
Copy the full SHA 069405eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4b317c6 - Browse repository at this point
Copy the full SHA 4b317c6View commit details
Commits on Oct 25, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 11bad90 - Browse repository at this point
Copy the full SHA 11bad90View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8b2f688 - Browse repository at this point
Copy the full SHA 8b2f688View commit details -
Configuration menu - View commit details
-
Copy full SHA for e1609de - Browse repository at this point
Copy the full SHA e1609deView commit details -
Configuration menu - View commit details
-
Copy full SHA for 30ca17c - Browse repository at this point
Copy the full SHA 30ca17cView commit details -
Configuration menu - View commit details
-
Copy full SHA for eb9c553 - Browse repository at this point
Copy the full SHA eb9c553View commit details -
Configuration menu - View commit details
-
Copy full SHA for ed2b7c2 - Browse repository at this point
Copy the full SHA ed2b7c2View commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 25, 2022 Configuration menu - View commit details
-
Copy full SHA for e758435 - Browse repository at this point
Copy the full SHA e758435View commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 25, 2022 Configuration menu - View commit details
-
Copy full SHA for 3207254 - Browse repository at this point
Copy the full SHA 3207254View commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 25, 2022 Configuration menu - View commit details
-
Copy full SHA for dbf6092 - Browse repository at this point
Copy the full SHA dbf6092View commit details
Commits on Oct 26, 2022
-
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 26, 2022 Configuration menu - View commit details
-
Copy full SHA for acdf239 - Browse repository at this point
Copy the full SHA acdf239View commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 26, 2022 Configuration menu - View commit details
-
Copy full SHA for bbdc00e - Browse repository at this point
Copy the full SHA bbdc00eView commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 26, 2022 Configuration menu - View commit details
-
Copy full SHA for 5fe54aa - Browse repository at this point
Copy the full SHA 5fe54aaView commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 26, 2022 Configuration menu - View commit details
-
Copy full SHA for 44a6fbf - Browse repository at this point
Copy the full SHA 44a6fbfView commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 26, 2022 Configuration menu - View commit details
-
Copy full SHA for d3e3089 - Browse repository at this point
Copy the full SHA d3e3089View commit details
Commits on Oct 27, 2022
-
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 27, 2022 Configuration menu - View commit details
-
Copy full SHA for db5770d - Browse repository at this point
Copy the full SHA db5770dView commit details
Commits on Oct 28, 2022
-
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 4c2bfdc - Browse repository at this point
Copy the full SHA 4c2bfdcView commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for d2d0187 - Browse repository at this point
Copy the full SHA d2d0187View commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for e2d97d2 - Browse repository at this point
Copy the full SHA e2d97d2View commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 7224112 - Browse repository at this point
Copy the full SHA 7224112View commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 06c1487 - Browse repository at this point
Copy the full SHA 06c1487View commit details -
Update on "SwiGLU optimized fw/bw"
**USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 783a2ff - Browse repository at this point
Copy the full SHA 783a2ffView commit details -
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 69e299f - Browse repository at this point
Copy the full SHA 69e299fView commit details -
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for f6e2ceb - Browse repository at this point
Copy the full SHA f6e2cebView commit details -
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 538d05c - Browse repository at this point
Copy the full SHA 538d05cView commit details -
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for 0ab305f - Browse repository at this point
Copy the full SHA 0ab305fView commit details -
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedOct 28, 2022 Configuration menu - View commit details
-
Copy full SHA for c67a0ad - Browse repository at this point
Copy the full SHA c67a0adView commit details
Commits on Oct 31, 2022
-
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3 op=xops.SwiGLUPackedFusedOp) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedOct 31, 2022 Configuration menu - View commit details
-
Copy full SHA for a77aeec - Browse repository at this point
Copy the full SHA a77aeecView commit details -
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedOct 31, 2022 Configuration menu - View commit details
-
Copy full SHA for 4b600bf - Browse repository at this point
Copy the full SHA 4b600bfView commit details
Commits on Nov 3, 2022
-
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedNov 3, 2022 Configuration menu - View commit details
-
Copy full SHA for dd6a285 - Browse repository at this point
Copy the full SHA dd6a285View commit details
Commits on Nov 4, 2022
-
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedNov 4, 2022 Configuration menu - View commit details
-
Copy full SHA for d825314 - Browse repository at this point
Copy the full SHA d825314View commit details -
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedNov 4, 2022 Configuration menu - View commit details
-
Copy full SHA for e2bfbb2 - Browse repository at this point
Copy the full SHA e2bfbb2View commit details -
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedNov 4, 2022 Configuration menu - View commit details
-
Copy full SHA for 07135b8 - Browse repository at this point
Copy the full SHA 07135b8View commit details
Commits on Nov 7, 2022
-
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedNov 7, 2022 Configuration menu - View commit details
-
Copy full SHA for 3490242 - Browse repository at this point
Copy the full SHA 3490242View commit details
Commits on Nov 10, 2022
-
Update on "SwiGLU optimized fw/bw"
**NOTE** We can improve a bit more once this is fixed - NVIDIA/cutlass#674 **USAGE** ```python import xformers.ops as xops # NOTE: Important to use `unbind` from xformers for the bw pass! w1, w2 = xops.unbind( w1w2.view([2, w1w2.shape[0] // 2, w1w2.shape[1]]), dim=0, ) b1, b2 = xops.unbind(b1b2.view([2, b1b2.shape[0] // 2]), dim=0) y = xops.functional_swiglu(x, w1, b1, w2, b2, w3, b3) ``` **PERFORMANCE (A100 only)** *FW* ``` [-------------------------------------------------------- swiglu_fw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 1377.7 | 1581.4 | 1339.1 f16.ac B=9456, I=1536, H=4096 | 1449.3 | 1735.3 | 1462.9 f16 B=4440, I=1536, H=4096 | 600.4 | 735.6 | 593.9 f16.ac B=4440, I=1536, H=4096 | 709.0 | 843.7 | 717.6 f16 B=4728, I=1536, H=4096 | 638.9 | 776.2 | 635.3 f16.ac B=4728, I=1536, H=4096 | 748.9 | 892.2 | 756.7 f16 B=4728, I=1536, H=1024 | 162.3 | 201.5 | 163.1 f16.ac B=4728, I=1536, H=1024 | 235.2 | 277.4 | 245.5 Times are in microseconds (us). ``` *BW* ``` [-------------------------------------------------------- swiglu_bw ---------------------------------------------------------] | SwiGLUPackedFusedOp[fused.p.cpp] | eager | SwiGLUFusedOp[fused] 1 threads: ------------------------------------------------------------------------------------------------------------------- f16 B=9456, I=1536, H=4096 | 2333.1 | 2696.7 | 2336.1 f16.ac B=9456, I=1536, H=4096 | 2620.8 | 2990.9 | 2840.0 f16 B=4440, I=1536, H=4096 | 1243.2 | 1413.8 | 1240.3 f16.ac B=4440, I=1536, H=4096 | 1448.6 | 1629.0 | 1637.3 f16 B=4728, I=1536, H=4096 | 1298.4 | 1481.5 | 1301.1 f16.ac B=4728, I=1536, H=4096 | 1511.8 | 1705.3 | 1705.4 f16 B=4728, I=1536, H=1024 | 463.3 | 493.9 | 463.0 f16.ac B=4728, I=1536, H=1024 | 582.4 | 614.9 | 672.7 Times are in microseconds (us). ``` [ghstack-poisoned]
danthe3rd committedNov 10, 2022 Configuration menu - View commit details
-
Copy full SHA for a90fe49 - Browse repository at this point
Copy the full SHA a90fe49View commit details