In the paper by [OpenAI](https://openai.com/) over the massive [new NLP machine learning model GPT3](https://arxiv.org/abs/2005.14165), with a whopping  $175 \cdot 10^9$ parameters, hints at the energy requirements for training the model in a paragraph of the paper.
    
    "Practical large-scale pre-training requires large amounts of computation, which is energy-intensive: training the GPT-3 175B consumed several thousand petaflop/s-days of compute during pre-training"

With this rough estimate and some public information on (best-case) energy efficiency for large scale computation, I tried to estimate the actual energy requirement using `julia` as unitful caclulator. The result of that can be seen below.

In [72]:
using Unitful
Unitful.register(@__MODULE__);
using Unitful.DefaultSymbols #default SI units
using Unitful:d #days
using Unitful:hr #hours
@unit FLOPS "FLOPS" FLoatingPointOpPerSecond 1/s true #true gives us all the SI-prefixes.
@unit Wh "Wh" WattHour 1*W*hr true

# Practical large-scale pre-training requires large amounts of computation, which is energy-intensive: training the GPT-3
# 175B consumed several thousand petaflop/s-days of compute during pre-training
# Given from paper:
# Low estimate
PFLOPSdays = 2000PFLOPS*1d

# Best case performance per watt
# https://en.wikipedia.org/wiki/Performance_per_watt
GFLOPSPerWatt = 16.876GFLOPS/W
SummitEquiv = 14.668GFLOPS/W

# Calculate the total Wh required.
function totalWh(PFLOPSdays,GFLOPSPerWatt; doprint=false)
	PetaFLOPSHours = uconvert(PFLOPS*hr,PFLOPSdays)
    doprint ? println(PetaFLOPSHours) : nothing
	PFLOPSPerWatt = uconvert(PFLOPS/W,GFLOPSPerWatt)
    doprint ? println(PFLOPSPerWatt) : nothing
	return uconvert(GWh, PetaFLOPSHours / PFLOPSPerWatt)
end

println("Best case efficiency:")
totWh2k = totalWh(PFLOPSdays,GFLOPSPerWatt,doprint=true)
println(totWh2k)

println("Summit equivalent efficiency:")

totWh2kSummit = totalWh(PFLOPSdays,SummitEquiv,doprint=true)
println(totWh2kSummit)

Best case efficiency:
48000 PFLOPS hr
1.6876e-5 PFLOPS W⁻¹
2.8442758947617923 GWh
Summit equivalent efficiency:
48000 PFLOPS hr
1.4668e-5 PFLOPS W⁻¹
3.27242977911099 GWh


So even **in the best case energy efficiency this model needed almost 3GWh to be trained**. For comparison that's the energy output of an average nuclear power plant fully dedicated to training this model for 3 hours.