Custom RMSNorm kernel for LLama3-8B. I made this as a study for GPU programming, applying the custom kernel by hand. I pushed this repo initially to Huggingface Space.