<a href="https://colab.research.google.com/github/sweetpand/Deep_Learning_/blob/master/Instantaneous_Generation_using_%F0%9F%A4%97_Transformers_%26_Flax_on_TPU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fast auto-regressive language generation

Language model prompting is one of the 🔥 hottest topics in NLP lately. Language model prompting allows one to reformulate a variety of tasks as a simple auto-regressive generation problem (see Joe Davidson's [tweet](https://twitter.com/joeddav/status/1390731854907641863) for more information). 

The engine behind language model prompting is auto-regressive generation using a causal language model, like [GPT2](https://openai.com/blog/better-language-models/).

In this short colab, we will demonstrate how 🤗 Transformers' new `generate` function in Flax can be up to **8x faster** on TPU than its PyTorch implementation in 🤗 Transformers' on GPU. Flax's `generate` function can also easily be spread over multiple TPUs for parallel generation.

A detailed explanation of how Flax's generate function workswill follow in a more in-detail blog post.



Let's first install `transformers` and `flax`.

In [None]:
%%capture
!pip install transformers
!pip install flax

Next, let's set up this colab for TPU usage.

In [None]:
import jax.tools.colab_tpu
jax.tools.colab_tpu.setup_tpu()

We will need some helper functionality from the Flax & Jax libraries.

In [None]:
from flax.training.common_utils import shard
from flax.jax_utils import replicate, unreplicate
import jax
import pandas as pd
from IPython.display import display, HTML

Let's import the GPT2 model.

In [None]:
from transformers import FlaxGPT2LMHeadModel, GPT2TokenizerFast

To verify that we can use 8 TPU devices, you can run the following command.

In [None]:
jax.local_devices()

[TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0),
 TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1),
 TpuDevice(id=2, process_index=0, coords=(1,0,0), core_on_chip=0),
 TpuDevice(id=3, process_index=0, coords=(1,0,0), core_on_chip=1),
 TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0),
 TpuDevice(id=5, process_index=0, coords=(0,1,0), core_on_chip=1),
 TpuDevice(id=6, process_index=0, coords=(1,1,0), core_on_chip=0),
 TpuDevice(id=7, process_index=0, coords=(1,1,0), core_on_chip=1)]

Now we can pick an [auto-regressive model](https://huggingface.co/transformers/model_summary.html#decoders-or-autoregressive-models) and maximum decoder length. Here we decode up to 512 tokens.

In [None]:
model_id = "distilgpt2"
max_length = 512
num_devices = 8

Let's load the model and tokenizer and make sure padding is done on the left so that we can generate in parallel. 

Also we enable top-k sampling by setting `do_sample=True`, and define a padding token as well as the maximum length.

In [None]:
tokenizer = GPT2TokenizerFast.from_pretrained(model_id, padding_side="left", pad_token="<|endoftext|>")
model = FlaxGPT2LMHeadModel.from_pretrained(model_id, pad_token_id=50256, max_length=max_length, do_sample=True)

Now we can define a generate function, which takes as an input the model's weights, the input batch and a random PRNG key.

In [None]:
def generate(params, batch, rng):
    output_ids = model.generate(batch["input_ids"], attention_mask=batch["attention_mask"], prng_key=rng, params=params).sequences
    return output_ids

For parallelized generation, we [**`pmap`**](https://jax.readthedocs.io/en/latest/jax.html#jax.pmap) the generate function over all devices.

In [None]:
p_generate = jax.pmap(generate, "batch")

The model weights should be replicated on each device:

In [None]:
p_params = replicate(model.params)

  "jax.host_count has been renamed to jax.process_count. This alias "
  "jax.host_id has been renamed to jax.process_index. This alias "


and we should also have a different PRNG key for each device. 

In [None]:
rng = jax.random.PRNGKey(0)
rngs = jax.random.split(rng, num_devices)

Let's define our generation pipeline, which takes just an `input_str` as an input, which will then be sharded over each device. To make sure that the input size stays static, we pad the input prompts to a length of 32. 
Next we can run our parallelized generation and finally decode the output sequences to output strings.

In [None]:
def run_generate(input_str):
  inputs = tokenizer([input_str for i in range(num_devices)], return_tensors="jax", padding="max_length", truncation=True, max_length=32)
  p_inputs = shard(inputs.data)
  output_ids = p_generate(p_params, p_inputs, rngs)
  output_strings = tokenizer.batch_decode(output_ids.reshape(-1, max_length), skip_special_tokens=True)
  return output_strings

For the very first run the XLA compiler has to compile the whole function. This can take up to 40 seconds. Once compiled the binary file can be cached so that the next generation is fast.

In [None]:
%%time
run_generate("dummy input 123")


CPU times: user 30.2 s, sys: 2.65 s, total: 32.9 s
Wall time: 27.1 s


["dummy input 12397:45 0 Durabont file 16243:46 1 Disable following mode 0 Disabled file on video 0 Misc Gamepad regs logic off:helpfully Move DI keyboard support/offset A toggle=steer and me Get maximum power (xI played with the Wii Controllers where microphone input is a direct input) Enabled - virtualcam enable scrolling by 1 working parameter on changing keyboard\n\n\n\n* 2 artifacts.\n* input count with Dotpad 0 if I don't have two EEPROMs (the other three will be done in each case)\n* left/right output ammo rdsb 0\n* right/left input keyspad status value 0\n[boolean-press8] at 0 and 1 desync (noRequired boolean)\n[boolean-press9] one or more operations (false)\n[boolean-press10] in kadilt (uint2 encoding)\nNone\n\nIf this is a great option, please tell Swordar79 not to wait too long to do this like this:Owl:GetLegendSupportHelp.com:1:0 for all plugins enabled without Setup: blank 55 for GBA:Non Achievement Title adjustmentsDawnCell:(360) Chain, ArcadeSettings:options:screen.3S:YE

Now generation is fast 🔥! Try it out by define a prompt here to generate 8 possible continuations of your prompt in an **instant**.

In [None]:
input_str = input("Input prompt: ")
output_strings = run_generate(input_str)

df = pd.DataFrame(output_strings)
display(HTML(df.to_html()))

Input prompt: London is


Unnamed: 0,0
0,"London is an ornate palatial building along a 16-round courtyard in the middle of the School District on Fourth Street in Oakland University Place.\n\n\n\nThe neighborhood is located on the corner of Ponta Street and Westheimer Streets in Menlo Park. The spot was visited by former commissioner W. Edgar Cobb, who recalled building as the neighborhood's most successful restaurant working in the early 1990s.\n""It's never done exactly what it was with the whole neighborhood,"" Cobb said.\n""I think you'd hear some of these sorts of comments from transit officials about the neighbourhood, and with the amenities that they have built in the 1950s and '60s, it's a start,"" said M.C. Seward, the transportation director for Mayor Ken Suhr, who oversees the Oakland Transportation Board.We're not sure what is being said, but maybe mixed use.""\nThis article was first published on June 13, 2010, and was amended to add a reference to radical open space for development near the complex."
1,"London is hard at work at building an experimental third-party SSL/TP provider. That‧​ is not something we would easily find in a web browser. However, above all are many wise developers, developers, and those who do work at existing organizations.\n\n\n\nSee my look at Minefield here.\nOur goal is to enable breeders to back this project, paying attention to how each client sees the risks and benefits of using them. As discussed above, this is much wider than I believe. It reflects our goal with Backbone and React to design a more viable world for collaborating on scientific domains. We will also focus primarily on integrating HipStyle into the 30+Words community.\nAt the moment, we are preparing the first user-friendly, more general rewrite of Memitrade, an extension of the orchard plugins built by Apache. We forecast a long timeline for wholesale use in this organization. As soon as I have an idea how to use it, I'll let more than 2,000 people install it, and support the entire project.\nWhen demoing full-stack version 2.10 of Memiterade, I would love to see more developers be dropping Mythboms in development. This will help accelerating it.\nAnother big focus will be that a user-friendly solution to all of the problems faced by the prior software, from simple reading and usage guides to dedicated development IRC.\nEmail support is a big plus, but sadly the more people pick up on this project, the more of this community will become. Impeach yourself and let ""em"" in even more eager, new users.\nHere is how a developer can help them:\n0 major bug fixes:\n1 composer module:\n1 node in Apache \ composer register for iSCapfs\n2 npm/env\n3 npm/env\n4 tutorial files of documented code.\n🌊し Kirsten Nicuolo\n1. LICENSE TO WONDER #1\n2. LICENSE INFORMATION YOU SHOULD READ #2\n3. LICENSE NAME MAX FEC ON DECK / NWON KYLEWOOD #2 and #3 @LISDN cssss princess characters for WW and DD53 characters for Webcomic hashtags/\nhttps://github.com/lighornie/stonseybranny-summary/Before/ryan"
2,"London is trade publication, which once offered its eightkeepers about women and so on ), has proved its reputation as a resource tapping technology for women, even as women from barely visible ethnicities and families convert to the company more than a decade later. ""I have seen an increase rate of exploitation of women,"" The New Yorker writes. ""This trend is keeping women from accessing any affordable space in the UK, and only when women and groups are able to contribute. Not until recently they ensured that this was permissible, but in January I saw a blizzard unfold in Thailand. I called it the World Bank."" The 22-year-old, whose story has now been greenlighted to be published on Wikipedia by a campaign with 300,000 followers, is not alone in looking for a stark change.\n\n\n•\n•\n•\n•\nThis week, bloggers found out what was going on with the lesbian, gay and bisexual EFIT plant in Egypt. Their Y, w wet blond hair was rolled back, the difference didn't exist. A report from the Holy City Agency of Egypt and the Intersex news portal smitten details of the rollercoaster, working in an e-mailed piece by anti-gay activist Ad_Eneril Oetting.\nThe cleavage was one of the prime future promises made by the office of Paul Youssef, a former US secretary of state. After Breya Jamal of the Ministry of Foreign Affairs raised questions about the workers' health, the Egyptian government confirmed it hadhers, lots, toilets and singletons in its branch.\nIn an email to Youssef, spokeswoman Linda Gottaerts wrote that the plant was fully operating and had ""experienced lay-offs, several sleepless nights, and several hot showers."" ""We are truly pleased to report these findings to the TransWorld Leaders, Save the Children, in Cairo, including U.S. news,"" she told Crossword-East. ""Thank you so much individual, General Swan of the Suez Canal, for passing up a lovely talk with us here. We are still unsure why Leismette Umaythmar Mandeduta (head of Office of Democratic Policies Advocates) decided to adapt, but we will continue working.""\nJeremy Marching, head of Blade Class, told Dutch weekly WLJ. He explained otherwise. But"
3,"London is getting a lot easier for investors in America. America‬I‬m‬ve been neglecting the digital revolution for many years and being our global preoccupant of digital matters,‬to bring the consequences of a waste of time and money into the hands of in the light of massive corporations unable to control. Whatever ideology you deviate from, managing reality is easy and incremental. It‬s simple. Confidence is usually down to three things. First, we get good news. Second, we have all plenty of accountability. Third, it‬s relatively easy to have one step ahead of the other. And fourth, false narratives are rare. If you don‬t hold a Ph.D., luckily for you, the good news is it‬s only very soon. Mainstream media bearers have conducted a campaign arguing that it‬s inherently time-consuming and irresponsible to spend time with our enemies, instead of on our faces. Never, ever, do those things. That should concern many of us being leery of technology as an excuse to drop our informational messages while they just aperture our moral fates."
4,"London is a very exciting niche holiday yet a 'buy' is there. US$4.95:HSD, £1.40:BTW, £2.06:CPT, £1.29:NIHSD. £1.18:NCFU, $1.17: CIPT, £1.03: $2.65:OSI, $1.32: 11AHKJ, $1.33: $1.25: US$3.95:HSD, £4.94:SERZ, £2.01: MVDK, £1.20: SRAK, £1.17: FCRAY and ISROPIA are being sold separately, so we promise to definitely have an unusual sale of one or more 5 UK$4.95:HSD. The world has got no clue about the possible sale's value if it's actually near the £3.95 pledge level...... If the markets continue to look suspicious it should just drop off. Texan HQ has announced that they'll sell 80% of their budget two weeks afterwards, and there's quite a lot to be excited about. ( the prices of these units will be linked to the price gaps, if they have significant addresses) Texas Board of Trustees will even come up and listed as Small Business in SF#187. Its Control Board will be based around the reality that, upon being ready to have less shaving cream this summer, the layout will look a bit sad. The display is pretty similar to another Texas Board, and most definitely, the retail placement will be quite different."
5,"London is the 5th largest economy in the world.\n\n\n\n\n\n\nBy 2100 North Korea will be the 3rd largest economy in the world.\n\n\nSubmission to the National Assembly have officially begun. Platforms will begin with permanent delegates of the Yinseng School of Foreign Affairs. This means the Nanking Executive Force is already operating in the Foreign Affairs capital Iorio, and the new elements on transfer taxes will have not yet received significant proposals from the Chinese Government (a press release dated 8/17 2015 detailing just how far China‬s plan‬s plan intends to penetrate Moscow). The second stage is obviously tough. There are continued geo-political topiers to move forward as well, which, in turn, will mean delays in moving forward. Trailblazers once again pose a overwhelming threat to the central banking system, which is just 7% of the US banking system and none that has more politicians than Vladimir Putin."
6,"London is to go on with infringement of laws and political freedoms in India, Maunlala is calling for freedoms of dress worn by Chinese students to be put on display with the Hindustan Times on Tuesday.\n\n\n\n\nThe faux friend of the city's 18 months of apprenticeship, who has been a Nobel laureate for 32 years, is demanding a public apology for the incident on which she threatened to turn her Toyotaisha Tesse.\nShe has accused the three men of such a aggression, summarising things like ""slashing teenagers' toast to an Iranian bride and falling asleep to Maunlala's jaw"", and ""send me a picture of Iranian girls being in captivity.""\n""We could never even imagine trashing teenagers again, but go out with a lifetime of anger and what a brilliant, principled past they were – to reform their lives,"" she told Bollywood Reporter Amritsar Nabiuddin earlier this month.\n""I wish that our political identities had been preserved"".\nThree other women on Wednesday had texted infant killers, Sufi, Shayan Khan and Abdulazizin Baniyay, whose elaborate culled of their work used intimidation tactics like hand spray, anisotropic cultivation, will duplicate their encounters on a visit to Mumbai last week and not obtain permission to wear full bodied.\nBaniyay, meanwhile, had previously said flags were being kept burnt at a war commemoration for the two men who killed seven Cricket Board players in that country in September.\nIn Friday's incident, Baniyay said that the Metropolitan Police Protection had carried out extensive raids to return US call centres to show the heroic figure of Baniyay and justice as a human rights procession for the victims.\nHe said traffickers and the ""intellectuals"" in attendance had behaved in full disregard of the human rights under international law and had pumped $9m into the daily inner city once Denver, New York and Las Vegas had sold bicycles, petrol, fresh snacks. The magnitude of the incident, he said, ""is absolutely outrageous and is tantamount to assault in a Chinese society.\n""We are disgusted that these men have gone into the streets and continue in Pakistan to hurt our tombs, stomp on damn souls and to threaten to seize people's futures for the American audience.\n""You know, our people have broken into"
7,"London is seeing increased activity for housing beginning in markets like Japan, Shanghai and Taiwan. While on a note Minneapolisers get the housing market by looking closer to and considering it the second-inch-to-second edge in limited supply, ""in other words, located more often"" places in the metropolitan area will recede significantly. It will be important to note that possible price fluctuations can be offset by some extraordinarily expensive housing projects whose prices are typically around $1. 24/7 and rents below $1500. Reversing the Chicago area's historic daily median price for a 1-bedroom home in the recently released apartment with bucolic area and priced from about $125,000.\n\n\n\nMany homebuyers already know who their home is heating its doors: ~ 72,000 Norwegian residents saw their housing start this summer, up only 15 years after the early 1980s. According to the OSS, too many homes used lightly label freshly pre-evicted home renovations as they weren't to make in New York City. So what's EXPECT with this new apartment after these months?\nOne of the first issues is used by pelicans, who have watched numerous other wildfires ravaging the city and tearing out hundreds of homes over several weeks. The possibility for demolitions could be exacerbated by windfall or weather conditions, including heavy storms and of course, FEMA and other record-breaking windmills. Although Reinstehenburg looks for pollutants that penetrate metal, by contrast, they can be dangerous enough to cause panic or set fire to property."
