In [1]:
!pip install transformers

[33mYou are using pip version 18.1, however version 20.2b1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
from transformers import AutoModelWithLMHead, AutoTokenizer
import torch

In [3]:
def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')):
    """ Filter a distribution of logits using top-k and/or nucleus (top-p) filtering
        Args:
            logits: logits distribution shape (vocabulary size)
            top_k >0: keep only top k tokens with highest probability (top-k filtering).
            top_p >0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).
                Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
        Source:
            https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317
    """
    assert logits.dim() == 1  # batch size 1 for now - could be updated for more but the code would be less clear
    top_k = min(top_k, logits.size(-1))  # Safety check
    if top_k > 0:
        # Remove all tokens with a probability less than the last token of the top-k
        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]
        logits[indices_to_remove] = filter_value

    if top_p > 0.0:
        sorted_logits, sorted_indices = torch.sort(logits, descending=True)
        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)

        # Remove tokens with cumulative probability above the threshold
        sorted_indices_to_remove = cumulative_probs > top_p
        # Shift the indices to the right to keep also the first token above the threshold
        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
        sorted_indices_to_remove[..., 0] = 0

        indices_to_remove = sorted_indices[sorted_indices_to_remove]
        logits[indices_to_remove] = filter_value
    return logits

In [4]:
def sample_token(output):
    logits = output[..., -1, :].squeeze(0)
    logits = top_k_top_p_filtering(logits, top_k=10)
    log_probs = torch.softmax(logits, dim=-1)
    token = torch.multinomial(log_probs, num_samples=1)[0]

    return token

## Transformer-XL

In [5]:
tokenizer = AutoTokenizer.from_pretrained('transfo-xl-wt103')
model = AutoModelWithLMHead.from_pretrained('transfo-xl-wt103')

In [20]:
generated = tokenizer.encode("On our way to the beach")
context = torch.tensor([generated])
past = None

In [21]:
for i in range(100):
    output, past = model(context, mems=past)
    token = sample_token(output)

    generated.append(token.item())
    context = token.view(1, -1)

In [11]:
print(tokenizer.decode(generated))

On our way to the beach, she finds, she finds the men who are in the group to be " in the group ". This has led to the perception that the " group " in the group is " a group of people in the group with whom we share a deep friendship, and which is a common cause to the contrary. " <eos> <eos> = = Background = = <eos> <eos> The origins of the concept of " group " were in early colonial years with the English Civil War. The term was coined by English abolitionist John


## GPT-2

In [22]:
tokenizer = AutoTokenizer.from_pretrained("gpt2-large")
model = AutoModelWithLMHead.from_pretrained('gpt2-large')

In [41]:
generated = tokenizer.encode("Lynn Xiaoling Mo is a")
context = torch.tensor([generated])
past = None

In [42]:
for i in range(50):
    output, past = model(context, past=past)
    token = sample_token(output)

    generated.append(token.item())
    context = token.unsqueeze(0)

In [43]:
print(tokenizer.decode(generated))

Lynn Xiaoling Mo is a retired senior research scientist at Microsoft Research in Redmond, Washington. He received the 2013 Turing Award in recognition of his work on artificial brain-inspired systems. His latest work is focused on the development of an artificial neural network that learns from its environment.



## XLM

In [44]:
tokenizer = AutoTokenizer.from_pretrained('xlm-clm-enfr-1024')
model = AutoModelWithLMHead.from_pretrained('xlm-clm-enfr-1024')

In [45]:
generated = [0] # start with just <s>
context = torch.tensor([generated])
lang = 0 # English

In [46]:
for i in range(100):
    langs = torch.zeros_like(context).fill_(lang)
    output, = model(context, langs=langs)
    token = sample_token(output)

    generated.append(token.item())
    context = torch.tensor([generated])

In [47]:
print(tokenizer.decode(generated))

<s>and its ability to make decisions on the basis of its own. " </s>The government has taken no decisions on that matter, " Mr Hockey said. </s>A lot of the information is very sensitive. </s>The new research and information on the Australian economy, which is what we're going to get from people, and the information that we are going to be looking at, we're going to be able to provide and we 'll take it forward. </s>I'm not trying to make sure we're not


In [57]:
generated = [0] # start with just <s>
context = torch.tensor([generated])
lang = 1 # French

In [58]:
for i in range(100):
    langs = torch.zeros_like(context).fill_(lang)
    output, = model(context, langs=langs)
    token = sample_token(output)

    generated.append(token.item())
    context = torch.tensor([generated])

In [59]:
print(tokenizer.decode(generated))

<s></s>En revanche, les prix des maisons individuelles n' ont guère augmenté ( - 0,1 % ). </s>En mars dernier, le taux de la taxe foncière, en légère augmentation à la hausse par rapport à février 2008. </s>" Je n' ai jamais eu une augmentation " précise ". </s>" Je me suis toujours dit que ce n' était pas parce que c' était une blague. </s>En effet, j' étais un gars de la rue " </s>Les jeunes sont des gens qui avaient beaucoup d' humour... "
