# 🚀 GPT

In this notebook, we'll walk through the steps required to train your own GPT model on the wine review dataset

The code is adapted from the excellent [GPT tutorial](https://keras.io/examples/generative/text_generation_with_miniature_gpt/) created by Apoorv Nandan available on the Keras website.

In [1]:
!git clone https://github.com/mxagar/generative_ai_book.git
!mkdir notebooks
!mkdir checkpoint
!mkdir data
!mkdir output
!mkdir models
!mv generative_ai_book/notebooks/utils.py ./notebooks/utils.py
!mv generative_ai_book/notebooks/03_vae/03_vae_faces/vae_utils.py ./notebooks/vae_utils.py
!pip install python-dotenv

Cloning into 'generative_ai_book'...
remote: Enumerating objects: 837, done.[K
remote: Counting objects: 100% (379/379), done.[K
remote: Compressing objects: 100% (184/184), done.[K
remote: Total 837 (delta 239), reused 266 (delta 192), pack-reused 458 (from 1)[K
Receiving objects: 100% (837/837), 57.97 MiB | 13.01 MiB/s, done.
Resolving deltas: 100% (474/474), done.
Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


In [11]:
from dotenv import load_dotenv
import os

# Load environment variables from the .env file
load_dotenv()

# Set an environment variable
# Get values from ./scripts/download.sh
# E.g., celebrity faces dataset: jessicali9530, celeba-dataset
os.environ['KAGGLE_DATASET_USER'] = 'zynicide'
os.environ['KAGGLE_DATASET'] = 'wine-reviews'

In [3]:
import os

# Download dataset
os.system("cd data/ && kaggle datasets download -d $KAGGLE_DATASET_USER/$KAGGLE_DATASET")
!unzip -q -o data/$KAGGLE_DATASET.zip -d data/$KAGGLE_DATASET
!rm data/$KAGGLE_DATASET.zip

In [12]:
%load_ext autoreload
%autoreload 2
import numpy as np
import json
import re
import string
from IPython.display import display, HTML

import tensorflow as tf
from tensorflow.keras import layers, models, losses, callbacks

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 0. Parameters <a name="parameters"></a>

In [13]:
VOCAB_SIZE = 10000
MAX_LEN = 80
EMBEDDING_DIM = 256
KEY_DIM = 256
N_HEADS = 2
FEED_FORWARD_DIM = 256
VALIDATION_SPLIT = 0.2
SEED = 42
LOAD_MODEL = False
BATCH_SIZE = 32
EPOCHS = 5

## 1. Load the data <a name="load"></a>

In [14]:
# Load the full dataset
with open("./data/wine-reviews/winemag-data-130k-v2.json") as json_data:
    wine_data = json.load(json_data)

In [15]:
wine_data[10]

{'points': '87',
 'title': 'Kirkland Signature 2011 Mountain Cuvée Cabernet Sauvignon (Napa Valley)',
 'description': 'Soft, supple plum envelopes an oaky structure in this Cabernet, supported by 15% Merlot. Coffee and chocolate complete the picture, finishing strong at the end, resulting in a value-priced wine of attractive flavor and immediate accessibility.',
 'taster_name': 'Virginie Boone',
 'taster_twitter_handle': '@vboone',
 'price': 19,
 'designation': 'Mountain Cuvée',
 'variety': 'Cabernet Sauvignon',
 'region_1': 'Napa Valley',
 'region_2': 'Napa',
 'province': 'California',
 'country': 'US',
 'winery': 'Kirkland Signature'}

In [16]:
# Filter the dataset
filtered_data = [
    "wine review : "
    + x["country"]
    + " : "
    + x["province"]
    + " : "
    + x["variety"]
    + " : "
    + x["description"]
    for x in wine_data
    if x["country"] is not None
    and x["province"] is not None
    and x["variety"] is not None
    and x["description"] is not None
]

In [17]:
# Count the recipes
n_wines = len(filtered_data)
print(f"{n_wines} recipes loaded")

129907 recipes loaded


In [18]:
example = filtered_data[25]
print(example)

wine review : US : California : Pinot Noir : Oak and earth intermingle around robust aromas of wet forest floor in this vineyard-designated Pinot that hails from a high-elevation site. Small in production, it offers intense, full-bodied raspberry and blackberry steeped in smoky spice and smooth texture.


## 2. Tokenize the data <a name="tokenize"></a>

In [19]:
# Pad the punctuation, to treat them as separate 'words'
def pad_punctuation(s):
    s = re.sub(f"([{string.punctuation}, '\n'])", r" \1 ", s)
    s = re.sub(" +", " ", s)
    return s


text_data = [pad_punctuation(x) for x in filtered_data]

In [20]:
# Display an example of a recipe
example_data = text_data[25]
example_data

'wine review : US : California : Pinot Noir : Oak and earth intermingle around robust aromas of wet forest floor in this vineyard - designated Pinot that hails from a high - elevation site . Small in production , it offers intense , full - bodied raspberry and blackberry steeped in smoky spice and smooth texture . '

In [21]:
# Convert to a Tensorflow Dataset
text_ds = (
    tf.data.Dataset.from_tensor_slices(text_data)
    .batch(BATCH_SIZE)
    .shuffle(1000)
)

In [22]:
# Create a vectorisation layer
vectorize_layer = layers.TextVectorization(
    standardize="lower",
    max_tokens=VOCAB_SIZE,
    output_mode="int",
    output_sequence_length=MAX_LEN + 1,
)

In [23]:
# Adapt the layer to the training set
vectorize_layer.adapt(text_ds)
vocab = vectorize_layer.get_vocabulary()

In [24]:
# Display some token:word mappings
for i, word in enumerate(vocab[:10]):
    print(f"{i}: {word}")

0: 
1: [UNK]
2: :
3: ,
4: .
5: and
6: the
7: wine
8: a
9: of


In [25]:
# Display the same example converted to ints
example_tokenised = vectorize_layer(example_data)
print(example_tokenised.numpy())

[   7   10    2   20    2   29    2   43   62    2   55    5  243 4145
  453  634   26    9  497  499  667   17   12  142   14 2214   43   25
 2484   32    8  223   14 2213  948    4  594   17  987    3   15   75
  237    3   64   14   82   97    5   74 2633   17  198   49    5  125
   77    4    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0]


## 3. Create the Training Set <a name="create"></a>

In [26]:
# Create the training set of recipes and the same text shifted by one word
def prepare_inputs(text):
    text = tf.expand_dims(text, -1)
    tokenized_sentences = vectorize_layer(text)
    x = tokenized_sentences[:, :-1]
    y = tokenized_sentences[:, 1:]
    return x, y


train_ds = text_ds.map(prepare_inputs)

In [27]:
example_input_output = train_ds.take(1).get_single_element()

In [28]:
# Example Input
example_input_output[0][0]

<tf.Tensor: shape=(80,), dtype=int64, numpy=
array([   7,   10,    2,   20,    2,   29,    2,   45,   44,    2, 1580,
        167, 1067,   91, 1778,  774, 2350,    3,   12,  411,   48,  430,
          1,   71,    4,   15,   18,   21,  527,  771,   11,   74,  377,
         25,   18,   21, 7667,  117,   11,   72,    3,  198,   55,    4,
         11,    8,  987,    9,  660,    3, 1686, 1049,    3,   15,   18,
         21,  186,   19,  605,   12, 1558,  137,    7,    4,   35,   66,
          4,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0])>

In [29]:
# Example Output (shifted by one token)
example_input_output[1][0]

<tf.Tensor: shape=(80,), dtype=int64, numpy=
array([  10,    2,   20,    2,   29,    2,   45,   44,    2, 1580,  167,
       1067,   91, 1778,  774, 2350,    3,   12,  411,   48,  430,    1,
         71,    4,   15,   18,   21,  527,  771,   11,   74,  377,   25,
         18,   21, 7667,  117,   11,   72,    3,  198,   55,    4,   11,
          8,  987,    9,  660,    3, 1686, 1049,    3,   15,   18,   21,
        186,   19,  605,   12, 1558,  137,    7,    4,   35,   66,    4,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0])>

## 5. Create the causal attention mask function <a name="causal"></a>

In [30]:
def causal_attention_mask(batch_size, n_dest, n_src, dtype):
    i = tf.range(n_dest)[:, None]
    j = tf.range(n_src)
    m = i >= j - n_src + n_dest
    mask = tf.cast(m, dtype)
    mask = tf.reshape(mask, [1, n_dest, n_src])
    mult = tf.concat(
        [tf.expand_dims(batch_size, -1), tf.constant([1, 1], dtype=tf.int32)], 0
    )
    return tf.tile(mask, mult)


np.transpose(causal_attention_mask(1, 10, 10, dtype=tf.int32)[0])

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]], dtype=int32)

## 6. Create a Transformer Block layer <a name="transformer"></a>

In [31]:
class TransformerBlock(layers.Layer):
    def __init__(self, num_heads, key_dim, embed_dim, ff_dim, dropout_rate=0.1):
        super(TransformerBlock, self).__init__()
        self.num_heads = num_heads
        self.key_dim = key_dim
        self.embed_dim = embed_dim
        self.ff_dim = ff_dim
        self.dropout_rate = dropout_rate
        self.attn = layers.MultiHeadAttention(
            num_heads, key_dim, output_shape=embed_dim
        )
        self.dropout_1 = layers.Dropout(self.dropout_rate)
        self.ln_1 = layers.LayerNormalization(epsilon=1e-6)
        self.ffn_1 = layers.Dense(self.ff_dim, activation="relu")
        self.ffn_2 = layers.Dense(self.embed_dim)
        self.dropout_2 = layers.Dropout(self.dropout_rate)
        self.ln_2 = layers.LayerNormalization(epsilon=1e-6)

    def call(self, inputs):
        input_shape = tf.shape(inputs)
        batch_size = input_shape[0]
        seq_len = input_shape[1]
        causal_mask = causal_attention_mask(
            batch_size, seq_len, seq_len, tf.bool
        )
        attention_output, attention_scores = self.attn(
            inputs,
            inputs,
            attention_mask=causal_mask,
            return_attention_scores=True,
        )
        attention_output = self.dropout_1(attention_output)
        out1 = self.ln_1(inputs + attention_output)
        ffn_1 = self.ffn_1(out1)
        ffn_2 = self.ffn_2(ffn_1)
        ffn_output = self.dropout_2(ffn_2)
        return (self.ln_2(out1 + ffn_output), attention_scores)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "key_dim": self.key_dim,
                "embed_dim": self.embed_dim,
                "num_heads": self.num_heads,
                "ff_dim": self.ff_dim,
                "dropout_rate": self.dropout_rate,
            }
        )
        return config

## 7. Create the Token and Position Embedding <a name="embedder"></a>

In [32]:
class TokenAndPositionEmbedding(layers.Layer):
    def __init__(self, max_len, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.max_len = max_len
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim
        self.token_emb = layers.Embedding(
            input_dim=vocab_size, output_dim=embed_dim
        )
        self.pos_emb = layers.Embedding(input_dim=max_len, output_dim=embed_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)
        x = self.token_emb(x)
        return x + positions

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "max_len": self.max_len,
                "vocab_size": self.vocab_size,
                "embed_dim": self.embed_dim,
            }
        )
        return config

## 8. Build the Transformer model <a name="transformer_decoder"></a>

In [33]:
inputs = layers.Input(shape=(None,), dtype=tf.int32)
x = TokenAndPositionEmbedding(MAX_LEN, VOCAB_SIZE, EMBEDDING_DIM)(inputs)
x, attention_scores = TransformerBlock(
    N_HEADS, KEY_DIM, EMBEDDING_DIM, FEED_FORWARD_DIM
)(x)
outputs = layers.Dense(VOCAB_SIZE, activation="softmax")(x)
gpt = models.Model(inputs=inputs, outputs=[outputs, attention_scores])
gpt.compile("adam", loss=[losses.SparseCategoricalCrossentropy(), None])

In [34]:
gpt.summary()

In [35]:
if LOAD_MODEL:
    # model.load_weights('./models/model.weights.h5')
    gpt = models.load_model("./models/gpt.weights.h5", compile=True)

## 9. Train the Transformer <a name="train"></a>

In [36]:
# Create a TextGenerator checkpoint
class TextGenerator(callbacks.Callback):
    def __init__(self, index_to_word, top_k=10):
        self.index_to_word = index_to_word
        self.word_to_index = {
            word: index for index, word in enumerate(index_to_word)
        }

    def sample_from(self, probs, temperature):
        probs = probs ** (1 / temperature)
        probs = probs / np.sum(probs)
        return np.random.choice(len(probs), p=probs), probs

    def generate(self, start_prompt, max_tokens, temperature):
        start_tokens = [
            self.word_to_index.get(x, 1) for x in start_prompt.split()
        ]
        sample_token = None
        info = []
        while len(start_tokens) < max_tokens and sample_token != 0:
            x = np.array([start_tokens])
            y, att = self.model.predict(x, verbose=0)
            sample_token, probs = self.sample_from(y[0][-1], temperature)
            info.append(
                {
                    "prompt": start_prompt,
                    "word_probs": probs,
                    "atts": att[0, :, -1, :],
                }
            )
            start_tokens.append(sample_token)
            start_prompt = start_prompt + " " + self.index_to_word[sample_token]
        print(f"\ngenerated text:\n{start_prompt}\n")
        return info

    def on_epoch_end(self, epoch, logs=None):
        self.generate("wine review", max_tokens=80, temperature=1.0)

In [37]:
# Create a model save checkpoint
model_checkpoint_callback = callbacks.ModelCheckpoint(
    filepath="./checkpoint/checkpoint.weights.h5",
    save_weights_only=True,
    save_freq="epoch",
    verbose=0,
)

tensorboard_callback = callbacks.TensorBoard(log_dir="./logs")

# Tokenize starting prompt
text_generator = TextGenerator(vocab)

In [38]:
gpt.fit(
    train_ds,
    epochs=EPOCHS,
    callbacks=[model_checkpoint_callback, tensorboard_callback, text_generator],
)

Epoch 1/5
[1m4059/4060[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 22ms/step - loss: 2.5953
generated text:
wine review : portugal : vinho verde : portuguese white : this is a soft , fruity wine . it ' s fresh , almost as a crisper side of the [UNK] , high altitude . with blackberry - currant and acidity and the lightest tannins the white - peach elements still make for a tangy citrus edge . drink now . 

[1m4060/4060[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m153s[0m 35ms/step - loss: 2.5951
Epoch 2/5
[1m4060/4060[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - loss: 1.9746
generated text:
wine review : france : bordeaux : bordeaux - style white blend : the sweetened wine is smooth and fruity all the place . it ' s not [UNK] layers of ripe fruit and layers of rich firm tannins , fruity wine . a powerful , wine par excellence , concentrated and balanced with a generous finish . 

[1m4060/4060[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m87s[0m 21

<keras.src.callbacks.history.History at 0x784ebeb748e0>

In [39]:
# Save the final model
gpt.save("./models/gpt.weights.h5")



# 3. Generate text using the Transformer

In [40]:
def print_probs(info, vocab, top_k=5):
    for i in info:
        highlighted_text = []
        for word, att_score in zip(
            i["prompt"].split(), np.mean(i["atts"], axis=0)
        ):
            highlighted_text.append(
                '<span style="background-color:rgba(135,206,250,'
                + str(att_score / max(np.mean(i["atts"], axis=0)))
                + ');">'
                + word
                + "</span>"
            )
        highlighted_text = " ".join(highlighted_text)
        display(HTML(highlighted_text))

        word_probs = i["word_probs"]
        p_sorted = np.sort(word_probs)[::-1][:top_k]
        i_sorted = np.argsort(word_probs)[::-1][:top_k]
        for p, i in zip(p_sorted, i_sorted):
            print(f"{vocab[i]}:   \t{np.round(100*p,2)}%")
        print("--------\n")

In [41]:
info = text_generator.generate(
    "wine review : us", max_tokens=80, temperature=1.0
)


generated text:
wine review : us : washington : merlot : this wine is all varietal , blended together in that give a few burgundy to smooth out dry creek , forward drinking over fine juicy fruit . it should be balanced , exhibiting generous chocolate and integrated tannins , with a touch of earthy . 



In [42]:
info = text_generator.generate(
    "wine review : italy", max_tokens=80, temperature=0.5
)


generated text:
wine review : italy : tuscany : sangiovese : aromas of scorched earth , underbrush , scorched earth , leather , leather and a balsamic note lead the nose on this full - bodied red . the palate is still closed but the rather evolved palate offers raisin , roasted coffee bean , espresso , licorice and licorice alongside firm , fine - grained tannins . drink sooner rather than later to capture the remaining fruit . 



In [43]:
info = text_generator.generate(
    "wine review : germany", max_tokens=80, temperature=0.5
)
print_probs(info, vocab)


generated text:
wine review : germany : mosel : riesling : while intensely floral , with hints of honey , this off - dry riesling is chock full of succulent pineapple and mango flavors . it ' s rich and concentrated , but finishes with a bristle of lemon - lime acidity . drink now through 2020 . 



::   	100.0%
grosso:   	0.0%
-:   	0.0%
blend:   	0.0%
,:   	0.0%
--------



mosel:   	82.38%
rheinhessen:   	9.09%
rheingau:   	7.56%
pfalz:   	0.82%
nahe:   	0.09%
--------



::   	99.92%
-:   	0.08%
,:   	0.0%
blanca:   	0.0%
grosso:   	0.0%
--------



riesling:   	99.99%
pinot:   	0.01%
weissburgunder:   	0.0%
gewürztraminer:   	0.0%
chardonnay:   	0.0%
--------



::   	100.0%
-:   	0.0%
grosso:   	0.0%
blanc:   	0.0%
blend:   	0.0%
--------



a:   	39.29%
while:   	12.74%
whiffs:   	7.56%
this:   	5.57%
dusty:   	4.01%
--------



intensely:   	44.38%
a:   	10.09%
demure:   	9.38%
the:   	8.06%
exuberantly:   	3.85%
--------



floral:   	43.99%
mineral:   	14.32%
ripe:   	10.51%
perfumed:   	8.33%
fruity:   	7.18%
--------



,:   	90.8%
and:   	4.43%
on:   	4.23%
in:   	0.41%
with:   	0.08%
--------



this:   	47.35%
with:   	32.77%
it:   	9.78%
wafting:   	5.6%
the:   	1.9%
--------



hints:   	30.07%
notes:   	29.61%
a:   	25.42%
honey:   	2.71%
scents:   	2.6%
--------



of:   	100.0%
at:   	0.0%
,:   	0.0%
on:   	0.0%
and:   	0.0%
--------



honey:   	81.38%
blossom:   	9.83%
honeysuckle:   	3.61%
saffron:   	1.03%
wax:   	0.67%
--------



and:   	66.07%
,:   	33.92%
on:   	0.01%
[UNK]:   	0.0%
-:   	0.0%
--------



this:   	41.98%
it:   	14.85%
honey:   	7.57%
wax:   	5.67%
saffron:   	3.27%
--------



off:   	32.33%
dry:   	24.6%
riesling:   	13.95%
is:   	10.86%
wine:   	3.77%
--------



-:   	99.98%
dry:   	0.02%
kabinett:   	0.0%
riesling:   	0.0%
the:   	0.0%
--------



dry:   	100.0%
sweet:   	0.0%
[UNK]:   	0.0%
ripe:   	0.0%
label:   	0.0%
--------



riesling:   	98.85%
kabinett:   	0.46%
wine:   	0.28%
,:   	0.26%
spätlese:   	0.13%
--------



is:   	90.52%
balances:   	6.46%
has:   	0.95%
should:   	0.46%
builds:   	0.4%
--------



remarkably:   	35.98%
a:   	25.98%
intensely:   	14.65%
surprisingly:   	4.65%
delicately:   	3.1%
--------



full:   	99.88%
-:   	0.12%
medium:   	0.0%
that:   	0.0%
flush:   	0.0%
--------



of:   	99.97%
,:   	0.01%
and:   	0.01%
-:   	0.01%
bodied:   	0.0%
--------



sweet:   	35.79%
ripe:   	23.86%
juicy:   	16.23%
fresh:   	3.66%
yellow:   	1.58%
--------



yellow:   	50.71%
peach:   	24.41%
white:   	7.33%
pineapple:   	3.68%
mango:   	3.29%
--------



and:   	81.88%
,:   	17.96%
flavors:   	0.1%
.:   	0.03%
fruit:   	0.02%
--------



mango:   	36.87%
peach:   	27.86%
tangerine:   	18.16%
grapefruit:   	3.22%
orange:   	2.38%
--------



flavors:   	98.37%
,:   	1.03%
fruit:   	0.21%
and:   	0.15%
.:   	0.09%
--------



.:   	90.41%
,:   	8.71%
that:   	0.53%
and:   	0.1%
accented:   	0.08%
--------



it:   	98.87%
the:   	0.62%
there:   	0.13%
a:   	0.05%
off:   	0.04%
--------



':   	99.98%
finishes:   	0.01%
is:   	0.0%
has:   	0.0%
drinks:   	0.0%
--------



s:   	100.0%
ll:   	0.0%
[UNK]:   	0.0%
sweet:   	0.0%
d:   	0.0%
--------



a:   	58.39%
full:   	3.58%
unabashedly:   	3.21%
an:   	3.14%
rich:   	3.1%
--------



and:   	54.76%
,:   	19.89%
in:   	19.14%
with:   	2.96%
but:   	1.76%
--------



sweet:   	36.63%
concentrated:   	26.44%
full:   	8.02%
ripe:   	6.35%
juicy:   	5.68%
--------



,:   	76.6%
in:   	13.0%
with:   	7.39%
on:   	1.28%
but:   	1.06%
--------



with:   	42.96%
but:   	30.75%
yet:   	25.52%
finishing:   	0.19%
it:   	0.14%
--------



with:   	23.58%
likely:   	22.27%
balanced:   	13.86%
brightened:   	6.25%
elegantly:   	4.32%
--------



with:   	95.9%
long:   	2.75%
elegantly:   	0.4%
briskly:   	0.36%
moderately:   	0.19%
--------



a:   	97.98%
an:   	0.75%
lingering:   	0.35%
hints:   	0.24%
just:   	0.18%
--------



bristle:   	82.1%
murmur:   	4.82%
hint:   	3.44%
touch:   	2.54%
flurry:   	1.84%
--------



of:   	100.0%
-:   	0.0%
,:   	0.0%
that:   	0.0%
and:   	0.0%
--------



lime:   	75.34%
acidity:   	6.54%
lemon:   	5.29%
white:   	2.97%
minerality:   	1.91%
--------



-:   	81.58%
zest:   	13.46%
acidity:   	2.74%
astringency:   	0.9%
pith:   	0.29%
--------



lime:   	99.84%
zest:   	0.08%
tea:   	0.04%
rind:   	0.02%
skin:   	0.01%
--------



acidity:   	99.55%
zest:   	0.26%
astringency:   	0.08%
-:   	0.05%
notes:   	0.02%
--------



.:   	99.49%
that:   	0.41%
and:   	0.09%
,:   	0.01%
on:   	0.0%
--------



:   	56.14%
drink:   	37.47%
it:   	4.21%
drinks:   	1.44%
the:   	0.33%
--------



now:   	99.98%
now–2015:   	0.01%
now–2017:   	0.0%
now–2016:   	0.0%
now–2020:   	0.0%
--------



through:   	94.89%
.:   	4.79%
for:   	0.08%
,:   	0.08%
and:   	0.07%
--------



2021:   	49.78%
2020:   	45.07%
2018:   	2.15%
2019:   	1.34%
2025:   	0.94%
--------



.:   	100.0%
to:   	0.0%
,:   	0.0%
through:   	0.0%
for:   	0.0%
--------



:   	100.0%
imported:   	0.0%
drink:   	0.0%
beyond:   	0.0%
enjoy:   	0.0%
--------

