<a href="https://colab.research.google.com/github/paraery/parsingResume/blob/main/transformersModel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 1.7 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 38.9 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 49.0 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1


In [2]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

In [9]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing DistilBertForTokenClassification: ['pre_classifier.weight', 'pre_classifier.bias']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [10]:
NER = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is yugesh and I live in India"

In [3]:
import pandas as pd
df = pd.read_csv('drive/MyDrive/textfortraining3.csv')

In [4]:
lstTxt = df['text'].values.tolist()

#train

In [5]:
pip install -q datasets transformers

[K     |████████████████████████████████| 451 kB 7.6 MB/s 
[K     |████████████████████████████████| 212 kB 51.3 MB/s 
[K     |████████████████████████████████| 132 kB 64.8 MB/s 
[K     |████████████████████████████████| 127 kB 39.6 MB/s 
[?25h

In [6]:
import re
def get_tokens_with_entities(raw_text: str):
    # split the text by spaces only if the space does not occur between square brackets
    # we do not want to split "multi-word" entity value yet
    raw_tokens = re.split(r"\s(?![^\[]*\])", raw_text)

    # a regex for matching the annotation according to our notation [entity_value](entity_name)
    entity_value_pattern = r"\[(?P<value>.+?)\]\((?P<entity>.+?)\)"
    entity_value_pattern_compiled = re.compile(entity_value_pattern, flags=re.I|re.M)

    tokens_with_entities = []

    for raw_token in raw_tokens:
        match = entity_value_pattern_compiled.match(raw_token)
        if match:
            raw_entity_name, raw_entity_value = match.group("entity"), match.group("value")

            # we prefix the name of entity differently
            # B- indicates beginning of an entity
            # I- indicates the token is not a new entity itself but rather a part of existing one
            for i, raw_entity_token in enumerate(re.split("\s", raw_entity_value)):
                entity_prefix = "B" if i == 0 else "I"
                entity_name = f"{entity_prefix}-{raw_entity_name}"
                tokens_with_entities.append((raw_entity_token, entity_name))
        else:
            tokens_with_entities.append((raw_token, "O"))

    return tokens_with_entities

In [7]:
class NERDataMaker:
    def __init__(self, texts):
        self.unique_entities = []
        self.processed_texts = []

        temp_processed_texts = []
        for text in texts:
            tokens_with_entities = get_tokens_with_entities(text)
            for _, ent in tokens_with_entities:
                if ent not in self.unique_entities:
                    self.unique_entities.append(ent)
            temp_processed_texts.append(tokens_with_entities)

        self.unique_entities.sort(key=lambda ent: ent if ent != "O" else "")

        for tokens_with_entities in temp_processed_texts:
            self.processed_texts.append([(t, self.unique_entities.index(ent)) for t, ent in tokens_with_entities])

    @property
    def id2label(self):
        return dict(enumerate(self.unique_entities))

    @property
    def label2id(self):
        return {v:k for k, v in self.id2label.items()}

    def __len__(self):
        return len(self.processed_texts)

    def __getitem__(self, idx):
        def _process_tokens_for_one_text(id, tokens_with_encoded_entities):
            ner_tags = []
            tokens = []
            for t, ent in tokens_with_encoded_entities:
                ner_tags.append(ent)
                tokens.append(t)

            return {
                "id": id,
                "ner_tags": ner_tags,
                "tokens": tokens
            }

        tokens_with_encoded_entities = self.processed_texts[idx]
        if isinstance(idx, int):
            return _process_tokens_for_one_text(idx, tokens_with_encoded_entities)
        else:
            return [_process_tokens_for_one_text(i+idx.start, tee) for i, tee in enumerate(tokens_with_encoded_entities)]

    def as_hf_dataset(self, tokenizer):
        from datasets import Dataset, Features, Value, ClassLabel, Sequence
        def tokenize_and_align_labels(examples):
            tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)

            labels = []
            for i, label in enumerate(examples[f"ner_tags"]):
                word_ids = tokenized_inputs.word_ids(batch_index=i)  # Map tokens to their respective word.
                previous_word_idx = None
                label_ids = []
                for word_idx in word_ids:  # Set the special tokens to -100.
                    if word_idx is None:
                        label_ids.append(-100)
                    elif word_idx != previous_word_idx:  # Only label the first token of a given word.
                        label_ids.append(label[word_idx])
                    else:
                        label_ids.append(-100)
                    previous_word_idx = word_idx
                labels.append(label_ids)

            tokenized_inputs["labels"] = labels
            return tokenized_inputs

        ids, ner_tags, tokens = [], [], []
        for i, pt in enumerate(self.processed_texts):
            ids.append(i)
            pt_tokens,pt_tags = list(zip(*pt))
            ner_tags.append(pt_tags)
            tokens.append(pt_tokens)
        data = {
            "id": ids,
            "ner_tags": ner_tags,
            "tokens": tokens
        }
        features = Features({
            "tokens": Sequence(Value("string")),
            "ner_tags": Sequence(ClassLabel(names=dm.unique_entities)),
            "id": Value("int32")
        })
        ds = Dataset.from_dict(data, features)
        tokenized_ds = ds.map(tokenize_and_align_labels, batched=True)
        return tokenized_ds

In [8]:
dm = NERDataMaker(lstTxt)
print(f"total examples = {len(dm)}")
print(dm[0:3])

total examples = 95
[{'id': 0, 'ner_tags': [1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [20]:
dm[11]

{'id': 11,
 'ner_tags': [0,
  1,
  2,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  

In [21]:
##first
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
#tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english", num_labels=len(dm.unique_entities), id2label=dm.id2label, label2id=dm.label2id,ignore_mismatched_sizes=True)
#model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(dm.unique_entities), id2label=dm.id2label, label2id=dm.label2id)

Some weights of the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing DistilBertForTokenClassification: ['pre_classifier.weight', 'pre_classifier.bias']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([3, 

In [9]:
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer

tokenizer = AutoTokenizer.from_pretrained("drive/MyDrive/PersonEntModelTest2")
model = AutoModelForTokenClassification.from_pretrained("drive/MyDrive/PersonEntModelTest2")
data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Configuration saved in drive/MyDrive/PersonEntModel3/config.json
Model weights saved in drive/MyDrive/PersonEntModel3/pytorch_model.bin
tokenizer config file saved in drive/MyDrive/PersonEntModel3/tokenizer_config.json
Special tokens file saved in drive/MyDrive/PersonEntModel3/special_tokens_map.json


('drive/MyDrive/PersonEntModel3/tokenizer_config.json',
 'drive/MyDrive/PersonEntModel3/special_tokens_map.json',
 'drive/MyDrive/PersonEntModel3/vocab.txt',
 'drive/MyDrive/PersonEntModel3/added_tokens.json',
 'drive/MyDrive/PersonEntModel3/tokenizer.json')

In [13]:
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=40,
    weight_decay=0.02,
)

train_ds = dm.as_hf_dataset(tokenizer=tokenizer)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=train_ds, # eval on training set! ONLY for DEMO!!
    tokenizer=tokenizer,
    data_collator=data_collator,
)

trainer.train()
model.save_pretrained("drive/MyDrive/PersonEntModelTest3")
tokenizer.save_pretrained("drive/MyDrive/PersonEntModelTest3")




  0%|          | 0/1 [00:00<?, ?ba/s]

***** Running training *****
  Num examples = 95
  Num Epochs = 40
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 240
  Number of trainable parameters = 66365187
The following columns in the training set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, tokens, id. If ner_tags, tokens, id are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,No log,0.001927
2,No log,0.000841
3,No log,0.000766
4,No log,0.000841
5,No log,0.000278
6,No log,0.000184
7,No log,0.000122
8,No log,0.000101
9,No log,9.1e-05
10,No log,7.9e-05


***** Running Evaluation *****
  Num examples = 95
  Batch size = 16
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, tokens, id. If ner_tags, tokens, id are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 95
  Batch size = 16
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, tokens, id. If ner_tags, tokens, id are not expected by `DistilBertForTokenClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 95
  Batch size = 16
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForTokenClassification.forward` and have been ignored: ner_tags, tokens, id. If ner_tags, tokens, id are no

('drive/MyDrive/PersonEntModelTest3/tokenizer_config.json',
 'drive/MyDrive/PersonEntModelTest3/special_tokens_map.json',
 'drive/MyDrive/PersonEntModelTest3/vocab.txt',
 'drive/MyDrive/PersonEntModelTest3/added_tokens.json',
 'drive/MyDrive/PersonEntModelTest3/tokenizer.json')

In [14]:
import pandas as pd
dftest = pd.read_csv('textInResume.csv')

In [None]:
import pandas as pd
dfName = pd.read_csv('nameinresume.csv')

In [None]:
lstName = (dfName['firstname'].values+' '+dfName['lastname'].values).tolist()
lstName

NameError: ignored

In [15]:
lsttest = dftest['text'].values.tolist()

In [18]:
pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # pass device=0 if using gpu
i=0
num=0
for test in lsttest:
  ent = pipe(test)
  txtName = ''
  txt =''
  for val in ent:
    txtName = txtName+" | " + val['word'].replace('##','')
    txt = txt+val['word'].replace('##','')
  print( ent)
  print(txtName,'('+txt+')')

[]
 ()
[{'entity_group': 'PERSON', 'score': 0.95196307, 'word': 'nisaratwisesbantao', 'start': 0, 'end': 18}]
 | nisaratwisesbantao (nisaratwisesbantao)
[{'entity_group': 'PERSON', 'score': 0.99890757, 'word': 'th', 'start': 178, 'end': 180}, {'entity_group': 'PERSON', 'score': 0.999071, 'word': '##ira', 'start': 180, 'end': 183}, {'entity_group': 'PERSON', 'score': 0.9884774, 'word': '##wit jirarungro', 'start': 183, 'end': 197}]
 | th | ira | wit jirarungro (thirawit jirarungro)
[]
 ()
[{'entity_group': 'PERSON', 'score': 0.99921846, 'word': 'than', 'start': 0, 'end': 4}, {'entity_group': 'PERSON', 'score': 0.9822139, 'word': '##chanok watcharakitphokin', 'start': 4, 'end': 28}]
 | than | chanok watcharakitphokin (thanchanok watcharakitphokin)
[{'entity_group': 'PERSON', 'score': 0.89714295, 'word': 'nat', 'start': 612, 'end': 615}, {'entity_group': 'PERSON', 'score': 0.7495198, 'word': '##anop pimonsathi', 'start': 615, 'end': 630}, {'entity_group': 'PERSON', 'score': 0.5261133, 'wo

In [None]:
  for name in lstName:
    if hex(id(name))==hex(id(txt)):
       num=num+1 
       print(num)
    break
  i= i+1

In [None]:
acc = (num/87)*100
print(num,acc)
num

0 0.0


0

In [None]:
pipe("""Busaba Supasawat        profile a self-starter and quick learner with two-year experience   in financial industry. versatile  skill  sets including problem solving,  quantitative and analytical skills. seeking for career in financial industry where i can utilize and further  develop my  skills  and knowledge to   bring innovative solutions helping company to achieve the company goal.   contact phone: 0838523456  email: kanittha.se@hotmail.com  address: 118/208 condo vtara 36,  soi saen-sa-buy, rama iv rd.,  phra-khanong, klong-toei, bangkok  10110  work experience structured products development - bualuang securities may 2020  present ? develop and conduct structured product pricing and other relevant areas ? analyze data to efficiently expand client-based market ? provide market outlook and trading strategies of structured note to the sales team ? create new tools used to assist the sales team  ? handle project for improving current working flow  derivatives analyst - citibank n.a. april 2019  april 2020 ? review and improve the existing process to increase efficiency by applying automation process ? process settlement and confirmations via verbal and electronics means for derivative and structures product ? handle exceptions and providing constructive solutions to resolve issues  onsite analyst - central group online september 2018  march 2019 ? develop excel spreadsheet to improve day-to-day job ? perform ad hoc analysis to identify business issues and draw recommendation in marketing performance management  ? develop performance tracking on google tag manager on website education tsinghua university, beijing, china [m.sc.] aug 2016  july 2018  ? management science and engineering (mse) program ? department of industrial engineering chulalongkorn university [b. eng.] may 2012  may 2016 ? department of chemical engineering  ? awarded second-class honors computer skills ? vba in excel (advance), python (intermediate), sql (basic) ? excellent working knowledge for tableau, power bi certificate  ic complex 1 language skills  chinese (hsk5)  english (intermediate)   ----------------page (0) break---------------- """)

[{'entity_group': 'PERSON',
  'score': 0.9372089,
  'word': 'bus',
  'start': 0,
  'end': 3},
 {'entity_group': 'PERSON',
  'score': 0.64176697,
  'word': '##aba',
  'start': 3,
  'end': 6},
 {'entity_group': 'PERSON',
  'score': 0.8098486,
  'word': 'su',
  'start': 7,
  'end': 9},
 {'entity_group': 'PERSON',
  'score': 0.8008657,
  'word': '##pas',
  'start': 9,
  'end': 12},
 {'entity_group': 'PERSON',
  'score': 0.7050145,
  'word': '##awa',
  'start': 12,
  'end': 15}]

In [None]:
from transformers import pipeline
pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # pass device=0 if using gpu
pipe("""Busaba Supasawat        profile a self-starter and quick learner with two-year experience   in financial industry. versatile  skill  sets including problem solving,  quantitative and analytical skills. seeking for career in financial industry where i can utilize and further  develop my  skills  and knowledge to   bring innovative solutions helping company to achieve the company goal.   contact phone: 0838523456  email: kanittha.se@hotmail.com  address: 118/208 condo vtara 36,  soi saen-sa-buy, rama iv rd.,  phra-khanong, klong-toei, bangkok  10110  work experience structured products development - bualuang securities may 2020  present ? develop and conduct structured product pricing and other relevant areas ? analyze data to efficiently expand client-based market ? provide market outlook and trading strategies of structured note to the sales team ? create new tools used to assist the sales team  ? handle project for improving current working flow  derivatives analyst - citibank n.a. april 2019  april 2020 ? review and improve the existing process to increase efficiency by applying automation process ? process settlement and confirmations via verbal and electronics means for derivative and structures product ? handle exceptions and providing constructive solutions to resolve issues  onsite analyst - central group online september 2018  march 2019 ? develop excel spreadsheet to improve day-to-day job ? perform ad hoc analysis to identify business issues and draw recommendation in marketing performance management  ? develop performance tracking on google tag manager on website education tsinghua university, beijing, china [m.sc.] aug 2016  july 2018  ? management science and engineering (mse) program ? department of industrial engineering chulalongkorn university [b. eng.] may 2012  may 2016 ? department of chemical engineering  ? awarded second-class honors computer skills ? vba in excel (advance), python (intermediate), sql (basic) ? excellent working knowledge for tableau, power bi certificate  ic complex 1 language skills  chinese (hsk5)  english (intermediate)   ----------------page (0) break---------------- """)

[{'entity_group': 'PERSON',
  'score': 0.9372089,
  'word': 'bus',
  'start': 0,
  'end': 3},
 {'entity_group': 'PERSON',
  'score': 0.64176697,
  'word': '##aba',
  'start': 3,
  'end': 6},
 {'entity_group': 'PERSON',
  'score': 0.8098486,
  'word': 'su',
  'start': 7,
  'end': 9},
 {'entity_group': 'PERSON',
  'score': 0.8008657,
  'word': '##pas',
  'start': 9,
  'end': 12},
 {'entity_group': 'PERSON',
  'score': 0.7050145,
  'word': '##awa',
  'start': 12,
  'end': 15}]