In [1]:
! pip3 install datasets transformers

Collecting datasets
  Downloading datasets-1.10.2-py3-none-any.whl (542 kB)
[?25l[K     |▋                               | 10 kB 28.5 MB/s eta 0:00:01[K     |█▏                              | 20 kB 27.3 MB/s eta 0:00:01[K     |█▉                              | 30 kB 19.0 MB/s eta 0:00:01[K     |██▍                             | 40 kB 14.9 MB/s eta 0:00:01[K     |███                             | 51 kB 7.0 MB/s eta 0:00:01[K     |███▋                            | 61 kB 8.2 MB/s eta 0:00:01[K     |████▎                           | 71 kB 7.8 MB/s eta 0:00:01[K     |████▉                           | 81 kB 8.7 MB/s eta 0:00:01[K     |█████▍                          | 92 kB 8.7 MB/s eta 0:00:01[K     |██████                          | 102 kB 7.3 MB/s eta 0:00:01[K     |██████▋                         | 112 kB 7.3 MB/s eta 0:00:01[K     |███████▎                        | 122 kB 7.3 MB/s eta 0:00:01[K     |███████▉                        | 133 kB 7.3 MB/s eta 0:00:01

In [2]:
import numpy as np
import random
import torch
seed = 2001
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
# torch.cuda.manual_seed(seed) # if cuda
# torch.cuda.manual_seed_all(seed)
# torch.backends.cudnn.deterministic = True
# torch.backends.cudnn.benchmark =False

<torch._C.Generator at 0x7fa33efc7ab0>

In [3]:
import transformers

print(transformers.__version__)

4.9.0


In [4]:
#!pip install tqdm
import datasets
from datasets import Dataset, DatasetDict

In [5]:
import sys
import numpy as np
sys.path.append('/content/drive/MyDrive/nlp')
#np.save('/content/drive/MyDrive/nlp/test', np.array([2,3,4,5]))

If you're opening this notebook locally, make sure your environment has an install from the last version of those libraries.

You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/question-answering).

# Fine-tuning a model on a question-answering task

In this notebook, we will see how to fine-tune one of the [🤗 Transformers](https://github.com/huggingface/transformers) model to a question answering task, which is the task of extracting the answer to a question from a given context. We will see how to easily load a dataset for these kinds of tasks and use the `Trainer` API to fine-tune a model on it.

![Widget inference representing the QA task](https://github.com/huggingface/notebooks/blob/master/examples/images/question_answering.png?raw=1)

**Note:** This notebook finetunes models that answer question by taking a substring of a context, not by generating new text.

In [6]:
 
squad_v2 = True#False
model_checkpoint = "distilbert-base-uncased"
batch_size = 16

Outline of the solution:
1. Converting the original data to a question answering format by adding questions such as 'who' + relation + entity B.
2. Adding questions with no answers in the training set and in validation set.
   For training - one extra random question for each sentence
   For validation - adding all the other questions (7 in total)
3. During test we check the answers given to all the questions related to a 
   specific sentence and hopefully the best answer will correspond with the
   correct relation as all other questions have no answer and will get lower
   score.

## Loading the dataset

In [7]:
import requests, zipfile, io

import pandas as pd

def download_data():
    url = "https://www.dropbox.com/s/izi2x4sjohpzoot/relation_extraction_dataset.zip?dl=1"
    r = requests.get(url)
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall()

download_data()


df = pd.read_pickle("relation_extraction_dataset.pkl")
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,end_idx,entities,entity_spans,match,original_article,sentence,start_idx,string_id
0,1024,"[Lilium, Baillie Gifford]","[[3, 9], [151, 166]]",raising $35,Happy Friday!\n\nWe sincerely hope you and you...,"3) Lilium, a German startup that’s making an a...",1013,invested_in
1,1762,"[Facebook ’s, Giphy]","[[92, 102], [148, 153]]",acquisition,Happy Friday!\n\nWe sincerely hope you and you...,"Meanwhile, the UK’s watchdog on Friday announc...",1751,acquired_by
2,2784,"[Global-e, Vitruvian Partners]","[[27, 35], [94, 112]]",raised $60,Happy Friday!\n\nWe sincerely hope you and you...,Israeli e-commerce startup Global-e has raised...,2774,invested_in
3,680,"[Joris Van Der Gucht, Silverfin]","[[0, 19], [35, 44]]",founder,Hg is a leading investor in tax and accounting...,"Joris Van Der Gucht, co-founder at Silverfin c...",673,founded_by
4,2070,"[Tim Vandecasteele, Silverfin]","[[0, 17], [71, 80]]",founder,Hg is a leading investor in tax and accounting...,"Tim Vandecasteele, co-founder added: ""We want ...",2063,founded_by


In [8]:
id2label = dict()
for idx, label in enumerate(df.string_id.value_counts().index):
  id2label[idx] = label
label2id = {v:k for k,v in id2label.items()}
label2id

{'CEO_of': 3,
 'acquired_by': 1,
 'founded_by': 0,
 'invested_in': 2,
 'owned_by': 6,
 'partners_with': 5,
 'subsidiary_of': 4}

In [9]:
questions=['who founded ','who acquired ','who invested in ','who is the CEO of ','who is subsidiary of ','who is partner with ','who owns ']
q2label=dict()
for idx, q in enumerate(questions):
  q2label[q] = id2label[idx]
q2label

{'who acquired ': 'acquired_by',
 'who founded ': 'founded_by',
 'who invested in ': 'invested_in',
 'who is partner with ': 'partners_with',
 'who is subsidiary of ': 'subsidiary_of',
 'who is the CEO of ': 'CEO_of',
 'who owns ': 'owned_by'}

In [10]:
 
#add questions and answers to db text and span
df['question'] = df['string_id'].transform(lambda x: questions[label2id[x]])#[typ.feature.names[i] for i in x]
df['question'] =df['question'] +df['entities'].transform(lambda x: x[1]+'?')

def add_answer(entity, span):
     
    return {'text':[entity[0]],'answer_start':[span[0][0]]}

#df['answer'] = df.(lambda x: add_answer(df.entities, df.entity_spans))# for i in range(df.entity_spans.count()))#, axis=1)
df['answers']= df[['entities', 'entity_spans']].apply(lambda x: add_answer(*x), axis=1)
df['id']=df.index.astype('str')
df.rename(columns={'sentence':'context'}, inplace=True)

In [11]:
import numpy as np

#split train validation
train_idx= np.random.choice(df.shape[0], size=int(0.8*df.shape[0]), replace=False)
val_idx=np.array([x for x in range(df.shape[0]) if not x in train_idx])
#train_idx.shape, val_idx.shape, df.shape
df_train=df.iloc[train_idx]
df_val=df.iloc[val_idx]
df_train.shape

(9624, 11)

#add questions with no answrs both for train and validation

In [12]:

import ast
 

 
df_train_ext=df_train.copy()
df_size=df.shape[0]
df_train_ext['question_id']=df_train_ext['id']
df_train_ext['string_id']=df_train['string_id']
def add_questionsWno_answers(group, withAnswer=False, size=0):
    row0 = group.iloc[0]
    row=pd.DataFrame(row0)#.transpose())#,index=line.name)
    row=row.transpose()
   
    rel=row0['string_id']
     
    possible_questions= [questions[i] for i in label2id.values() if i!=label2id[rel]]
    
    if withAnswer:
      #count=1
      idx=np.random.choice(len(possible_questions))
      possible_questions=[possible_questions[idx]]
      
  
    df3=pd.concat([row]*(len(possible_questions)+1))
 
    for i,q in enumerate(possible_questions) :
      #print(int(df3.iloc[i+1].loc['id'])+idx)
      df3.iloc[i+1].loc['question']=q + df3.iloc[i+1].loc['entities'][1]+ '?'
      #df3.loc[i+1,'question']=q + df3.iloc[i+1].loc['entities'][1]+ '?'
     
      df3.iloc[i+1].loc['answers'] = str({'text':[],'answer_start':[]}) #str()
      df3.iloc[i+1].loc['string_id']  = q2label[q]
      df3.iloc[i+1].loc['question_id']=str(i+1 + size + 8*int(df3.iloc[i+1].loc['id']))
    return df3

##
## Add questions with no answers for train dataset, for every question we add one more random question that has no answer 
df_train_ext=df_train_ext.groupby('id', group_keys=False).apply(add_questionsWno_answers, (True), (df_size))
df_train_ext['answers']= df_train_ext['answers'].transform(lambda x:ast.literal_eval(x) if isinstance(x, str) else x)
df_train_ext.shape[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


19248

In [13]:
df_train_ext.head(16)

Unnamed: 0,end_idx,entities,entity_spans,match,original_article,context,start_idx,string_id,question,answers,id,question_id
0,1024,"[Lilium, Baillie Gifford]","[[3, 9], [151, 166]]",raising $35,Happy Friday!\n\nWe sincerely hope you and you...,"3) Lilium, a German startup that’s making an a...",1013,invested_in,who invested in Baillie Gifford?,"{'text': ['Lilium'], 'answer_start': [3]}",0,0
0,1024,"[Lilium, Baillie Gifford]","[[3, 9], [151, 166]]",raising $35,Happy Friday!\n\nWe sincerely hope you and you...,"3) Lilium, a German startup that’s making an a...",1013,founded_by,who founded Baillie Gifford?,"{'text': [], 'answer_start': []}",0,12032
1,1762,"[Facebook ’s, Giphy]","[[92, 102], [148, 153]]",acquisition,Happy Friday!\n\nWe sincerely hope you and you...,"Meanwhile, the UK’s watchdog on Friday announc...",1751,acquired_by,who acquired Giphy?,"{'text': ['Facebook ’s'], 'answer_start': [92]}",1,1
1,1762,"[Facebook ’s, Giphy]","[[92, 102], [148, 153]]",acquisition,Happy Friday!\n\nWe sincerely hope you and you...,"Meanwhile, the UK’s watchdog on Friday announc...",1751,owned_by,who owns Giphy?,"{'text': [], 'answer_start': []}",1,12040
10,76,"[Collibra, CapitalG]","[[52, 60], [90, 98]]",raised $100,Belgium/US-based data governance technology co...,Belgium/US-based data governance technology co...,65,invested_in,who invested in CapitalG?,"{'text': ['Collibra'], 'answer_start': [52]}",10,10
10,76,"[Collibra, CapitalG]","[[52, 60], [90, 98]]",raised $100,Belgium/US-based data governance technology co...,Belgium/US-based data governance technology co...,65,founded_by,who founded CapitalG?,"{'text': [], 'answer_start': []}",10,12112
100,46,"[Randy, CloudBlue]","[[30, 35], [77, 86]]",founded,"Prior to co-founding Xometry, Randy co-founded...","Prior to co-founding Xometry, Randy co-founded...",39,founded_by,who founded CloudBlue?,"{'text': ['Randy'], 'answer_start': [30]}",100,100
100,46,"[Randy, CloudBlue]","[[30, 35], [77, 86]]",founded,"Prior to co-founding Xometry, Randy co-founded...","Prior to co-founding Xometry, Randy co-founded...",39,invested_in,who invested in CloudBlue?,"{'text': [], 'answer_start': []}",100,12832
1000,3256,"[Scopely, Barcelona Dublin DIGIT Game Studio]","[[381, 388], [464, 498]]",founded,Mobile game publisher Scopely has raised 200 m...,Viswanathan former NEA general partner Greycro...,3249,founded_by,who founded Barcelona Dublin DIGIT Game Studio?,"{'text': ['Scopely'], 'answer_start': [381]}",1000,1000
1000,3256,"[Scopely, Barcelona Dublin DIGIT Game Studio]","[[381, 388], [464, 498]]",founded,Mobile game publisher Scopely has raised 200 m...,Viswanathan former NEA general partner Greycro...,3249,subsidiary_of,who is subsidiary of Barcelona Dublin DIGIT Ga...,"{'text': [], 'answer_start': []}",1000,20032


Add questions to validation dataset so for each question we ask all the 7 questions. Thus the question which will have an answer will describe the correct relation

In [14]:


df_size=df.shape[0]
df_val_ext=df_val.copy()
df_val_ext['question_id']=df_val_ext['id']
df_val_ext=df_val_ext.groupby('id', group_keys=False).apply(add_questionsWno_answers, (False), (df_size))
df_val_ext['answers']= df_val_ext['answers'].transform(lambda x:ast.literal_eval(x) if isinstance(x, str) else x)
df_val_ext.head(15)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,end_idx,entities,entity_spans,match,original_article,context,start_idx,string_id,question,answers,id,question_id
10000,75,"[Chiliz, Binance Chain]","[[26, 32], [76, 89]]",partnership with,Sports blockchain venture Chiliz has announced...,Sports blockchain venture Chiliz has announced...,59,partners_with,who is partner with Binance Chain?,"{'text': ['Chiliz'], 'answer_start': [26]}",10000,10000
10000,75,"[Chiliz, Binance Chain]","[[26, 32], [76, 89]]",partnership with,Sports blockchain venture Chiliz has announced...,Sports blockchain venture Chiliz has announced...,59,founded_by,who founded Binance Chain?,"{'text': [], 'answer_start': []}",10000,92032
10000,75,"[Chiliz, Binance Chain]","[[26, 32], [76, 89]]",partnership with,Sports blockchain venture Chiliz has announced...,Sports blockchain venture Chiliz has announced...,59,acquired_by,who acquired Binance Chain?,"{'text': [], 'answer_start': []}",10000,92033
10000,75,"[Chiliz, Binance Chain]","[[26, 32], [76, 89]]",partnership with,Sports blockchain venture Chiliz has announced...,Sports blockchain venture Chiliz has announced...,59,invested_in,who invested in Binance Chain?,"{'text': [], 'answer_start': []}",10000,92034
10000,75,"[Chiliz, Binance Chain]","[[26, 32], [76, 89]]",partnership with,Sports blockchain venture Chiliz has announced...,Sports blockchain venture Chiliz has announced...,59,CEO_of,who is the CEO of Binance Chain?,"{'text': [], 'answer_start': []}",10000,92035
10000,75,"[Chiliz, Binance Chain]","[[26, 32], [76, 89]]",partnership with,Sports blockchain venture Chiliz has announced...,Sports blockchain venture Chiliz has announced...,59,subsidiary_of,who is subsidiary of Binance Chain?,"{'text': [], 'answer_start': []}",10000,92036
10000,75,"[Chiliz, Binance Chain]","[[26, 32], [76, 89]]",partnership with,Sports blockchain venture Chiliz has announced...,Sports blockchain venture Chiliz has announced...,59,owned_by,who owns Binance Chain?,"{'text': [], 'answer_start': []}",10000,92037
10003,1335,"[Alexandre Dreyfus, Socios com]","[[41, 58], [78, 88]]",founder,What is blockchain technology for OK at this p...,The core concept is relatively simple As Alexa...,1328,founded_by,who founded Socios com?,"{'text': ['Alexandre Dreyfus'], 'answer_start'...",10003,10003
10003,1335,"[Alexandre Dreyfus, Socios com]","[[41, 58], [78, 88]]",founder,What is blockchain technology for OK at this p...,The core concept is relatively simple As Alexa...,1328,acquired_by,who acquired Socios com?,"{'text': [], 'answer_start': []}",10003,92056
10003,1335,"[Alexandre Dreyfus, Socios com]","[[41, 58], [78, 88]]",founder,What is blockchain technology for OK at this p...,The core concept is relatively simple As Alexa...,1328,invested_in,who invested in Socios com?,"{'text': [], 'answer_start': []}",10003,92057


In [15]:
#df_val_ext[df_val_ext.id=='3']
df_val[:5]#.shape[0]

Unnamed: 0,end_idx,entities,entity_spans,match,original_article,context,start_idx,string_id,question,answers,id,question_id
2,2784,"[Global-e, Vitruvian Partners]","[[27, 35], [94, 112]]",raised $60,Happy Friday!\n\nWe sincerely hope you and you...,Israeli e-commerce startup Global-e has raised...,2774,invested_in,who invested in Vitruvian Partners?,"{'text': ['Global-e'], 'answer_start': [27]}",2,2
39,17937,"[Agosto, Google Cloud]","[[0, 6], [130, 142]]",founded,Above the Trend Line: your industry rumor cent...,"Agosto, founded in 2001, helps businesses enha...",17930,founded_by,who founded Google Cloud?,"{'text': ['Agosto'], 'answer_start': [0]}",39,39
40,18088,"[Agosto, Pythian]","[[4, 10], [37, 44]]",acquisition,Above the Trend Line: your industry rumor cent...,The Agosto acquisition will solidify Pythian's...,18077,acquired_by,who acquired Pythian?,"{'text': ['Agosto'], 'answer_start': [4]}",40,40
42,19372,"[Louis Tetu, Coveo]","[[190, 200], [220, 225]]",& CEO,Above the Trend Line: your industry rumor cent...,"""From a business perspective, COVID-19 is both...",19367,CEO_of,who is the CEO of Coveo?,"{'text': ['Louis Tetu'], 'answer_start': [190]}",42,42
44,21163,"[Felix Van de Maele, Collibra]","[[118, 136], [160, 168]]",CEO of,Above the Trend Line: your industry rumor cent...,"""The impact of the global pandemic highlights ...",21157,CEO_of,who is the CEO of Collibra?,"{'text': ['Felix Van de Maele'], 'answer_start...",44,44


In [16]:

 
from datasets import Dataset, DatasetDict
dataset_rel_train = Dataset.from_pandas(df_train_ext)
dataset_rel_val = Dataset.from_pandas(df_val_ext)
ds_rel_val_org = Dataset.from_pandas(df_val)

In [17]:
 
datasets=DatasetDict({'train':dataset_rel_train, 'val':dataset_rel_val})

In [18]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


We will use the [🤗 Datasets](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.  

In [19]:
from datasets import load_dataset, load_metric

The `datasets` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation and test set.

In [20]:
datasets

DatasetDict({
    train: Dataset({
        features: ['end_idx', 'entities', 'entity_spans', 'match', 'original_article', 'context', 'start_idx', 'string_id', 'question', 'answers', 'id', 'question_id', '__index_level_0__'],
        num_rows: 19248
    })
    val: Dataset({
        features: ['end_idx', 'entities', 'entity_spans', 'match', 'original_article', 'context', 'start_idx', 'string_id', 'question', 'answers', 'id', 'question_id', '__index_level_0__'],
        num_rows: 16849
    })
})

We can see the training, validation and test sets all have a column for the context, the question and the answers to those questions.

To access an actual element, you need to select a split first, then give an index:

In [21]:
datasets["train"][0]

{'__index_level_0__': 0,
 'answers': {'answer_start': [3], 'text': ['Lilium']},
 'context': '3) Lilium, a German startup that’s making an all-electric vertical takeoff and landing passenger jet became a ‘unicorn’ after raising $35 million from Baillie Gifford, the largest investor in Tesla after its billionaire owner Elon Musk.',
 'end_idx': '1024',
 'entities': ['Lilium', 'Baillie Gifford'],
 'entity_spans': [[3, 9], [151, 166]],
 'id': '0',
 'match': 'raising $35',
 'original_article': 'Happy Friday!\n\nWe sincerely hope you and yours are keeping healthy and safe. Please take care of yourself and others.\n\nThis week, our research team tracked more than 70 tech funding deals worth over €2 billion (!), as well as 15 M&A transactions, rumours, and related news stories across Europe, including Russia, Israel, and Turkey.\n\nMeanwhile, here’s an overview of the 10 biggest European tech news items for this week (subscribe to our free newsletter to get this roundup in your inbox every Mond

In [22]:
type(datasets["train"])

datasets.arrow_dataset.Dataset

We can see the answers are indicated by their start position in the text (here at character 515) and their full text, which is a substring of the context as we mentioned above.

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset (automatically decoding the labels in passing).

In [23]:
from datasets import ClassLabel, Sequence
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    
    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
        elif isinstance(typ, Sequence) and isinstance(typ.feature, ClassLabel):
            df[column] = df[column].transform(lambda x: [typ.feature.names[i] for i in x])
    print(df.dtypes)
    display(HTML(df.to_html()))

In [24]:
show_random_elements(datasets["train"])

end_idx              object
entities             object
entity_spans         object
match                object
original_article     object
context              object
start_idx            object
string_id            object
question             object
answers              object
id                   object
question_id          object
__index_level_0__     int64
dtype: object


Unnamed: 0,end_idx,entities,entity_spans,match,original_article,context,start_idx,string_id,question,answers,id,question_id,__index_level_0__
0,2438,"[Kristofer Tremaine, Kimura Capital]","[[175, 193], [213, 227]]",Founder,MineHub has added London based asset management company Kimura Capital to its growing consortium of industry participants The news comes just a few weeks after the technology platform to improve efficiency in trading operations and environmental and social governance ESG compliance in mining and metals supply chains confirmed it was ready for its first blockchain customers Kimura a specialist in commodity trade finance with expertise in financing complex logistical operations is a member of the Alternative Investment Management Association Kimura s experience in finance will provide another important source of financial liquidity within the mining and metals ecosystem MineHub said The partnership with MineHub will be instrumental in driving the innovation within the industry The company continued With financial institutions adapting to a changing regulatory environment the provision of alternative finance plays an increasingly important role in facilitating the availability of credit not just for trade finance but also project and institutional finance Improving access to capital is therefore a core part of the MineHub value proposition and a key area of focus is on enabling an integrated mix of institutional and alternative finance actors to provide new financing structures MineHub has been working on this technology platform in collaboration with IBM and leaders across the mining supply chain including ING Group Wheaton Precious Metals Ocean Partners USA Kutcho Copper Capstone Mining and White Case LLP It went live earlier this month with the company saying initial usage and transactions with consortium members was anticipated to commence within the next few weeks Arnoud Star Busmann CEO of MineHub said having Kimura on board as a consortium member is very strategic for MineHub Offering optionality in financing sources is core to our strategy and Kimura is a clear leader in this sector both in size and diversity as well as their commitment to innovation technology and sustainability He added Working with Kimura and their peers in conjunction with commercial banks and other financial institutions MineHub will improve the working capital options and costs for miners traders and other users Our solutions will serve both large corporates and SMEs within the market whilst contributing to responsible supply chains by linking risk pricing to ESG profiles of minerals Kristofer Tremaine CEO and Founder of Kimura Capital said digitisation is the future for the market Kimura has developed its business by selecting best in class partners in order to strengthen its overall offering We are delighted to begin a partnership with MineHub whom have outstanding potential and represent Kimura s first cooperation in the digitisation of commodity trade flows MineHub added Digitalisation transparency and automation will help reduce operational and fraud risks thereby lowering the barriers to entry for alternative financing sources Increased operational efficiencies and automation of ESG compliance will enable alternative financing houses to serve more clients without increasing operations and overheads,Our solutions will serve both large corporates and SMEs within the market whilst contributing to responsible supply chains by linking risk pricing to ESG profiles of minerals Kristofer Tremaine CEO and Founder of Kimura Capital,2431,founded_by,who founded Kimura Capital?,"{'answer_start': [175], 'text': ['Kristofer Tremaine']}",9982,9982,9982
1,5554,"[Meridia Private Equity, Vetsum]","[[0, 22], [39, 45]]",invested,"Forescout (Nasdaq: FSCT) and the Advent International private equity firm have reached an amended $1.43 billion merger agreement to settle a lawsuit over Advent's attempt to back out of the deal based on Covid-19 concerns. ""The fundamental strengths that first attracted Advent to Forescout - its differentiated technology, record of innovation, talented employee base and relentless focus on its customers - continue to make this business a compelling platform and critical player in the cybersecurity ecosystem,"" says Bryan Taylor, head of Advent's technology investment team. According to lawyers for Wilson Sonsini Goodrich & Rosati who argued the case for Forescout, the agreement marks the first ""material adverse event"" related to Covid-19 where a temporary restraining order was granted and a settlement was reached. Advent had announced in May that it wanted to back out of the sale as a result of the effect that the pandemic had on Forescout's security device business. In response, Forescout sued Advent in Delaware Chancery Court, arguing that the purchase agreement specifically barred Advent from citing the coronavirus as creating a material adverse change that would excuse the investment fund from finalizing the deal. The trial had been scheduled to start July 20. The transaction is now expected to close in the third quarter. Crosspoint Capital Partners, Morgan Stanley and Ropes & Gray also advised on the deal.\n\nGuiding portfolio companies through the Covid-19 pandemic is the biggest challenge facing private equity firms and service providers today. Firms throughout the middle market are focused now on helping portfolio companies make decisions about reopening their businesses, assessing the impact of the coronavirus on their businesses and forging a strategy on moving forward. For example, Boston-based Watermill worked with portfolio company Enbi to reimagine its business to make face shields. ""They are a high-tech company with deep market knowhow that make rollers for printers,"" says Julia Karol, president and COO of Watermill. ""The company realized people who have to wear the face shields for long periods of time were getting deep welts on their foreheads. Enbi was able to make a high-grade compound that used foam across the front of the shields, so they are comfortable. Enbi will continue to make masks and explore other innovations. It's awesome to see the whole manufacturing community using their skills and expertise to help."" For more insights from Watermill, New Heritage, Monroe, Cushman & Wakefield and EisnerAmper, see the story Switching gears: 5 ways private equity firms and service providers are helping portfolio companies adapt in the Covid-19 pandemic.\n\nDEAL NEWS\n\nLast year Saudi Aramco, the world's biggest oil producer, was set to buy a 20 percent stake in Reliance Industries Ltd.'s refining and petrochemicals business, valuing it at $75 billion. But Mukesh Ambani, Asia's richest tycoon, said on Wednesday that a deal hadn't been worked out yet with the delay partly down to the coronavirus.\n\nTechnimark, manufacturer of plastic packaging and components based in North Carolina, has acquired Tool & Plastic Industries Ltd., a supplier of injection molded products for the medical device, pharmaceutical and consumer product sectors headquartered in Longford, Ireland. Terms were not disclosed. Technimark is backed by Pritzker Private Capital.\n\nPayments company Square has acquired Stitch Labs, an operations management platform. Stitch Labs builds tools for businesses for inventory and order management, channel management and fulfillment solutions. Stitch Lab's products will continue to operate for existing customers until the spring of 2021.\n\nAdam Cook, former CEO of Glebar Co., has formed Culper Capital Partners, a New Jersey-based private investment company. The new company plans to provide debt and equity capital to middle-market companies while partnering with their management teams to drive growth and enterprise value. Culper Capital will also invest in alternative credit and equity vehicles and real estate.\n\nTriller, a music-video streaming platform owned by the Los Angeles-based Proxima Media investment firm, has acquired Halogen Networks and its patents for live video streaming and monetization. Triller plans to use Halogen's technology to deliver live-streamed, large-scale free and pay-per-view events.\n\nMP Materials, producer of rare earth minerals, will be listed on the New York Stock Exchange following a merger with Fortress Value Acquisition Corp. The combined company will be valued at about $1.5 billion, and current MP Materials shareholders JHL Capital Group and QVT Financial will roll their existing equity holdings into equity of the combined company.\n\nCB Insights has acquired VentureSource data from Dow Jones. According to CB Insights, the VentureSource data assets will significantly expand its private markets coverage and strengthen its position in emerging technology information and private market data.\n\nLos Angeles middle-market PE firm RLH Equity Partners has invested in MCA Connect, one of the largest independent Microsoft Dynamics firms that had not been controlled by private equity or acquired by a large corporation, according to RLH.\n\nBerlin-based RSG Group is acquiring the Gold's Gym fitness chain for $100 million. With the acquisition, RSG Group will have more than 900 fitness locations on six continents. Gold's Gym had filed for Chapter 11 bankruptcy protection from creditors in May.\n\nMeridia Private Equity has invested in Vetsum, a veterinary care provider in Spain. Alongside Kipenzi, the company's majority shareholder, Meridia plans to grow Vetsum by accelerating its ""buy and build' consolidation strategy.\n\nBear Down Brands, a portfolio company of Topspin Consumer Partners, has acquired Verilux Inc., a Waitsfield, Vermont-based provider of lighting products that simulate natural light indoors for reading and seasonal light therapy. Huntington Beach, California-based Bear Down is a developer and marketer of home, health and wellness products marketed under the Pure Enrichment, Bentgo, Brusheez and EasyLunchboxes brands. Intrepid Investment Bankers advised on the deal.\n\nATTOM Data Solutions, curator of a national property database, has acquired Home Junction Inc., a real estate data technology company. Monroe Capital supported the deal through an increased credit facility to ATTOM.\n\nTrident Energy, an independent oil and gas company operating in Equatorial Guinea, has acquired the Pampo and Enchova offshore oil basins from the Brazil-run Petrobras.\n\nDEAL TRENDS\n\nA June survey of senior executives and M&A professionals found that the Covid-19 pandemic is creating uncertainty about the strength of the U.S. economy over the next 12 months, with respondents evenly split between positive, neutral or negative outlooks. But looking out 24 months, respondents are overwhelmingly positive about the U.S. economic outlook, reports Dykema and the Association for Corporate Growth -- Detroit, San Antonio/Austin and Columbus Chapters, which conducted the survey. The survey respondents also say that the No. 1 driver of U.S. M&A deal activity is the health of the U.S. economy, and most believe their company or portfolio companies will be involved in an M&A transaction in the next 12 months.\n\nAfter a three-month decline that brought U.S. manufacturing activity to its lowest level since the global financial crisis, the industry showed signs of slight recovery in May and June, according to Capstone Headwaters' Industrials & Manufacturing Update. Capstone reported that from speaking to industry players and private equity firms about how Covid-19 has impacted their business model and merger and acquisition appetite, the firm has found that while activity has slowed, select M&A transactions are still underway.\n\nWilliam Blair has added two senior investment bankers: Jamie Hamilton as managing director, financial technology investment banking, and B.T. Remmert as managing director, IT services investment banking, both in Atlanta.\n\nEQUALITY AND INCLUSION\n\nA 2019 report from the Institute for Policy Studies shows that the median wealth for Black families in 2016 was $3,557 -- about 2% of the median wealth owned by white families, which owned nearly $147,000 in the same year. For banks to play a major role in closing the income gap between whites and Blacks, they'll need to diversify their top leadership and middle-management ranks. New hiring and promotional policies could reshape banks' understanding of local communities' needs and expand who gets mortgages or small-business loans and which families build lasting wealth. Including more people of color in bank management would diversify the flow of capital, says Malia Lazu, the chief experience and culture officer at Berkshire Hills Bancorp in Boston. See the full story: How banks aim to close racial wealth gap: More minorities in leadership.\n\nTen private equity firms have pledged to each create and post five board seats to make them available to minority and women candidates, participating in an initiative to increase diversity on company boards of directors. Aurora Capital Partners, Clearlake Capital, Genstar Capital, Grain Management, Hellman & Friedman, Hg, Insight Partners, K1 Investment Management, TA Associates and Vista Equity Partners have committed to the board initiative announced by Diligent Corp., provider of company governance software and a portfolio company of Clearlake and Insight. Read our full coverage: Clearlake, Insight, Vista and other private equity firms create 50 new board roles for diverse candidates.\n\nPortfolia Rising America Fund ""invests directly in early and growth-stage companies in the U.S. led by people of color and/or LGBTQ founders, or products and services that cater to these markets,"" says investment partner Lorine Pendleton in a Q&A with Mergers & Acquisitions. ""These are founders, ecosystems, products and services historically overlooked by traditional venture capitalists but positioned for significant growth and profitability."" The firm is led by five women of color. In addition to Pendleton, the firm's leaders are: Noramay Cadena, co-founder and managing partner of MiLA Capital; Daphne Dufresne, a managing partner of GenNx 360 Capital Partners; Juliana Garaizar, an angel investor; and Karen Kerr, executive managing director at GE Ventures. ""We believe that strength lies in differences and seek out entrepreneurs and startups who are using shifting demographics and their own diversity of experience and thought to create innovation that offers outsized opportunities for returns and impact."" The fund had its first close earlier in 2020 and has made two investments to date: The first investment is in MoCaFi, a fintech startup founded by Wole Coaxum, a former JPMorgan Chase commercial banking executive and entrepreneur, who is African American. ""MoCaFi offers a mobile-first banking platform that brings digital banking products to underbanked or unbanked communities (an 88 million U.S. market), allowing them to build credit and financial mobility,"" Pendleton explains. The second investment is in a women's tele-medicine network. For more, read the full interview: Led by 5 women of color, Portfolia Rising America Fund backs mobile banking and women's telemedicine startups.\n\n""As stewards of capital we have an outsized role in determining which businesses to support,"" says Mina Pacheco Nazemi of Barings Alternative Investments. ""As asset allocators, we need to hold ourselves accountable. I can do more. Will you join me?"" Dealmakers begin to weigh in, as Gerge Floyd's death sparked two weeks of Black Lives Matter protests against police brutality and racial injustice. Read the story: ""Justice doesn't just happen. It requires action, dedication and accountability,"" says one private equity investor.\n\nCORONAVIRUS IMPACT\n\nLast year Saudi Aramco, the world's biggest oil producer, was set to buy a 20 percent stake in Reliance Industries Ltd.'s refining and petrochemicals business, valuing it at $75 billion. But Mukesh Ambani, Asia's richest tycoon, said on Wednesday that a deal hadn't been worked out yet with the delay partly down to the coronavirus.\n\nUnder normal circumstances, M&A demands a robust set of tools and services to be successful. In today's environment in which the stakes have been raised by the coronavirus crisis, professional help from service providers is more important than ever. Private equity firms and their portfolio companies want to know what actions they can or should take, and what their peers are considering, to make the best decisions possible in response to the Covid-19 pandemic. Through talking with many different affected parties, service providers have streams of data and information that can help investors make informed decisions and minimize negative economic impacts on their investments. Mergers & Acquisitions examines offerings from EHE Health, Norgay Partners, Cepres, Valuation Research Corp. and Axial. ""The stakes are high today,"" says Greg Mansur, chief client officer at EHE Health, which provides a playbook on getting companies back to work safely. ""We want to be part of the solution for our clients. We want to help them through this and help America get back to work."" Read our full coverage: 5 service providers guide dealmakers through the next phase of the pandemic.\n\nAs transactions previously delayed due to the pandemic begin to pick up, acquirors and investors in the middle market should evaluate the target's performance during the unprecedented disruption presented by the pandemic, and adjust expectations for the immediate and medium term. Supplemental due diligence is not only prudent -- it is likely to be required as a condition to the placement of any representations and warranties insurance. Essential considerations include whether the target has been able to innovate and whether the valuation agreed to in a letter of intent should be revisited. Buyers should also review any termination provisions to determine whether any breakup fee would be payable. See our full coverage: 11 factors for dealmakers to consider before buying a company during the pandemic.\n\nMany companies are unprepared to face the tremendous economic challenges brought on by the pandemic. For buyers, navigating this new world of distressed M&A may be the hardest obstacle to overcome in transactions with insolvent organizations. Read the full article: Coronavirus puts spotlight on distressed M&A.\n\nDigital technologies like artificial intelligence and advanced analytics can help organizations to accelerate their pace and expand their insights quickly -- advantages that are especially crucial in times of rapid change. See the full story: How analytics can rebalance M&A in the wake of the coronavirus.\n\nArizent, the parent company of Mergers & Acquisitions, released a new survey May 15 to understand how executives across industries were dealing with the impacts of the Covid-19 crisis after operating in a ""new normal"" environment for two months. As the coronavirus pandemic continues to extend its grip on the globe -- infecting more than 1.41 million Americans (over 4.44 million globally) by the middle of May -- executives must navigate their organizations through uncharted territory, with the possibility that the virus may not disappear any time soon. This is forcing C-suites to make big, lasting decisions with few guideposts to aid them. The April survey found that there was a surprisingly smooth, albeit hurried transition to remote, with most companies, including private equity firms and investment banks, feeling that they performed on par or above their own expectations. However, technology gaps did arise, as some companies found that customers either didn't have the equipment to access their accounts digitally or needed training from staff working remotely. In the middle market, dealmakers report that ""opportunities have thinned somewhat but have not disappeared,"" as one private equity investor put it. ""Investor base still has liquidity to invest."" Said one investment banker focused on real estate: ""Pending deals were either put on hold, cancelled or delayed. Asset prices for listings are being re-evaluated or renegotiated with the sellers and buyers expecting discounts."" For more, see: Exclusive survey: How private equity firms, investment banks and other companies are surviving the pandemic.\n\nWhat do you do when you're a dealmaker under quarantine, and face-to-face meetings are out of the question? For Work from Home (WFH) strategies, Mergers & Acquisitions turns to eight prominent dealmakers from private equity firms, investment banks, lenders and law firms. ""I miss the excitement of a great conference; wearing my nice clothes, early morning breakfasts, the one-on-ones, drinks with my women 'tribe,' and dinner at a steakhouse, even though I am a vegan,"" says Amy Weisman, managing director, business development, Sterling Investment Partners. In some respects, it is easier to build relationships now, explains Nanette Heide, partner, co-chair, private equity group, Duane Morris. ""Meeting folks over a video conference from their home is immediately humanizing."" M&A pros also point out that human factors play a role. ""Emotional Quotient (EQ) is more important than ever during trying times,"" says Jeremy Holland, managing partner, origination, The Riverside Co. ""It's critical to remember that the dealmaker on other side of the (now figurative) deal table is a person, too. They have good and bad days and presumably know many people in high-risk categories, potentially even themselves. Being extra thoughtful about each interaction is important."" Read our full coverage: Dealmaking under quarantine: 8 private equity and M&A pros share strategies while social distancing.\n\nMORE FEATURED CONTENT\n\nMergers & Acquisitions is recognizing nine dealmakers as the 2020 Rising Stars of Private Equity:\n\nThese outstanding up-and-coming investment professionals have been excelling during a period of profound change in the U.S. and in the world. The publication of this list comes at a pivotal moment in time. The country is beginning to open up after three months of quarantine from the coronavirus, while a second wave picks up steam in the Sun Belt from South Carolina to California and including Texas. Dealmaking under quarantine while working from home has proved challenging, to say the least.\n\nSocial justice issues have taken on fresh urgency. There is heightened awareness of systemic racial injustice and police brutality against Blacks after the deaths of George Floyd and many others. Meanwhile, the U.S. Supreme Court ruled recently that, ""An employer who fires an individual merely for being gay or transgender defies the law."" On immigration policy, the Court recently put the brakes on dismantling the Deferred Action for Childhood Arrivals, or DACA. Meanwhile, the President is asking the Court to overturn the Affordable Care Act, also known as Obamacare.\n\nClick here for full coverage of Mergers & Acquisitions' 2020 Rising Stars of Private Equity.\n\nIn the challenging times we face now, it's more important than ever to come together as a community and recognize the people and companies that excel and lead. We invite you to join us in honoring the 2019 winners of Mergers & Acquisitions' M&A Mid-Market Awards. In contrast with the volatile coronavirus-driven conditions unfolding in 2020, the dealmaking environment of 2019 was remarkably stable. Among the PE firms benefitting from the auspicious fundraising climate was Vista Private Equity, which raised a $16 billion fund - the largest technology-focused PE fund ever raised. Mergers & Acquisitions is honoring Vista founder and CEO Robert F. Smith with our 2019 Dealmaker of the Year award. In addition to leading his firm's unprecedented fundraising, Smith excelled in philanthropy. When he spoke at the commencement of Morehouse College, he announced he would pay off all the student loans of the HBCU's 2019 graduates, providing a helping hand in the student debt crisis facing many U.S. families. The financial services sector saw a lot of consolidation in 2019. Piper Jaffray wins our 2019 Deal of the Year for buying Sandler O'Neill to form Piper Sandler, which instantly became a leading investment bank in the financial services sector. And Stifel wins our 2019 Investment Bank of the Year for growing dramatically and making several acquisitions. Read our full awards coverage: Meet the winners of Mergers & Acquisitions' M&A Mid-Market Awards.\n\nTo celebrate deals, dealmakers and dealmaking firms, Mergers & Acquisitions produces three special reports every year: the M&A Mid-Market Awards; the Rising Stars of Private Equity; and the Most Influenital Women in Mid-Market M&A. For an overview of what we're looking for in each project, including timelines, see Special reports overview: M&A Mid-Market Awards, Rising Stars, Most Influential Women.","Meridia Private Equity has invested in Vetsum, a veterinary care provider in Spain.",5546,invested_in,who invested in Vetsum?,"{'answer_start': [0], 'text': ['Meridia Private Equity']}",10407,10407,10407
2,13095,"[Facebook, Dustin Moskovitz]","[[63, 71], [83, 99]]",founder,"YouTube upgraded to show 60 frames per second recently - and put top-quality video out of reach for half of UK households. CC-licensed photo by Sean MacEntee on Flickr.\n\nDavid McCabe, Cecilia Kang and Daisuke Wakabayashi:\n\n""\n\nThe Justice Department accused Google of illegally protecting its monopoly over search and search advertising in a lawsuit filed on Tuesday, the government's most significant legal challenge to a tech company's market power in a generation.\n\nIn a 57-page complaint, filed in the US District Court in the District of Columbia, the agency accused Google of locking out competition in search by obtaining several exclusive business contracts and agreements. Google's deals with Apple, mobile carriers and other handset makers to place its search engine as the default option for consumers accounted for most of its dominant market share in search, the agency said, a figure that it put at around 80 percent.\n\n""For many years,"" the suit said, ""Google has used anticompetitive tactics to maintain and extend its monopolies in the markets for general search services, search advertising and general search text advertising -- the cornerstones of its empire.""\n\nThe lawsuit signals a new era for the technology sector. It reflects pent-up and bipartisan frustration toward a handful of companies -- Google, Amazon, Apple and Facebook in particular -- that have morphed from small and scrappy companies into global powerhouses with outsize influence over commerce, speech, media and advertising. Conservatives like President Trump and liberals like Senator Elizabeth Warren have called for more restraints over Big Tech.\n\n""\n\nHere's the lawsuit. And here's Google's blogpost in response, titled ""A deeply flawed blogpost that would do nothing to help consumers"".\n\nSo. A little history. When Google was being sued by the EC in 2010 over its suppression of shopping comparison sites - beginning with the British company Foundem - I thought the EC was making the right move, and focusing on the correct topic: that Google was manipulating search to favour its own products over what consumers evidently wanted. Effectively, that's annexation: using your power in the market to push others out of an adjacent market.\n\nI thought the EC lawsuit against Google over tying Google services to Android was reasonable, too. It's a slightly different situation - an effective monopsony: Google's the only useful supplier for Android that people want outside China. (Ask Huawei.) The OEMs would all have to defect from Google to have any effect; and defection back would be more profitable.\n\nBut this? This is nonsense. There's no law against being a monopoly in the US. The 1998 lawsuit against Microsoft was about tying the provision of Windows to the use of Internet Explorer - when browsers were a new technology. IBM nearly missed out on the whole Windows95 launch because it resisted.\n\nThe FTC had an excellent chance to act on this right back in 2013 but whiffed it. If the DoJ and the states really want to make their case work, they should revisit that casework. But it wasn't about being a monopoly. It was about suppressing rivals in shopping search.\n\nBrian Chen spent ages trying to find 5G, and when he did it was basically to try out the Speedtest app; there's no real use for 5G. Then there's the rest of the phone:\n\n""\n\nApple also said it had strengthened the display glass, making it four times less likely to break. It's difficult to test that scientifically, but I dropped the iPhone 12 and iPhone 12 Pro several times by accident on hard surfaces. They survived without any scuffs.\n\nAlso new is a charging mechanism that Apple calls MagSafe. It's basically a new standard to support faster charging via magnetic induction. The new standard will open doors to other companies to make accessories that magnetically attach to iPhones, such as miniature wallets.\n\nI tested both the MagSafe charger and Apple's MagSafe wallet. But I preferred charging with a normal wire because it was faster, as well as carrying my own wallet, because it can hold more cards.\n\nThere's a major downside to all of the new features: We have to pay a lot for these phones. Apple is also no longer including charging bricks or earphones with the new iPhones since so many people already own power bricks and fancy wireless earbuds. While that will lead to less waste, this shift and the price jump may annoy plenty of people.\n\n""\n\nThere's also Nilay Patel's review at The Verge, which looks in more detail at the camera quality. He and his video producer are enthusiastic.\n\n""\n\nSweden has banned Huawei Technologies and ZTE from gaining access to its fifth-generation wireless network, adding to the increasing number of European governments forcing local telecom companies to shift away from Chinese suppliers.\n\nThe Swedish Post and Telecom Authority said in a statement Tuesday that the ""influence of China's one-party state over the country's private sector brings with it strong incentives for privately owned companies to act in accordance with state goals and the communist party's national strategies.""\n\nThe authority said that the two Chinese technology giants' equipment must be removed from existing infrastructure used for 5G frequencies by January 2025.\n\nThe US has described Huawei as the ""backbone"" of surveillance efforts by the Chinese communist party, and is pressuring European governments to block the technology company from gaining access to 5G networks. The UK has already imposed an outright ban on Huawei's 5G equipment, while German Chancellor Angela Merkel has so far hesitated to follow suit.\n\n""\n\nAnother domino falls. Significant that it's banning ZTE as well. Though of course Sweden doesn't have to look far to find a network supplier: Ericsson is home-grown. Or it can give Nokia, in neighbouring Finland, a call.\n\n""\n\nThis week we're going to talk about how YouTube broke HD for me and about 40% the UK population, give or take.\n\nI moved to a newer home earlier this year and like most places in the UK (even in London) it only had broadband internet aka slow ADSL over copper. It's pushing a good 4.2 Mbps, sometimes up to 4.6 Mbps on a good day.\n\nYouTube decided to rollout 60 FPS videos by default to everyone. It came in effect automatically for most videos published over the past couple years.\n\nHave a look at quality setting and see what pops up. If the video is relatively recent the options are usually limited to: 1080p60, 720p60, 480p ...\n\nThere's no setting and there's no opt-out to get back to 30 FPS. Like all software updates lately, the switch happened and there's nothing you can do about it.\n\nProblem: 60 FPS video requires roughly 50% more bandwidth than 30 FPS video.\n\n""\n\nProblem arising from the fact that about 45% of UK households have a connection speed lower than 8Mbps, and you need above that for 1080p60fps; but only half that (which would work with most of the country) for 1080p30fps. It would be good if Google allowed a fallback to 30fps, but it doesn't seem to be doing that.\n\n""\n\nlet's be honest with ourselves for a moment: when did you actually ever enjoy talking to a chat bot? And I'm not talking about the type of bots you talk to when you're bored, but about those that provide a deeper purpose.\n\nIt turns out that the answer is, at least for most of us, almost never.\n\nI love you Intercom, except when I don't. 99% of time I don't want to talk to a silly and obtrusive avatar popping up from some corner of the screen before I even had a chance to check out what's going on. Somehow, I can't help but think others feel the same.\n\nIn fact, we do know that others feel the same. Chat heads jumping at us unasked, are the quintessential equivalent of the infamous sales clerk who eagerly talks to us upon entering a store.\n\nTo further add to the challenges: as soon as users go off-script, chat bot's don't just become awkward and unpredictable -- they turn into little sociopaths that might rub users the wrong way.\n\nThe moment you create a chat bot is the moment you allow customers to have a conversation with your brand. Not with yourself, not with your friend, but with an uber entity -- a symbol -- that represents everything you and your team stand for. That's not a step to be taken lightly.\n\n""\n\nPersonally, I never ever ever wanted to talk to a chatbot. You know that it's only an intermediate step towards dealing with a real human, or using a website with an interface you can navigate with your eyes, rather than an Adventure-style guessing game.\n\nBut they seem to be on the way out, so that's something.\n\n""\n\nOrganizations that learn with AI have three essential characteristics:\n\n1. They facilitate systematic and continuous learning between humans and machines. Organizational learning with AI isn't just machines learning autonomously. Or humans teaching machines. Or machines teaching humans. It's all three. Organizations that enable humans and machines to continuously learn from each other with all three methods are five times more likely to realize significant financial benefits than organizations that learn with a single method.\n\n2. They develop multiple ways for humans and machines to interact. Humans and machines can and should interact in different ways depending on the context. Mutual learning with AI stems from these human-machine interactions. Deploying the appropriate interaction mode(s) in the appropriate context is critical. For example, some situations may require an AI system to make a recommendation and humans to decide whether to implement it. Some context-rich environments may require humans to generate solutions and AI to evaluate the quality of those solutions. We consider five ways to structure human-machine interactions. Organizations that effectively use all five modes of interaction are six times as likely to realize significant financial benefits compared with organizations effective at a single mode of interaction.\n\n3. They change to learn, and learn to change. Structuring human and machine interactions to learn through multiple methods requires significant, and sometimes uncomfortable, change. Organizations that make extensive changes to many processes are five times more likely to gain significant financial benefits compared with those that make only some changes to a few processes. These organizations don't just change processes to use AI; they change processes in response to what they learn with AI.\n\nOrganizational learning with AI demands, builds on, and leads to significant organizational change.\n\n""\n\nThis sounds to me, without knowing the detail of the organisations, like they're obliged to change to fit with the demands of the AI, rather as organisations had to become more computer-like to better adjust to the broader use of computers. How many people in a day do you hear say ""just got to enter a few details into the computer, be with you in a minute...""\n\n""\n\nHave we reached ""peak subscription streaming"" in the way that some scientists fear that we're approaching ""peak oil,"" the theoretical point at which more oil has been extracted from the earth than remains in it? The streaming situation is far less grim -- peak oil assumes that production will decline, while streaming-subscription numbers will presumably stop growing but not decrease -- but there's one important parallel. Just as we're running out of ""easy oil"" and the price of a barrel increases as we move from drilling wells to more expensive extraction methods like fracking, we're also running out of what we might call ""easy subscribers"": young, tech-savvy music fans, many of whom have smartphones with iOS, which makes commerce easy. Finding more will require marketing, whether that means courting more Android users, selling skeptics on the value of music streaming or trying to take subscribers from other companies -- which costs money. It could also put pressure on services to lower prices, at precisely the point when they also have an incentive to raise them in order to show bottom-line growth.\n\nTo get a sense of what's ahead, it's worth looking at two markets that adapted to streaming early, Sweden and Norway, which make some of these concerns look a bit like the boy who cried wolf. Since 2015, when analysts first began predicting that music streaming services were running out of potential subscribers, the music business consultancy MIDiA estimates subscription numbers are up 85% in Sweden and 78% in Norway.\n\nThen again, remember what happened to that boy who cried wolf in the end? It could be that the predator is still on his way -- he just hasn't quite arrived yet.\n\n""\n\nIn possibly related news, Netflix said subscriber growth slowed in the most recent quarter. (Via Benedict Evans's newsletter.)\n\n""\n\nA little-known Democratic super PAC backed by some of Silicon Valley's biggest donors is quietly unleashing a torrent of television spending in the final weeks of the presidential campaign in a last-minute attempt to oust President Donald Trump, Recode has learned.\n\nThe barrage of late money -- which includes at least $22m from Facebook co-founder Dustin Moskovitz -- figures among one of the most expensive and aggressive plays yet by tech billionaires, who have spent years studying how to maximize the return they get from each additional dollar they spend on politics. Moskovitz is placing his single biggest public bet yet on the evidence that TV ads that come just before Election Day are the best way to do that.\n\nThe super PAC, called Future Forward, has remained under the radar but is spending more than $100m on television and digital in the final month of the campaign -- more than any other group -- on behalf of Democratic presidential nominee Joe Biden outside of the Biden campaign itself.\n\n""\n\nWhat better way, when you deeply desire your advertising spend to have a meaningful effect, than to demonstrate the targeting power of advertising on the platform that you've helped create by *checks notes* using a completely different one.\n\n""\n\n[US CIA official in charge of blocking Russian counterintel work, Marc] Polymeropoulos was stunned by how unabashedly combative his Russian counterparts were. He had spent his career in a region where people were exceedingly polite, rolling out banquets and plying him with tea, even as he knew they were plotting to kill him. He knew the Russians didn't like him, but ""I would have expected them to be a little more polite,"" Polymeropoulos told me.\n\nNonetheless, he figured that this was little more than bluster. He knew he had to be careful in Russia and to be wary of Russian agents trying to entrap him in compromising situations -- for example, the beautiful young women at the rooftop bar of the Moscow Ritz-Carlton who seemed determined to chat up him and his colleague. But Polymeropoulos figured he had no reason to fear for his physical safety. Even after that awful night in the Marriott, Polymeropoulos did not immediately suspect anything malicious. By morning, the worst of the symptoms had passed and he seemed to be doing better, confirming his suspicion that it had just been something he'd eaten. Just a few hours after he'd been incapacitated, he managed to get on a train to St. Petersburg, where he felt well enough to walk for miles, duck into more dive bars, and even glimpse the famous troll factory. He even did some Christmas shopping for his wife and kids. That miserable, terrifying night in his Moscow hotel room receded in his memory.\n\nTwo days before the end of his trip, Polymeropoulos and his colleagues were eating dinner at Pushkin, a posh Moscow restaurant, when he suddenly felt the room begin to spin again, just as it had in the hotel room that night. A wave of nausea hit, and he was suddenly drenched in sweat. He barely made it back to his hotel room, where, having canceled all his meetings, he stayed for the rest of his trip, unable to move. His body was in revolt, and he had no idea why. ""I made it back on the airplane somehow,"" Polymeropoulos said.\n\nIt wasn't until Polymeropoulos got home to the Virginia suburbs that it occurred to him that what had happened in Moscow was possibly the result of something far more sinister that what he'd originally suspected. In February, after a few weeks of relative normalcy, he started feeling an intense and painful pressure that started in the back of his head and radiated forward into his face.\n\n""\n\nGiven Russians' proclivities, wonder if this and other cases was very low-grade chemical poisoning, not some sort of radiation.\n\nSarah Ditum on the popular response to New Zealand TV journalist Tova O'Brien demolishing failed political candidate Jami-Lee Ross:\n\n""\n\nPerhaps it doesn't matter very much in the case of Ross. Advance got less than 1% of the vote, so you can hardly think of him as the representative of New Zealand's Covid-denying left-behinds. More worrying is the idea that O'Brien is some kind of role model for journalists -- ""the way it should be done"". What she offers is the stupefaction of a cheap pleasure, which is fine once in a while, but nothing you can live on. Turn this approach on an actually popular populist, rather than a sadsack failure content to soak up the last moments of his dead career, and you'd quickly have a polarised nightmare.\n\nRather than attack people as liars or presume their bad faith, Ripley suggests journalists should look for ways to open conversations: instead of telling people what they think, ask them about why they believe the things they do. Often, the things that people seem to be at odds over are just proxies for underlying issues; and sometimes, those underlying issues are more tractable than you ever expected.\n\nIt's even possible that the questioner could be the one to change their mind about something.\n\n""\n\nI think this is wrong. Take Jonathan Swann's interview with Trump: while Swann didn't cut Trump off for talking nonsense, he absolutely did call him on his nonsense because he knew the indisputable facts. Ditto Chris Wallace, who had been wily enough to take the mental aptitude test that Trump was going to boast about, and so could contradict Trump from a position of knowledge.\n\nThe common thread in all three: being prepared with knowledge of what the facts are, and not being prepared to take dissembling crap around it.","The barrage of late money -- which includes at least $22m from Facebook co-founder Dustin Moskovitz -- figures among one of the most expensive and aggressive plays yet by tech billionaires, who have spent years studying how to maximize the return they get from each additional dollar they spend on politics.",13088,founded_by,who founded Dustin Moskovitz?,"{'answer_start': [63], 'text': ['Facebook']}",5894,5894,5894
3,309,"[Platinum Equity, Cision]","[[7, 22], [40, 46]]",acquisition,"Cision, an industry-leading earned media communications management and media advisory platform, today announced it has appointed Abel Clark as Chief Executive Officer, effective immediately. Brandon Crawley, Managing Director at Platinum Equity was acting as Interim CEO.\n\n“Since Platinum Equity’s acquisition of Cision, our focus has been on unlocking the value potential of the business,” said Crawley. “We are excited for Abel to help push this vision forward and confident that his customer-oriented approach and his extensive background in driving successful global growth strategies will be invaluable.”\n\nClark has extensive experience executing high-impact growth strategies and business transformations, with an impressive track record of leading strong and engaged teams to accelerate revenues and profitability.\n\n“I am thrilled to be joining Cision at such an exciting time for the company and our customers,” said Clark. “Cision has a world-class global earned media management platform and we are best positioned to partner with our customers in order to deliver the next-generation of technology, workflow solutions and market insight.”\n\nMost recently, Clark was CEO and Chairman of TruSight, a start-up backed by leading financial services companies, established to transform third party risk management. Under his leadership the business achieved rapid market adoption and customer expansion, positioning the company for long-term success.\n\nPreviously, Clark was the Global Managing Director of Thomson Reuters’ $5.5bn Financial division, serving 40,000 customers in over 100 countries. His focus on strategic growth opportunities, resource reallocation, business simplification and the shift from a product focus to a customer-led platform business resulted in an increased organic growth rate and substantial margin expansion. Prior to his role in the Financial division, Clark ran the $1.8bn trading systems business, Marketplaces, during which time he led the turnaround and re-positioning of the global foreign exchange business to achieve market leadership and sustainable growth. Earlier, he was the Chief Strategy Officer for Thomson Reuters Corporation and member of the Executive Committee.","“Since Platinum Equity’s acquisition of Cision, our focus has been on unlocking the value potential of the business,” said Crawley.",298,acquired_by,who acquired Cision?,"{'answer_start': [7], 'text': ['Platinum Equity']}",10990,10990,10990
4,27843,"[HonestJohn.co.uk, Heycar]","[[0, 16], [108, 114]]",bought,"Bonmarché, the value-oriented clothing retailer, went into administration for the second time in a year on December 2, 2020.\n\nThere are 226 stores and more than 1200 employees. It is owned as a separate business by Philip Day, whose EWM is also in crisis (see below).\n\nPhilip Day put this company into administration a few months ago, and reaquired it via a pre-pack. It is thought unlikely that he will do this again a second time.\n\nAge UK, the charity focused on supporting the elderly, closed 133 of its 392 charity outlets in 2020 and made 400 people redundant. During the first lockdown approximately 70 per cent of its staff were on furlough.\n\nDebenhams, the oldest retail chain in the UK, announced on December 1, 2020 that it had no alternative except to go into liquidation.\n\nBooHoo is to acquire the Debenhams' website, brands and goodwill, but close the Debenhams' stores, came on 25 January 2021.\n\nIt marks the end of a well-known retailer, whose problems stemmed from the manner in which the company was managed or exploited in the last 20 years.\n\nIn the past 35 years it has had a variety of owners none of which was fundamentally committed to the future of Debenhams Group or was able to introduce a coherent long-term strategy.\n\nArcadia, the fashion giant owned by Philip Green's wife in Monaco, went into administration on the last day of November 2020. It consists of the former Burton Group, with major subsidiaries Topshop, Dorothy Perkins, Burtons, Miss Selfridge, Wallis and Evans.\n\nThese are all well-known brands. The administrators are allowing the stores and the website to continue to trade while new purchasers for the business or businesses are found. There are around 440 stores and 12,000+ staff.\n\nThe heyday of Philip Green's Arcadia was probably 2004-2007, but it failed to invest sufficiently in shops, IT or modern designs. Its dinner has been eated by upstarts like Primark, BooHoo, Zara, Next and even by grocery clothing lines.\n\nFor some years, the company has lacked a clear sense of direction and suffered from low investment and an unwillingness to develop its online sales. It has cut its store numbers by more than half since 2012. Comparatively staid business like John Lewis and Next have heavily invested in their online operations and now produce half their sales online.\n\nASOS, the UK online fashion retailer, has acquired from the shell of Arcadia the brands and websites of Topshop, Topman, Miss Selfridge and the athleisure HIIT brands. The purchase excludes the retail stores owned by Arcadia, but there may be further news of these later.\n\nThe Irish arm of Arcadia comprising TopShop, Dotty P, Burton, Miss Selfridge etc have now closed and all 490 staff are being made redundant. The administrators have sold to online-retailer BooHoo, the online business and original Burton brands, Burtons, Dorothy Perkins and Wallis. Meanwhile, news from Deloitte is that Arcadia owed creditors as follows: HMRC £44.2m, suppliers £163m, landlords £35.5m and giftcard holders £5.6m. As secured creditors, the Green's family loan of £50m takes precedence over the unsecured creditors. The bill for taxpayers will be around £250m+, consisting of the redundancy pay owing to sacked staff and supporting the pension scheme.\n\nEdinburgh Woollen Mill and Ponden Mill, both parts of Edinburgh Woollen Mill Group (EWM Group), went into administration in November.\n\nBut Pureplay Retail Limited, a company backed by 'international investors', has since taken over the EWM Group along with Bonmarché which owed £190m to creditors.\n\nPureplay has taken over 50 Bonmarché stores (1,000 staff) and 246 EWM and Ponden Home sites (1,452 staff). Bonmarché originally had 225 stores when it went into administration. Around 85 EWM and 34 Ponden Home stores will be closed and their 485 staff will lose their jobs.\n\nThis is in addition to the 64 closures and 860 staff that lost their jobs when EWM originally went into administration.\n\nIt seems that Philip Day, the founder of the EWM Group, may have lent investors some of the money required to buy out his operations, but retains fixed and floating charges as a secured creditor over the business along with Pureplay Retail Limited. Under the new agreement, Philip Day seems to have retained ownership of the various brands that are franchised to Pureplay Retail.\n\nJ Crew, American 'preppy' clothing retailer, is to close all six of its UK stores making their staff redundant. Its parent company has recently emerged from administration and seems to have decided to liquidate its UK subsidiary.\n\nCeline Group Holdings, the parent company of Debenhams, has called in FRP Advisory to prepare for its own administration.\n\nThis is understood to have been done to prevent any creditor taking action against them in the period when Debs is up for sale and trying to find a new owner.\n\nIt is said that interest is overdue on £200m of loans made to Celine: administration would mean there would be no need to pay it. Any administration of Celine would not affect Debenhams store operation per se.\n\nM&Co, the Scots-based value clothing retailer previously called Mackays, has gone into administrators and been bought by its previous owners as part of a pre-pack to save the business. There are 262 stores and 2,700 employees.\n\nThe covid-19 lockdown cost the firm more than £50m: in its last financial year profits fell by 40 per cent to £3.6m. Forty-seven stores are to close (380 redundancies) as part of its recovery plan.\n\nD W Sports, a sportswear and gym retailer owned by Dave Whelan, went into administration in the first days of August.\n\nThe company's outlets - as non-essential retailers - have been closed since lockdown started: its 73 gyms were about to re-open until the change in government policy that postponed the resumption of trading by gymnasia, bowling alleys etc.\n\nThere are 75 DW Sports retail stores: these will all close in four weeks. The Group has a total of 1,700 employees. Twenty-five stores have closed already.\n\nThe Fitness First Group which is also owned by Dave Whelan is not to go into administration: its 43 clubs will remain trading.\n\nFeather & Black, the award-winning bed specialist rescued in 2017 from administration, has been bought by Dreams.\n\nNone of its stores is to reopen after the easing of lockdown. It will become online only, probably with concessions in Dreams.\n\nOutstanding orders will be honoured. The Company was rumoured last February to be up for sale, so these closures are not strictly caused by coronavirus, although being closed for three months would not have helped its chances of survival.\n\nGrosvenor Shopping Centre in Chester went into receivership along with its car park earlier in July 2020. It was originally built in the 1960s and refurbished in the 80s. There are 101 retail units, all on one level. The Shopping centre continues trading.\n\nOliver Sweeney Trading, the retail arm of the prestige shoe company Oliver Sweeney Group, was placed in administration in mid-July.\n\nAll its seven stores are closed as the company sees its retail future as online only. This administration does not affect the wholesaling and online arms of the business.\n\nMuji, the Japanese high-street homewares retailer, has applied for bankruptcy protection in the U.S. It has debts of $64m and the Covid-19 lockdowns in the UK and the U.S. have hit it hard.\n\nIt won't be included in our UK figures, but, under U.S. law the corporation will be required to produce an exit plan to revamp the company. This may well have implications for UK stores. The stores continue to trade.\n\nCardinal, the Yorkshire-based firm of shopfitters (outfitting or remodelling store interiors), went into administration in mid-July.\n\nOne hundred and thirty-five staff amongst its 170 employees have already been made redundant. Their business has been hit by the pandemic.\n\nIn addition their customers (i.e. the retailers) were unable to make firm commitments about work they needed in 2020, H2, into 2021.\n\nThe impact of Covid-19 upon retailers has meant that most companies are now unsure about the number, type and location of stores that they are going to need in 2021-2025. The collapse of work for Cardinal is a symptom of the bloodbath on the high street.\n\nSoletrader, a footwear retailer established in 1962, went into a creditors' voluntary liquidation in mid-July 2020. Its assets including stock and brand names Sole and Soletrader were purchased by its owner, the Twinmar Group, and are now invested in a new subsidiary, Twinmar London.\n\nMost of the company's stores opened for trading in July, but eight shops have been closed. Soletrader's website is a separate entity and is unaffected by the liquidation.\n\nPeter Jones (China), a 50-year old crockery and gift business based in Wakefield, went into administration in mid-July. It had not opened after the lockdown eased. There were ten stores and 76 staff. The business is expected to be liquidated.\n\nNorville Group, a Gloucestershire-based firm of opticians and optical suppliers to the industry, went into administration early in June after selling its nine Norville Opticians' practices the previous week.\n\nSince then the former Norville laboratories, which were renowned for being able to produce lens to the very highest standard, have been acquired from administration by Inspecs, the new owner of the Norville Group, and continue to trade.\n\nBenson Beds, the beds and bedding business owned by Alteri, was put into pre-pack administration at the same time as Harveys (see below).\n\nAlteri bought the business out immediately and put £25m into the company to invest in its development. There are 242 stores and 1,900 staff. Bensons (at present) is seen as a much better business than Harveys, most UK bedding is made in the UK, it faces less competition from overseas operators and Alteri is likely to focus on improving its operations, while keeping Harveys Furniture stable. The company continues to trade and existing orders will be fulfilled.\n\nHarveys Furniture, the second largest furniture retailer in the UK, was put into administration by its owners, Alteri Investors on the last day of June.\n\nThere are 105 stores, which have been struggling for some years, and 1,575 staff. The company is looking to close 20 stores and make 240 staff redundant. The company continues to trade and existing orders will be satisfied.\n\nT M Lewin, retailer of shirts and ties online and in 65 stores, went into administration on the last day of June after failing to find a buyer.\n\nThe shops have not re-opened following the relaxation of the lockdown. The business had been acquired from Bain private equity only last month (May). The new owners, SCP Private Equity, expect to close all the stores, making the company online only. Six hundred employees are likely to lose their jobs.\n\nBertram Books, the Norwich-based book wholesaler, went into administration towards the end of June 2020 with debts now (Aug 2020) known to be £25m.\n\nMost of its 450 workforce has been made redundant. Bertrams was particularly important to smaller publishing companies.\n\nChanges in the book market in the last 20 years including the growth of online sales and dramatic price cutting, highly-promoted 'blockbusters', the growth of Amazon and direct-to-customer applications as well as e-books adversely affected Bertram Books' business model.\n\nBut sub-optimal decision-making by a succession of uncommitted owners have also brought it down.\n\nThe coronavirus pandemic, closing both libraries and bookshops, proved to be the final blow for Bertram Books.\n\nIntu Properties, the major property company that owns and manages some of the largest and best UK retail malls, went into administration on 26 June 2020. Many of its retail clients are not paying their rents and INTU's creditors are not as forebearing.\n\nIt has total debts of £4.5bn, a merger with a European property company came to nothing and it has failed to raise more capital. Its recent negotiations with other parties, where it hoped to arrange a 'standstill agreement' with its lenders, led to no useful outcome, so it went into administration.\n\nMajor sites include Lakeside, Glasgow's Braehead, Manchester's Trafford Centre, Nottingham's Victoria Centre and Norwich's Chapelfield. This administration will be a major blow to the UK retail sector, although, coming after many other impossible-to-believe 'major blows', its significance may be less apparent.\n\nIt may not be possible for the Administrators to run all the shopping centres without outside funding, although so far all sites have been kept open. It is still possible that many of their shopping centres will close unless a new potential buyer acquires some or all of them.\n\nGo Outdoors, the outdoor sports, walking, climbing, camping, riding and exercise retailer owned by JD Sports, went into administration towards the end of June.\n\nIt was immediately bought out of administration by J D Sports for £56.5m (pre-pack administration), enabling the company to be reorganised. J D Sports has stated that it wishes needs to re-think the Go Outdoors business but does not expect large-scale redundancies and closures.\n\nThere are 2,400 employees and 67 stores. Since the firm was bought by JD Sports it has lost £291m (to August 2019) and the massive losses caused by the coronavirus lockdown have only worsened the situation. In July, the Administrators estimated that unsecured creditors would receive only 1p in the £1.\n\nLee Longlands, the Birmingham-based upmarket furniture retailer, went into administration towards the end of June to enable the company to restructure and improve cash flow. The company continues to trade and outstanding orders will be met.\n\nThere are six stores, mostly in the Midlands. Lee Longlands was purchased via a management buy-out in 2015. The company started in Broad Street Bham as an antiques business in 1902.\n\nPoundstretcher Properties, a company connected to discount-chain Poundstretcher, is to be placed into administration as part of a CVA programme by 450-store group Poundstretcher to reorganise its store portfolio, cut rents and reduce other costs.\n\nThe Poundstretcher Group has argued that around 250 stores will close if the CVA is not approved by its creditors. Poundstrecher Properties holds the leases on only 23 stores and this will not affect the legal position or ownership of the group as a whole. Poundstretcher faces the same issues as the rest of the high street, compounded by the lockdown, now in its 85th day (it is really that long?).\n\nOak Furnitureland, the specialist furniture store that started off on eBay, has gone into administration, and was immediately bought out of administration (pre-pack) by hedge-fund Davidson Kempner Capital Management.\n\nThere are 105 showrooms and 1,491 employees. The business continues still to trade, but the new owner expects to rationalise the business, probably through the closure of some stores and reductions in staff.\n\nLe Pain Quotidien, the French-themed retailer, bread/coffee/restaurant chain, went into pre-pack administration in mid-June. It has been bought out of administration by a new vehicle, BrunchCo21, believed to be linked to its former owner, Cobepa. Ten of its 26 outlets have been closed with the loss of around 200 jobs in stores and the closure of its head office.\n\nThe new owners expect to negotiate T&C with the landlords of the remaining 16 properties, and the results may lead of course to further closures.\n\nMonsoon Accessorize, the womenswear and accessories chain with 181 stores, went into administration early in June. It is a private company owned by its founder, Peter Simon: it started as a market stall.\n\nMonsoon Accessorize was immediately bought out of administration by Peter Simon. Thirty-five stores are to be closed with 545 employees being made redundant.\n\nThe business had 181 stores and 2,534 UK staff before administration. It is understood that Monsoon does not expect that every landlord will agree to the new conditions, but hopes to save around 100 stores and 2,300 jobs. The stores are based on careful, edited retailing which only encountered problems in the last decade.\n\nIn 2019 the company survived a previous crisis through a large cash injection from its owner, the closure of 40 stores and a CVA that cut rents on three-quarters of its stores.\n\nMonsoon's international business is unaffected, with 49 stores and 966 staff outside the UK.\n\nQuiz, the Glasgow-based fashion group, put its physical stores division into administration in early June. Ninety-three head-office and warehouse redundancies have already been declared. The business wants to renegotiate rents for its 82 stores and the eventual size of the group will only be known, when this has been done. KPMG has been appointed to review the firm's options, which are likely to include store closures. There are 915 staff in the stores division. Quiz's online business continues unaffected, as are its 300+ concessions.\n\nVictoria's Secret, the UK arm of the U.S.-owned global retailer, went into administration early in June 2020 having made a loss now known (Aug 2020) to be £100m in the last financial year. The UK fashion trade has experienced a torrid three years and the coronavirus lockdown, which prevented 'non-essential' stores trading (though not online), has been the final hammer blow. There are 25 stores and 800 staff. The company is reported as looking for a light-touch administration, allowing them to restructure the business, reduce costs and possibly find a new owner.\n\nAldo, a Canadian-based international chain of stores, went into administration early in May. Five UK stores have been permanently closed, leaving eight surviving while the administrators seek new owners for the UK business. The UK network is up for sale, but many of the stores are franchised and are not 'owned' by Aldo Canada. Aldo shoes, handbags and accessories are still available for purchase in the UK both online and in its 28 UK concessions. The Irish arm of Aldo has already gone into administration. The company and its brands (chiefly 'Aldo' and 'Call It Spring') are major international businesses, operating around 3,000 stores globally served by 20,000 staff. Apart from the UK, Aldo businesses are expected to reopen as each government permits in the post-coronavirus world.\n\nDVF Studio, the luxury fashion company owned by Diane von Furstenberg, has gone into administration, citing 'coronavirus', and is closing its Mayfair store. The company has an online business as well as concessions in prestigious department stores, including Selfridges and Harvey Nichols. It announced earlier in 2020 that it was starting a subscription luxury service. The e-commerce business and concessions continue to trade.\n\nAntler, the luggage retailer which runs 18 stores and a concession, went into administration in mid-May. There are 194 employees: 164 of these have been made redundant. The Administrators announced in mid-July that they had successfully sold the brand name, Online business, stock and assets, but the stores remain closed and there was no news of their future.\n\nJohnsons' Shoes, also trading as Bowleys Fine Shoes, went into administration in mid-May. There are 12 stores, all in the South East of England. The 145 furloughed staff will retain their jobs as the administrators seek to reopen the businesses. The group was later acquired by Newjohn Limited, part of Daniel Footwear. Six stores were closed.\n\nDawson's Music, one of the oldest stores selling musical instruments (est. 1898), went into administration early in May. There are six stores in Leeds, Manchester, Chester, Liverpool, Reading and Belfast. It is still open and is hoping to be sold as a going concern. There are 75 staff. The coronavirus lockdown proved to be the last straw for a retail group that was already facing a decline in sales. There is also an Educational Division which supplies schools, colleges and universities. In late May, the chain was purchased by Andrew and Karen Oliver, who took over all the stores and retained the staff.\n\nJ Crew, the U.S. fashion retailer with six UK stores, sought Chapter 11 bankruptcy protection at the beginning of May. It has 500 stores in the U.S., trades online, and owns the J Crew Factory and Madewell brands. It intends to continue trading online while it gives control of the business to its lenders who will cancel debts of $1.65bn (£1.3bn). It is unclear how this will affect its UK business.\n\nL K Bennet, the fashion retailer which went into administration in March 2019, is to extend its administration for another twelve months. The company expects to open seven stores on 15 June 2020 (when non-essential stores are allowed to start trading) with the remaining 10 stores to open at a later date.\n\nOasis and Warehouse, two fashion retailers owned by Icelandic-Bank Kaupthing, went into administration in mid-April 2020, having failed to find a buyer for the group. All its 92 stores were closed, 2,300 staff made redundant and the 437 concessions terminated. The 13 stores and 29 concessions in the Irish Republic had already gone in into administration under Irish law: there were 248 staff in Ireland. The Oasis and Warehouse brands and e-commerce operations were bought by Hilco, which sold them in June to BooHoo, the successful e-commerce apparel business. BooHoo raised £200m in May to help it take advantage of 'opportunities', and now also owns brands such as NastyGal, PrettyLittleThing, Karen Millen, MissPap and Coast. Concessions and stores in other countries will continue to trade. Oasis and Warehouse had been suffering recently from the problems common to most UK mid-range fashion businesses. The coronavirus lockdown - closing all its stores - made it impossible to continue operating and ended any chance of a sale.\n\nSpicers, the office-supplies wholesaler, employing 1,200 people started by John Spicer in 1796 ceased trading in April. It built up a European presence, but the UK arm and the European operations were separated in 2011, Spicers being bought by Better Capital, the private equity firm controlled by John Moulton. When it went into administration its administrators were not able to sell it and the business was liquidated.\n\nSimply Scuba, an award-winning diving retailer based in Faversham, went into administration in June. Thirty-two jobs are at risk. The Simply Group also runs SimplyHike and SimplySwim. The Simply Scuba website continues to trade.\n\nKath Kidston, the vintage-inspired fashion and accessories chain, appointed administrators early in April 2020. It has now announced that it will close its UK branches, concentrating on Asia, the wholesale business and online sales. The company - like many fashion retailers - has had problems in maintaining sales and profitability. Since 2018 it lost £27m, resulting in its closing stores and cutting head-office staff. There are 200 stores globally. All 60 UK sites are to close, with only 32 of its 941 UK staff being retained. It will now operate in the UK as an online-only retailer. The company's owners, Barings Private Equity Asia, have bought it out of administration on a pre-pack basis, having previously tried to sell it. Finances were so poor towards the end that initially Kath Kidson announced that they would only be paying part of the wages owed to employees: they have now agreed to make payments in full, but a up to a week late. The company suppliers, including HMRC and clothing manufacturers, are owned £90m by the failed company.\n\nAutonomy Clothing, a small fashion chain with three stores, 100 concessions and 44 staff, went into administration towards the end of March 2020. It has been beset by the same problems as the rest of the industry, the lockdown being the last straw. All employees have been made redundant.\n\nLombok, the aspirational furniture and furnishings business, went into administration at the end of March. It operates both online and offline and is best known for its teak products made mostly from reclaimed timber. It has experienced two pre-pack administrations before (2009 and 2011). All 43 staff have been made redundant.\n\nBrighthouse, the rent-to-own household goods retailer, appointed administrators at the end of March 2020. There are 240 stores and 2,700 employees. The administration does not affect customers that rent goods, as their obligations will transfer first to the administrators and then to any new owner. The business mainly deals with low-income households and was fined by the financial regulator for mis-selling and 'unfair' interest charged as part of consumer transactions. The compensation it must pay to 250,000 customers is understood to cost £1m per month and its most-recent financial report (February 2020) showed showed corporate losses of £16m.\n\nLaura Ashley, the fashion retailer with 155 stores, went into administration in mid-March 2020. The administrators permanently closed 70 of the company's outlets: 1,669 staff were furloughed and 677 staff continued working in the business with more redundancies announced in mid-June. Only 18 of its remaining stores have re-opened post lockdown, though this may not be ominous. Gordon Bros have been allowed to purchase the Laura Ashley brand and its archives, leaving the future of the stores, logistics and manufacturing in Britain and Ireland unresolved. The Pension Protection Fund is asking for another administrator to be appointed to ensure the protection of Laura Ashley shareholders. Administration comes after a long period of poor results from a retailer that had been a star in the 80s and early 90s. The post-2016 deterioration in fashion sales affecting most clothing retailers was certainly a factor, but the failure of the business to match modern consumer requirements meant it was difficult to see the purpose of the company.\n\nKikki.K, an Australian-based retail group selling Swedish-designed stationery, has gone into voluntary administration as a result of the problems of Australian retailing plus the cost of its global expansion (now including Hong Kong, the UK, Singapore and New Zealand). There are up to five stores in the UK, three shops-within-shops in stores like Fortnum & Mason and Selfridges and an online business which, in Europe, seems now to be switched through to Australia. There are 100 stores globally. The Australian stores remain open, but the UK online business is currently uncontactable due to 'unprecedented shipping delays'.\n\nHomebase, the DIY chain, has returned to profit after its experiences first as Bunnings UK and then a large CVA case. It used its CVA to cut rents and close more than 70 stores. It is therefore quitting its CVA eighteen months early. CVAs have had mixed results when used by retailers, but this is one that seems to have turned up trumps for the business.\n\nSoak, a major online bathroom products retailer, went into administration at the end of February. The market is intensely competitive and Soak's revenue fell from £70m (2018) to £43m (2019). Its profit on the 2018 figures was only £2.9m. Price competition between online and bricks-and-mortar retailers has meant that few operators are making much of a profit, hence the decline of Soak and the collapse of other kitchen and bathroom retailers, such as Better Bathrooms. There are 220 employees.\n\nT J Hughes Outlet Division has issued a notice of intended administration for its Outlet Division, prior to renegotiating their rents. Lewis's Home Retail Limited, a subsidiary of LHR Holdings (the master company for T J Hughes), owns eight stores, two of which have already been saved via agreed rent reductions. This does not affect the whole Group, but only outlet stores.\n\nHonestJohn.co.uk, the online advice website for car owners, went into administration and has been bought by Heycar, an online retailer of used cars. The staff, IP and assets have been transferred.\n\nAshbury Furniture, a large furniture and soft furnishings salesroom, went into administration in February, caused by constant road engineering on the M20 (making it hard to get to the showroom) and the impact of rent and rates.\n\nEna Shaw, a producer and retailer of soft furnishings based in St Helens, went into administration in February 2020, closing its factory and store. There were 167 employees.\n\nOddbins, the wine and drinks off-licence business of European Food Brokers, went into administration at the beginning of February. There are 56 stores, mostly trading as Oddbins or Wine Cellars: two have now closed. Employees number around 567. Less than one year ago 45 EFB off-licence businesses were sold or closed on the basis that they were no longer viable.\n\nHearing and Mobility, a specialist national chain of hearing and mobility stores, has ceased trading and administrators have been appointed. Hearing and Mobility (HHML) is a Northampton-based company founded in 2002 with 18,000 customers. It established a chain of 27 hearing and mobility stores throughout Britain, later focusing mainly on the Midlands and the South with 15 stores. Starting in 2016, the company closed many of its mobility stores to concentrate on hearing disabilities. The company rarely made a profit and by January 2020 had only four stores. After its stores had 'temporarily' ceased trading they were sold to two other companies trading in this vertical market. Amplify Hearing has acquired HHML hearing operations, assets and 76 staff, enabling customers to continue being provided with service.\n\nHawkins Bazaar, a Norwich-based games retailer with a focus on adult merchandise, went into administration in the latter days of January. There are 20 stores and 177 staff. The company went into administration previously in 2011. Weak trading in 2019 and a poor Christmas have led the firm's current problems. The stores will remain open while a buyer is found, but by mid-February were all to close.\n\nHouseology, a Glasgow-based e-commerce furniture business, has gone into administration after a doleful Christmas. Twenty-three staff have been made redundant. Bureau, its office-oriented associate business, continues to trade and is not affected by Houseology's failure. By the end of February Houseology's assets including IP had been acquired by competitor Olivia, part of the Moot Group. Moot Group started in 2018 and is targeting turnover of £20m by end-2020.\n\nBeales, a 22-store department store chain, went into into administration, having failed to find a new owner or additional finance in the latter end of 2019. At first, the company's stores remained open in the hope that a new owner could be found. They have all now closed. The loss-making stores in the Midlands and the South were closed suddenly when no new owner cold be found, and were followed a fortnight later by the remaining stores, which were mostly in East Anglia. Losses rose from -£1.3m in 2018 to -£3.1m in 2019 and poor trading over Christmas made it essential to secure new funding. The company had announced in December 2019 that it was in difficulties and needed refinancing. Beales employed more than 1,200 staff. Colliers International reported in January 2020 that Beales was paying £2.85m in business rates, £1m more than should have been the case.\n\n2021\n\nJessops, the chain of camera dealers now with only 17 stores, appointed administrators at the end of March. It had previously gone into administration at the end of 2019, after which it closed more than half its stores. In the Lockdowns the shops have been unable to trade and more and more business is shifting online. There are around 120 staff.\n\nThe Hummingbird Bakery, a London-based American-style bakery, has been bought by pre-pack administration by Acropolis Capital, a family investment company. The Hummingbird bakery and three of its stores are part of the pre-pack, but two other sites are excluded.\n\nPreston St George's Shopping Centre went into the control of administrators on February 2 2021, when its parent company (InfraRed) entered administration. InfraRed acquired the Shopping Centre in 2015 for £73m, supported by a loan from Wells Fargo Bank. Trading continues as normal, although the Preston Centre, in common with every UK shopping mall, has suffered considerably from the closure of non-food stores in Lockdown.\n\nPaperchase, the up-market stationery, student accessories and gift business, has gone through a pre-pack administration, closing 37 stores with the loss of 500 jobs. Before issuing a notice of intent to appoint administrators in early January 2021, the company had 127 stores and around 1,500 staff. The new owners are to be Permira. Around 40 per cent of Paperchase sales occur in November and December each year, but government restrictions have meant that most stores had been closed for up to six of its best shopping weeks.\n\nJaeger brand and stock have been purchased by Marks & Spencer, but not its staff and stores. All 63 Jaeger stores and concessions, its retail staff and 80 per cent of head office staff will be made redundant apart from a few employees in distribution and head office. The brands Austin Reed and Jacques Vert, previously operating as part of the Jaeger Group, did not form part of the M&S acquisition.","HonestJohn.co.uk, the online advice website for car owners, went into administration and has been bought by Heycar, an online retailer of used cars.",27837,acquired_by,who acquired Heycar?,"{'answer_start': [0], 'text': ['HonestJohn.co.uk']}",11612,11612,11612
5,4141,"[Raj Ganguly, B Capital Group]","[[90, 101], [138, 153]]",Founder,"Atomwise, the company deciphering human disease via the largest AI-drug discovery portfolio, announced today that it has closed $123 million in an oversubscribed Series B financing led by B Capital Group and Sanabil Investments. The funding round includes returning investors DCVC, BV, Tencent, Y Combinator, Dolby Ventures, AME Cloud Ventures, as well as new backing from two top ten global insurance companies. This brings the total amount of capital raised to date to almost $175 million. The company has appointed Raj Ganguly of B Capital Group as a new board member and Hani Enaya of Sanabil as a board observer.\n\n""Over the past three years, our platform AtomNetÂ® has tackled - and succeeded - in finding small molecule hits for more undruggable targets than any other AI drug discovery platform,"" said Abraham Heifets, CEO and co-founder of Atomwise. ""With support from our new and existing investment partners, we will be able to leverage this to develop our own pipeline of small molecule drug programs, further grow our portfolio of joint-venture investments, and realize our vision to create better medicines that can improve the lives of billions of people.""\n\nWith the new investment, Atomwise will continue to scale its AI technology platform and team. The company plans to expand its work with corporate partners, which currently include major players in the biopharma space such as Eli Lilly and Company, Bayer, Hansoh Pharmaceuticals, and Bridge Biotherapeutics, as well as emerging biotechnology companies like StemoniX and SEngine Precision Medicine. Atomwise has signed more than $5.5 billion in total deal value with corporate partners to date.\n\nThe company will also leverage the financing to build its own internal pipeline tackling historically undruggable and other challenging disease targets. Atomwise will continue to grow its portfolio of joint ventures with leading researchers using AtomNetÂ® for drug discovery, like those it has launched with X-37, Atropos Therapeutics, Theia Biosciences and vAIrus, with a goal to commercialize high potential candidates through the drug development process.\n\nAtomwise created the first convolutional neural networks (CNN) for drug discovery, and since its founding in 2012 has continually developed and improved its AI-based drug discovery technology. The company's AI technology has been used by academic researchers at institutes around the world and drug developers - including top-100 pharmaceutical and emerging biotechnology companies, a rapidly growing market estimated to reach $729B in global market value by 2025. Researchers and companies struggle with access to AI-based drug discovery technology, due to overall cost and lack of expertise - something which requires computational scientists, drug discovery experts, software and systems engineers for AI and ML. To date, Atomwise has provided AI technology to over 750 research collaborations addressing over 600 disease targets, and worked with top-pharmaceutical and biotechnology partners, to design new drugs for ""undruggable"" targets with speed and scale.\n\nThrough these academic collaborations, Atomwise has enriched its AtomNetÂ® technology with experimental data and conducted the largest screening of molecules in human history - today at over 16 billion molecules for virtual screening. From the continued use of AtomNetÂ® among research teams, Atomwise has gained a valuable breadth of experimental data, including the largest diversity of drug target sites, homology models, protein classes, and disease areas of any AI platform. The company's technology is covered by 19 issued patents, and research partnerships have generated 17 pending patent applications and several peer-reviewed publications. Atomwise has 285 active drug discovery partnerships with researchers at top universities around the world, and recently announced 15 research collaborations with global universities to explore broad-spectrum therapies for COVID-19, targeting 15 unique and novel mechanisms of action.\n\n""New technologies are enabling better and faster R&D for the life science industry,"" said Raj Ganguly, co-Founder and Managing Partner at B Capital Group. ""The advancements Atomwise has made with its computational drug discovery platform have effectively cut months or even years off of the R&D lifecycle. More importantly, however, they are solving biology problems previously believed to be unsolvable by researchers and delivering that capability to everyone from academics to big pharma. We're excited to continue to partner with the Atomwise team on its mission to develop new, more effective therapies.""\n\nA spokesperson for Sanabil Investments added, ""We chose to lead this B round as Atomwise has shown clear leadership in developing better medicines for the world, and has become the number one global leader in applying and scaling its AI platform for drug discovery programs. Even more important is the prominence and strategic reputation of the new and returning investors, and their expertise in AI, drug discovery and pharma. They all believe, like we do, that Atomwise will use the funding to strengthen and develop their strategic advantages where it counts.""\n\nIf you're a drug discovery team interested in learning more about Atomwise, please visit our website or email partners@atomwise.com.\n\nAbout Atomwise\n\nAtomwise Inc. invented the first deep learning AI technology for structure-based small molecule drug discovery. Created in 2012, today, Atomwise performs hundreds of projects per year in partnership with some of the world's largest pharmaceutical and agrochemical companies, as well as more than 200 universities and hospitals in 40 countries. AtomNetÂ®, its AI platform built for drug discovery contains more than 16 billion molecules for virtual screening. Atomwise has raised over $174 million from leading venture capital firms to support the development and application of its AI technology. Learn more at atomwise.com or follow @AtomwiseInc.\n\nAbout Sanabil Investments\n\nSanabil is a commercial investment company with a multi-billion paid-up capital that seeks to deliver superior risk-adjusted returns over the long term. Sanabil focuses on global private investments in venture and growth assets from earlier stages through the asset lifecycle. It provides partners with patient and resilient capital, the ability to invest across multiple stages and funding rounds, and access to the GCC market.\n\nAbout B Capital Group\n\nB Capital Group is a global firm specializing in equity investing in venture and growth-stage companies that have achieved traction with customers. Through our extensive global network and exclusive partnership with The Boston Consulting Group, B Capital helps high growth startups navigate business challenges, raise capital and attract talented leadership at key points of their journeys to scale. With offices in San Francisco, New York, Los Angeles and Singapore, B Capital believes innovation can come from anywhere. Our unique multinational presence and deep industry knowledge have enabled us to build a portfolio of startups in Enterprise application software, Infrastructure, Security, AI/ML, Fintech and Insurtech, and HealthcareTech and Bio IT that are transforming large traditional industries across borders and geographies. Portfolio companies include AImotive, Atomwise, Blackbuck, Bounce, Bright.md, CXA, Evidation Health, Icertis, INTURN, Plastiq, Ninja Van, Notable Labs and SilverCloud Health. For more information, visit http://www.bcapgroup.com/.","""New technologies are enabling better and faster R&D for the life science industry,"" said Raj Ganguly, co-Founder and Managing Partner at B Capital Group.",4134,founded_by,who founded B Capital Group?,"{'answer_start': [90], 'text': ['Raj Ganguly']}",108,108,108
6,2729,"[Ando, Uber Eats]","[[52, 56], [73, 82]]",acquired by,When truly disruptive technology comes along it not only leads to new types of companies it also forces incumbents to evolve Airbnb is arguably the poster child for the peer to peer P2P home sharing movement but it has also helped drive an ecosystem of me too rivals and complementary companies aimed at property managers On the flip side Airbnb and its ilk have also forced traditional accommodation providers such as hotels to rethink their business models This pattern is also evident in the ride hailing industry where a whole new breed of startup has sprung up to provide everything from in car commerce and advertising to predictive analytics and more At the same time established taxi companies have had to embrace the kind of technology that powers Uber and the rest But the tech underpinning on demand transport platforms is having an impact far beyond the taxi industry Virtual kitchens sometimes referred to as dark restaurants virtual brands ghost kitchens or any combination thereof have been popping up all over due in large part to the rise of on demand transport infrastructure In 2019 the trend reached a fever pitch with numerous investments and fresh takes on the concept Virtual kitchens are essentially strategically placed kitchens that specialize in delivery only no walk in or sit down customers allowed The general idea is that restaurants can expand their footprint into high demand areas with minimal upfront investment and lower overhead given that prime real estate is not required Virtual kitchens can also allow existing restaurants to experiment with bespoke menus and offer new items without impacting their existing brand To reach customers these businesses usually lean on transport infrastructure provided by the likes of Uber Eats Deliveroo Postmates GrubHub DoorDash and Caviar Kitchen in the cloud U K based food delivery giant Deliveroo which counts Amazon as a major investor recently revealed that it now claims 2 000 virtual restaurant brands in the U K alone a 150 increase on the previous year Deliveroo has operated delivery only kitchens called editions since 2017 This basically involves setting up near areas that are likely to have a big demand for food delivery with Deliveroo offering an arsenal of data to help identify specific culinary preferences It s all about spotting gaps in the market and you might see a small industrial estate with several cabins working on various types of cuisine Deliveroo was far from the first to embrace this concept New York based Maple had operated a similar model out of Manhattan for a couple of years but that ultimately faltered and Deliveroo bought it out in 2017 Another New York based delivery only kitchen called Ando was acquired by Uber Eats last year Elsewhere in Europe Spain based on demand delivery startup Glovo last week raised 167 million at a valuation of more than 1 billion making it one of just a handful of private Spanish startups to hit unicorn status A large chunk of Glovo s cash injection will go toward delivery only restaurants and grocery stores In fact the startup currently operates seven dark stores in Europe and Latin America and is planning 100 similar locations by 2021 The technology landscape over the past 12 months reveals a similar picture investors are hungry for delivery only restaurants The Uber factor As Uber struggles to cut losses across its business Uber Eats now represents its fastest growing unit with a customer base of well over 90 million users and sales growth of 64 in the past year Uber has also been promoting virtual kitchens although these typically operate out of existing restaurants with a different brand and separate menus designed for delivery However the impact of Uber s foray into the virtual kitchen realm is clear Uber cofounder and former CEO Travis Kalanick has launched a new venture called CloudKitchens which touts itself as a real estate company that provides smart kitchens for delivery only restaurants Last month it closed a 400 million funding round at a reported 5 billion valuation Another new startup called Virtual Kitchen Co launched out of stealth last month with 17 million in funding from an illustrious list of backers that includes Andreessen Horowitz a16z and Uber Eats product head Stephen Chau Virtual Kitchen Co was founded by Ken Chong who formerly led Uber s marketplace product team Matt Sawchuk who launched Uber s first peer to peer ride sharing service before moving into a role with Uber Eats and Andro Radonich Virtual Kitchen Co has already helped a few virtual kitchens open in San Francisco with plans to open more than a dozen additional locations in the Bay Area alone in early 2020 We think with a combination of technology data science and rigorous operational abilities it s possible to make running a high volume delivery restaurant an easy complete solution that lets even the smallest food entrepreneurs take advantage of the massive food delivery market said a16z general partner Andrew Chen Virtual Kitchen Co utilizes data to figure out where to best locate their network of kitchens what cuisines are lacking in underserved neighborhoods and even what ingredients should go into which dishes By sharing and collaborating with restaurant partners it will make the ecosystem even better Scramble In September Pasadena California based Kitchen United closed a 40 million series B round of funding co led by Alphabet s GV and New York real estate giant RXR Realty Similar to others in the space Kitchen United offers prospective customers or food entrepreneurs warehouse type facilities that can house up to 20 different restaurants This again is underpinned by a data driven technology platform to help guide menu and location choices Various similar cloud kitchen startups have sprung up around the world and investors have been lining up in droves In Latin America Columbia based Muy this year secured a 15 million cash injection while London based Taster which is creating native food brands for delivery companies like Glovo Uber Eats and Deliveroo raised 8 million And in Germany Keatz locked down 13 million in funding Setting up a traditional sit down restaurant is a costly endeavor and fraught with risk By crunching large swathes of data spanning demographics location and food preferences cloud kitchens promise to sidestep both issues in one fell swoop The data lets them know where to set up and what food to sell and fancy facilities with long leases are simply not necessary A report released last year by Investment bank UBS titled Is the Kitchen Dead estimated that the 35 billion food delivery economy could grow tenfold within a decade The rise of virtual kitchens will undoubtedly play a major role in driving down the cost of meals making it far more likely that people will order takeout instead of cooking at home There could be a scenario where by 2030 most meals currently cooked at home are instead ordered online and delivered from either restaurants or central kitchens the report said The ramifications for the food retail food producer and restaurant industries could be material as well as the impact on property markets home appliances and robotics This is the very definition of disruption Technology platforms that were originally conceived to connect drivers with passengers have not only changed the taxi industry they re also transforming freight and trucking and other spinoff sectors And as we have seen with the rapid rise of cloud kitchens this year the restaurant industry is the next frontier,Another New York based delivery only kitchen called Ando was acquired by Uber Eats last year Elsewhere in Europe,2718,owned_by,who owns Uber Eats?,"{'answer_start': [], 'text': []}",9662,89328,9662
7,2561,"[Walmart, Ribbit Capital]","[[0, 7], [39, 53]]",owner,"""For years, millions of customers have put their trust in Walmart to not only save them money when they shop with us but help them manage their financial needs,"" John Furner, the CEO of Walmart U.S. said in a news release. ""And they've made it clear they want more from us in the financial services arena.""\n\nWalmart said Monday that it's creating a fintech start-up with Ribbit Capital, one of the venture capital firms behind Robinhood.\n\nThe big-box retailer did not share the name of the new company or say when its services will be available. It said it will develop unique and affordable financial products for Walmart employees and customers.\n\nShares were up more than 2% on the news in after-hours trading Monday. Walmart's market cap is $416.7 billion.\n\nThe fintech startup will be majority-owned by Walmart and its board will include several company executives, including its Chief Financial Officer Brett Biggs and Walmart U.S. CEO John Furner. It said it will also name independent industry experts to the board and may acquire or partner with other fintech companies.\n\n""For years, millions of customers have put their trust in Walmart to not only save them money when they shop with us but help them manage their financial needs,"" Furner said in a news release. ""And they've made it clear they want more from us in the financial services arena.""\n\nWith more than 4,700 stores across the country, Walmart interacts with millions of customers each year - including some who don't have a relationship with a bank or a financial advisor.\n\nSix percent of adults don't have a checking, savings or money market account, according to the Federal Reserve. About 16% are ""underbanked,"" meaning they have a bank account but also use alternative financial service products, like a money order. Those Americans are more likely to turn to short-term solutions, such as a pawn shop or a payday loan, which can lead to additional charges or high interest fees.\n\nWalmart already offers some financial services for customers. For example, it has Walmart MoneyCard, a prepaid debit card that customers can load with money and use for purchases. The card has some features that encourage money management or help people who may have a challenged credit history, such as no overdraft fees, no monthly fee and no minimum balance requirement.\n\nThe retailer also offers alternative payment plans for customers on a tight budget, such as layaway and Affirm, a fintech company that allows customers to buy an online item immediately and pay in installments.\n\nWalmart's co-owner of the new company, Ribbit Capital, has a history of investing in fintech companies. Its portfolio includes Affirm; Robinhood, a fee-free investing start-up; and Credit Karma, a company that offers consumer-friendly tools like free credit score checks.","Walmart's co-owner of the new company, Ribbit Capital, has a history of investing in fintech companies.",2556,owned_by,who owns Ribbit Capital?,"{'answer_start': [0], 'text': ['Walmart']}",2323,2323,2323
8,736,"[Marketo, Adobe]","[[53, 60], [74, 79]]",part of,"Recently, it’s become popular to downplay the differences between B2B and B2C marketing. Some industry analysts and commentators have forcefully argued that all marketing should be viewed as “business-to-human,” “human-to-human,” or something similar.\n\nIt’s certainly accurate to say that virtually all forms of marketing involve the communication of messages to human beings. It’s equally true that business decision makers are also consumers, and that the attributes and preferences they have as consumers don’t evaporate when they’re acting in a professional capacity. But have all the meaningful differences between B2B and B2C marketing really disappeared?\n\nThe findings in two recent research reports – one by Marketo (now part of Adobe), and one by Forrester Consulting (commissioned by Adobe) – suggest that some of the lines between B2B and B2C marketing have become blurred. These two studies used different research approaches, and they emphasize different aspects of B2B and B2C marketing, but both raise issues that merit consideration.\n\nThe Marketo Study For this study, Marketo partnered with Loudhouse, an independent research firm, to survey 910 B2B buyers and interview 305 B2B marketing professionals. All of the study participants were located in the UK, Germany, or France, and the surveyed buyers represented a range of company sizes and job functions, including IT, Finance, HR, and Operations.\n\nThe survey found that, like consumers, B2B buyers are very concerned about privacy. More than 80% of the survey respondents said it is important for their prospective vendors to be serious about protecting their business and personal data and to always conform to best practices for handling their data and sensitive information.\n\nMarketo’s survey also found that many B2B buyers, like many consumers, are placing importance on the social values and practices of the companies they do business with. The following table shows the percentage of buyer survey respondents who rated six social practices as important:\n\nThis study also revealed two other emerging similarities between business buyers and consumers. First, 30% of the surveyed buyers said they would disengage from a vendor whose values don’t match their own. And second, it appears that B2B buyers, like consumers, are becoming less loyal. Forty-three percent of the surveyed buyers said they are always looking for a better deal.\n\nThe Forrester Consulting Study The principal objective of the Forrester Consulting study was to explore the similarities between business and consumer purchase journeys. For this research, Forrester surveyed 552 B2B and B2C marketers (manager level and above) representing a wide range of industries and company sizes. Survey respondents were drawn from a total of nine countries. Because of the composition of the survey panel, this research actually captures the perceptions of marketing professionals regarding the convergence of business and consumer buying behaviors.\n\nIn the Forrester survey, marketers identified three attitudes or behaviors that business buyers and consumers share:\n\nMore specifically, Forrester asked marketers how certain elements of their customers’ buying journey have changed over the past two years. The following table shows the percentage of B2B and B2C marketers who reported that these journey attributes had increased significantly or somewhat:\n\nThere is little doubt that the expectations and behaviors of business buyers are being influenced by their experiences as consumers. But this doesn’t mean that all meaningful differences between B2B and B2C marketing have disappeared.\n\nMost B2C marketing still involves the communication of relatively simple messages to a large or very large audience. Most B2B marketing, on the other hand, still involves the communication of more complex messages to a relatively small audience. This difference alone requires the use of different marketing strategies, channels, and tactics.\n\nTop image courtesy of George Redgrave via Flickr CC.","The findings in two recent research reports – one by Marketo (now part of Adobe), and one by Forrester Consulting (commissioned by Adobe) – suggest that some of the lines between B2B and B2C marketing have become blurred.",729,founded_by,who founded Adobe?,"{'answer_start': [], 'text': []}",11075,100632,11075
9,7422,"[Smart Industry, Sparta Systems , Inc.]","[[75, 89], [120, 140]]",partnership with,"Above the Trend Line: your industry rumor central is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items grouped by category such as M&A activity, people movements, funding news, industry partnerships, customer wins, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz. Our intent is to provide you a one-stop source of late-breaking news to help you keep abreast of this fast-paced ecosystem. We're working hard on your behalf with our extensive vendor network to give you all the latest happenings. Heard of something yourself? Tell us! Just e-mail me at: daniel@insidebigdata.com. Be sure to Tweet Above the Trend Line articles using the hashtag: #abovethetrendline.\n\nI've been taking it pretty easy this summer so far while wearing my ""educator"" hat. I'm only teaching a single ""Introduction to Data Science"" class for UCLA Extension and it's been a blast thus far. My small group of 10 newbie data scientists are hungry for knowledge in support of their attempt to break into the field. I'm making sure they're ready with a data science toolbox full of code, a real-life project to complete, publish and promote on GitHub, as well as a glimpse of the entire Data Science Process. Summer of Data! But for now, let's dig into the big data rumor mill ... in new funding news we heard ... Privacera, the cloud data governance and security leader founded by the creators of Apache Ranger™, announced it has raised $13.5 million in Series A funding to support the increasing demand for automated data security, privacy and governance in the cloud. The funding was led by Accel who invested based on the tremendous trajectory of the Company and its strong customer base. Equally appealing for the venture firm was the fact that data governance, security, and compliance are becoming table stakes as organizations look to migrate enterprise data to the cloud to drive data analytics. Accel joins early investors Cervin Ventures, Point 72, and Alchemist Accelerator ... TileDB, Inc. has secured a $15M Series A investment round led by Two Bear Capital, joined by Uncorrelated Ventures and all existing investors: Nexus Venture Partners, Intel Capital, and Big Pi Ventures. The funding will help the company expand go-to-market and product development for its ""universal data engine,"" a novel database that goes beyond tables to manage any complex data and beyond SQL to analyze the data with any tool, all serverless and at planet scale. Montana philanthropist and Two Bear Capital Managing Partner Mike Goguen will join TileDB's Board of Directors ... MariaDB® Corporation announced a $25 million funding round supporting the company goals to expand the reach and development of its cloud database, MariaDB SkySQL. Led by SmartFin Capital, with participation from existing investors and new investor GP Bullhound, the round brings the total investment in MariaDB to over $125 million ... Traceable, the end-to-end application security monitoring platform, launched from stealth with $20M in series A funding from Unusual Ventures and BIG Labs. Jyoti Bansal, the founder and former CEO of AppDynamics, heads the company as CEO and co-founder after selling AppDynamics to Cisco for $3.7 billion. Bansal is joined by Sanjay Nagaraj, former VP Engineering at AppDynamics, as CTO and co-founder. Traceable was spun out of BIG Labs, Bansal's startup studio.\n\nIn M&A news we learned ... Syniti, a global data solution provider, announced the acquisition of Virtyx Technologies, Inc., an innovative start-up that provides end-to-end AI-powered automatic monitoring and analytics for data transformation. The asset purchase, which encompasses the acquisition of Virtyx technology and the retention of the key Virtyx Engineering talent, will integrate Virtyx's industry-leading cloud-native and AI technology with Syniti's Knowledge Platform -- creating a knowledge-driven software suite and enhancing Syniti's development and Engineering team ... Core Scientific, a leading infrastructure and software solutions provider for Artificial Intelligence (AI) and blockchain led by CEO Kevin Turner, the former COO of Microsoft, announced the acquisition of certain assets and technology of Atrio Inc. The addition of Atrio is timely as Core Scientific is focusing on expanding its efforts to help researchers solve molecular genetic codes of viruses like COVID-19 and aid drug discovery efforts. Most recently, Core Scientific partnered with NVIDIA, the inventor of the GPU, as well as NetApp, the leader in cloud data services, to provide free access to AI and data engineering infrastructure for coronavirus-related research. Similarly, this acquisition will bring together both Core Scientific and Atrio's High Performance Computing (HPC) and AI capabilities to offer researchers and businesses a more seamless experience for their high-end computing needs ... Jobvite, a leading end-to-end talent acquisition suite, announced that it has acquired the artificial intelligence (AI) and data science team at Predictive Partner. Morgan Llewellyn, CEO of Predictive Partner, will serve as Jobvite's Chief Data Scientist and oversee a team leveraging AI through automation, predictive analytics, data science, machine learning, natural language processing, and optical character recognition ... ATTOM Data Solutions, curator of the nation's premier property database, announced it has acquired Home Junction Inc., a real estate data technology company that specializes in building high quality geographic boundary datasets for neighborhoods, school attendance zones, subdivisions and more ... IBM (NYSE: IBM) announced it has reached a definitive agreement to acquire Brazilian software provider of robotic process automation (RPA) WDG Soluções Em Sistemas E Automação De Processos LTDA (referred to as ""WDG Automation"" throughout). The acquisition further advances IBM's comprehensive AI-infused automation capabilities, spanning business processes to IT operations. Financial terms were not disclosed ... NetApp (NASDAQ: NTAP), a leader in cloud data services, announced that it has completed its acquisition of Spot, a leader in compute management and cost optimization in the public clouds ... Brillio, a leading digital technology consulting and solutions company, announced the acquisition of Cognetik, a data and insights company with deep expertise in improving digital experiences for its customers. Cognetik enables companies across the globe, including Facebook, Pizza Hut and McDonald's to build and implement analytics solutions that optimize customer experience to increase loyalty, drive revenue and advance business transformation. Terms of the deal were not disclosed.\n\nWe also heard of a number of new partnerships, alignments and collaborations ... Iguazio, the data science platform for real-time machine learning applications, announced a strategic partnership with SFL Scientific, a leading data science consulting firm. The partnership will enable both companies to extend their offerings to enterprises of all industries looking to apply AI to real life applications, regardless of the size or skill set of their internal teams ... Quartic.ai, provider of the award-winning Quartic AI and IoT Platform™ for Smart Industry, announced a partnership with Sparta Systems, Inc., provider of industry-leading Quality Management System (QMS) platforms TrackWise® and TrackWise Digital®, to bring forward next-level AI capabilities for early risk detection during the manufacturing process to reduce product quality impact and enable near-real-time product release ... To make it easier to run more targeted and precise marketing campaigns cost-effectively across multiple channels, Narrative, the enterprise data streaming company, announced they are partnering with TransUnion (NYSE:TRU) to enable access to TransUnion's rich consumer attributes and audiences via Narrative's data streaming platform ... Q-CTRL, a startup that applies the principles of control engineering to accelerate the development of quantum technology, announced a global research and technology development partnership with Advanced Navigation, a leader in AI-based navigational hardware ... Hybrid cloud data warehouse company Yellowbrick Data announced that Sonra has joined the company's partner program. Both Yellowbrick and Sonra power huge data needs across a variety of enterprise applications. Together, they are making it faster, easier, and cheaper to convert legacy XML and JSON data into modern formats that can be used for data insights critical to business success. As part of their partnership, the companies are working on technical certifications and performance tuning as well as joint go-to-market opportunities. Sonra also offers data architecture consulting and design services, and can help enterprises deploy Yellowbrick quickly and with expertise ... SpIntellx, Inc., and CellNetix Pathology & Laboratories have announced that they will collaborate to validate the SpIntellx HistoMapr-Breast™ Platform, which taps the power of explainable artificial intelligence (xAI) for healthcare providers to diagnose and treat breast cancer more efficiently and accurately. The platform is the first in the companies' larger efforts to harness the power of xAI-assisted lab processes in the development of diagnostics, prognostics, therapeutic strategies, and drug development for tumor and non-tumor diseases. SpIntellx, Inc. is a computational and systems pathology company that harnesses the computational power of unbiased spatial analytics and explainable artificial intelligence (xAI) technologies to offer proprietary software products and services for analyzing pathology tissue sections. CellNetix Pathology & Laboratories is a rapidly growing pathology laboratory company providing comprehensive subspecialty clinical and anatomic pathology services in the Pacific Northwest ... Siren, a leading provider of Investigative Intelligence analytics, today announced a strategic partnership with AIMART, a Brazilian-based consultancy which specializes in providing solutions that tackle complex investigations and detect sophisticated frauds and financial crimes. Under this agreement, Siren joins vendors such as IBM and Q-Credi to form part of a best of breed analytics and data science portfolio for the South American marketplace.\n\nIn people movement news we heard ... Excelero, a disruptor in software-defined block storage for AI/ML/deep learning and GPU computing, announced that longtime enterprise technology executive and strategist Henri Richard has become a board advisor. As AI/ML/deep learning and HPC deployments drive demand for newer software-defined storage solutions using NVMe Flash, Henri will assist Excelero's executive team in uncovering opportunities to accelerate growth ... Collibra, the Data Intelligence company, announced the appointment of Stuart Wilson to chief revenue officer and Aileen Black to senior vice president of public sector, a newly-created role ... Magnitude Software has named Paul Young as the company's new general manager. Paul will be tasked with leading Magnitude's Data Integration Business unit that brings together its data connectivity, integration and management solutions under one umbrella.\n\nAnd finally, in the new customer wins category we learned ... Run:AI, a company virtualizing AI infrastructure, announced that it is working with the London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare as a technology provider to help them better manage their AI resources and provide elastic resource allocation, visibility and control.\n\nThe AI Centre, led by King's College London and based in St Thomas' Hospital, uses an enormous trove of de-identified patient data held by the NHS, including medical images and patient clinical pathway data, to train sophisticated AI learning algorithms. These algorithms are used to create new tools for faster diagnosis, personalized therapies, and more effective screening ... Ascend.io, the data engineering company, announced that HNI Corporation, a global leader in workplace furnishings and residential building products, has deployed The Ascend Unified Data Engineering Platform for modern data pipelines, enabling faster data analytics to drive strategic business decisions. Ascend enables HNI to intelligently collect and model data from various APIs and sources to directly fuel operational analytics across the business. Completed in just two months, HNI has been able to experience high operational efficiency by replacing manually-intensive processes with fully automated pipelines, resulting in faster data delivery, improved accuracy, and better business insights ... SADA, a leading global business and technology consultancy, announced a five year, $50 million deal with MadHive to expand the OTT ad solutions company's use of Google Cloud technologies to deliver new products and services. MadHive's end-to-end advertising solution, which leverages cryptography, blockchain and AI to power modern media, was first deployed on Google Cloud Platform (Google Cloud) in 2017 with help from SADA, a Google Cloud Premier Partner. The challenge was to deliver MadHive's next-generation platform at scale with low latency while supporting a rapid, iterative development cycle, machine learning requirements, and a short go-to-market timeline ... Hewlett Packard Enterprise (HPE) announced that Mastertel, one of the largest telecommunications companies in Russia, has selected HPE GreenLake to modernize its IT infrastructure, optimize financial flows, and achieve the flexibility necessary to service the growth of new customers.","Quartic.ai, provider of the award-winning Quartic AI and IoT Platform™ for Smart Industry, announced a partnership with Sparta Systems, Inc., provider of industry-leading Quality Management System (QMS) platforms TrackWise",7406,partners_with,"who is partner with Sparta Systems , Inc.?","{'answer_start': [75], 'text': ['Smart Industry']}",3867,3867,3867


## Preprocessing the training data

Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 Transformers `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that model requires.

To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

That vocabulary will be cached, so it's not downloaded again the next time we run the cell.

In [25]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/442 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

The following assertion ensures that our tokenizer is a fast tokenizers (backed by Rust) from the 🤗 Tokenizers library. Those fast tokenizers are available for almost all models, and we will need some of the special features they have for our preprocessing.

In [26]:
import transformers
assert isinstance(tokenizer, transformers.PreTrainedTokenizerFast)

You can check which type of models have a fast tokenizer available and which don't on the [big table of models](https://huggingface.co/transformers/index.html#bigtable).

You can directly call this tokenizer on two sentences (one for the answer, one for the context):

In [27]:
tokenizer("What is your name?", "My name is Sylvain.")

{'input_ids': [101, 2054, 2003, 2115, 2171, 1029, 102, 2026, 2171, 2003, 25353, 22144, 2378, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

Depending on the model you selected, you will see different keys in the dictionary returned by the cell above. They don't matter much for what we're doing here (just know they are required by the model we will instantiate later), you can learn more about them in [this tutorial](https://huggingface.co/transformers/preprocessing.html) if you're interested.

Now one specific thing for the preprocessing in question answering is how to deal with very long documents. We usually truncate them in other tasks, when they are longer than the model maximum sentence length, but here, removing part of the the context might result in losing the answer we are looking for. To deal with this, we will allow one (long) example in our dataset to give several input features, each of length shorter than the maximum length of the model (or the one we set as a hyper-parameter). Also, just in case the answer lies at the point we split a long context, we allow some overlap between the features we generate controlled by the hyper-parameter `doc_stride`:

In [28]:
max_length = 384 # The maximum length of a feature (question and context)
doc_stride = 128 # The authorized overlap between two part of the context when splitting it is needed.

Let's find one long example in our dataset:

In [29]:
for i, example in enumerate(datasets["train"]):
    if len(tokenizer(example["question"], example["context"])["input_ids"]) > 384:
        break
example = datasets["train"][i]

Token indices sequence length is longer than the specified maximum sequence length for this model (659 > 512). Running this sequence through the model will result in indexing errors


Without any truncation, we get the following length for the input IDs:

In [30]:
len(tokenizer(example["question"], example["context"])["input_ids"])

659

Now, if we just truncate, we will lose information (and possibly the answer to our question):

In [31]:
len(tokenizer(example["question"], example["context"], max_length=max_length, truncation="only_second")["input_ids"])

384

Note that we never want to truncate the question, only the context, else the `only_second` truncation picked. Now, our tokenizer can automatically return us a list of features capped by a certain maximum length, with the overlap we talked above, we just have to tell it with `return_overflowing_tokens=True` and by passing the stride:

In [32]:
tokenized_example = tokenizer(
    example["question"],
    example["context"],
    max_length=max_length,
    truncation="only_second",
    return_overflowing_tokens=True,
    stride=doc_stride
)

Now we don't have one list of `input_ids`, but several: 

In [33]:
[len(x) for x in tokenized_example["input_ids"]]

[384, 384, 163]

And if we decode them, we can see the overlap:

In [34]:
for x in tokenized_example["input_ids"][:2]:
    print(tokenizer.decode(x))

[CLS] who acquired amd? [SEP] intel corporation's dominance of the microprocessor market and its aggressive business practices ; the ability of third party manufacturers to manufacture amd's products on a timely basis in sufficient quantities and using competitive technologies ; expected manufacturing yields for amd's products ; amd's ability to introduce products on a timely basis with features and performance levels that provide value to its customers ; global economic uncertainty ; the loss of a significant customer ; amd's ability to generate revenue from its semi - custom soc products ; the impact of the covid - 19 pandemic on amd's business, financial condition and results of operations ; political, legal, economic risks and natural disasters ; the impact of government actions and regulations such as export administration regulations, tariffs and trade protection measures ; potential security vulnerabilities ; potential it outages, data loss, data breaches and cyber - attacks ; u

Now this will give us some work to properly treat the answers: we need to find in which of those features the answer actually is, and where exactly in that feature. The models we will use require the start and end positions of these answers in the tokens, so we will also need to to map parts of the original context to some tokens. Thankfully, the tokenizer we're using can help us with that by returning an `offset_mapping`:

In [35]:
tokenized_example = tokenizer(
    example["question"],
    example["context"],
    max_length=max_length,
    truncation="only_second",
    return_overflowing_tokens=True,
    return_offsets_mapping=True,
    stride=doc_stride
)
print(tokenized_example["offset_mapping"][0][:100])

[(0, 0), (0, 3), (4, 12), (13, 15), (15, 16), (16, 17), (0, 0), (0, 5), (6, 17), (17, 18), (18, 19), (20, 29), (30, 32), (33, 36), (37, 42), (42, 45), (45, 48), (48, 51), (52, 58), (59, 62), (63, 66), (67, 77), (78, 86), (87, 96), (96, 97), (98, 101), (102, 109), (110, 112), (113, 118), (119, 124), (125, 138), (139, 141), (142, 153), (154, 156), (156, 157), (157, 158), (158, 159), (160, 168), (169, 171), (172, 173), (174, 180), (181, 186), (187, 189), (190, 200), (201, 211), (212, 215), (216, 221), (222, 233), (234, 246), (246, 247), (248, 256), (257, 270), (271, 277), (278, 281), (282, 284), (284, 285), (285, 286), (286, 287), (288, 296), (296, 297), (298, 300), (300, 301), (301, 302), (302, 303), (304, 311), (312, 314), (315, 324), (325, 333), (334, 336), (337, 338), (339, 345), (346, 351), (352, 356), (357, 365), (366, 369), (370, 381), (382, 388), (389, 393), (394, 401), (402, 407), (408, 410), (411, 414), (415, 424), (424, 425), (426, 432), (433, 441), (442, 453), (453, 454), (455

This gives, for each index of our input IDS, the corresponding start and end character in the original text that gave our token. The very first token (`[CLS]`) has (0, 0) because it doesn't correspond to any part of the question/answer, then the second token is the same as the characters 0 to 3 of the question:

In [36]:
first_token_id = tokenized_example["input_ids"][0][1]
offsets = tokenized_example["offset_mapping"][0][1]
print(tokenizer.convert_ids_to_tokens([first_token_id])[0], example["question"][offsets[0]:offsets[1]])

who who


So we can use this mapping to find the position of the start and end tokens of our answer in a given feature. We just have to distinguish which parts of the offsets correspond to the question and which part correspond to the context, this is where the `sequence_ids` method of our `tokenized_example` can be useful:

In [37]:
sequence_ids = tokenized_example.sequence_ids()
print(sequence_ids)

[None, 0, 0, 0, 0, 0, None, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

It returns `None` for the special tokens, then 0 or 1 depending on whether the corresponding token comes from the first sentence past (the question) or the second (the context). Now with all of this, we can find the first and last token of the answer in one of our input feature (or if the answer is not in this feature):

In [38]:
answers = example["answers"]
start_char = answers["answer_start"][0]
end_char = start_char + len(answers["text"][0])

# Start token index of the current span in the text.
token_start_index = 0
while sequence_ids[token_start_index] != 1:
    token_start_index += 1

# End token index of the current span in the text.
token_end_index = len(tokenized_example["input_ids"][0]) - 1
while sequence_ids[token_end_index] != 1:
    token_end_index -= 1

# Detect if the answer is out of the span (in which case this feature is labeled with the CLS index).
offsets = tokenized_example["offset_mapping"][0]
if (offsets[token_start_index][0] <= start_char and offsets[token_end_index][1] >= end_char):
    # Move the token_start_index and token_end_index to the two ends of the answer.
    # Note: we could go after the last offset if the answer is the last word (edge case).
    while token_start_index < len(offsets) and offsets[token_start_index][0] <= start_char:
        token_start_index += 1
    start_position = token_start_index - 1
    while offsets[token_end_index][1] >= end_char:
        token_end_index -= 1
    end_position = token_end_index + 1
    print(start_position, end_position)
else:
    print("The answer is not in this feature.")

230 231


And we can double check that it is indeed the theoretical answer:

In [39]:
print(tokenizer.decode(tokenized_example["input_ids"][0][start_position: end_position+1]))
print(answers["text"][0])

amd
AMD


For this notebook to work with any kind of models, we need to account for the special case where the model expects padding on the left (in which case we switch the order of the question and the context):

In [40]:
pad_on_right = tokenizer.padding_side == "right"

Now let's put everything together in one function we will apply to our training set. In the case of impossible answers (the answer is in another feature given by an example with a long context), we set the cls index for both the start and end position. We could also simply discard those examples from the training set if the flag `allow_impossible_answers` is `False`. Since the preprocessing is already complex enough as it is, we've kept is simple for this part.

In [41]:
def prepare_train_features(examples):
    # Tokenize our examples with truncation and padding, but keep the overflows using a stride. This results
    # in one example possible giving several features when a context is long, each of those features having a
    # context that overlaps a bit the context of the previous feature.
    tokenized_examples = tokenizer(
        examples["question" if pad_on_right else "context"],
        examples["context" if pad_on_right else "question"],
        truncation="only_second" if pad_on_right else "only_first",
        max_length=max_length,
        stride=doc_stride,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    # Since one example might give us several features if it has a long context, we need a map from a feature to
    # its corresponding example. This key gives us just that.
    sample_mapping = tokenized_examples.pop("overflow_to_sample_mapping")
    # The offset mappings will give us a map from token to character position in the original context. This will
    # help us compute the start_positions and end_positions.
    offset_mapping = tokenized_examples.pop("offset_mapping")

    # Let's label those examples!
    tokenized_examples["start_positions"] = []
    tokenized_examples["end_positions"] = []

    for i, offsets in enumerate(offset_mapping):
        # We will label impossible answers with the index of the CLS token.
        input_ids = tokenized_examples["input_ids"][i]
        cls_index = input_ids.index(tokenizer.cls_token_id)

        # Grab the sequence corresponding to that example (to know what is the context and what is the question).
        sequence_ids = tokenized_examples.sequence_ids(i)

        # One example can give several spans, this is the index of the example containing this span of text.
        sample_index = sample_mapping[i]
        answers = examples["answers"][sample_index]
        # If no answers are given, set the cls_index as answer.
        if len(answers["answer_start"]) == 0:
            tokenized_examples["start_positions"].append(cls_index)
            tokenized_examples["end_positions"].append(cls_index)
        else:
            # Start/end character index of the answer in the text.
            start_char = answers["answer_start"][0]
            end_char = start_char + len(answers["text"][0])

            # Start token index of the current span in the text.
            token_start_index = 0
            while sequence_ids[token_start_index] != (1 if pad_on_right else 0):
                token_start_index += 1

            # End token index of the current span in the text.
            token_end_index = len(input_ids) - 1
            while sequence_ids[token_end_index] != (1 if pad_on_right else 0):
                token_end_index -= 1

            # Detect if the answer is out of the span (in which case this feature is labeled with the CLS index).
            if not (offsets[token_start_index][0] <= start_char and offsets[token_end_index][1] >= end_char):
                tokenized_examples["start_positions"].append(cls_index)
                tokenized_examples["end_positions"].append(cls_index)
            else:
                # Otherwise move the token_start_index and token_end_index to the two ends of the answer.
                # Note: we could go after the last offset if the answer is the last word (edge case).
                while token_start_index < len(offsets) and offsets[token_start_index][0] <= start_char:
                    token_start_index += 1
                tokenized_examples["start_positions"].append(token_start_index - 1)
                while offsets[token_end_index][1] >= end_char:
                    token_end_index -= 1
                tokenized_examples["end_positions"].append(token_end_index + 1)

    return tokenized_examples

In [42]:
print((datasets['train'][:69]))



This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists for each key:

In [43]:
features = prepare_train_features(datasets['train'][:5])
features

{'input_ids': [[101, 2040, 11241, 1999, 15358, 8751, 29360, 1029, 102, 1017, 1007, 13451, 5007, 1010, 1037, 2446, 22752, 2008, 1521, 1055, 2437, 2019, 2035, 1011, 3751, 7471, 19744, 1998, 4899, 4628, 6892, 2150, 1037, 1520, 21830, 1521, 2044, 6274, 1002, 3486, 2454, 2013, 15358, 8751, 29360, 1010, 1996, 2922, 14316, 1999, 26060, 2044, 2049, 22301, 3954, 3449, 2239, 14163, 6711, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

To apply this function on all the sentences (or pairs of sentences) in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command. Since our preprocessing changes the number of samples, we need to remove the old columns when applying it.

In [44]:
tokenized_datasets = datasets.map(prepare_train_features, batched=True, remove_columns=datasets["train"].column_names)

  0%|          | 0/20 [00:00<?, ?ba/s]

  0%|          | 0/17 [00:00<?, ?ba/s]

Even better, the results are automatically cached by the 🤗 Datasets library to avoid spending time on this step the next time you run your notebook. The 🤗 Datasets library is normally smart enough to detect when the function you pass to map has changed (and thus requires to not use the cache data). For instance, it will properly detect if you change the task in the first cell and rerun the notebook. 🤗 Datasets warns you when it uses cached files, you can pass `load_from_cache_file=False` in the call to `map` to not use the cached files and force the preprocessing to be applied again.

Note that we passed `batched=True` to encode the texts by batches together. This is to leverage the full benefit of the fast tokenizer we loaded earlier, which will use multi-threading to treat the texts in a batch concurrently.

## Fine-tuning the model

Now that our data is ready for training, we can download the pretrained model and fine-tune it. Since our task is question answering, we use the `AutoModelForQuestionAnswering` class. Like with the tokenizer, the `from_pretrained` method will download and cache the model for us:

In [45]:
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForQuestionAnswering: ['vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this mode

In [46]:
#in case we already trained the model and want to continue
#model = AutoModelForQuestionAnswering.from_pretrained("/content/drive/MyDrive/nlp/test-rel-trained")

The warning is telling us we are throwing away some weights (the `vocab_transform` and `vocab_layer_norm` layers) and randomly initializing some other (the `pre_classifier` and `classifier` layers). This is absolutely normal in this case, because we are removing the head used to pretrain the model on a masked language modeling objective and replacing it with a new head for which we don't have pretrained weights, so the library warns us we should fine-tune this model before using it for inference, which is exactly what we are going to do.

To instantiate a `Trainer`, we will need to define three more things. The most important is the [`TrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments), which is a class that contains all the attributes to customize the training. It requires one folder name, which will be used to save the checkpoints of the model, and all other arguments are optional:

In [47]:
args = TrainingArguments(
    f"test-rel",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=3,
    weight_decay=0.01,
)

Here we set the evaluation to be done at the end of each epoch, tweak the learning rate, use the `batch_size` defined at the top of the notebook and customize the number of epochs for training, as well as the weight decay.

Then we will need a data collator that will batch our processed examples together, here the default one will work:

In [48]:
from transformers import default_data_collator

data_collator = default_data_collator

We will evaluate our model and compute metrics in the next section (this is a very long operation, so we will only compute the evaluation loss during training).

Then we just need to pass all of this along with our datasets to the `Trainer`:

In [49]:
trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["val"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

We can now finetune our model by just calling the `train` method:

In [50]:
trainer.train()
trainer.save_model("/content/drive/MyDrive/nlp/test-rel-trained")

Since this training is particularly long, let's save the model just in case we need to restart.

## Evaluation

Evaluating our model will require a bit more work, as we will need to map the predictions of our model back to parts of the context. The model itself predicts logits for the start and en position of our answers: if we take a batch from our validation datalaoder, here is the output our model gives us:

In [51]:
import torch

for batch in trainer.get_eval_dataloader():
    break
batch = {k: v.to(trainer.args.device) for k, v in batch.items()}
with torch.no_grad():
    output = trainer.model(**batch)
output.keys()

odict_keys(['loss', 'start_logits', 'end_logits'])

The output of the model is a dict-like object that contains the loss (since we provided labels), the start and end logits. We won't need the loss for our predictions, let's have a look a the logits:

In [52]:
output.start_logits.shape, output.end_logits.shape

(torch.Size([16, 384]), torch.Size([16, 384]))

We have one logit for each feature and each token. The most obvious thing to predict an answer for each featyre is to take the index for the maximum of the start logits as a start position and the index of the maximum of the end logits as an end position.

In [53]:
output.start_logits.argmax(dim=-1), output.end_logits.argmax(dim=-1)

(tensor([15,  0,  0,  0,  0,  0,  0, 15,  0,  0,  0,  0,  0,  0, 29,  0],
        device='cuda:0'),
 tensor([16,  0,  0,  0,  0,  0,  0, 18,  0,  0,  0,  0,  0,  0, 33,  0],
        device='cuda:0'))

This will work great in a lot of cases, but what if this prediction gives us something impossible: the start position could be greater than the end position, or point to a span of text in the question instead of the answer. In that case, we might want to look at the second best prediction to see if it gives a possible answer and select that instead.

However, picking the second best answer is not as easy as picking the best one: is it the second best index in the start logits with the best index in the end logits? Or the best index in the start logits with the second best index in the end logits? And if that second best answer is not possible either, it gets even trickier for the third best answer.


To classify our answers, we will use the score obtained by adding the start and end logits. We won't try to order all the possible answers and limit ourselves to with a hyper-parameter we call `n_best_size`. We'll pick the best indices in the start and end logits and gather all the answers this predicts. After checking if each one is valid, we will sort them by their score and keep the best one. Here is how we would do this on the first feature in the batch:

In [54]:
n_best_size = 20

In [55]:
import numpy as np

start_logits = output.start_logits[0].cpu().numpy()
end_logits = output.end_logits[0].cpu().numpy()
# Gather the indices the best start/end logits:
start_indexes = np.argsort(start_logits)[-1 : -n_best_size - 1 : -1].tolist()
end_indexes = np.argsort(end_logits)[-1 : -n_best_size - 1 : -1].tolist()
valid_answers = []
for start_index in start_indexes:
    for end_index in end_indexes:
        if start_index <= end_index: # We need to refine that test to check the answer is inside the context
            valid_answers.append(
                {
                    "score": start_logits[start_index] + end_logits[end_index],
                    "text": "" # We need to find a way to get back the original substring corresponding to the answer in the context
                }
            )

And then we can sort the `valid_answers` according to their `score` and only keep the best one. The only point left is how to check a given span is inside the context (and not the question) and how to get back the text inside. To do this, we need to add two things to our validation features:
- the ID of the example that generated the feature (since each example can generate several features, as seen before);
- the offset mapping that will give us a map from token indices to character positions in the context.

That's why we will re-process the validation set with the following function, slightly different from `prepare_train_features`:

In [56]:
def prepare_validation_features(examples):
    # Tokenize our examples with truncation and maybe padding, but keep the overflows using a stride. This results
    # in one example possible giving several features when a context is long, each of those features having a
    # context that overlaps a bit the context of the previous feature.
    tokenized_examples = tokenizer(
        examples["question" if pad_on_right else "context"],
        examples["context" if pad_on_right else "question"],
        truncation="only_second" if pad_on_right else "only_first",
        max_length=max_length,
        stride=doc_stride,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    # Since one example might give us several features if it has a long context, we need a map from a feature to
    # its corresponding example. This key gives us just that.
    sample_mapping = tokenized_examples.pop("overflow_to_sample_mapping")

    # We keep the example_id that gave us this feature and we will store the offset mappings.
    tokenized_examples["example_id"] = []

    for i in range(len(tokenized_examples["input_ids"])):
        # Grab the sequence corresponding to that example (to know what is the context and what is the question).
        sequence_ids = tokenized_examples.sequence_ids(i)
        context_index = 1 if pad_on_right else 0

        # One example can give several spans, this is the index of the example containing this span of text.
        sample_index = sample_mapping[i]
        tokenized_examples["example_id"].append(examples['question_id'][sample_index]) #####__index_level_0__

        # Set to None the offset_mapping that are not part of the context so it's easy to determine if a token
        # position is part of the context or not.
        tokenized_examples["offset_mapping"][i] = [
            (o if sequence_ids[k] == context_index else None)
            for k, o in enumerate(tokenized_examples["offset_mapping"][i])
        ]

    return tokenized_examples

And like before, we can apply that function to our validation set easily:

In [57]:
validation_features = datasets["val"].map(
    prepare_validation_features,
    batched=True,
    remove_columns=datasets["val"].column_names
)

  0%|          | 0/17 [00:00<?, ?ba/s]

Now we can grab the predictions for all features by using the `Trainer.predict` method:

In [58]:
raw_predictions = trainer.predict(validation_features)

The following columns in the test set  don't have a corresponding argument in `DistilBertForQuestionAnswering.forward` and have been ignored: example_id, offset_mapping.
***** Running Prediction *****
  Num examples = 16856
  Batch size = 16


The `Trainer` *hides* the columns that are not used by the model (here `example_id` and `offset_mapping` which we will need for our post-processing), so we set them back:

In [59]:
validation_features.set_format(type=validation_features.format["type"], columns=list(validation_features.features.keys()))

We can now refine the test we had before: since we set `None` in the offset mappings when it corresponds to a part of the question, it's easy to check if an answer is fully inside the context. We also eliminate very long answers from our considerations (with an hyper-parameter we can tune)

In [60]:
max_answer_length = 30

In [61]:
start_logits = output.start_logits[0].cpu().numpy()
end_logits = output.end_logits[0].cpu().numpy()
offset_mapping = validation_features[0]["offset_mapping"]
# The first feature comes from the first example. For the more general case, we will need to be match the example_id to
# an example index
context = datasets["val"][0]["context"]
print(context)
# Gather the indices the best start/end logits:
start_indexes = np.argsort(start_logits)[-1 : -n_best_size - 1 : -1].tolist()
end_indexes = np.argsort(end_logits)[-1 : -n_best_size - 1 : -1].tolist()
valid_answers = []
for start_index in start_indexes:
    for end_index in end_indexes:
        # Don't consider out-of-scope answers, either because the indices are out of bounds or correspond
        # to part of the input_ids that are not in the context.
        if (
            start_index >= len(offset_mapping)
            or end_index >= len(offset_mapping)
            or offset_mapping[start_index] is None
            or offset_mapping[end_index] is None
        ):
            continue
        # Don't consider answers with a length that is either < 0 or > max_answer_length.
        if end_index < start_index or end_index - start_index + 1 > max_answer_length:
            continue
        if start_index <= end_index: # We need to refine that test to check the answer is inside the context
            start_char = offset_mapping[start_index][0]
            end_char = offset_mapping[end_index][1]
            valid_answers.append(
                {
                    "score": start_logits[start_index] + end_logits[end_index],
                    "text": context[start_char: end_char]
                }
            )

valid_answers = sorted(valid_answers, key=lambda x: x["score"], reverse=True)[:n_best_size]
valid_answers

Sports blockchain venture Chiliz has announced a strategic partnership with Binance Chain the mainnet of major cryptocurrency exchange


[{'score': 16.619637, 'text': 'Chiliz'},
 {'score': 10.727211, 'text': 'Chiliz has'},
 {'score': 8.035167, 'text': 'Chili'},
 {'score': 7.923586, 'text': 'Chiliz has announced'},
 {'score': 6.628046, 'text': 'Chiliz has announced a'},
 {'score': 6.403385, 'text': 'z'},
 {'score': 5.088732, 'text': 'Sports blockchain venture Chiliz'},
 {'score': 4.298773, 'text': 'blockchain venture Chiliz'},
 {'score': 3.1134443, 'text': 'Chiliz has announced a strategic partnership'},
 {'score': 2.943778, 'text': 'Chiliz has announced a strategic'},
 {'score': 2.8883576, 'text': 'venture Chiliz'},
 {'score': 2.600844, 'text': 'chain venture Chiliz'},
 {'score': 2.1659794, 'text': 'n venture Chiliz'},
 {'score': 1.6354704,
  'text': 'Chiliz has announced a strategic partnership with Binance Chain the mainnet'},
 {'score': 1.621232,
  'text': 'Chiliz has announced a strategic partnership with Binance Chain the mainnet of major cryptocurren'},
 {'score': 1.5599504,
  'text': 'Chiliz has announced a strat

We can compare to the actual ground-truth answer:

In [62]:
datasets["val"][0]["answers"]

{'answer_start': [26], 'text': ['Chiliz']}

Our model picked the right as the most likely answer!

As we mentioned in the code above, this was easy on the first feature because we knew it comes from the first example. For the other features, we will need a map between examples and their corresponding features. Also, since one example can give several features, we will need to gather together all the answers in all the features generated by a given example, then pick the best one. The following code builds a map from example index to its corresponding features indices:

In [63]:
import collections

examples = datasets["val"]
features = validation_features

example_id_to_index = {k: i for i, k in enumerate(examples["question_id"])}
features_per_example = collections.defaultdict(list)
for i, feature in enumerate(features):
    features_per_example[example_id_to_index[feature["example_id"]]].append(i)

We're almost ready for our post-processing function. The last bit to deal with is the impossible answer (when `squad_v2 = True`). The code above only keeps answers that are inside the context, we need to also grab the score for the impossible answer (which has start and end indices corresponding to the index of the CLS token). When one example gives several features, we have to predict the impossible answer when all the features give a high score to the impossible answer (since one feature could predict the impossible answer just because the answer isn't in the part of the context it has access too), which is why the score of the impossible answer for one example is the *minimum* of the scores for the impossible answer in each feature generated by the example.

We then predict the impossible answer when that score is greater than the score of the best non-impossible answer. All combined together, this gives us this post-processing function:

And we can apply our post-processing function to our raw predictions - This is the original versiom which prepare every question seperately:

In [64]:
from tqdm.auto import tqdm

def postprocess_qa_predictionsORG(examples, features, raw_predictions, n_best_size = 20, max_answer_length = 30):
    all_start_logits, all_end_logits = raw_predictions
    # Build a map example to its corresponding features.
    example_id_to_index = {k: i for i, k in enumerate(examples["question_id"])}
    features_per_example = collections.defaultdict(list)
    for i, feature in enumerate(features):
        features_per_example[example_id_to_index[feature["example_id"]]].append(i)

    # The dictionaries we have to fill.
    predictions = collections.OrderedDict()

    # Logging.
    print(f"Post-processing {len(examples)} example predictions split into {len(features)} features.")

    # Let's loop over all the examples!
    for example_index, example in enumerate(tqdm(examples)):
        # Those are the indices of the features associated to the current example.
        feature_indices = features_per_example[example_index]
         
        min_null_score = None # Only used if squad_v2 is True.
        valid_answers = []
        
        context = example["context"]
        # Looping through all the features associated to the current example.
        
        for feature_index in feature_indices:
            # We grab the predictions of the model for this feature.
            start_logits = all_start_logits[feature_index]
            end_logits = all_end_logits[feature_index]
            # This is what will allow us to map some the positions in our logits to span of texts in the original
            # context.
            offset_mapping = features[feature_index]["offset_mapping"]

            # Update minimum null prediction.
            cls_index = features[feature_index]["input_ids"].index(tokenizer.cls_token_id)
            feature_null_score = start_logits[cls_index] + end_logits[cls_index]
            if min_null_score is None or min_null_score < feature_null_score:
                 
                min_null_score = feature_null_score

            # Go through all possibilities for the `n_best_size` greater start and end logits.
            start_indexes = np.argsort(start_logits)[-1 : -n_best_size - 1 : -1].tolist()
            end_indexes = np.argsort(end_logits)[-1 : -n_best_size - 1 : -1].tolist()
            for start_index in start_indexes:
                for end_index in end_indexes:
                    # Don't consider out-of-scope answers, either because the indices are out of bounds or correspond
                    # to part of the input_ids that are not in the context.
                    if (
                        start_index >= len(offset_mapping)
                        or end_index >= len(offset_mapping)
                        or offset_mapping[start_index] is None
                        or offset_mapping[end_index] is None
                    ):
                        continue
                    # Don't consider answers with a length that is either < 0 or > max_answer_length.
                    if end_index < start_index or end_index - start_index + 1 > max_answer_length:
                        continue

                    start_char = offset_mapping[start_index][0]
                    end_char = offset_mapping[end_index][1]
                    valid_answers.append(
                        {
                            "score": start_logits[start_index] + end_logits[end_index],
                            "text": context[start_char: end_char]
                        }
                    )
        
        if len(valid_answers) > 0:
            best_answer = sorted(valid_answers, key=lambda x: x["score"], reverse=True)[0]
        else:
            # In the very rare edge case we have not a single non-null prediction, we create a fake prediction to avoid
            # failure.
            best_answer = {"text": "", "score": 0.0}
        
        # Let's pick our final answer: the best one or the null answer (only for squad_v2)
        
        if not squad_v2:
            predictions[example["question_id"]] = best_answer["text"]
        else:
             
            answer = best_answer["text"] if best_answer["score"] > min_null_score else ""
            predictions[example["question_id"]] = answer

    return predictions

In [65]:
final_predictions_org = postprocess_qa_predictionsORG(datasets["val"], validation_features, raw_predictions.predictions)
len(final_predictions_org)

Post-processing 16849 example predictions split into 16856 features.


  0%|          | 0/16849 [00:00<?, ?it/s]

16849

In this version we preprocess together all the question related to a specific sentence ('id') - we aggregate all the answers for these questions and order them and hopefuly the best answer will correspond with the correct relation as all other questions have no answer and will get lower score.

In [66]:
from tqdm.auto import tqdm
RELֹֹ_NUM=7
def postprocess_qa_predictions(examples, features, raw_predictions, n_best_size = 20, max_answer_length = 30, rel_num=7):
    print(rel_num)
    all_start_logits, all_end_logits = raw_predictions
    # Build a map example to its corresponding features.
    example_id_to_index = {k: i for i, k in enumerate(examples["question_id"])}
    features_per_example = collections.defaultdict(list)
    for i, feature in enumerate(features):
        features_per_example[example_id_to_index[feature["example_id"]]].append(i)

    # The dictionaries we have to fill.
    predictions = collections.OrderedDict()

    # Logging.
    print(f"Post-processing {len(examples)} example predictions split into {len(features)} features.")

    # Let's loop over all the examples!
    for example_index, example in enumerate(tqdm(examples)):
        # Those are the indices of the features associated to the current example.
        feature_indices = features_per_example[example_index]
        #print(example, feature_indices)
        min_null_score = None # Only used if squad_v2 is True.
        if example_index%rel_num ==0 :
          valid_answers = []
        
        context = example["context"]
        # Looping through all the features associated to the current example.
       # print('fff', len(feature_indices))
        for feature_index in feature_indices:
            # We grab the predictions of the model for this feature.
            start_logits = all_start_logits[feature_index]
            end_logits = all_end_logits[feature_index]
            # This is what will allow us to map some the positions in our logits to span of texts in the original
            # context.
            offset_mapping = features[feature_index]["offset_mapping"]

            # Update minimum null prediction.
            cls_index = features[feature_index]["input_ids"].index(tokenizer.cls_token_id)
            feature_null_score = start_logits[cls_index] + end_logits[cls_index]
            if min_null_score is None or min_null_score < feature_null_score:
                #print("jj", feature_null_score)
                min_null_score = feature_null_score

            # Go through all possibilities for the `n_best_size` greater start and end logits.
            start_indexes = np.argsort(start_logits)[-1 : -n_best_size - 1 : -1].tolist()
            end_indexes = np.argsort(end_logits)[-1 : -n_best_size - 1 : -1].tolist()
            for start_index in start_indexes:
                for end_index in end_indexes:
                    # Don't consider out-of-scope answers, either because the indices are out of bounds or correspond
                    # to part of the input_ids that are not in the context.
                    if (
                        start_index >= len(offset_mapping)
                        or end_index >= len(offset_mapping)
                        or offset_mapping[start_index] is None
                        or offset_mapping[end_index] is None
                    ):
                        continue
                    # Don't consider answers with a length that is either < 0 or > max_answer_length.
                    if end_index < start_index or end_index - start_index + 1 > max_answer_length:
                        continue

                    start_char = offset_mapping[start_index][0]
                    end_char = offset_mapping[end_index][1]
                    valid_answers.append(
                        {
                            "score": start_logits[start_index] + end_logits[end_index],
                            "text": example['string_id']##context[start_char: end_char]
                        }
                    )
        if (example_index%rel_num ==(rel_num-1)):
            if (len(valid_answers) ) > 0:
                best_answer = sorted(valid_answers, key=lambda x: x["score"], reverse=True)[0]
            else:
                # In the very rare edge case we have not a single non-null prediction, we create a fake prediction to avoid
                # failure.
                best_answer = {"text": "", "score": 0.0}
            
            # Let's pick our final answer: the best one or the null answer (only for squad_v2)
            
            if not squad_v2:
                predictions[example["id"]] = best_answer["text"]
            else:
                
                answer = best_answer["text"] ###if best_answer["score"] > min_null_score else ""
                predictions[example["id"]] = answer

    return predictions

In [67]:

final_predictions = postprocess_qa_predictions(datasets["val"], validation_features, raw_predictions.predictions)

7
Post-processing 16849 example predictions split into 16856 features.


  0%|          | 0/16849 [00:00<?, ?it/s]

In [68]:
len(final_predictions)

2407

Then we can load the metric from the datasets library.

In [69]:
metric = load_metric("squad_v2" if squad_v2 else "squad")

Downloading:   0%|          | 0.00/2.26k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.18k [00:00<?, ?B/s]

Then we can call compute on it. We just need to format predictions and labels a bit as it expects a list of dictionaries and not one big dictionary. In the case of squad_v2, we also have to set a `no_answer_probability` argument (which we set to 0.0 here as we have already set the answer to empty if we picked it).

In [70]:
# how good is the model in the regular question answering task
if squad_v2:
    formatted_predictions = [{"id": k, "prediction_text": v, "no_answer_probability": 0.0} for k, v in final_predictions_org.items()]
else:
    formatted_predictions = [{"id": k, "prediction_text": v} for k, v in final_predictions_org.items()]
references = [{"id": ex["question_id"], "answers": ex["answers"]} for ex in datasets["val"]]
metric.compute(predictions=formatted_predictions, references=references)

{'HasAns_exact': 88.20108018280017,
 'HasAns_f1': 91.13335286787724,
 'HasAns_total': 2407,
 'NoAns_exact': 98.57360476388313,
 'NoAns_f1': 98.57360476388313,
 'NoAns_total': 14442,
 'best_exact': 97.09181553801413,
 'best_exact_thresh': 0.0,
 'best_f1': 97.51071163588203,
 'best_f1_thresh': 0.0,
 'exact': 97.09181553801413,
 'f1': 97.51071163588216,
 'total': 16849}

In [73]:
print(formatted_predictions[:9])
references[:9]

[{'id': '10000', 'prediction_text': 'partners_with', 'no_answer_probability': 0.0}, {'id': '10003', 'prediction_text': 'founded_by', 'no_answer_probability': 0.0}, {'id': '10008', 'prediction_text': 'CEO_of', 'no_answer_probability': 0.0}, {'id': '1002', 'prediction_text': 'acquired_by', 'no_answer_probability': 0.0}, {'id': '10023', 'prediction_text': 'founded_by', 'no_answer_probability': 0.0}, {'id': '10030', 'prediction_text': 'CEO_of', 'no_answer_probability': 0.0}, {'id': '10042', 'prediction_text': 'subsidiary_of', 'no_answer_probability': 0.0}, {'id': '10044', 'prediction_text': 'invested_in', 'no_answer_probability': 0.0}, {'id': '10046', 'prediction_text': 'CEO_of', 'no_answer_probability': 0.0}]


[{'answers': 'invested_in', 'id': '2'},
 {'answers': 'founded_by', 'id': '39'},
 {'answers': 'acquired_by', 'id': '40'},
 {'answers': 'CEO_of', 'id': '42'},
 {'answers': 'CEO_of', 'id': '44'},
 {'answers': 'acquired_by', 'id': '51'},
 {'answers': 'partners_with', 'id': '53'},
 {'answers': 'acquired_by', 'id': '60'},
 {'answers': 'partners_with', 'id': '61'}]

We now test it on our original task of relation detection

In [71]:
if squad_v2:
    formatted_predictions = [{"id": k, "prediction_text": v, "no_answer_probability": 0.0} for k, v in final_predictions.items()]
else:
    formatted_predictions = [{"id": k, "prediction_text": v} for k, v in final_predictions.items()]
references = [{"id": ex["id"], "answers": ex["string_id"]} for ex in ds_rel_val_org]
y_true=[ref['answers'] for ref in references]
y_true[:5]
#metric.compute(predictions=formatted_predictions, references=references)

['invested_in', 'founded_by', 'acquired_by', 'CEO_of', 'CEO_of']

In [72]:
#len(final_predictions)
my_predictions=dict(sorted(final_predictions.items(), key=lambda x:int(x[0])))
#list(my_predictions.items())[:5]
y_pred=list(my_predictions.values())
y_pred[:9]

['invested_in',
 'founded_by',
 'acquired_by',
 'CEO_of',
 'founded_by',
 'acquired_by',
 'partners_with',
 'acquired_by',
 'partners_with']

In [83]:
from sklearn.metrics import f1_score, classification_report
#f1_score(y_true,y_pred, average=None)#'macro')
print(classification_report(y_true, y_pred))

               precision    recall  f1-score   support

       CEO_of       0.96      0.90      0.93       341
  acquired_by       0.99      0.98      0.99       494
   founded_by       0.96      0.99      0.97       751
  invested_in       0.98      0.98      0.98       459
     owned_by       1.00      0.99      0.99        97
partners_with       0.97      0.99      0.98       115
subsidiary_of       0.98      0.97      0.98       150

     accuracy                           0.97      2407
    macro avg       0.98      0.97      0.98      2407
 weighted avg       0.97      0.97      0.97      2407

