<a href="https://colab.research.google.com/github/sarcasticvibes/TweetSentimentExtraction_via_BERT/blob/master/TwitterSentimentExtraction_via_BERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twitter Sentiment Extraction

---



>   This task involves *extracting* *segments* *of* *a* *given* *tweet* which give the tweet a particular sentiment (**positive**, **negative** or **neutral**).

>  I have decided to use **BERT** for performing this task since we can use the *given* *sentiment* *as* *a* *question* and the *given* *tweet* *as* *the* *context* thus making the task similar to **Extractive** **Question** **Answering**.

>  **BERT** takes two sentences as input. For *QnA* tasks the first sentence is the Question while the second sentence is the Context.

>   We have to add **[CLS]** token at the start of each input sent to the model.
  After the first sentence (*Sentiment* in our case) we add an **[SEP]** token which is followed by the second sentence (*Tweet* in our case). At the end of the second sentence we again add the **[SEP]** token.

>  We add **[PAD]** tokens at the end to make all the inputs of the same size.

>  I'm using `tokenizers.BertWordPieceTokenizer` by **Hugging Face** for encoding the tweets and then add the sentiment to this encoding.

>  The **BERT** model used is from `transformers` library also from **Hugging Face**



---

## Importing the Dataset and installing the required dependencies
---

In [1]:
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"sarcasticvibes","key":"41883d4fbccff1f3662a136f9c271ee4"}'}

In [0]:
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

In [3]:
! kaggle competitions download -c 'tweet-sentiment-extraction'
! mkdir training_set

Downloading sample_submission.csv to /content
  0% 0.00/41.4k [00:00<?, ?B/s]
100% 41.4k/41.4k [00:00<00:00, 57.5MB/s]
Downloading train.csv.zip to /content
  0% 0.00/1.23M [00:00<?, ?B/s]
100% 1.23M/1.23M [00:00<00:00, 84.3MB/s]
Downloading test.csv to /content
  0% 0.00/307k [00:00<?, ?B/s]
100% 307k/307k [00:00<00:00, 92.9MB/s]


In [4]:
! unzip train.csv.zip -d training_set

Archive:  train.csv.zip
  inflating: training_set/train.csv  


In [5]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/48/35/ad2c5b1b8f99feaaf9d7cdadaeef261f098c6e1a6a2935d4d07662a6b780/transformers-2.11.0-py3-none-any.whl (674kB)
[K     |▌                               | 10kB 20.1MB/s eta 0:00:01[K     |█                               | 20kB 6.3MB/s eta 0:00:01[K     |█▌                              | 30kB 7.2MB/s eta 0:00:01[K     |██                              | 40kB 7.9MB/s eta 0:00:01[K     |██▍                             | 51kB 6.1MB/s eta 0:00:01[K     |███                             | 61kB 6.1MB/s eta 0:00:01[K     |███▍                            | 71kB 5.8MB/s eta 0:00:01[K     |███▉                            | 81kB 5.9MB/s eta 0:00:01[K     |████▍                           | 92kB 6.0MB/s eta 0:00:01[K     |████▉                           | 102kB 6.0MB/s eta 0:00:01[K     |█████▍                          | 112kB 6.0MB/s eta 0:00:01[K     |█████▉                          | 122kB 6.0

In [6]:
!wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt

--2020-06-10 21:03:27--  https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.8.253
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.8.253|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 231508 (226K) [text/plain]
Saving to: ‘bert-base-uncased-vocab.txt’


2020-06-10 21:03:28 (804 KB/s) - ‘bert-base-uncased-vocab.txt’ saved [231508/231508]



## Importing the required libraries
---

In [0]:
import os
import torch
import pandas as pd
import torch.nn as nn
import numpy as np
import torch.nn.functional as F
from torch.optim import lr_scheduler

from sklearn import model_selection
from sklearn import metrics
import transformers
import tokenizers
from transformers import AdamW
from transformers import get_linear_schedule_with_warmup
from tqdm.autonotebook import tqdm
import utils
from sklearn.model_selection import train_test_split

In [30]:
torch.cuda.get_device_name()

'Tesla P100-PCIE-16GB'

## DataLoader for the Dataset
---
  - The **BERT** model expects a specific format as its input. It takes token ids for each token generated by the tokenizer, mask (contains *1* in all the places where there is an **actual** **input** **token** and *0* in places where we add **padding**) and token type ids(contains 0 for the first sentence and padding and 1 for the second sentence).
  - We are taking the tweets an generating these along with an offset and returning a dictonary with these as its values.

In [0]:
class TweetDataset:
    def __init__(self, tweet, selected_text, sentiment):
        self.tweet = tweet
        self.selected_text = selected_text
        self.sentiment = sentiment
        self.tokenizer = TOKENIZER
        self.max_len = MAX_LEN
    
    def __len__(self):
        return len(self.tweet)
    
    def __getitem__(self, item):
        tweet = str(self.tweet[item])
        tweet = " ".join(tweet.split())
        selected_text = str(self.selected_text[item])
        selected_text = " ".join(selected_text.split())
        sentiment = self.sentiment[item]


        len_sel_txt = len(selected_text)
        start_indx = -1
        end_indx  = -1

        for indx in (i for i, e in enumerate(tweet) if e == selected_text[0]):
          if tweet[indx:indx + len_sel_txt] == selected_text:
            start_indx = indx
            end_indx = indx + len_sel_txt - 1
            break

        char_targets = [0] * len(tweet)
        if start_indx != -1 and end_indx != -1:
          for j in range(start_indx, end_indx+1):
            if tweet[j] != ' ':
              char_targets[j] = 1

        tokenized_tweets = self.tokenizer.encode(tweet)

        tweet_ids = tokenized_tweets.ids
        tweet_offsets = tokenized_tweets.offsets[1: -1]
        tweet_tokens = tokenized_tweets.tokens

        target_idx = []
        for j, (offset1, offset2) in enumerate(tweet_offsets):
          if sum(char_targets[offset1: offset2]) > 0:
            target_idx.append(j)

        targets_start = target_idx[0]
        targets_end = target_idx[-1]

        sentiment_id = {'positive': 3893,
                        'negative': 4997,
                        'neutral': 8699}
    
        input_ids = [101] + [sentiment_id[sentiment]] + [102] + tweet_ids[1:]
        token_type_ids = [0, 0, 0] + [1] * (len(tweet_ids) - 1)
        mask = [1] * len(token_type_ids)
        tweet_offsets = [(0, 0)] * 3 + tweet_offsets + [(0, 0)]
        targets_start += 3
        targets_end += 3

        padding_length = self.max_len - len(tweet_tokens)
        
        if padding_length > 0:
          input_ids = input_ids + ([0] * padding_length)
          mask = mask + ([0] * padding_length)
          token_type_ids = token_type_ids + ([0] * padding_length)
          tweet_offsets = tweet_offsets + ([(0, 0)] * padding_length)
        return {'ids': torch.tensor(input_ids, dtype=torch.long),
                'mask': torch.tensor(mask, dtype=torch.long),
                'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
                'targets_start': torch.tensor(targets_start, dtype=torch.long),
                'targets_end': torch.tensor(targets_end, dtype=torch.long),
                'orig_tweet': tweet,
                'orig_selected': selected_text,
                'sentiment': sentiment,
                'offsets': torch.tensor(tweet_offsets, dtype=torch.long)}

## Model
---
 - We using Base BERT model and adding a Linear Layer on it which produces two outputs.
 - The first output acts as the starting point index and the second output acts as the ending point index

In [0]:
class TweetExtractionModel(transformers.BertPreTrainedModel):
    def __init__(self, conf):
        super(TweetExtractionModel, self).__init__(conf)
        self.bert = transformers.BertModel.from_pretrained('bert-base-uncased', config=conf)
        self.drop_out = nn.Dropout(0.1)
        self.output_layer = nn.Linear(768 * 2, 2)
        torch.nn.init.normal_(self.output_layer.weight, std=0.02)
    
    def forward(self, ids, mask, token_type_ids):
        _, _, out = self.bert(ids, attention_mask=mask, token_type_ids=token_type_ids)

        out = torch.cat((out[-1], out[-2]), dim=-1)
        out = self.drop_out(out)
        logits = self.output_layer(out)

        start_logits, end_logits = logits.split(1, dim=-1)

        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        return start_logits, end_logits

## Training Functions
---

In [0]:
def train_fn(data_loader, model, optimizer, device, scheduler=None):
    model.train()
    losses = utils.AverageMeter()
    jaccards = utils.AverageMeter()

    tk0 = tqdm(data_loader, total=len(data_loader))
    
    for _, d in enumerate(tk0):

        ids = d["ids"]
        token_type_ids = d["token_type_ids"]
        mask = d["mask"]
        targets_start = d["targets_start"]
        targets_end = d["targets_end"]
        sentiment = d["sentiment"]
        orig_selected = d["orig_selected"]
        orig_tweet = d["orig_tweet"]
        targets_start = d["targets_start"]
        targets_end = d["targets_end"]
        offsets = d["offsets"]

        ids = ids.to(device, dtype=torch.long)
        token_type_ids = token_type_ids.to(device, dtype=torch.long)
        mask = mask.to(device, dtype=torch.long)
        targets_start = targets_start.to(device, dtype=torch.long)
        targets_end = targets_end.to(device, dtype=torch.long)

        model.zero_grad()
        outputs_start, outputs_end = model(ids=ids, mask=mask, token_type_ids=token_type_ids,)
        loss = loss_fn(outputs_start, outputs_end, targets_start, targets_end)
        loss.backward()
        optimizer.step()
        scheduler.step()

        outputs_start = torch.softmax(outputs_start, dim=1).cpu().detach().numpy()
        outputs_end = torch.softmax(outputs_end, dim=1).cpu().detach().numpy()
        jaccard_scores = []
        for px, tweet in enumerate(orig_tweet):
            selected_tweet = orig_selected[px]
            tweet_sentiment = sentiment[px]
            jaccard_score, _ = calculate_jaccard_score(
                original_tweet=tweet,
                target_string=selected_tweet,
                sentiment_val=tweet_sentiment,
                idx_start=np.argmax(outputs_start[px, :]),
                idx_end=np.argmax(outputs_end[px, :]),
                offsets=offsets[px]
            )
            jaccard_scores.append(jaccard_score)

        jaccards.update(np.mean(jaccard_scores), ids.size(0))
        losses.update(loss.item(), ids.size(0))
        tk0.set_postfix(loss=losses.avg, jaccard=jaccards.avg)

In [0]:
def train():
    dfx = pd.read_csv('/content/training_set/train.csv')

    df_train, df_valid = train_test_split(dfx, test_size=.1)
    
    train_dataset = TweetDataset(
        tweet=df_train.text.values,
        sentiment=df_train.sentiment.values,
        selected_text=df_train.selected_text.values
    )

    train_data_loader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=TRAIN_BATCH_SIZE,
        num_workers=4
    )

    valid_dataset = TweetDataset(
        tweet=df_valid.text.values,
        sentiment=df_valid.sentiment.values,
        selected_text=df_valid.selected_text.values
    )

    valid_data_loader = torch.utils.data.DataLoader(
        valid_dataset,
        batch_size=VALID_BATCH_SIZE,
        num_workers=2
    )

    device = torch.device("cuda")
    model_config = transformers.BertConfig.from_pretrained('bert-base-uncased')
    model_config.output_hidden_states = True
    model = TweetExtractionModel(conf=model_config)
    model.to(device)

    num_train_steps = int(len(df_train) / TRAIN_BATCH_SIZE * EPOCHS)
    param_optimizer = list(model.named_parameters())
    no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
    optimizer_parameters = [
        {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.001},
        {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0},
    ]
    optimizer = AdamW(optimizer_parameters, lr=3e-5)
    scheduler = get_linear_schedule_with_warmup(
        optimizer, 
        num_warmup_steps=0, 
        num_training_steps=num_train_steps
    )

    es = utils.EarlyStopping(patience=2, mode="max")
    print(f"Training is Starting:")
    
    for epoch in range(EPOCH):
        train_fn(train_data_loader, model, optimizer, device, scheduler=scheduler)
        jaccard = eval_fn(valid_data_loader, model, device)
        print(f"Jaccard Score = {jaccard}")
        es(jaccard, model, model_path=f"model.bin")
        if es.early_stop:
            print("Early stopping")
            break

## Loss Function
---

In [0]:
def loss_fn(start_logits, end_logits, start_positions, end_positions):
    loss_function = nn.CrossEntropyLoss()
    start_loss = loss_function(start_logits, start_positions)
    end_loss = loss_function(end_logits, end_positions)
    total_loss = (start_loss + end_loss)
    return total_loss

## Evaluation Functions
---

In [0]:
def calculate_jaccard_score(original_tweet, target_string, sentiment_val, idx_start, idx_end, offsets):
    
    if idx_end < idx_start:
        idx_end = idx_start
    
    filtered_output  = ""
    for ix in range(idx_start, idx_end + 1):
        filtered_output += original_tweet[offsets[ix][0]: offsets[ix][1]]
        if (ix+1) < len(offsets) and offsets[ix][1] < offsets[ix+1][0]:
            filtered_output += " "

    if sentiment_val == "neutral" or len(original_tweet.split()) < 2:
        filtered_output = original_tweet

    jac = utils.jaccard(target_string.strip(), filtered_output.strip())
    return jac, filtered_output

In [0]:
def eval_fn(data_loader, model, device):
    model.eval()
    losses = utils.AverageMeter()
    jaccards = utils.AverageMeter()
    
    with torch.no_grad():
        tk0 = tqdm(data_loader, total=len(data_loader))
        for _, d in enumerate(tk0):
            ids = d["ids"]
            token_type_ids = d["token_type_ids"]
            mask = d["mask"]
            sentiment = d["sentiment"]
            orig_selected = d["orig_selected"]
            orig_tweet = d["orig_tweet"]
            targets_start = d["targets_start"]
            targets_end = d["targets_end"]
            offsets = d["offsets"].numpy()

            ids = ids.to(device, dtype=torch.long)
            token_type_ids = token_type_ids.to(device, dtype=torch.long)
            mask = mask.to(device, dtype=torch.long)
            targets_start = targets_start.to(device, dtype=torch.long)
            targets_end = targets_end.to(device, dtype=torch.long)

            outputs_start, outputs_end = model(
                ids=ids,
                mask=mask,
                token_type_ids=token_type_ids
            )
            loss = loss_fn(outputs_start, outputs_end, targets_start, targets_end)
            outputs_start = torch.softmax(outputs_start, dim=1).cpu().detach().numpy()
            outputs_end = torch.softmax(outputs_end, dim=1).cpu().detach().numpy()
            jaccard_scores = []
            for px, tweet in enumerate(orig_tweet):
                selected_tweet = orig_selected[px]
                tweet_sentiment = sentiment[px]
                jaccard_score, _ = calculate_jaccard_score(
                    original_tweet=tweet,
                    target_string=selected_tweet,
                    sentiment_val=tweet_sentiment,
                    idx_start=np.argmax(outputs_start[px, :]),
                    idx_end=np.argmax(outputs_end[px, :]),
                    offsets=offsets[px]
                )
                jaccard_scores.append(jaccard_score)

            jaccards.update(np.mean(jaccard_scores), ids.size(0))
            losses.update(loss.item(), ids.size(0))
            tk0.set_postfix(loss=losses.avg, jaccard=jaccards.avg)
    
    print(f"Jaccard = {jaccards.avg}")
    return jaccards.avg

## Coniguration for training
---

In [0]:
MAX_LEN = 128
TRAIN_BATCH_SIZE = 32
VALID_BATCH_SIZE = 16
EPOCHS = 5
TOKENIZER = tokenizers.BertWordPieceTokenizer('bert-base-uncased-vocab.txt', clean_text=True, lowercase=True)

## Training
---

In [13]:
train()

Training is Starting:


HBox(children=(FloatProgress(value=0.0, max=773.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=172.0), HTML(value='')))


Jaccard = 0.711174996060997
Jaccard Score = 0.711174996060997
Validation score improved (-inf --> 0.711174996060997). Saving model!


HBox(children=(FloatProgress(value=0.0, max=773.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=172.0), HTML(value='')))


Jaccard = 0.7154709548545622
Jaccard Score = 0.7154709548545622
Validation score improved (0.711174996060997 --> 0.7154709548545622). Saving model!


HBox(children=(FloatProgress(value=0.0, max=773.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=172.0), HTML(value='')))


Jaccard = 0.7074666144341393
Jaccard Score = 0.7074666144341393
EarlyStopping counter: 1 out of 2


HBox(children=(FloatProgress(value=0.0, max=773.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=172.0), HTML(value='')))


Jaccard = 0.6978409495675766
Jaccard Score = 0.6978409495675766
EarlyStopping counter: 2 out of 2
Early stopping


## Evaluation on test set
---

In [0]:
df_test = pd.read_csv("test.csv")
df_test.loc[:, "selected_text"] = df_test.text.values

In [0]:
device = torch.device("cuda")
model_config = transformers.BertConfig.from_pretrained('bert-base-uncased')
model_config.output_hidden_states = True

In [18]:
model = TweetExtractionModel(conf=model_config)
model.to(device)
model.load_state_dict(torch.load("model.bin"))
model.eval()

TweetExtractionModel(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affi

In [22]:
final_output = []

test_dataset = TweetDataset(
        tweet=df_test.text.values,
        sentiment=df_test.sentiment.values,
        selected_text=df_test.selected_text.values
)

data_loader = torch.utils.data.DataLoader(
    test_dataset,
    shuffle=False,
    batch_size=VALID_BATCH_SIZE,
    num_workers=1
)

with torch.no_grad():
    tk0 = tqdm(data_loader, total=len(data_loader))
    for bi, d in enumerate(tk0):
        ids = d["ids"]
        token_type_ids = d["token_type_ids"]
        mask = d["mask"]
        sentiment = d["sentiment"]
        orig_selected = d["orig_selected"]
        orig_tweet = d["orig_tweet"]
        targets_start = d["targets_start"]
        targets_end = d["targets_end"]
        offsets = d["offsets"].numpy()

        ids = ids.to(device, dtype=torch.long)
        token_type_ids = token_type_ids.to(device, dtype=torch.long)
        mask = mask.to(device, dtype=torch.long)
        targets_start = targets_start.to(device, dtype=torch.long)
        targets_end = targets_end.to(device, dtype=torch.long)

        outputs_start, outputs_end = model(ids=ids, mask=mask, token_type_ids=token_type_ids)

        outputs_start = torch.softmax(outputs_start, dim=1).cpu().detach().numpy()
        outputs_end = torch.softmax(outputs_end, dim=1).cpu().detach().numpy()
        for px, tweet in enumerate(orig_tweet):
            selected_tweet = orig_selected[px]
            tweet_sentiment = sentiment[px]
            _, output_sentence = calculate_jaccard_score(original_tweet=tweet,
                                                         target_string=selected_tweet,
                                                         sentiment_val=tweet_sentiment,
                                                         idx_start=np.argmax(outputs_start[px, :]),
                                                         idx_end=np.argmax(outputs_end[px, :]),
                                                         offsets=offsets[px])
            
            final_output.append(output_sentence)

HBox(children=(FloatProgress(value=0.0, max=221.0), HTML(value='')))




In [0]:
df_test.loc[:, "selected_text"] = final_output

## Result
---

In [25]:
df_test

Unnamed: 0,textID,text,sentiment,selected_text
0,f87dea47db,Last session of the day http://twitpic.com/67ezh,neutral,Last session of the day http://twitpic.com/67ezh
1,96d74cb729,Shanghai is also really exciting (precisely -...,positive,Good
2,eee518ae67,"Recession hit Veronique Branquinho, she has to...",negative,shame!
3,01082688c6,happy bday!,positive,happy bday!
4,33987a8ee5,http://twitpic.com/4w75p - I like it!!,positive,I like it!!
...,...,...,...,...
3529,e5f0e6ef4b,"its at 3 am, im very tired but i can`t sleep ...",negative,im very tired
3530,416863ce47,All alone in this old house again. Thanks for...,positive,Thanks
3531,6332da480c,I know what you mean. My little dog is sinkin...,negative,depression..
3532,df1baec676,_sutra what is your next youtube video gonna b...,positive,I love


In [0]:
df_test.to_csv('/content/drive/My Drive/TwitterSentimentExtraction_via_BERT_Result.csv')

# utils.py

In [0]:
import numpy as np
import torch


class AverageMeter:
    """
    Computes and stores the average and current value
    """
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count


class EarlyStopping:
    def __init__(self, patience=7, mode="max", delta=0.001):
        self.patience = patience
        self.counter = 0
        self.mode = mode
        self.best_score = None
        self.early_stop = False
        self.delta = delta
        if self.mode == "min":
            self.val_score = np.Inf
        else:
            self.val_score = -np.Inf

    def __call__(self, epoch_score, model, model_path):

        if self.mode == "min":
            score = -1.0 * epoch_score
        else:
            score = np.copy(epoch_score)

        if self.best_score is None:
            self.best_score = score
            self.save_checkpoint(epoch_score, model, model_path)
        elif score < self.best_score + self.delta:
            self.counter += 1
            print('EarlyStopping counter: {} out of {}'.format(self.counter, self.patience))
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = score
            self.save_checkpoint(epoch_score, model, model_path)
            self.counter = 0

    def save_checkpoint(self, epoch_score, model, model_path):
        if epoch_score not in [-np.inf, np.inf, -np.nan, np.nan]:
            print('Validation score improved ({} --> {}). Saving model!'.format(self.val_score, epoch_score))
            torch.save(model.state_dict(), model_path)
        self.val_score = epoch_score


def jaccard(str1, str2): 
    a = set(str1.lower().split()) 
    b = set(str2.lower().split())
    c = a.intersection(b)
    return float(len(c)) / (len(a) + len(b) - len(c))

# Result csv link:
---

In [0]:
https://drive.google.com/file/d/1o9lTVQxH51ItvsDwA1tKYHMV9dDcnWI-/view?usp=sharing