<a href="https://colab.research.google.com/github/msyed92/twitter_clone/blob/main/twitter_edit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twitter Edit Button About

## Why edit button?

The goal of this project is to ensure users are able to edit spelling and gramatical errors, as well as missing words in their tweets. The idea is, as long as the two tweets have the same meaning, the edit should be "allowed". The model should determine if tweets are too dissimilar, and the edit won't be published.

## Challenges

If a user is editing their tweet for spelling mistakes, the edited version and the original might have completely different meanings.

For example: If a user tweeted this:

"I'm socked to here this"

and meant to tweet this:

"I'm shocked to hear this"

Their cosine similarity score is 0.328 with RoBERTa, and 0.586 with BERT

One way to approach this challenge is to look through current twitter data and determine what the most common spelling errors are, and add those words as aliases.

Possibly take a sample of tweets, determine what edits *could* be made, and determine different ways to embed the words so they are computed as similar.
Or ask people to edit 5? of their own tweets to get a better/less biased sample of data.

# Using Out of the Box

## Install libraries and packages

In [None]:
%pip install transformers
%pip install sentence-transformers

In [None]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

## Roberta

In [None]:
model = SentenceTransformer('all-mpnet-base-v2')

## BERT

In [None]:
tweet1 = "I am not shocked to hear that"
tweet2 = "I am shocked to hear that!"
tweet3 = "I'm surprised to hear that"

In [None]:
#model = SentenceTransformer('all-distilroberta-v1')
model = SentenceTransformer('all-mpnet-base-v2')
sentences = [tweet1, tweet2, tweet3]
sentence_embeddings = model.encode(sentences)
sentence_embeddings

In [None]:
scores = cosine_similarity(
    [sentence_embeddings[0]],
    sentence_embeddings[1:]
)[0]

print("Tweet 1 and 2 Similarity Score:", scores[0])
print("Tweet 1 and 3 Similarity Score:", scores[1])

scores = cosine_similarity(
    [sentence_embeddings[1]],
    [sentence_embeddings[0], sentence_embeddings[2]]
)[0]

print("Tweet 2 and 3 Similarity Score:", scores[1])

# Replacing Misspelled words

In [1]:
%pip install python-twitter
%pip install python-decouple


Note: you may need to restart the kernel to use updated packages.


In [2]:
import sys
import operator
import requests
import json
import twitter
from decouple import config

In [3]:
twitter_consumer_key = config('API_KEY')
twitter_consumer_secret = config('API_SECRET_KEY')
twitter_access_token = config('ACCESS_TOKEN')
twitter_access_secret = config('ACCESS_TOKEN_SECRET')

In [6]:
twitter_api = twitter.Api(consumer_key=twitter_consumer_key,
                        consumer_secret=twitter_consumer_secret,
                        access_token_key=twitter_access_token,
                        access_token_secret=twitter_access_secret)

In [7]:
statuses = twitter_api.GetUserTimeline(screen_name='phenomanaal', count=200, include_rts=False)

In [56]:
results = twitter_api.GetSearch(
    raw_query="q=twitter%20&result_type=recent&count=1000")

'RT @saraschaefer1: doing your own research just got a whole lot easier! https://t.co/vEBta1K08p'