# Random select n records for manual labelling

Randomly select n rows from source csv to destination csv.
- source csv: csv file containing scraped tweets
- destination csv: csv file for manually labelling tweets

Note: selected rows will be deleted from source csv to allow reselection without duplicates.

In [1]:
import pandas as pd

In [2]:
df_src = pd.read_csv("Raw datasets/tweets_raw_for_randselect.csv")

df_src.head()

Unnamed: 0,url,datetime,text,tweet_id,username,retweet_count,like_count
0,https://twitter.com/nickytwoeyes/status/163928...,2023-03-24 23:28:42+08:00,@Cryptowizardd77 @crypto_rand @elonmusk @Hobbe...,1639288136076328960,nickytwoeyes,0,0
1,https://twitter.com/Dr_Bed_Dr/status/163928813...,2023-03-24 23:28:42+08:00,@MarkusWoat @elonmusk Hurrah,1639288133232590852,Dr_Bed_Dr,0,0
2,https://twitter.com/lill63416788/status/163928...,2023-03-24 23:28:41+08:00,@cb_doge @elonmusk wow that's so amazing 2look...,1639288131819302913,lill63416788,0,0
3,https://twitter.com/starflower1959/status/1639...,2023-03-24 23:28:41+08:00,@elonmusk @BillyM2k Hmmm …how about Australia?,1639288128962715648,starflower1959,0,0
4,https://twitter.com/DBrubaker13/status/1639288...,2023-03-24 23:28:40+08:00,@jayinneveh @williamlegate @elonmusk The only ...,1639288127079546880,DBrubaker13,0,0


In [3]:
# shuffle rows in source dataframe
df_src = df_src.sample(frac=1, random_state=0).reset_index(drop=True)  # add drop=True to not create "index" column

df_src.head()

Unnamed: 0,url,datetime,text,tweet_id,username,retweet_count,like_count
0,https://twitter.com/its_sommy/status/163925892...,2023-03-24 21:32:39+08:00,@fleetingSecs @elonmusk 😭😭😭 it will work in a ...,1639258928184786945,its_sommy,0,0
1,https://twitter.com/John14049/status/163928519...,2023-03-24 23:17:01+08:00,@TeslaSynopsis @elonmusk @BillyM2k I think Elo...,1639285195013787648,John14049,0,0
2,https://twitter.com/killakrazed96/status/16392...,2023-03-24 22:56:26+08:00,@elonmusk We reap what we sow and we deserve e...,1639280013559078917,killakrazed96,0,0
3,https://twitter.com/aceengineeruk/status/16392...,2023-03-24 22:14:04+08:00,@yourfavbot2 @minderazu @Pak1Tomas @jacksonhin...,1639269353240535043,aceengineeruk,0,0
4,https://twitter.com/EndlessTravell2/status/163...,2023-03-24 22:57:13+08:00,@elonmusk @Centaur_UK @POTUS Almost like it wa...,1639280210284539905,EndlessTravell2,0,1


In [4]:
# select last n rows from source dataframe into temporary dataframe
df_temp = df_src.tail(600)  # for now, we choose 600

# copy selected rows into destination dataframe
df_dst = pd.DataFrame(columns=df_src.columns)
df_dst = pd.concat([df_dst, df_temp], ignore_index=True)  # add ignore_index=True to reset index

df_dst.head()

Unnamed: 0,url,datetime,text,tweet_id,username,retweet_count,like_count
0,https://twitter.com/DavidMillett/status/163926...,2023-03-24 22:16:08+08:00,Greate article on over population in the New S...,1639269874680709121,DavidMillett,0,0
1,https://twitter.com/Sevenof95/status/163927476...,2023-03-24 22:35:35+08:00,"@Teslaconomics @elonmusk Ignorance is bliss,\n...",1639274766207950854,Sevenof95,0,0
2,https://twitter.com/Markcalladine1/status/1639...,2023-03-24 21:40:13+08:00,@elonmusk The baby is upset he didn't own it f...,1639260835561209859,Markcalladine1,0,0
3,https://twitter.com/unnikrishna/status/1639259...,2023-03-24 21:34:08+08:00,@deepakravindran Blue for all. Socialism! but ...,1639259302148939778,unnikrishna,0,4
4,https://twitter.com/DaleJohnson772/status/1639...,2023-03-24 22:34:27+08:00,@elonmusk Damn I wasn't born with an Emerald m...,1639274481624416256,DaleJohnson772,0,0


In [5]:
df_dst.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   url            600 non-null    object
 1   datetime       600 non-null    object
 2   text           600 non-null    object
 3   tweet_id       600 non-null    object
 4   username       600 non-null    object
 5   retweet_count  600 non-null    object
 6   like_count     600 non-null    object
dtypes: object(7)
memory usage: 32.9+ KB


In [6]:
# delete selected rows from source dataframe
df_src = df_src.drop(df_temp.index)

df_src.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9400 entries, 0 to 9399
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   url            9400 non-null   object
 1   datetime       9400 non-null   object
 2   text           9400 non-null   object
 3   tweet_id       9400 non-null   int64 
 4   username       9400 non-null   object
 5   retweet_count  9400 non-null   int64 
 6   like_count     9400 non-null   int64 
dtypes: int64(3), object(4)
memory usage: 514.2+ KB


In [7]:
# export dataframes to csv
df_dst.to_csv("Labeled Data/data_labelling_tweets.csv", index=False)

df_src.to_csv("Raw datasets/tweets_raw_for_randselect.csv", index=False)

In [8]:
from textblob import TextBlob

In [9]:
def check_sentiment(text):
    print(text)
    res = TextBlob(text)
    
    print('subjectivity: ', res.subjectivity)
    print('polarity: ', res.polarity)
    print('\n')

In [10]:
df_dst["text"].apply(check_sentiment)

Greate article on over population in the New Scientist

https://t.co/Qjf9NqBkUZ

@elonmusk
subjectivity:  0.45454545454545453
polarity:  0.13636363636363635


@Teslaconomics @elonmusk Ignorance is bliss,
or
Living a lie would be more accurate
subjectivity:  0.5666666666666667
polarity:  0.45000000000000007


@elonmusk The baby is upset he didn't own it first
subjectivity:  0.6666666666666666
polarity:  0.425


@deepakravindran Blue for all. Socialism! but pay for it 👆🏿😊 @elonmusk
subjectivity:  0.1
polarity:  0.0


@elonmusk Damn I wasn't born with an Emerald mine guess I will go with a checkmark As I have since 2009.
subjectivity:  0.0
polarity:  0.0


@Michaelc2481 I get this all the time, I thought @elonmusk had sorted it out!
subjectivity:  0.0
polarity:  0.0


https://t.co/vdDsK4g94l - Sources: Elon Musk proposed taking control of OpenAI in 2018 but Sam Altman and other founders rejected the offer, leading Musk to walk away from the company (Reed Albergotti/Semafor) #tech #mobile


0      None
1      None
2      None
3      None
4      None
       ... 
595    None
596    None
597    None
598    None
599    None
Name: text, Length: 600, dtype: object