# Cleaning Zonal Tweets

In this notebook, we are cleaning the **Covishield** and **Covaxin** tweets retrieved from different Zones of India :-<br>
* North Zone
* East Zone
* West Zone
* South Zone
* North East Zone


In [26]:
#Importing libraries
import pandas as pd
import nltk #Natural Language Processing
import re

## Covishield Tweets

In [28]:
# Reading all the tweets retrieved from different zones of India

covishield_north = pd.read_csv('covishield_north_tweets.csv')
covishield_east = pd.read_csv('covishield_east_tweets.csv')
covishield_west = pd.read_csv('covishield_west_tweets.csv')
covishield_south = pd.read_csv('covishield_south_tweets.csv')
covishield_northeast = pd.read_csv('covishield_northeast_tweets.csv')

In [29]:
# Head of the dataframe
covishield_north.head()

Unnamed: 0,user_location,tweet_created_at,tweet_text
0,"Ghaziabad, India",18-04-2021 13:07,@sandesh_ic @iAsura_ &amp; In my family 2 docs...
1,New Delhi,18-04-2021 13:07,#AndhraPradesh on Sunday received two lakh dos...
2,"New Delhi, India",18-04-2021 12:53,"@ShekharGupta 1/2\nYes, more brand vaccine pro..."
3,New Delhi,18-04-2021 12:39,@cloudnikki @ArvinderSoin @freedoomer 2nd dose...
4,"Lucknow, India",18-04-2021 12:37,@DrKKAggarwal My father has tested positive af...


In [30]:
# Retrieving the Text Column from the Tweets

covishield_north_tweets = covishield_north['tweet_text']
covishield_east_tweets = covishield_east['tweet_text']
covishield_west_tweets = covishield_west['tweet_text']
covishield_south_tweets = covishield_south['tweet_text']
covishield_northeast_tweets = covishield_northeast['tweet_text']

In [31]:
def clean_tweet(tweets):
    '''
    Helps in cleaning text by removing Tweets URL , @username , removing all characters except Alphabets/Numbers
    and Returns clean Tweets list
    '''
    tweet_clean=[]
    for i in range(len(tweets)):
        #Removing ',' with no space (done for 3,700)
        if re.search(",", tweets[i]):
            result = re.sub('[,]','',tweets[i])
        else:
            result = tweets[i]
        #Replacing https with empty string
        result = re.sub(r"http\S+","",result)
        #Replacing @username with empty string
        result = re.sub(r"@\S+","", result)
        #Replacing all with empty space except Alphabets and digits
        result = re.sub('[^a-zA-Z\d]',' ',result)

        tweet_clean.append(result)
    return tweet_clean

In [32]:
# Getting clean tweets after applying the above function

covishield_north_clean_tweets = clean_tweet(covishield_north_tweets)
covishield_east_clean_tweets = clean_tweet(covishield_east_tweets)
covishield_west_clean_tweets = clean_tweet(covishield_west_tweets)
covishield_south_clean_tweets = clean_tweet(covishield_south_tweets)
covishield_northeast_clean_tweets = clean_tweet(covishield_northeast_tweets)

In [33]:
# Covishield North Clean Tweets List 

covishield_north_clean_tweets

['   amp  In my family 2 docs have got Covaxin  amp  2 hav got Covishield  Neither hav been infected again up until now while working all d time in hospitals ',
 ' AndhraPradesh on Sunday received two lakh doses of  Covid19 vaccine  Covishield from Serum Institute of India  Pune   Photo  IANS  File  ',
 ' 1 2 Yes more brand vaccine production is very much required and increase in production of covishieldcovaxin    Also get the CRP tested before vaccination it should be  lt  1 as per doctors video ',
 '   2nd dose he got a month ago so 1st dose another 4 weeks before that  Only covishield was in favor at that time ',
 ' My father has tested positive after 1st dose covishield His CRP is normal but platelets are low 120  and D Dimer is very very high  10000  Also HRCT is 2 25  He is on low molecular weight hipparin and dexamethasone    Is there anything to panic    ',
 ' My mom tested  ve even after getting two shots of Covishield ',
 'Both Covishield and Covaxin have been deemed safe for

In [34]:
# Adding Tweet Clean Column in Dataframe 

covishield_north['tweet_clean'] = covishield_north_clean_tweets
covishield_east['tweet_clean'] = covishield_east_clean_tweets
covishield_west['tweet_clean'] = covishield_west_clean_tweets
covishield_south['tweet_clean'] = covishield_south_clean_tweets
covishield_northeast['tweet_clean'] = covishield_northeast_clean_tweets

In [35]:
# Head of dataframe after adding the above column
covishield_north.head()

Unnamed: 0,user_location,tweet_created_at,tweet_text,tweet_clean
0,"Ghaziabad, India",18-04-2021 13:07,@sandesh_ic @iAsura_ &amp; In my family 2 docs...,amp In my family 2 docs have got Covaxin ...
1,New Delhi,18-04-2021 13:07,#AndhraPradesh on Sunday received two lakh dos...,AndhraPradesh on Sunday received two lakh dos...
2,"New Delhi, India",18-04-2021 12:53,"@ShekharGupta 1/2\nYes, more brand vaccine pro...",1 2 Yes more brand vaccine production is very...
3,New Delhi,18-04-2021 12:39,@cloudnikki @ArvinderSoin @freedoomer 2nd dose...,2nd dose he got a month ago so 1st dose ano...
4,"Lucknow, India",18-04-2021 12:37,@DrKKAggarwal My father has tested positive af...,My father has tested positive after 1st dose ...


In [37]:
#Saving all the Clean Tweets to a CSV file

covishield_north.to_csv('covishield_north_clean_tweets.csv',index=False)
covishield_east.to_csv('covishield_east_clean_tweets.csv',index=False)
covishield_west.to_csv('covishield_west_clean_tweets.csv',index=False)
covishield_south.to_csv('covishield_south_clean_tweets.csv',index=False)
covishield_northeast.to_csv('covishield_northeast_clean_tweets.csv',index=False)

## Covaxin Tweets

In [61]:
# Reading all the tweets retrieved from different zones of India

covaxin_north = pd.read_csv('covaxin_north_tweets.csv')
covaxin_east = pd.read_csv('covaxin_east_tweets.csv')
covaxin_west = pd.read_csv('covaxin_west_tweets.csv')
covaxin_south = pd.read_csv('covaxin_south_tweets.csv')
covaxin_northeast = pd.read_csv('covaxin_northeast_tweets.csv')

In [62]:
# Head of Dataframe
covaxin_north.head()

Unnamed: 0,user_location,tweet_created_at,tweet_text
0,Delhi,18-04-2021 13:21,Mamata didi is covishield and covaxin combined...
1,"Ghaziabad, India",18-04-2021 13:07,@sandesh_ic @iAsura_ &amp; In my family 2 docs...
2,"New Delhi, India",18-04-2021 12:53,"@ShekharGupta 1/2\nYes, more brand vaccine pro..."
3,New Delhi,18-04-2021 12:43,@cloudnikki @ArvinderSoin @freedoomer in delhi...
4,Noida,18-04-2021 12:12,Both Covishield and Covaxin have been deemed s...


In [63]:
# Retrieving the Text Column from the Tweets

covaxin_north_tweets = covaxin_north['tweet_text']
covaxin_east_tweets = covaxin_east['tweet_text']
covaxin_west_tweets = covaxin_west['tweet_text']
covaxin_south_tweets = covaxin_south['tweet_text']
covaxin_northeast_tweets = covaxin_northeast['tweet_text']

In [65]:
# Getting clean tweets after applying the above function

covaxin_north_clean_tweets = clean_tweet(covaxin_north_tweets)
covaxin_east_clean_tweets = clean_tweet(covaxin_east_tweets)
covaxin_west_clean_tweets = clean_tweet(covaxin_west_tweets)
covaxin_south_clean_tweets = clean_tweet(covaxin_south_tweets)
covaxin_northeast_clean_tweets = clean_tweet(covaxin_northeast_tweets)

In [66]:
# Covaxin North Clean Tweets List 

covaxin_north_clean_tweets

['Mamata didi is covishield and covaxin combined in to one ',
 '   amp  In my family 2 docs have got Covaxin  amp  2 hav got Covishield  Neither hav been infected again up until now while working all d time in hospitals ',
 ' 1 2 Yes more brand vaccine production is very much required and increase in production of covishieldcovaxin    Also get the CRP tested before vaccination it should be  lt  1 as per doctors video ',
 '   in delhi we dont have covaxin ',
 'Both Covishield and Covaxin have been deemed safe for usage in India  What you get depends on the availability at the centre  Follow   Sanjeevani   A Shot Of Life a CSR initiative by  for more awareness   Vaccine  LagayaKya  ',
 '  Covaxin production can t increase at SII s level even after several production lines',
 'This report about Covaxin is very serious  It is so serious that it may have triggered  CoronaSecondWave in India  Pressurize the company  CDSCO and Indian government to answer   is your take on this   ',
 ' Data of

In [67]:
# Adding Tweet Clean Column in Dataframe 

covaxin_north['tweet_clean'] = covaxin_north_clean_tweets
covaxin_east['tweet_clean'] = covaxin_east_clean_tweets
covaxin_west['tweet_clean'] = covaxin_west_clean_tweets
covaxin_south['tweet_clean'] = covaxin_south_clean_tweets
covaxin_northeast['tweet_clean'] = covaxin_northeast_clean_tweets

In [68]:
# Head of dataframe after adding the above column
covaxin_north.head()

Unnamed: 0,user_location,tweet_created_at,tweet_text,tweet_clean
0,Delhi,18-04-2021 13:21,Mamata didi is covishield and covaxin combined...,Mamata didi is covishield and covaxin combined...
1,"Ghaziabad, India",18-04-2021 13:07,@sandesh_ic @iAsura_ &amp; In my family 2 docs...,amp In my family 2 docs have got Covaxin ...
2,"New Delhi, India",18-04-2021 12:53,"@ShekharGupta 1/2\nYes, more brand vaccine pro...",1 2 Yes more brand vaccine production is very...
3,New Delhi,18-04-2021 12:43,@cloudnikki @ArvinderSoin @freedoomer in delhi...,in delhi we dont have covaxin
4,Noida,18-04-2021 12:12,Both Covishield and Covaxin have been deemed s...,Both Covishield and Covaxin have been deemed s...


In [70]:
#Saving all the Clean Tweets to a CSV file

covaxin_north.to_csv('covaxin_north_clean_tweets.csv',index=False)
covaxin_east.to_csv('covaxin_east_clean_tweets.csv',index=False)
covaxin_west.to_csv('covaxin_west_clean_tweets.csv',index=False)
covaxin_south.to_csv('covaxin_south_clean_tweets.csv',index=False)
covaxin_northeast.to_csv('covaxin_northeast_clean_tweets.csv',index=False)