# Scweet

### **Table of Content:**

 1. [Installation](#head-1)  
 2. [Scraping tweets by words or hashtag](#head-2)    
 3. [Get the main information of a given list of users](#head-3)
 4. [Get followers and following of a given list of users](#head-4)

Scweet is a browser scripting tool that utlizes selenium (headless browser) to scrape data. In it's official github repository, the creator wrote:

>In the last days, Twitter banned almost every twitter scrapers. This repository represent an alternative legal tool (depending on how many seconds we wait between each scrolling) to scrap tweets between two given dates (start_date and max_date), for a given language and list of words or account name, and saves a csv file containing scraped data :

>[UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, Image link, Tweet URL]

I found this could be a good source to bypass the limitation of Official Twitter API, and query a large amount of tweets within a short period of time.

## 1. Install Dependency<a class="anchor" id="head-1"></a>

Dependency of the program based it's requirements.txt:

- selenium
- pandas
- python-dotenv
- chromedriver-autoinstaller
- urllib3

In [None]:
# first install all the dependencies
!pip install -r requirements.txt

## 2. Scraping tweets by words or hashtag<a class="anchor" id="head-2"></a>

In [None]:
from Scweet.scweet import scrap
from Scweet.user import get_user_information, get_users_following, get_users_followers

In [2]:
# scrape top tweets with the words 'covid','covid19' in proximity and without replies.
# the process is slower as the interval is smaller 
# (choose an interval that can divide the period of time betwee, start and max date)

data = scrap(words=['covid','covid19'], # key words
             lang='en', # Tweets language. example : "en" for english and "fr" for french.
             start_date="2020-04-01",  # start date to scrape
             max_date="2020-04-15",  # end date to scrape
             to_account = None, # Tweets replyed to this account
             from_account = None, # Tweets from this account (axample : @Tesla).
             interval=1, # Interval days between each start date and end date for search queries. example : 5.
             headless=True, # Headless webdrives or not. True or False
             display_type="Top", # Display type of twitter page : Latest or Top
             save_images=False, # download image of embedded in the tweets
             resume=False, # Resume the last scraping. specify the csv file path.
             filter_replies=True, # remove reply to
             proximity=True) # close search

Scraping on headless mode.
looking for tweets between 2020-04-01 and 2020-04-02 ...
 path : https://twitter.com/search?q=(covid%20OR%20covid19)%20until%3A2020-04-02%20since%3A2020-04-01%20%20-filter%3Areplies&src=typed_query&lf=on
Tweet made at: 2020-04-01T02:18:55.000Z is found.
Tweet made at: 2020-04-01T04:19:03.000Z is found.
Tweet made at: 2020-04-01T14:14:14.000Z is found.
Tweet made at: 2020-04-01T18:00:17.000Z is found.
scroll  1
Tweet made at: 2020-04-01T20:58:14.000Z is found.
Tweet made at: 2020-04-01T01:16:17.000Z is found.
Tweet made at: 2020-04-01T18:10:02.000Z is found.
Tweet made at: 2020-04-01T23:45:47.000Z is found.
Tweet made at: 2020-04-01T12:44:11.000Z is found.
Tweet made at: 2020-04-01T15:57:53.000Z is found.
Tweet made at: 2020-04-01T22:07:38.000Z is found.
scroll  2
Tweet made at: 2020-04-01T16:32:14.000Z is found.
Tweet made at: 2020-04-01T21:52:38.000Z is found.
Tweet made at: 2020-04-01T12:31:23.000Z is found.
Tweet made at: 2020-04-01T01:37:48.000Z is found.

Tweet made at: 2020-04-04T03:04:40.000Z is found.
Tweet made at: 2020-04-04T08:07:44.000Z is found.
Tweet made at: 2020-04-04T11:01:09.000Z is found.
Tweet made at: 2020-04-04T05:19:56.000Z is found.
Tweet made at: 2020-04-04T20:06:09.000Z is found.
Tweet made at: 2020-04-04T00:32:05.000Z is found.
Tweet made at: 2020-04-04T21:20:11.000Z is found.
Tweet made at: 2020-04-04T01:07:21.000Z is found.
scroll  2
Tweet made at: 2020-04-04T05:38:18.000Z is found.
Tweet made at: 2020-04-04T18:44:02.000Z is found.
Tweet made at: 2020-04-04T14:18:18.000Z is found.
Tweet made at: 2020-04-04T00:57:42.000Z is found.
Tweet made at: 2020-04-04T18:54:07.000Z is found.
Tweet made at: 2020-04-04T12:46:39.000Z is found.
scroll  3
Tweet made at: 2020-04-04T16:52:33.000Z is found.
Tweet made at: 2020-04-04T01:02:13.000Z is found.
Tweet made at: 2020-04-04T07:50:36.000Z is found.
Tweet made at: 2020-04-04T19:22:14.000Z is found.
Tweet made at: 2020-04-04T13:29:53.000Z is found.
Tweet made at: 2020-04-04T18:5

Tweet made at: 2020-04-07T12:20:50.000Z is found.
Tweet made at: 2020-04-07T18:33:12.000Z is found.
Tweet made at: 2020-04-07T22:39:44.000Z is found.
Tweet made at: 2020-04-07T17:57:12.000Z is found.
Tweet made at: 2020-04-07T15:38:40.000Z is found.
Tweet made at: 2020-04-07T21:10:03.000Z is found.
Tweet made at: 2020-04-07T22:21:14.000Z is found.
Tweet made at: 2020-04-07T20:50:44.000Z is found.
Tweet made at: 2020-04-07T23:18:03.000Z is found.
scroll  4
Tweet made at: 2020-04-07T17:23:59.000Z is found.
Tweet made at: 2020-04-07T13:48:26.000Z is found.
Tweet made at: 2020-04-07T13:54:50.000Z is found.
Tweet made at: 2020-04-07T23:26:58.000Z is found.
Tweet made at: 2020-04-07T16:37:34.000Z is found.
Tweet made at: 2020-04-07T19:51:54.000Z is found.
Tweet made at: 2020-04-07T17:03:38.000Z is found.
Tweet made at: 2020-04-07T17:36:17.000Z is found.
Tweet made at: 2020-04-07T04:01:04.000Z is found.
scroll  5
Tweet made at: 2020-04-07T15:59:05.000Z is found.
Tweet made at: 2020-04-07T17:0

Tweet made at: 2020-04-10T19:55:39.000Z is found.
Tweet made at: 2020-04-10T15:13:06.000Z is found.
Tweet made at: 2020-04-10T20:13:15.000Z is found.
Tweet made at: 2020-04-10T03:22:37.000Z is found.
Tweet made at: 2020-04-10T20:59:12.000Z is found.
Tweet made at: 2020-04-10T00:55:52.000Z is found.
Tweet made at: 2020-04-10T17:27:14.000Z is found.
Tweet made at: 2020-04-10T21:19:39.000Z is found.
scroll  5
Tweet made at: 2020-04-10T18:03:47.000Z is found.
Tweet made at: 2020-04-10T18:16:09.000Z is found.
Tweet made at: 2020-04-10T17:44:56.000Z is found.
Tweet made at: 2020-04-10T21:30:50.000Z is found.
Tweet made at: 2020-04-10T19:52:06.000Z is found.
Tweet made at: 2020-04-10T13:12:59.000Z is found.
Tweet made at: 2020-04-10T23:10:57.000Z is found.
scroll  6
scroll  7
scroll  8
looking for tweets between 2020-04-11 and 2020-04-12 ...
 path : https://twitter.com/search?q=(covid%20OR%20covid19)%20until%3A2020-04-12%20since%3A2020-04-11%20%20-filter%3Areplies&src=typed_query&lf=on
Tweet 

Tweet made at: 2020-04-13T15:01:17.000Z is found.
Tweet made at: 2020-04-13T21:03:35.000Z is found.
Tweet made at: 2020-04-13T02:46:19.000Z is found.
Tweet made at: 2020-04-13T22:34:25.000Z is found.
Tweet made at: 2020-04-13T23:14:20.000Z is found.
Tweet made at: 2020-04-13T16:53:11.000Z is found.
Tweet made at: 2020-04-13T19:28:25.000Z is found.
scroll  6
Tweet made at: 2020-04-13T22:34:25.000Z is found.
scroll  7
scroll  8
looking for tweets between 2020-04-14 and 2020-04-15 ...
 path : https://twitter.com/search?q=(covid%20OR%20covid19)%20until%3A2020-04-15%20since%3A2020-04-14%20%20-filter%3Areplies&src=typed_query&lf=on
Tweet made at: 2020-04-14T02:39:34.000Z is found.
Tweet made at: 2020-04-14T12:55:26.000Z is found.
Tweet made at: 2020-04-14T23:54:41.000Z is found.
Tweet made at: 2020-04-14T17:50:48.000Z is found.
scroll  1
Tweet made at: 2020-04-14T18:06:27.000Z is found.
Tweet made at: 2020-04-14T15:19:00.000Z is found.
Tweet made at: 2020-04-14T22:25:11.000Z is found.
Tweet 

In [5]:
data

Unnamed: 0,UserScreenName,UserName,Timestamp,Text,Embedded_text,Emojis,Comments,Likes,Retweets,Image link,Tweet URL
0,UofTMDPhD,@uoftmdphd,2020-04-01T02:18:55.000Z,New peer-reviewed #covid19 biology and diagnos...,Quote Tweet\nThe Chan Lab\n@TheChanLab\n · Mar...,🦠 🔬 🚨 🚨 🩺 🧬 🏥,,3,16,[https://pbs.twimg.com/profile_images/78409901...,https://twitter.com/uoftmdphd/status/124517376...
1,sukhbir kaur,@sukhbxrkaur,2020-04-01T04:19:03.000Z,Sunnybrook Hospital.\n#COVID19 #Coronavirusont...,,,4,39,18,[https://pbs.twimg.com/media/EUfacJbWAAIVi9A?f...,https://twitter.com/sukhbxrkaur/status/1245204...
2,Joanna Lavoie,@JoannaLavoie,2020-04-01T14:14:14.000Z,Spotted this morning in #RiverdaleTO #eastTO ~...,,,,2,9,[https://pbs.twimg.com/media/EUhipXOX0AcZuZU?f...,https://twitter.com/JoannaLavoie/status/124535...
3,Sonia Ojha,@QueenSonia90,2020-04-01T18:00:17.000Z,The new norm for productive meetings! Wish we ...,,,1,2,7,[https://pbs.twimg.com/media/EUiWXE4XQAATWyJ?f...,https://twitter.com/QueenSonia90/status/124541...
4,KapG,@420investing,2020-04-01T20:58:14.000Z,Premier doesn't want to cause 'panic' by relea...,Premier doesn't want to cause 'panic' by relea...,😒,1,4,4,[https://pbs.twimg.com/card_img/13859964573088...,https://twitter.com/420investing/status/124545...
...,...,...,...,...,...,...,...,...,...,...,...
630,Chronic The Head Fog,@samanthafraser,2020-04-14T23:12:34.000Z,"I don’t understand all of the, mostly conserva...",,,1,,6,[],https://twitter.com/samanthafraser/status/1250...
631,Pj Kwong,@skatingpj,2020-04-14T22:05:23.000Z,"Yup, am feeling #festive! I'm going to be talk...",,,,6,17,[],https://twitter.com/skatingpj/status/125018339...
632,𝚂𝚎á𝚗 𝙾’𝚂𝚑𝚎𝚊,@ConsumerSOS,2020-04-14T20:03:46.000Z,The SOS work and editing vehicle has been spot...,Quote Tweet\nJack Murdoch MD\n@Jack_Murdoch\n ...,,1,,14,[https://pbs.twimg.com/profile_images/17108172...,https://twitter.com/ConsumerSOS/status/1250152...
633,CP24 Breakfast,@CP24Breakfast,2020-04-14T16:56:04.000Z,Virtual pet care during COVID-19 and how we co...,,,,,3,[https://pbs.twimg.com/media/EVlEYEEWoAgd39R?f...,https://twitter.com/CP24Breakfast/status/12501...


In [8]:
# scrape top tweets of with the hashtag #covid19, in proximity and without replies.
# the process is slower as the interval is smaller 
# (choose an interval that can divide the period of time betwee, start and max date)
data = scrap(hashtag="covid19", 
             start_date="2020-04-01", 
             max_date="2020-04-02", 
             from_account = None,
             interval=1, 
             headless=True, 
             display_type="Top", 
             save_images=False, 
             resume=False, 
             filter_replies=True, 
             proximity=True,
             limit=10)

Scraping on headless mode.
looking for tweets between 2020-04-01 and 2020-04-02 ...
 path : https://twitter.com/search?q=(%23covid19)%20until%3A2020-04-02%20since%3A2020-04-01%20%20-filter%3Areplies&src=typed_query&lf=on
Tweet made at: 2020-04-01T04:19:03.000Z is found.
Tweet made at: 2020-04-01T14:14:14.000Z is found.
Tweet made at: 2020-04-01T02:18:55.000Z is found.
Tweet made at: 2020-04-01T18:00:17.000Z is found.
scroll  1
Tweet made at: 2020-04-01T18:10:02.000Z is found.
Tweet made at: 2020-04-01T12:44:11.000Z is found.
Tweet made at: 2020-04-01T20:18:52.000Z is found.
Tweet made at: 2020-04-01T15:57:53.000Z is found.
Tweet made at: 2020-04-01T23:45:47.000Z is found.
Tweet made at: 2020-04-01T21:09:28.000Z is found.


In [9]:
data.head()

Unnamed: 0,UserScreenName,UserName,Timestamp,Text,Embedded_text,Emojis,Comments,Likes,Retweets,Image link,Tweet URL
0,sukhbir kaur,@sukhbxrkaur,2020-04-01T04:19:03.000Z,Sunnybrook Hospital.\n#COVID19 #Coronavirusont...,,,4.0,39,18,[https://pbs.twimg.com/media/EUfacJbWAAIVi9A?f...,https://twitter.com/sukhbxrkaur/status/1245204...
1,Joanna Lavoie,@JoannaLavoie,2020-04-01T14:14:14.000Z,Spotted this morning in #RiverdaleTO #eastTO ~...,,,,2,9,[https://pbs.twimg.com/media/EUhipXOX0AcZuZU?f...,https://twitter.com/JoannaLavoie/status/124535...
2,UofTMDPhD,@uoftmdphd,2020-04-01T02:18:55.000Z,New peer-reviewed #covid19 biology and diagnos...,Quote Tweet\nThe Chan Lab\n@TheChanLab\n · Mar...,🦠 🔬 🚨 🚨 🩺 🧬 🏥,,3,16,[https://pbs.twimg.com/profile_images/78409901...,https://twitter.com/uoftmdphd/status/124517376...
3,Sonia Ojha,@QueenSonia90,2020-04-01T18:00:17.000Z,The new norm for productive meetings! Wish we ...,,,1.0,2,7,[https://pbs.twimg.com/media/EUiWXE4XQAATWyJ?f...,https://twitter.com/QueenSonia90/status/124541...
4,Ms. J. Allen,@JAllenite,2020-04-01T18:10:02.000Z,Sending lots of \n to our @DerryWest & \n@Peel...,"Quote Tweet\nMs.Bowen\n@MsBowenDWV\n · Apr 1, ...",❤ 🙏 💜 🐺,,2,17,[https://pbs.twimg.com/profile_images/13270813...,https://twitter.com/JAllenite/status/124541312...


# Get the main information of a given list of users <a class="anchor" id="head-3"></a>

In [10]:
# These users belongs to my following. 
users = ['nagouzil', '@yassineaitjeddi', 'TahaAlamIdrissi', 
         '@Nabila_Gl', 'geceeekusuu', '@pabu232', '@av_ahmet', '@x_born_to_die_x']

In [11]:
# this function return a list that contains : 
# ["nb of following","nb of followers", "join date", "birthdate", "location", "website", "description"]
users_info = get_user_information(users, headless=True)

Scraping on headless mode.
--------------- nagouzil information : ---------------
Following :  81
Followers :  27
Location :  
Join date :  Joined September 2012
Birth date :  
Description :  Make a difference
Website :  
--------------- @yassineaitjeddi information : ---------------
Following :  122
Followers :  5
Location :  
Join date :  Joined October 2016
Birth date :  
Description :  
Website :  
--------------- TahaAlamIdrissi information : ---------------
Following :  381
Followers :  156
Location :  France
Join date :  Joined August 2012
Birth date :  
Description :  #SoftwareEngineeringStudentAtTSE
#DEVOPS_ENTHUSIAST
#FULL_STACK_WEB_DEV
Website :  
--------------- @Nabila_Gl information : ---------------
Following :  376
Followers :  425
Location :  Quelque part                 
Join date :  Joined October 2016
Birth date :  Born October 23
Description :  « La seule personne de confiance est celle qui craint الله » 
 {Omar Ibn al-khattab}
Website :  
--------------- geceeekus

In [12]:
import pandas as pd

users_df = pd.DataFrame(users_info, index = ["nb of following","nb of followers", "join date", 
                                             "birthdate", "location", "website", "description"]).T
users_df

Unnamed: 0,nb of following,nb of followers,join date,birthdate,location,website,description
nagouzil,81,27,Joined September 2012,,,,Make a difference
@yassineaitjeddi,122,5,Joined October 2016,,,,
TahaAlamIdrissi,381,156,Joined August 2012,,France,,#SoftwareEngineeringStudentAtTSE\n#DEVOPS_ENTH...
@Nabila_Gl,376,425,Joined October 2016,Born October 23,Quelque part,,« La seule personne de confiance est celle qui...
geceeekusuu,295,256,Joined April 2012,"Born October 23, 1997","İzmir, Türkiye",https://t.co/zIQJyCBmxX?amp=1,Ben olmak çok zor. Bi de unutmayın ki hala gen...
@pabu232,4879,4857,Joined December 2011,,ROYAL,,
@av_ahmet,609,446,Joined July 2011,,AVUKAT&ADANA,,Avukatlar tarih boyu köle kullanmadılar ama hi...
@x_born_to_die_x,688,1492,Joined July 2010,,Justin'in yanında,http://t.co/Ruil1rP8oW?amp=1,"I'm #belieber ,#lovatic . \n @jusdemkes\n ye..."


# Get followers and following of a given list of users <a class="anchor" id="head-4"></a>

Enter your username and password in .env file. DON'T USE MAIN ACCOUNT, IT'S LIKELY TO GET BANNED!  
Increase wait argument to avoid banning your account and maximise the crawling process if the internet is slow. I used 1 and it's safe.

In [4]:
following = get_users_following(users=users, verbose=0, headless = True, wait=1)

Scraping on headless mode.
Crawling @nagouzil following
Found 9 following
Found 38 following
Found 60 following
Found 80 following
Found 80 following
Crawling @@yassineaitjeddi following
Found 10 following
Found 40 following
Found 60 following
Found 79 following
Found 99 following
Found 120 following
Found 120 following
Crawling @TahaAlamIdrissi following
Found 8 following
Found 37 following
Found 56 following
Found 77 following
Found 98 following
Found 119 following
Found 139 following
Found 159 following
Found 180 following
Found 200 following
Found 219 following
Found 240 following
Found 258 following
Found 279 following
Found 299 following
Found 318 following
Found 338 following
Found 359 following
Found 378 following
Found 381 following
Found 381 following
Crawling @lolitapoupat following
Found 10 following
Found 39 following
Found 39 following
Crawling @@Jade_happiness following
Found 9 following
Found 38 following
Found 58 following
Found 78 following
Found 99 following
Found 12

In [5]:
print(following['nagouzil'])

['@tweetsauce', '@jockowillink', '@brfootball', '@433', '@Ibra_official', '@HSHQ', '@Snowden', '@johnkrasinski', '@fireship_dev', '@zinebmouchrik', '@Dannmace', '@Kurz_Gesagt', '@reactjs', '@dribbble', '@UpLabs', '@sketch', '@materialdesign', '@GoogleDesign', '@ayoubagouzil', '@garyvee', '@368', '@brielarson', '@Tesla', '@MehdiElIdriss23', '@Spotify', '@DavidDobrik', '@ddlovato', '@Dave2D', '@verge', '@StephenCurry30', '@KingJames', '@olivia_holt', '@saradietschy', '@petermckinnon', '@colesprouse', '@jakerawr', '@Casey', '@MandyPandyLeigh', '@Phil_Coutinho', '@KevinHart4real', '@BillGates', '@elonmusk', '@oneplus', '@dbrand', '@MKBHD', '@UnboxTherapy', '@TheRock', '@JDMorgan', '@AADaddario', '@jimmyfallon', '@ConanOBrien', '@alexandrabreck1', '@RealHughJackman', '@MelissaBenoist', '@HARDWELL', '@TahaAlamIdrissi', '@Eminem', '@SnoopDogg', '@GiGiHadid', '@Ucefmab', '@SalmaRach', '@ultra', '@MartinGarrix', '@YouTube', '@BadoAbdo', '@iWillSmith', '@ivanrakitic', '@Adele', '@SergiRoberto10'

In [5]:
followers = get_users_followers(users=users, verbose=0, headless = True, wait=1)

Scraping on headless mode.
Crawling @NAgouzil followers
Found 9 followers
Found 27 followers
Found 27 followers
Crawling @Yassineaitjeddi followers
Found 5 followers
Found 5 followers
Crawling @TahaAlamIdrissi followers
Found 9 followers
Found 40 followers
Found 60 followers
Found 80 followers
Found 100 followers
Found 119 followers
Found 140 followers
Found 157 followers
Found 157 followers
Crawling @LolitaPoupat followers
Found 10 followers
Found 40 followers
Found 60 followers
Found 80 followers
Found 100 followers
Found 120 followers
Found 134 followers
Found 134 followers
Crawling @Jade_happiness followers
Found 0 followers
Found 37 followers
Found 58 followers
Found 78 followers
Found 98 followers
Found 117 followers
Found 136 followers
Found 158 followers
Found 177 followers
Found 198 followers
Found 217 followers
Found 237 followers
Found 245 followers
Found 245 followers
Crawling @nabila_gl followers
Found 10 followers
Found 38 followers
Found 60 followers
Found 80 followers
F

In [9]:
print(followers['LolitaPoupat'])

['@BaolBase', '@PrtJessie', '@amzil_naoufal', '@AlexisAymardd', '@Missalgangstar', '@yvane7777', '@LauraIs18', '@royes_lucas', '@GaetanGettgett', '@MehdyNedjar', '@TahiaYounes', '@Nassim95783893', '@BvSportetsante', '@zazazoz74115203', '@KevinJacquet6', '@edouardcrdt', '@Bekkamo_nono', '@DamienCharb', '@RHautiere', '@BaptisteRcr', '@sirina_rebbati', '@influenceuses', '@HadfiYamina', '@AlexPortoss', '@melanie_gdn_', '@BeratKoparan', '@Hasni_vola', '@lhdnn3', '@Hutchiwa1', '@BoukharouaI', '@zz_adil', '@AdrienTheOne', '@Axel_18100', '@Sneek59600743', '@GuilheiraAmelia', '@jaouenTzn', '@mizrahi_10', '@Charles03273937', '@KazmaKazmitch01', '@_beneston', '@Luckies_Smoke', '@meilleurtweet4', '@tiagoantunes205', '@bouyguestelecom', '@Atlatix', '@LeBoyJow', '@God_of_Couscous', '@Gokuku01755', '@Petition_RER_B', '@kmeljbr', '@AdrienPutseys', '@spy_vic', '@SLorenzo_FR', '@Petits_seins', '@diamondStyleez', '@alves_damien', '@thepunisher641', '@parisportif2017', '@aliciaportoss', '@livantsi', '@Ali