# Ukraine

The footage in social media that is [being reported](https://www.nytimes.com/live/2022/02/25/world/russia-ukraine-war/ukraine-says-russian-troops-entered-the-outskirts-of-kyiv) about the Russian invasion of Ukraine is just astonishing. How can you use the Twitter API to tap into some of this?

To get started you need to install and configure twarc with [Twitter Developer](https://developer.twitter.com/en) keys:

In [None]:
! pip install twarc
! twarc2 configure

Now lets [build a query](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query) to collect some tweets:

* tweets mentioning *Ukraine*
* no retweets (cuts down on the number of tweets that need to be retrieved)
* tweets containing videos
* tweets since midnight yesterday

In [None]:
! twarc2 search --start-time '2022-02-24T00:00:00' --end-time '2022-02-25T21:25:00' 'ukraine has:videos -is:retweet' data/ukraine.jsonl 

Now lets install the twarc-csv module to convert them to CSV to make them easier to process:

In [None]:
! pip install twarc-csv

Now we can convert them:

In [7]:
! twarc2 csv data/ukraine.jsonl data/ukraine.csv

100%|████████████████| Processed 357M/357M of input file [00:53<00:00, 7.03MB/s]

ℹ️
Parsed 126182 tweets objects from 1266 lines in the input file.
Wrote 126182 rows and output 74 columns in the CSV.



Now the data can be loaded into a Pandas DataFrame:

In [8]:
import pandas

df = pandas.read_csv('data/ukraine.csv')
df.head()

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,id,conversation_id,referenced_tweets.replied_to.id,referenced_tweets.retweeted.id,referenced_tweets.quoted.id,author_id,in_reply_to_user_id,retweeted_user_id,quoted_user_id,created_at,...,geo.geo.bbox,geo.geo.type,geo.id,geo.name,geo.place_id,geo.place_type,__twarc.retrieved_at,__twarc.url,__twarc.version,Unnamed: 73
0,1497321751277047808,1497205616917594114,1.497318e+18,,,262857550,262857600.0,,,2022-02-25T21:24:59.000Z,...,,,,,,,2022-02-25T21:27:35+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
1,1497321739159752710,1497127491806322706,1.497127e+18,,,703773648,7.204778e+17,,,2022-02-25T21:24:56.000Z,...,,,,,,,2022-02-25T21:27:35+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
2,1497321737406590978,1497321737406590978,,,,1375688273381711873,,,,2022-02-25T21:24:55.000Z,...,,,,,,,2022-02-25T21:27:35+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
3,1497321735905062917,1497321735905062917,,,,273648929,,,,2022-02-25T21:24:55.000Z,...,,,,,,,2022-02-25T21:27:35+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
4,1497321732377657351,1497321732377657351,,,,2700731676,,,,2022-02-25T21:24:54.000Z,...,,,,,,,2022-02-25T21:27:35+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,


There are a lot of columns in this DataFrame:

In [9]:
df.columns

Index(['id', 'conversation_id', 'referenced_tweets.replied_to.id',
       'referenced_tweets.retweeted.id', 'referenced_tweets.quoted.id',
       'author_id', 'in_reply_to_user_id', 'retweeted_user_id',
       'quoted_user_id', 'created_at', 'text', 'lang', 'source',
       'public_metrics.like_count', 'public_metrics.quote_count',
       'public_metrics.reply_count', 'public_metrics.retweet_count',
       'reply_settings', 'possibly_sensitive', 'withheld.scope',
       'withheld.copyright', 'withheld.country_codes', 'entities.annotations',
       'entities.cashtags', 'entities.hashtags', 'entities.mentions',
       'entities.urls', 'context_annotations', 'attachments.media',
       'attachments.media_keys', 'attachments.poll.duration_minutes',
       'attachments.poll.end_datetime', 'attachments.poll.id',
       'attachments.poll.options', 'attachments.poll.voting_status',
       'attachments.poll_ids', 'author.id', 'author.created_at',
       'author.username', 'author.name', 'author

Lets sort the DataFrame in descending order by the number of times it was retweeted:

In [10]:
df = df.sort_values('public_metrics.retweet_count', ascending=False)

Unnamed: 0,id,conversation_id,referenced_tweets.replied_to.id,referenced_tweets.retweeted.id,referenced_tweets.quoted.id,author_id,in_reply_to_user_id,retweeted_user_id,quoted_user_id,created_at,...,geo.geo.bbox,geo.geo.type,geo.id,geo.name,geo.place_id,geo.place_type,__twarc.retrieved_at,__twarc.url,__twarc.version,Unnamed: 73
88554,1496834888456019974,1496834888456019974,,,,910515151761170432,,,,2022-02-24T13:10:21.000Z,...,,,,,,,2022-02-25T21:49:13+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
63045,1496941845812760577,1496941845812760577,,,,1110122179,,,,2022-02-24T20:15:22.000Z,...,,,,,,,2022-02-25T21:45:21+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
92895,1496818903644659713,1496818903644659713,,,,986089880462622720,,,,2022-02-24T12:06:50.000Z,...,,,,,,,2022-02-25T21:58:12+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
74273,1496893311835123716,1496893311835123716,,,,464067709,,,,2022-02-24T17:02:31.000Z,...,,,,,,,2022-02-25T21:47:03+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
85751,1496845105428193286,1496845105428193286,,,,1430026082984595462,,,,2022-02-24T13:50:57.000Z,...,,,,,,,2022-02-25T21:48:48+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85645,1496845664956805124,1496845664956805124,,,,858645970225160192,8.593639e+17,,,2022-02-24T13:53:11.000Z,...,,,,,,,2022-02-25T21:48:47+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
40448,1497112466102566914,1497112466102566914,,,,3253409029,,,,2022-02-25T07:33:21.000Z,...,,,,,,,2022-02-25T21:33:32+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
85647,1496845641561120772,1496845641561120772,,,,1492229779935932420,,,,2022-02-24T13:53:05.000Z,...,,,,,,,2022-02-25T21:48:47+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
85648,1496845640122478592,1496845640122478592,,,,712380435,,,,2022-02-24T13:53:05.000Z,...,,,,,,,2022-02-25T21:48:47+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,


Lets see what these tweets are:

In [20]:
for i, tweet in df.head(50).iterrows():
    print(f"https://twitter.com/{tweet['author.username']}/status/{tweet['id']}")

https://twitter.com/happydeever/status/1496834888456019974
https://twitter.com/PMoelleken/status/1496941845812760577
https://twitter.com/happyinjail/status/1496818903644659713
https://twitter.com/aletweetsnews/status/1496893311835123716
https://twitter.com/UkraineNews0/status/1496845105428193286
https://twitter.com/_CastellanosEve/status/1496859770732822532
https://twitter.com/WUTangKids/status/1496950737762684928
https://twitter.com/sunbaepswx_/status/1497004859354398721
https://twitter.com/ABC/status/1496700890589368323
https://twitter.com/ajplus/status/1496673988034142213
https://twitter.com/ChristopherJM/status/1497258980040589313
https://twitter.com/AlexKokcharov/status/1496868326383190017
https://twitter.com/mjluxmoore/status/1496905901894258692
https://twitter.com/RahulGandhi/status/1496842781628768256
https://twitter.com/Newnews_eu/status/1496830803736731649
https://twitter.com/BNONews/status/1496738880749727755
https://twitter.com/BhavishaPatel/status/1496882754935398408
https

These don't appear at first blush to be footage. Lets limit to tweets that are in Ukrainian and sort those.

In [30]:
uk_tweets = df[df['lang'] == 'uk']
uk_tweets

Unnamed: 0,id,conversation_id,referenced_tweets.replied_to.id,referenced_tweets.retweeted.id,referenced_tweets.quoted.id,author_id,in_reply_to_user_id,retweeted_user_id,quoted_user_id,created_at,...,geo.geo.bbox,geo.geo.type,geo.id,geo.name,geo.place_id,geo.place_type,__twarc.retrieved_at,__twarc.url,__twarc.version,Unnamed: 73
79854,1496868727526236164,1496868727526236164,,,,2236175028,,,,2022-02-24T15:24:49.000Z,...,,,,,,,2022-02-25T21:47:54+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
41745,1497104727435362320,1497104727435362320,,,,388157003,,,,2022-02-25T07:02:36.000Z,...,,,,,,,2022-02-25T21:33:44+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
98542,1496795626553720835,1496795626553720835,,,,2236175028,,,,2022-02-24T10:34:21.000Z,...,,,,,,,2022-02-25T21:58:59+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
21600,1497215512304095243,1497215512304095243,,,,2236175028,,,,2022-02-25T14:22:49.000Z,...,,,,,,,2022-02-25T21:30:49+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
63373,1496939809830576131,1496939809830576131,,,,2236175028,,,,2022-02-24T20:07:17.000Z,...,,,,,,,2022-02-25T21:45:24+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87750,1496837534659530752,1496837534659530752,,,,1480010775901188096,,,,2022-02-24T13:20:52.000Z,...,,,,,,,2022-02-25T21:49:05+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
87740,1496837558365564935,1496837558365564935,,,,1666246836,,,,2022-02-24T13:20:58.000Z,...,,,,,,,2022-02-25T21:49:05+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
41311,1497107281674223623,1497107281674223623,,,,1447658913571160064,,,,2022-02-25T07:12:45.000Z,...,,,,,,,2022-02-25T21:33:41+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,
40709,1497111032187306017,1497111032187306017,,,,2885742369,,,,2022-02-25T07:27:39.000Z,...,,,,,,,2022-02-25T21:33:35+00:00,https://api.twitter.com/2/tweets/search/recent...,2.8.3,


In [32]:
uk_tweets = uk_tweets.sort_values('public_metrics.retweet_count', ascending=False)

for i, tweet in uk_tweets.iterrows():
    print(f"https://twitter.com/{tweet['author.username']}/status/{tweet['id']}")

https://twitter.com/by_Ukraine/status/1496868727526236164
https://twitter.com/Kefirchik88/status/1497104727435362320
https://twitter.com/by_Ukraine/status/1496795626553720835
https://twitter.com/by_Ukraine/status/1497215512304095243
https://twitter.com/by_Ukraine/status/1496939809830576131
https://twitter.com/crabik_sayori/status/1496856840071753733
https://twitter.com/linda3malina/status/1496689399291068416
https://twitter.com/by_Ukraine/status/1496864495041449986
https://twitter.com/Vitaliy70341751/status/1496924652417044480
https://twitter.com/by_Ukraine/status/1497182134473166851
https://twitter.com/StopRussia2014/status/1496986158265511936
https://twitter.com/by_Ukraine/status/1496922004309659649
https://twitter.com/StopRussia2014/status/1496885878551199748
https://twitter.com/by_Ukraine/status/1496831692757422085
https://twitter.com/ekhokavkaza/status/1496749450768826370
https://twitter.com/by_Ukraine/status/1496840598069944320
https://twitter.com/bodsadman/status/149713046741759

This seemed to work better. We could search again and limit to tweets that mention Russia or Russian in Ukrainian:

* російський 
* Росія

And lets increase the time frame to get one more day.


In [34]:
! twarc2 search --start-time '2022-02-23T00:00:00' --end-time '2022-02-25T21:25:00' '(російський OR Росія) has:videos -is:retweet' data/ukrainian.jsonl 
! twarc2 csv data/ukrainian.jsonl data/ukrainian.csv

100%|█████████████████| Processed 2 days/2 days [00:02<00:00, 317 tweets total ]
100%|████████████████| Processed 906k/906k of input file [00:00<00:00, 6.51MB/s]

ℹ️
Parsed 317 tweets objects from 4 lines in the input file.
Wrote 317 rows and output 74 columns in the CSV.



In [36]:
df2 = pandas.read_csv('data/ukrainian.csv')

uk_tweets = uk_tweets.sort_values('public_metrics.retweet_count', ascending=False)
for i, tweet in uk_tweets.iterrows():
    print(f"https://twitter.com/{tweet['author.username']}/status/{tweet['id']}")

https://twitter.com/by_Ukraine/status/1496868727526236164
https://twitter.com/Kefirchik88/status/1497104727435362320
https://twitter.com/by_Ukraine/status/1496795626553720835
https://twitter.com/by_Ukraine/status/1497215512304095243
https://twitter.com/by_Ukraine/status/1496939809830576131
https://twitter.com/crabik_sayori/status/1496856840071753733
https://twitter.com/linda3malina/status/1496689399291068416
https://twitter.com/by_Ukraine/status/1496864495041449986
https://twitter.com/Vitaliy70341751/status/1496924652417044480
https://twitter.com/by_Ukraine/status/1497182134473166851
https://twitter.com/StopRussia2014/status/1496986158265511936
https://twitter.com/by_Ukraine/status/1496922004309659649
https://twitter.com/StopRussia2014/status/1496885878551199748
https://twitter.com/by_Ukraine/status/1496831692757422085
https://twitter.com/ekhokavkaza/status/1496749450768826370
https://twitter.com/by_Ukraine/status/1496840598069944320
https://twitter.com/bodsadman/status/149713046741759

We can use the twarc-videos plugin to collect these.

In [None]:
!pip install twarc-videos

In [40]:
! twarc2 videos --help

Usage: twarc2 videos [OPTIONS] [INFILE]

  Download videos referenced in tweets and their metadata.

Options:
  --max-downloads INTEGER  max downloads per URL
  --max-filesize INTEGER   max filesize to download (bytes)
  --ignore-livestreams     ignore livestreams
  --download-dir TEXT      directory to download to
  --block TEXT             hostname(s) to block (repeatable)
  --timeout INTEGER        seconds to wait for a video download to finish
  --quiet                  silence terminal output
  --help                   Show this message and exit.


In [None]:
! twarc2 videos data/ukrainian.jsonl --download-dir ~/ukrainian-videos

downloaded [34mhttps://twitter.com/russianwarnews/status/1497321396896251904/video/1[0m as [32m/Users/edsummers/ukrainian-videos/twitter/1497321396896251904/_RussiaUkraineWar_RussiaUkraineConflict_Zelenskiy_Putin_I_H_UkraineWar.mp4[0m
downloaded [34mhttps://twitter.com/kittenyunie/status/1497321379313635330/video/1[0m as [32m/Users/edsummers/ukrainian-videos/twitter/1497321379313635330/anastasia_-_..mp4[0m
downloaded [34mhttps://twitter.com/GinkAri/status/1497319037721792517/video/1[0m as [32m/Users/edsummers/ukrainian-videos/twitter/1497319037721792517/Summer_in_Boryspil_-_._..mp4[0m
downloaded [34mhttps://twitter.com/deadlyvalley/status/1497316243581378564/video/1[0m as [32m/Users/edsummers/ukrainian-videos/twitter/1497316243581378564/_.mp4[0m
downloaded [34mhttps://twitter.com/basrawi_mousa/status/1497315346419699718/video/1[0m as [32m/Users/edsummers/ukrainian-videos/twitter/1497315346419699718/Mousa_Basrawi_-_@IAPonomarenko.mp4[0m
downloaded [34mhttps://twitte