# Setup and Run Twint

Let's scrape some twitter data

## Download Twint

First we need to install twint. We'll basically follow the instructions on the twint github wiki: 

https://github.com/twintproject/twint/wiki/Setup

Note: The instructions say you can use pip directly to install this, but that didn't work for me on my linux machine. Downloading and installing from source worked fine though. 

The "!" at the beginning of the code cells here, means that we're actually running unix shell command from this python notebook.


In [None]:
## Get the code for twint from github - if you've already done this 
## step, re-running it will give you an error.  That's ok, just move
## on to the next step! 
!git clone --depth=1 https://github.com/twintproject/twint.git

In [None]:
## Grab all the required packages and install twint

!(cd twint; pip install . -r requirements.txt)

## Run some queries

Now that we've installed twint we can try to scrape some tweets using it's command line interface. 

This following command gets tweets from the `@edinunilel` account using the `-u` option You'll see it just dumps out a bunch of information for each tweet

In [None]:
!twint -u edinunilel

### Use some options
Now let's try using a search term and some other options: 
* `-s vaccination` says looks for tweets that contain the term 'vaccination'
* `-o vaccination.csv` says write out the results to the file vaccination.csv
* `--csv` says write the output in csv format (csv=comma separate values, but by default here columns are actually separated by tabs!)
* `--limit 200` says only get the last 200 tweets

In [None]:

!twint -s vaccination -o vaccination.csv --csv --limit 200

### Have a look at the output

Now if you go back to the jupyter dashboard (click on the 'Jupyter' icon at the top of this page), you should see vaccination.csv in your list of files

Alternatively, we can take a peak at the file using the `head` command. 

Notice that even though we asked for a comma separated value (csv) file, the columns are actually separated by tabs (tsv). 

In [None]:
## Have a look at the top 10 lines of the vaccination.csv file we just created with the query above
!head -n 10 vaccination.csv

### Try out some other options

You can modify your queries with other commandline options, listed here: 

https://github.com/twintproject/twint/wiki/Basic-usage

## Do it python instead! 

You can of course inspect your data using python. This class isn't about programming so you can stop here now if you don't have experience with this!

If you have done some python, you can have a look at your data using, for example, `pandas`.

You can also `import twint` as package within your own python code (see the github docs) 


In [None]:

import pandas as pd

## Read in the *tab separated* csv file you just made 
tweets = pd.read_csv('vaccination.csv', sep="\t")

## Have a look at the first 20 rows
tweets[0:20]

In [None]:
import sys
sys.path.append("./twint")

In [None]:
!pip install nest_asyncio



In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
import twint

c = twint.Config()
c.Username = "edinunilel"
twint.run.Search(c)

In [None]:
c