# Setup and Run Twint

Let's scrape some twitter data!

### First things first: Jupyter Notebooks

Jupyter Notebooks allow you to write nicely formatted text and run code (in this case Python), all within a browser.
You should be able to use this Notebook to scrape twitter data by just running each of these cells (the shaded boxes on this page) in turn.  

**To run a cell**, click on it and then click  the 'Run' button from the menu at the top of the page. For _Markdown_ cells (like this one), this will just format the text.  For _Code_ cells this will actually run the code in the cell.   

**A handy shortcut:** press `shift-enter` to run the current cell and move to the next one. 

You can make a new cell using the _Insert_ menu (this will be a code cell by default).  You can clear everything and start again by going to _Kernel > Restart and Clear Output_

If you're not familiar with Jupyter Notebooks and want to find out more, you can find some tutorials here:

* [A very quick overview from the Jupyter docs](https://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb)
* [An interactive tutorial from the Binder project that's made up of Jupyter notebooks](https://gke.mybinder.org/v2/gh/ipython/ipython-in-depth/master?filepath=binder/Index.ipynb)
* [Jupyter docs](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html)
* [More info on Edina Notable](https://noteable.edina.ac.uk/documentation/)


For now, let's just run the rest of this notebook and scrape some tweets! 


## Download Twint

First we need to install twint, a python based twitter scraper. We'll basically follow the instructions on the twint github wiki: 

https://github.com/twintproject/twint/wiki/Setup

Note: The instructions say you can use pip directly to install this, but that didn't work for me on my linux machine. Downloading and installing from source worked fine though. 

If you're confused by the `!` at the beginning of the line in the next few code cells: it just means that we're actually running a unix shell command from this python notebook (so not actually python but part of the programming environment that is available to you with these notebooks).

The next cell is a _code_ cell.  When you run it, it will get the source code for twint from github for you.  

In [None]:
## Get the code for twint from github - if you've already done this 
## step, re-running it will give you an error.  That's ok, just move
## on to the next step! 
!git clone --depth=1 https://github.com/twintproject/twint.git

In [None]:
## Grab all the required packages and install twint
## This will print out a lot of text

!(cd twint; pip install . -r requirements.txt)

## Run some queries

Now that we've installed twint we can try to scrape some tweets using it's command line interface. 

This following command gets tweets from the `@edinunilel` account using the `-u` option You'll see it just dumps out a bunch of information for each tweet, including the tweet text itself.  

In [None]:
!twint -u edinunilel

### Try some other query options
Now let's try using a search term and some other options: 
* `-s vaccination` says looks for tweets that contain the term 'vaccination'
* `-o vaccination.csv` says write out the results to the file vaccination.csv
* `--csv` says write the output in csv format (csv=comma separate values, but by default here columns are actually separated by tabs!)
* `--limit 200` says only get the last 200 tweets

In [None]:

!twint -s vaccination -o vaccination.csv --csv --limit 200

### Have a look at the output

Now if you go back to the jupyter dashboard (click on the 'Jupyter' icon at the top of this page), you should see vaccination.csv in your list of files.

Alternatively, we can take a peak at the file using the `head` command below.  You can see that the column with the tweet text is called 'tweet'.

*Note:* Even though we asked for a comma separated value (csv) file, the columns are actually separated by tabs (tsv). 

In [None]:
## Have a look at the top 10 lines of the vaccination.csv file we just created with the query above
!head -n 10 vaccination.csv

### Even more  options

You can modify your queries with other commandline options, listed here: 

https://github.com/twintproject/twint/wiki/Basic-usage

In [None]:
## try writing your own query here! 



## Do it python instead! 

This class isn't about programming so **you can stop here now** if you don't have experience with python!

If already know some python, you can inspect your data and run queries using python instead of the command line. 

For example, you can inspect and process your data using the `pandas` package.



In [None]:

import pandas as pd

## Read in the *tab separated* csv file you just made 
tweets = pd.read_csv('vaccination.csv', sep="\t")

## Have a look at the first 20 rows
tweets[0:20]

You can also `import twint` as package within your own python code. 
See: https://github.com/twintproject/twint/wiki/Scraping-functions 


In [None]:
#In order to use twint in python in jupyter notebooks you also need nest_asyncio
!pip install nest_asyncio
import nest_asyncio
nest_asyncio.apply()

In [None]:
## You might need to add our twint install to your pythonpath or just restart the notebook
import sys
sys.path.append("./twint")


In [None]:
## Run a query inside python
## For more examples: https://github.com/twintproject/twint/wiki/Scraping-functions 

## Get the tweets for the LEL twitter account from within python code
import twint
c = twint.Config()
c.Username = "edinunilel"
c.Pandas = True

twint.run.Search(c)
Tweets_df = twint.storage.panda.Tweets_df



In [None]:
## print the first few rows of the output
print(Tweets_df.head())

## Write just the tweet column to a file
Tweets_df['tweet'].to_csv('edinunilel-twint-python.txt', index=False)


In [None]:
## Take a look at the top of the file
!head edinunilel-twint-python.txt