# Using minet in a Jupyter notebook

`minet` being first and foremost a python CLI tool and library, it can naturally be used in a Jupyter notebook and this makes it a very good tool to experiment with. As such it can for instance very easily be used by students in digital humanities, data science or webmining classes.

## Installing minet in a Jupyter notebook

On a machine that supports python, `minet` can naturally be installed using `pip` in your favorite python environment running before starting your notebook:

```bash
pip install minet
```

But sometimes, the notebooks won't be run from your machine, for instance if you use some notebooks hosted by a university for their students or when using [Google Collab](https://colab.research.google.com/) (which we wholeheartedly recommend as a way to let anyone experiment with python notebooks very easily without the drag of installing a working python environment on a wide variety of machines and OSes).

But don't worry. There is an easy solution to this, you see, because one feature of Jupyter notebooks which is often overlooked is its ability to run shell commands when prefixing one of your code cells by `!` as in this example below:

In [1]:
!pip install minet



Now that `minet` seems to be installed, let's run it to make sure of this and assess which version we will be working with:

In [2]:
!minet --version

minet 0.47.0


## Using minet as command line tool from a notebook

Let's use minet to scrape the tweets from [@medialab_ScPo](https://twitter.com/medialab_ScPo)!

In [3]:
!minet twitter scrape tweets "from:medialab_ScPo" > tweets.csv

Searching for "from:medialab_ScPo"                                  
Collecting tweets: 437 tweets [00:05, 74.60 tweets/s, queries=1, tokens=1]


This should work quickly and when the command ends, you should have a `tweets.csv` file in your working directory containing all the relevant tweets.

If you are working on Google Collab, this file will be accessible by clicking the "folder" icon on your left revealing your virtual filesystem. Be sure to download the file before exiting the notebook because Google Collab will delete any file you produced when shutting down your notebook to free resources.

If at one point you are lost or don't remember how to use `minet` as a command line tool, be sure to either use the `h/--help` flag or go check the relevant [docs](https://github.com/medialab/minet/blob/master/docs/cli.md).

In [4]:
!minet twitter scrape --help

usage: minet twitter scrape [-h] [--include-refs] [-l LIMIT] [-o OUTPUT]
                            [--query-template QUERY_TEMPLATE] [-s SELECT]
                            {tweets} query [file]

Minet Twitter Scrape Command

Scrape Twitter's public facing search API to collect tweets etc.

positional arguments:
  {tweets}                         What to scrape. Currently only `tweets` is possible.
  query                            Search query or name of the column containing queries to run in given CSV file.
  file                             Optional CSV file containing the queries to be run.

optional arguments:
  -h, --help                       show this help message and exit
  --include-refs                   Whether to emit referenced tweets (quoted, retweeted & replied) in the CSV output. Note that it consumes a memory proportional to the total number of unique tweets retrieved.
  -l LIMIT, --limit LIMIT          Maximum number of tweets to collect per query

So now you should be able to compose your own commands to scrape tweets you might need. Twitter's search is quite complete and also supports a variety of search operators which you can discover by using the advanced search widget [here](https://twitter.com/search-advanced?f=live).

Never forget to use the `--limit` flag when testing a query or when searching for generic things or your command will never end and at one point your hard drive will probably choke if you attempt to retrieve billions of tweets :)

## Processing the results from a notebook

As you are in a python environment, you can of course use the capabilities of the language to explore the resulting CSV files further.

Here is an example printing our `tweets.csv` file's headers and displaying the lab's most retweeted tweet:

In [5]:
import csv

with open('./tweets.csv') as f:
    reader = csv.DictReader(f)
    print('Headers:', reader.fieldnames)
    print()
    
    best_tweet = max(reader, key=lambda x: int(x['retweet_count']))
    print('Best tweet with %s retweets:' % best_tweet['retweet_count'], best_tweet['text'])

Headers: ['query', 'id', 'timestamp_utc', 'local_time', 'user_screen_name', 'text', 'possibly_sensitive', 'retweet_count', 'like_count', 'reply_count', 'lang', 'to_username', 'to_userid', 'to_tweetid', 'source_name', 'source_url', 'user_location', 'lat', 'lng', 'user_id', 'user_name', 'user_verified', 'user_description', 'user_url', 'user_image', 'user_tweets', 'user_followers', 'user_friends', 'user_likes', 'user_lists', 'user_created_at', 'user_timestamp_utc', 'collected_via', 'match_query', 'retweeted_id', 'retweeted_user', 'retweeted_user_id', 'retweeted_timestamp_utc', 'quoted_id', 'quoted_user', 'quoted_user_id', 'quoted_timestamp_utc', 'collection_time', 'url', 'place_country_code', 'place_name', 'place_type', 'place_coordinates', 'links', 'media_urls', 'media_files', 'media_types', 'mentioned_names', 'mentioned_ids', 'hashtags', 'intervention_type', 'intervention_text', 'intervention_url']

Best tweet with 63 retweets: Ça sort aujourd'hui en libraire : l'ouvrage "Culture #Numér

Here is another example using the `pandas` library:

In [6]:
import pandas as pd

df = pd.read_csv('./tweets.csv')
df.head()

Unnamed: 0,query,id,timestamp_utc,local_time,user_screen_name,text,possibly_sensitive,retweet_count,like_count,reply_count,...,links,media_urls,media_files,media_types,mentioned_names,mentioned_ids,hashtags,intervention_type,intervention_text,intervention_url
0,from:medialab_ScPo,1371864480540397573,1615909354,2021-03-16T16:42:34,medialab_ScPo,"Mardi 23/03, @brunopatino est l'invité du sémi...",0.0,7,13,0,...,https://medialab.sciencespo.fr/actu/lien-entre...,,,,brunopatino,11227042.0,,,,
1,from:medialab_ScPo,1371408961740673027,1615800750,2021-03-15T10:32:30,medialab_ScPo,Digital Growth Strategies : is there a new way...,0.0,7,11,0,...,https://medialab.sciencespo.fr/en/news/digital...,,,,,,,,,
2,from:medialab_ScPo,1367834292966068226,1614948483,2021-03-05T13:48:03,medialab_ScPo,La Silicon Valley : un écosystème d’innovation...,0.0,0,1,1,...,https://sciencespo.zoom.us/meeting/register/tJ...,,,,,,,,,
3,from:medialab_ScPo,1367750981585354754,1614928620,2021-03-05T08:17:00,medialab_ScPo,"Aujourd'hui, 14h30 -> TransNum s'intéresse à l...",0.0,4,9,0,...,https://medialab.sciencespo.fr/actu/silicon-va...,,,,,,,,,
4,from:medialab_ScPo,1366433462723436550,1614614499,2021-03-01T17:01:39,medialab_ScPo,Qu'est-ce qui fait de la Silicon Valley un mod...,0.0,10,13,0,...,https://medialab.sciencespo.fr/actu/silicon-va...,,,,,,,,,


## Using minet as a python library

Having been developped in python, `minet` can also be used as a python library if you ever need to. It can be useful if you need to integrate `minet` schemes into your own python worflow or just need to customize things. Just know that often the command line tool handles a lot of things for you that you will, as a result, need to handle yourself when working from python.

The relevant documentation can be found [here](https://github.com/medialab/minet/blob/master/docs/lib.md).

Here is an example of Twitter scraping directly from python:

In [7]:
from minet.twitter import TwitterAPIScraper

In [8]:
scraper = TwitterAPIScraper()

for tweet in scraper.search('from:medialab_ScPo'):
    print(tweet['text'])
    break

Mardi 23/03, @brunopatino est l'invité du séminaire du médialab. A travers ses derniers ouvrages, il abordera les liens entre journalisme, information et environnement techno économique.
Ouvert à tous, sur inscription : https://medialab.sciencespo.fr/actu/lien-entre-journalisme-information-et-environnement-techno-economique-le-cas-de-la-revolution-numerique/


## Parting words

Now you should have a clear idea of how to use `minet` from a Jupyter notebook, for your personal use or for students in a class, for instance.

However, one should note that Jupyter notebooks, as wonderful as they are, are not a good way to industrialize data collection processes and are not meant to be long-running (e.g. when collecting Tweets, you might want to let your command run for days). For this kind of endeavours, traditional tools such as a Unix shells and some sysadmin skills can't be beaten.

Finally, some `minet` commands might require that you given credentials to access some APIs (such as YouTube's one). Be sure not to publish those, for instance if you work on Google Collab, so that people won't steal them from you on the web.

If you need to feed those commands with credentials differently, check out `minet` docs about config files and environment variables [here](https://github.com/medialab/minet/blob/master/docs/cli.md#minetrc). Note that both those options can be easy footguns on Google Collab also because leaving a config file in the open or having a cell setting a environment variable are not good solutions :)