Map out a network of sites by following links and keep looking when finding trigger words.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
BLOGPOST.md
ContentSpider.py
LinkSpider.py
Pipfile
Pipfile.lock
README.md
common.py
config.py
scrape.py

README.md

TriggerScraper

TriggerScraper is a scraper to map out clusters of sites containing some trigger words. Built on top of grab.

Install

  1. $ pip install pipenv (if needed)
  2. $ brew install redis (if needed)
  3. $ pipenv install

Run

  1. Run Redis locally: $ redis-server in a terminal window
  2. $ pipenv run python scrape.py

Settings

  • Modify common.py with our triggerwords. By default it's some Swedish words strongly correlated with the alt-right strongly focused on immigration.

Example

After some time, the scraper will have saved something like this to current_findings.csv:

    |  domain                                  |  ratio               |  triggered  |  n_links
----|------------------------------------------|----------------------|-------------|---------
6   |  http://avpixlat.info                    |  7.774193548387097   |  210        |  31
25  |  http://petterssonsblogg.se              |  4.835680751173709   |  817        |  213
10  |  http://gruvmor.wordpress.com            |  3.8                 |  28         |  10
31  |  http://thoralfalfsson.webblogg.se       |  3.6484375           |  339        |  128
33  |  http://tobbesmedieblogg.blogspot.se     |  2.583333333333333   |  19         |  12
9   |  http://galnegunnarsblogg.wordpress.com  |  2.388888888888889   |  250        |  180
27  |  http://samnytt.se                       |  2.193548387096774   |  74         |  62
13  |  http://imittsverige.blogspot.se         |  1.98                |  49         |  50
7   |  http://everykindapeople.blogspot.se     |  1.9                 |  9          |  10
30  |  http://thoralf.bloggplatsen.se          |  1.7986577181208054  |  119        |  149
32  |  http://tobbesmedieblogg.blogspot.com    |  1.75                |  9          |  12
38  |  http://www.abcnyheter.se                |  1.7457627118644068  |  44         |  59
37  |  http://varjager.wordpress.com           |  1.7267441860465116  |  125        |  172
15  |  http://integrationsbloggen.blogspot.se  |  1.694736842105263   |  66         |  95
2   |  http://aktualia.wordpress.com           |  1.6551724137931034  |  19         |  29
42  |  http://www.dagenssamhalle.se            |  1.5                 |  5          |  10
18  |  http://jihadimalmo.blogspot.se          |  1.3711340206185567  |  36         |  97
36  |  http://twitter.com                      |  1.3414634146341464  |  14         |  41
0   |  http://affes.wordpress.com              |  1.3333333333333333  |  33         |  99
22  |  http://morklaggning.wordpress.com       |  1.2421052631578948  |  23         |  95
8   |  http://friatider.se                     |  1.1009174311926606  |  11         |  109
47  |  http://www.magnussandelin.se            |  1.0714285714285714  |  3          |  42
35  |  http://tullberg.org                     |  1.0535714285714286  |  3          |  56
11  |  http://gudmundson.blogspot.se           |  1.0454545454545454  |  3          |  66