Skip to content

maxberggren/TriggerScraper

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

TriggerScraper

TriggerScraper is a scraper to map out clusters of sites containing some trigger words. Built on top of grab.

Install

  1. $ pip install pipenv (if needed)
  2. $ brew install redis (if needed)
  3. $ pipenv install

Run

  1. Run Redis locally: $ redis-server in a terminal window
  2. $ pipenv run python scrape.py

Settings

  • Modify common.py with our triggerwords. By default it's some Swedish words strongly correlated with the alt-right strongly focused on immigration.

Example

After some time, the scraper will have saved something like this to current_findings.csv:

    |  domain                                  |  ratio               |  triggered  |  n_links
----|------------------------------------------|----------------------|-------------|---------
6   |  http://avpixlat.info                    |  7.774193548387097   |  210        |  31
25  |  http://petterssonsblogg.se              |  4.835680751173709   |  817        |  213
10  |  http://gruvmor.wordpress.com            |  3.8                 |  28         |  10
31  |  http://thoralfalfsson.webblogg.se       |  3.6484375           |  339        |  128
33  |  http://tobbesmedieblogg.blogspot.se     |  2.583333333333333   |  19         |  12
9   |  http://galnegunnarsblogg.wordpress.com  |  2.388888888888889   |  250        |  180
27  |  http://samnytt.se                       |  2.193548387096774   |  74         |  62
13  |  http://imittsverige.blogspot.se         |  1.98                |  49         |  50
7   |  http://everykindapeople.blogspot.se     |  1.9                 |  9          |  10
30  |  http://thoralf.bloggplatsen.se          |  1.7986577181208054  |  119        |  149
32  |  http://tobbesmedieblogg.blogspot.com    |  1.75                |  9          |  12
38  |  http://www.abcnyheter.se                |  1.7457627118644068  |  44         |  59
37  |  http://varjager.wordpress.com           |  1.7267441860465116  |  125        |  172
15  |  http://integrationsbloggen.blogspot.se  |  1.694736842105263   |  66         |  95
2   |  http://aktualia.wordpress.com           |  1.6551724137931034  |  19         |  29
42  |  http://www.dagenssamhalle.se            |  1.5                 |  5          |  10
18  |  http://jihadimalmo.blogspot.se          |  1.3711340206185567  |  36         |  97
36  |  http://twitter.com                      |  1.3414634146341464  |  14         |  41
0   |  http://affes.wordpress.com              |  1.3333333333333333  |  33         |  99
22  |  http://morklaggning.wordpress.com       |  1.2421052631578948  |  23         |  95
8   |  http://friatider.se                     |  1.1009174311926606  |  11         |  109
47  |  http://www.magnussandelin.se            |  1.0714285714285714  |  3          |  42
35  |  http://tullberg.org                     |  1.0535714285714286  |  3          |  56
11  |  http://gudmundson.blogspot.se           |  1.0454545454545454  |  3          |  66

About

Map out a network of sites by following links and keep looking when finding trigger words.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages