Skip to content

Commit

Permalink
Merge branch 'main' of github-top_FIBers:osome-iu/top-FIBers into main
Browse files Browse the repository at this point in the history
  • Loading branch information
Truthy committed Jun 2, 2023
2 parents 3f0b04a + b9fd6cc commit 86c5eca
Show file tree
Hide file tree
Showing 6 changed files with 21 additions and 21 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
Code to find and rank the top superspreaders of misinformation on Twitter using the FIB-index.

### Links
- [Official documentation](https://www.matthewdeverna.com/top-FIBers/)
- [Official documentation](https://osome-iu.github.io/top-FIBers/)
- [Dashboard](https://osome.iu.edu/tools/topfibers/)
- [FIB Index working paper](https://arxiv.org/abs/2207.09524)

### Creators
Top FIBers is a project of the [Observatory on Social Media](https://osome.iu.edu/) (OSoMe, pronounced "awesome") at Indiana University. The following individuals have contributed to this project: [Matthew R. DeVerna](https://www.matthewdeverna.com/), [Pasan Kamburugamuwa](https://iuni.iu.edu/about/people/person/pasan), [Nick Liu](https://iuni.iu.edu/about/people/person/nick_liu), [Kaicheng Yang](https://www.kaichengyang.me/), [Ben Serrette](https://iuni.iu.edu/about/people/person/ben-serrette), and [Filippo Menczer](https://cnets.indiana.edu/fil/).

The best way to contact the team is by using the contact information found at the [OSoMe website](https://osome.iu.edu/about/contact).
The best way to contact the team is by using the contact information found at the [OSoMe website](https://osome.iu.edu/about/contact).
8 changes: 4 additions & 4 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,17 @@ The Top FIBers dashboard tracks and reports on the top ten superspreaders of low

This site makes up the official documentation of everything that is needed to know about the project.

- Source code: https://github.com/mr-devs/top-fibers
- Documentation code: https://github.com/mr-devs/top-fibers/tree/main/docs
- Source code: https://github.com/osome-iu/top-FIBers
- Documentation code: https://github.com/osome-iu/top-FIBers/tree/main/docs
- Website: https://osome.iu.edu/tools/topfibers/
- Frontend repo : https://github.iu.edu/truthy-team/TopFIBers-dashboard

💥‼️ **If you are new here, start with this page: [Updating this documentation](./documentation.md)** ‼️💥

### Contents
- [Code](./code/code.md)
- [Understanding the project workflow](./code/overview.md)
- [System architecture](./code/architecture.md)
- [Code details](./code/details.md)
- [Data](./data.md)
- [FIB index](./fib_index.md)
- [Setting up the project](./setup/setup.md)
Expand All @@ -27,4 +27,4 @@ This site makes up the official documentation of everything that is needed to kn
- Database & website: Pasan Kamburugamuwa (pkamburu@iu.edu)

> This site is managed with [GitHub Pages](https://pages.github.com/).
> Configuration and design details are mostly specified with the [_config.yml](https://github.com/mr-devs/top-FIBers/blob/3cc7d9946abab4990c18ff66b425f874cbd11ce1/docs/_config.yml) file.
> Configuration and design details are mostly specified with the [_config.yml](https://github.com/mr-devs/top-FIBers/blob/3cc7d9946abab4990c18ff66b425f874cbd11ce1/docs/_config.yml) file.
8 changes: 4 additions & 4 deletions docs/code/architecture.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "System architecture"
last_modified: "2023-05-07"
last_modified: "2023-05-23"
---
> Last modified: {{ page.last_modified | date: "%Y-%m-%d"}}
Expand All @@ -10,12 +10,12 @@ All data and analysis are kept on the `lenny` machine in this repository:
/home/data/apps/topfibers/
```

Note that this projects [repository](https://github.com/mr-devs/top-fibers) is cloned via `git` while _inside of the `topfibers/` directory_ and the name of the repository is specified as `repo`.
Note that this projects [repository](https://github.com/osome-iu/top-FIBers) is cloned via `git` while _inside of the `topfibers/` directory_ and the name of the repository is specified as `repo`.

E.g. via:
```
git clone git@github.com:mr-devs/top-FIBers.git repo
git clone git@github.com:osome-iu/top-FIBers repo
```

### Database and website
The database and the website code are kept on `lisa`.
The database and the website code are kept on `lisa`. Once the database is updated the website (https://osome.iu.edu/tools/topfibers) automatically updates with the latest data. Find the front-end code [here](https://github.iu.edu/truthy-team/TopFIBers-dashboard).
4 changes: 2 additions & 2 deletions docs/code/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ last_modified: "2023-05-07"
- [System architecture](./architecture.md)

### Code details
For details on how specific scripts work, please see the [repository](https://github.com/mr-devs/top-FIBers/tree/main) to review the code itself.
For details on how specific scripts work, please see the [repository](https://github.com/osome-iu/top-FIBers) to review the code itself.
We have taken an extra effort to heavily comment code and thoroughly document the repository so that it is as clear as possible.
If you believe anything needs clarification, please [open an issue](https://github.com/mr-devs/top-FIBers/issues) or submit a pull request.
If you believe anything needs clarification, please [open an issue](https://github.com/mr-devs/top-FIBers/issues) or submit a pull request.
14 changes: 7 additions & 7 deletions docs/code/overview.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Understanding the project workflow"
last_modified: "2023-05-07"
last_modified: "2023-05-24"
---
> Last modified: {{ page.last_modified | date: "%Y-%m-%d"}}
Expand All @@ -16,8 +16,8 @@ Please see the [Data](../data.md) page for all details.

### 2. Preparing the data for processing
After data has been retrieved and moved to the `/home/data/apps/topfibers/repo/data/raw` directory, the pipeline works by creating subdirectories full of symbolic links (kept here: `/home/data/apps/topfibers/repo/data/symbolic_links`) that point to the raw files for each platform.

This is done with the [`scripts/data_prep/create_data_file_symlinks.py`](https://github.com/mr-devs/top-FIBers/blob/4a597ed2d38a597323b8e58857aa279f55b93144/scripts/data_prep/create_data_file_symlinks.py) script.
https://github.com/osome-iu/top-FIBers/blob/main/scripts/data_prep/create_data_file_symlinks.py
This is done with the [`scripts/data_prep/create_data_file_symlinks.py`](https://github.com/osome-iu/top-FIBers/blob/main/scripts/data_prep/create_data_file_symlinks.py) script.

The structure of the `symbolic_links` directory is as follows:
```
Expand All @@ -37,7 +37,7 @@ Inside of each `YYYY_MM` subdirectory are symbolic links to the data used to cal
├── 2021-11-01__tweets_w_links.jsonl.gzip -> /home/data/apps/topfibers/repo/data/raw/twitter/2021-11-01__tweets_w_links.jsonl.gzip
└── 2021-12-01__tweets_w_links.jsonl.gzip -> /home/data/apps/topfibers/repo/data/raw/twitter/2021-12-01__tweets_w_links.jsonl.gzip
```
This approach allows us to use the [`scripts/data_prep/create_data_file_symlinks.py`](https://github.com/mr-devs/top-FIBers/blob/4a597ed2d38a597323b8e58857aa279f55b93144/scripts/data_prep/create_data_file_symlinks.py) script to generate unique reports for different time periods (i.e., more or less months than the standard three).
This approach allows us to use the [`scripts/data_prep/create_data_file_symlinks.py`](https://github.com/osome-iu/top-FIBers/blob/main/scripts/data_prep/create_data_file_symlinks.py) script to generate unique reports for different time periods (i.e., more or less months than the standard three).
These directories of symbolic links are then provided as input to generate the FIB-index output files.

### 3. Generating FIB-index output file
Expand All @@ -49,8 +49,8 @@ With the data gathered in the first process, we generate two output files each m

> Notes:
> 1. Both files are generated by `calc_{platform}_fib_indices.py`
> - [facebook script](https://github.com/mr-devs/top-FIBers/blob/d94389ec79409eac1154acb1d778eb7c03a751fa/scripts/data_processing/calc_crowdtangle_fib_indices.py)
> - [twitter script](https://github.com/mr-devs/top-FIBers/blob/d94389ec79409eac1154acb1d778eb7c03a751fa/scripts/data_processing/calc_twitter_fib_indices.py)
> - [facebook script](https://github.com/osome-iu/top-FIBers/blob/main/scripts/data_processing/calc_crowdtangle_fib_indices.py)
> - [twitter script](https://github.com/osome-iu/top-FIBers/blob/main/scripts/data_processing/calc_twitter_fib_indices.py)
> 2. `YYYY_mm_dd` represents the date the file is generated
> 3. `platform` will be either `crowdtangle` (for facebook data) or `twitter`
Expand All @@ -63,4 +63,4 @@ After all of the above has been completed, code kept in the `data-loader/` direc
Specifically, the `run_data_loader.sh` script is executed by the monthly bash script which runs the `data-loader/server.py` script.

### 6. The front end
Once the database is updated the website (https://osome.iu.edu/tools/topfibers) automatically updates with the latest data.
Once the database is updated the website (https://osome.iu.edu/tools/topfibers) automatically updates with the latest data. Find the front-end code [here](https://github.iu.edu/truthy-team/TopFIBers-dashboard).
4 changes: 2 additions & 2 deletions docs/data.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Data"
last_modified: "2023-05-07"
last_modified: "2023-05-23"
---
> Last modified: {{ page.last_modified | date: "%Y-%m-%d"}}
Expand Down Expand Up @@ -33,7 +33,7 @@ This project utilizes two data sources:

- We utilize the [`posts/search` endpoint](https://github.com/CrowdTangle/API/wiki/Search) with elevated access (up to 10k posts per request).
- This page also contains information on the format of posts returned by CT
- The script that downloads data is: [`scripts/data_collection/crowdtangle_dl_fb_links.py`](https://github.com/mr-devs/top-FIBers/blob/2d076ea29ba5df11b848c0c033a3662fdfd0cfe6/scripts/data_collection/crowdtangle_dl_fb_links.py)
- The script that downloads data is: [`scripts/data_collection/crowdtangle_dl_fb_links.py`](https://github.com/osome-iu/top-FIBers/blob/main/scripts/data_collection/crowdtangle_dl_fb_links.py)
- The API key can be found on the `lenny` machine saved here:
```
/u/truthy/.top_fib_CT_setup
Expand Down

0 comments on commit 86c5eca

Please sign in to comment.