asf_online_data_exploration

Summary

This repo contains scripts for exploring online data from sources such as Twitter, online news outlets or online forums, with the main aim of generating ideas for ASF projects involving opinions/thoughts shared by different types of users online.

Data Collection

Our data collection pipeline contains functions to collect data from Twitter's API v2 recent search endpoint and from The Guardian Open Platform content endpoint.

Both require access to developer credentials:

Processing

Coming soon...

Prototype

Coming soon...

Setup

Meet the data science cookiecutter requirements, in brief:
- Install: direnv and conda
Run make install to configure the development environment:
- Setup the conda environment
- Configure pre-commit
Run direnv allow;
Activate conda enviroment
- conda activate asf_online_data_exploration
Set your credentials as enviroment variables
- export BEARER_TOKEN="ADD_YOUR_BEARER_TOKEN_HERE" and replace ADD_YOUR_BEARER_TOKEN_HERE with your bearer token credentials.
- export GUARDIAN_API_KEY="ADD_YOUR_API_KEY_HERE" and replace ADD_YOUR_API_KEY_HERE with your API key credentials. Alternatively, set export GUARDIAN_API_KEY="test"
Run conda install -c conda-forge vega-cli vega-lite-cli. If that doesn't work, follow the instructions here

Folder structure

asf_online_data_exploration/
├─ analysis/
├─ config/
│  ├─ base.yaml - file paths info
│  ├─ data_collection_parameters.py - parameters for data collection
├─ getters/
├─ notebooks/
├─ pipeline/
│  ├─ data_collection/
│  │  ├─ recent_search_twitter.py - functions to collect data from Twitter's recent search endpoint
│  │  ├─ the_guardian.py - functions to collect data from The Guardian Open Platform content endpoint
│  │  ├─ tests/
│  │  | ├─ testing_recent_search_twitter.py - functions to test data collection pipeline for Twitter's recent search endpoint
│  │  | ├─ testing_the_guardian.py - functions to test data collection pipeline for The Guardian Open Platform content endpoint
├─ utils/
│  ├─ data_collection_utils.py - utility functions for retrieving and uploading data
inputs/
outputs/

Contributor guidelines

Technical and working style guidelines

Project based on Nesta's data science project template (Read the docs here).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.cookiecutter		.cookiecutter
.github		.github
asf_online_data_exploration		asf_online_data_exploration
docs		docs
outputs		outputs
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
jupytext.toml		jupytext.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

asf_online_data_exploration

Summary

Data Collection

Processing

Prototype

Setup

Folder structure

Contributor guidelines

About

Releases

Packages

Contributors 3

Languages

License

nestauk/asf_online_data_exploration

Folders and files

Latest commit

History

Repository files navigation

asf_online_data_exploration

Summary

Data Collection

Processing

Prototype

Setup

Folder structure

Contributor guidelines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages