MINT-Data-Sync

Scripts to be used by MINT or other systems to download new datasets as they become available and register them in MINT Data Catalog.

Instructions

1. Clone this repo

git clone https://github.com/mintproject/MINT-Data-Sync.git

2. Go into the directory

cd MINT-Data-Sync

3. Build Docker image

docker build -t mint-data-sync

4. Run it

docker run -e "earthdata_username=REPLACE_ME" -e "earthdata_password=REPLACE_ME" -e "mint_data_username=REPLACE_ME" -e "mint_data_password=REPLACE_ME" -it --rm mint-data-sync:latest

Currently, we sync GLDAS data, which requires Earthdata login credentials; hence the need for earthdata_username and earthdata_password credentials above.

By default, the above container will start a cron process that will trigger sync.py script every day at 01:00 (am). That logic can be modified by editing cronjobs file and rebuilding the Docker image

Adding new data sources

To add a new data source, you would need to write a scraper that checks the source for data availability. Assuming that the scraper is implemented, the general data sync process goes as follows:

Check data source for the latest data available (by e.g., temporal coverage)
Check MINT data catalog for the latest available data
If there is a mismatch, generate a list of missing resources based on 1) and 2)
[Optionally] Download missing resources
[Optionally] Upload them to MINT data storage
Generate appropriate resource metadata
Register missing resources in MINT data catalog

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
scrapy		scrapy
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
cronjobs		cronjobs
requirements.txt		requirements.txt
sync_gldas.py		sync_gldas.py
test.json		test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrapy

scrapy

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

cronjobs

cronjobs

requirements.txt

requirements.txt

sync_gldas.py

sync_gldas.py

test.json

test.json

Repository files navigation

MINT-Data-Sync

Instructions

1. Clone this repo

2. Go into the directory

3. Build Docker image

4. Run it

Adding new data sources

About

Releases

Packages

Contributors 3

Languages

mintproject/MINT-Data-Sync

Folders and files

Latest commit

History

Repository files navigation

MINT-Data-Sync

Instructions

1. Clone this repo

2. Go into the directory

3. Build Docker image

4. Run it

Adding new data sources

About

Resources

Stars

Watchers

Forks

Languages