Skip to content

Commit

Permalink
Merge pull request neherlab#68 from neherlab/tsvDumping
Browse files Browse the repository at this point in the history
Revised Tsv and parsing to JSON
  • Loading branch information
rneher committed Mar 29, 2020
2 parents e8f254a + 47e162e commit 8ef490a
Show file tree
Hide file tree
Showing 519 changed files with 10,217 additions and 7,871 deletions.
46 changes: 28 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,30 +159,32 @@ To run the parsers, call
python3 covid19_scenarios_data/generate_data.py --fetch
```

This will update the tables and generate new jsons in the `assets` folder.
This will update the tables in the directory `case-counts`.
For each parser there is a separate directory which contains individual case counts for each location covered by the parser.

To only run specific parsers, run

```shell
python3 covid19_scenarios_data/generate_data.py --fetch --parsers netherlands switzerland
```

To copy the output jsons to a specific place (e.g. to deploy to an app), run
To generate jsons for the app, specific the path the location of the target. This can either be done in combination with updating the `tsv` files or separately depending on whether the command is run with `--fetch` or not.

```shell
python3 covid19_scenarios_data/generate_data.py --fetch \
python3 covid19_scenarios_data/generate_data.py \
--output-cases path/case-counts.json \
--output-population path/population.json
```

To generate the integrated scenario json, run

```shell
python3 covid19_scenarios_data/generate_data.py --fetch \
python3 covid19_scenarios_data/generate_data.py \
--output-cases path/case-counts.json \
--output-scenarios path/scenarios.json
```


## Contents

### Country codes
Expand All @@ -208,27 +210,27 @@ We are actively looking for people to supply data to be used for our modeling!

## Contributing and curating data:

### Adding case count data for a new region:
### Adding parser or case count data for a new region:

The steps to follow are:

##### Identify a source for case counts data that is updated frequently (at least daily) as outbreak evolves.

- Write a script that downloads and converts raw data into TSV format
- Write a script that downloads and converts raw data into a dict of lists of lists {'<country>': [['2020-03-20', 1, 0, ...], ['2020-03-21', 2, 0, ...]]}
- Columns: [time, cases, deaths, hospitalized, ICU, recovered]
- **Important:** all columns must be cumulative data.
- The time column **must** be a string formatted as `YYYY-MM-DD`
- Try to keep the same order of columns for hygiene, although it should not ultimately matter
- If data is missing, please leave the entry empty
- Use the store_data() function in utils to store the data into .tsv and .json files automatically
- If data is missing, please leave the entry empty (i.e., ['2020-03-20',1, None, None, ...])
- Use the store_data() function in utils to store the data into .tsv automatically
- Ensure that the data provided to store_data() is well formatted
- The keys in the datastructure provided to utils should be
- For countries: U.N. country names (see country_codes.csv), or
- For states within countries: <TLC>-<state>, where <TLC> is the three letter code for the country (see country_codes.csv), and <state> is the state name
- The second parameter is the string identifying your parser (see sources.json entry below)
- Place the script into the parsers directory
- The name should correspond to the region name desired in the scenario.
- There **must** be a function parse() defined that calls store_data() from utils
- Ensure that the path provided to store_data() is well formatted
- The structure of the directory is Region/Sub-Region/Country/
- Region and Sub-Region are designated as per the U.N.
- U.N. designations are found within country_codes.csv
- Please use only the U.N. designated name for the country, region, and sub-region.

##### Update the _sources.json_ file to contain all relevant metadata.

Expand All @@ -237,6 +239,17 @@ The steps to follow are:
- dataProvenance = The organization behind the data collection
- license = The license governing the usage of data

##### Test your parser and create a Pull Request

- Create the appropriate directory in case-counts/
- Test your parser from the directory above (outside your covid19_scenario_data folder) using

```shell
python3 covid19_scenarios_data/generate_data.py --fetch --parsers <yourparsername>
```

- Check the resulting output in case-counts/<yourparsername>/, and add the files to your Pull Request together with the parser and sources.json

##### Add populations data for the additional regions/states.

Case count data is most useful when tied to data on the population it refers to. To ensure new case counts are correctly included in the population presets, add a line to the `populationData.tsv` for each new region (see [Adding/editing population data for a country and/or region](#adding/editing-population-data-for-a-country-and/or-region) below).
Expand All @@ -246,12 +259,9 @@ Case count data is most useful when tied to data on the population it refers to.
We note that this option is not preferred relative to a script that automatically updates as outlined above.
However, if there is no accessible data sources, one can manually enter the data. To do so

##### Commit a manually entered file into the correct directory
##### Commit a manually entered file into the "manuals" directory

- The structure of the directory is Region/Sub-Region/Country/
- Region and Sub-Region are designated as per the U.N.
- U.N. designations are found within country_codes.csv
- Please use only the U.N. designated name for the country, region, and sub-region.
- Please use only the U.N. designated name for the country, the file name should be <country>.tsv.

### Adding/editing population data for a country and/or region:

Expand Down
1 change: 0 additions & 1 deletion assets/case_counts.json

This file was deleted.

1 change: 0 additions & 1 deletion assets/population.json

This file was deleted.

11 changes: 0 additions & 11 deletions case-counts/Asia/Eastern Asia/China/Anhui.tsv

This file was deleted.

17 changes: 0 additions & 17 deletions case-counts/Asia/Eastern Asia/China/Chongqing.tsv

This file was deleted.

16 changes: 0 additions & 16 deletions case-counts/Asia/Eastern Asia/China/Fujian.tsv

This file was deleted.

12 changes: 0 additions & 12 deletions case-counts/Asia/Eastern Asia/China/Guangdong.tsv

This file was deleted.

11 changes: 0 additions & 11 deletions case-counts/Asia/Eastern Asia/China/Guangxi.tsv

This file was deleted.

7 changes: 0 additions & 7 deletions case-counts/Asia/Eastern Asia/China/Guizhou.tsv

This file was deleted.

18 changes: 0 additions & 18 deletions case-counts/Asia/Eastern Asia/China/Hainan.tsv

This file was deleted.

9 changes: 0 additions & 9 deletions case-counts/Asia/Eastern Asia/China/Hebei.tsv

This file was deleted.

13 changes: 0 additions & 13 deletions case-counts/Asia/Eastern Asia/China/Heilongjiang.tsv

This file was deleted.

23 changes: 0 additions & 23 deletions case-counts/Asia/Eastern Asia/China/Henan.tsv

This file was deleted.

12 changes: 0 additions & 12 deletions case-counts/Asia/Eastern Asia/China/Hubei.tsv

This file was deleted.

7 changes: 0 additions & 7 deletions case-counts/Asia/Eastern Asia/China/Hunan.tsv

This file was deleted.

13 changes: 0 additions & 13 deletions case-counts/Asia/Eastern Asia/China/Jiangsu.tsv

This file was deleted.

8 changes: 0 additions & 8 deletions case-counts/Asia/Eastern Asia/China/Jiangxi.tsv

This file was deleted.

20 changes: 0 additions & 20 deletions case-counts/Asia/Eastern Asia/China/Jilin.tsv

This file was deleted.

12 changes: 0 additions & 12 deletions case-counts/Asia/Eastern Asia/China/Liaoning.tsv

This file was deleted.

15 changes: 0 additions & 15 deletions case-counts/Asia/Eastern Asia/China/Neimenggu.tsv

This file was deleted.

15 changes: 0 additions & 15 deletions case-counts/Asia/Eastern Asia/China/Ningxia.tsv

This file was deleted.

8 changes: 0 additions & 8 deletions case-counts/Asia/Eastern Asia/China/Qinghai.tsv

This file was deleted.

13 changes: 0 additions & 13 deletions case-counts/Asia/Eastern Asia/China/Shaanxi.tsv

This file was deleted.

Loading

0 comments on commit 8ef490a

Please sign in to comment.