Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make input.sh compatible with all OSes (re-write input.sh using python) #26

Closed
ilyasst opened this issue Jun 19, 2020 · 7 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@ilyasst
Copy link
Collaborator

ilyasst commented Jun 19, 2020

Is your feature request related to a problem? Please describe.
Currently, input.sh works for Ubuntu (it might work on MacOS if SVN is available but I did not test it), however it can definitely not be used for Windows.

Describe the solution you'd like
input.sh could be written in python which would make it possible to execute it using any OS as long as the python environment is properly setup.

@lisphilar
Copy link
Owner

Dear @ilyasst ,
Thank you very much for your proposal and pull request!
input.py is very useful and the script was successfully marged to master branch!

@lisphilar
Copy link
Owner

lisphilar commented Jun 20, 2020

Dear @ilyasst ,
As the next step, I plan to create a Python class CovsirPhy.DataLoader. This will download the datasets automatically and show the citations of the datasets.

For the users who are not Kaggers,

  1. The number of cases (Global): directory download JHU data
  2. The number of cases in Japan: will be discussed in Add example dataset to this repository #17
  3. Total population: will be discussed in Automatic downloading of dataset: total population #29
  4. OxCGRT: GitHub repository as the previous versions

(Kaggle users can download them manually with input.py.)

I will create a pull request for "1. The number of cases" later.

@lisphilar
Copy link
Owner

lisphilar commented Jun 24, 2020

Dear @ilyasst ,
covsirphy.cleaning.data_loader.DataLoader was created for automatic data downloading of JHU/Japan/OxcGRT data. (Data loader of population dataset is pending now. #29 )

Please kindly comfirm it with the default branch. (Version 2.2.5)
Example codes are as follows.

import covsirphy as cs
# Set the directory to save the datasets
data_loader = cs.DataLoader("input")
# JHU dataset
jhu_data = data_loader.jhu()
print(jhu_data.citation)
jhu_data.cleaned()
# The number of cases in Japan
japan_data = data_loader.japan()
print(japan_data.citation)
jhu_data.replace(japan_data)
ncov_df = jhu_data.cleaned()
# OxCGRT dataset
oxcgrt_data = data_loader.oxcgrt()
print(oxcgrt_data.citation)
oxcgrt_df = oxcgrt_data.cleaned()
jpn_oxcgrt_df = oxcgrt_data.subset(iso3="JPN")

input.py was also updated.

@ilyasst
Copy link
Collaborator Author

ilyasst commented Jun 24, 2020

I have pulled the code from master and followed the For developers guide. I had no problems with the installation, I was also able to download the JHU dataset and OxCGRT datasets with the method you provided above with no problems.

I was also able to use input.py to download all the datasets only when the kaggle.json file was stored in ~/.kaggle. It is not possible to simply put the kaggle.json file in the same folder as input.py because a modification in f837386 . It is necessary to set the OS environment variable "KAGGLE_CONFIG_DIR" before loading KaggleApi library otherwise it will fail to detect the kaggle.json file.

I will submit a PR for this shortly.

@lisphilar
Copy link
Owner

Dear @ilyasst ,
Thank you for your pull request. I merged it.

However, I don't recommend keeping kaggle.json in your working directory for a security reason. It may cause leak of your API keys accidentally.
I plan to stop using Kaggle API because we can replace Kaggle datasets (secondary data) with datasets provided by primary sources. Kagglers can import Kaggle datasets to their Kaggle notebooks with GUI and load the datasets with local_file argument.

data_loader = DataLoader(directory=None)
jhu_data = data_loader.jhu(local_file="kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv")
japan_data = data_loader.japan(local_file="kaggle/input/covid19-dataset-in-japan/covid_jpn_total.csv")

Currenly, OxCGRT dataset in Kaggle is provided as EXCEL file. We need to convert it to CSV file.
https://www.kaggle.com/paultimothymooney/oxford-covid19-government-response-tracker

The difference of primary/Kaggle datasets will be adjust using covsirphy.cleaning sub-module.
What do you think about this?

I will add DataLoader.population method and update README.md within several days.

@lisphilar lisphilar mentioned this issue Jun 27, 2020
@lisphilar
Copy link
Owner

Dear @ilyasst ,
Please confirm that data loader of population dataset was included with the default branch.

import covsirphy as cs
# Set the directory to save the datasets
data_loader = cs.DataLoader("input")
# Population in each country
population_data = data_loader.population()

README.md was also updated.
Thank you.

@lisphilar
Copy link
Owner

Because this change was applied, I will close this issue. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants