Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Good First Issue]🌄Implement a better demo data module #34

Closed
MooooCat opened this issue Oct 26, 2023 · 1 comment
Closed

[Good First Issue]🌄Implement a better demo data module #34

MooooCat opened this issue Oct 26, 2023 · 1 comment
Labels
difficulty-easy good first issue Good for newcomers help wanted Extra attention is needed

Comments

@MooooCat
Copy link
Contributor

🚅Search before asking

I have searched for issues similar to this one.

🚅Description

Currently, demo data can only be obtained through function sdgx.utils.io.csv_utils.get_demo_single_table, and only one adult data sets are supported. In this issue, please implement a more scientific demonstration data management module.

🏕Solution

We recommend stripping this moudule out of script sdgx/utils/io/csv_utils.py and implementing a separate script in the sdgx/utils/io/ directory。

We recommend creating a file demo_data.py and implementing the functions or class in this file.

🍰 Example

We provide a class example for your reference:

# ISSUE DESCRIPTION A DemoData example
class DemoData(object):
    def  __init__(self, dataset_name) -> None:
        # ISSUE DESCRIPTION 
        # the dataset name should be checked 
        pass 

    def get_data(self, offline_path = None) -> pd.DataFrame:
        # ISSUE DESCRIPTION 
        # if offline_path is not None value, 
        # read data from the input path
        pass

    def download_data(self) -> None:
        # 
        pass

⚙️ Detail

Some operations that enhance user experience are also worthwhile, such as:

  • When we support many datasets, it is unreasonable to put each dataset in the dataset/ directory. We expect to support downloading the target dataset from the Internet,this helps reduce the size of the entire git repository.
  • Due to the network speed in mainland China, you can ask the development team to use network resources to upload and provide download links for some larger data sets. The speed of these download links will be faster than the original links of the data sets.
@MooooCat MooooCat added good first issue Good for newcomers help wanted Extra attention is needed difficulty-easy labels Oct 26, 2023
@Wh1isper
Copy link
Collaborator

We will provide demo load functions/classes as the project develops.

@Wh1isper Wh1isper closed this as not planned Won't fix, can't repro, duplicate, stale Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty-easy good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants