GitHub - kitamoto-lab/pyphoon2: Dataset loader for the Digital Typhoon Project

pyphoon2

About The Project

The Digital Typhoon Project is a project aimed to be an example of the application of meteoinformatics to large-scale real-world issues. The two primary challenges of this project are to (1) build, for the typhoon image collection, large-scale scientific databases which are the foundation of meteoinformatics, and (2) to establish algorithms and database models for the discovery of information and knowledge useful for typhoon analysis and prediction.

The Dataloader contained within this project addresses the first challenge. It is built for the Kitamoto Lab typhoon dataset and its structure, and is designed to be a easily accessible PyTorch-based interface with the dataset. Through it the user can (1) access typhoon images via index, typhoon ID, or season, (2) load all data into memory if desired, and (3) randomly split the dataset into buckets for model training, by image, season, or typhoon ID while preventing leakage between the buckets.

See readthedocs documentation here.

Getting Started

Prerequisites

This project uses:

python3
torch
torchvision
numpy
pandas
h5py

Installation

Clone and enter the repo

git clone https://github.com/kitamoto-lab/pyphoon2
cd pyphoon2

Install the package
```
pip3 install .
```
To uninstall, run
```
pip3 uninstall pyphoon2
```

Usage

Below is a brief example on how to initialize and access data using the DataLoader:

Import the Dataset class

from pyphoon2.DigitalTyphoonDataset import DigitalTyphoonDataset

You can also import the submodules DigitalTyphoonSequence, DigitalTyphoonImage, and DigitalTyphoonUtils in the same way if desired:

from pyphoon2.DigitalTyphoonSequence import DigitalTyphoonSequence
from pyphoon2.DigitalTyphoonImage import DigitalTyphoonImage
from pyphoon2.DigitalTyphoonUtils import *

Instantiate the loader

# See the documentation for description of the optional parameters. 
dataset_obj = DigitalTyphoonDataset("/path/to/image/directory/", 
                                    "/path/to/metadata/directory/", 
                                    "/path/to/metadata.json", 
                                    ('grade', 'lat', 'long'), # labels to return when indexing 
                                    split_dataset_by='sequence',
                                    load_data_into_memory=False,
                                    ignore_list=[],
                                    verbose=True)

The dataset object is now instantiated and you can use the data in the desired fashion. Some examples include:

Get the length of the dataset
```
length = len(dataset_obj)
```

Get the item at the i'th index

image_array, labels = dataset_obj[i] # labels will correspond to labels passed in on instantiation or set via dataset.set_labels()
image_obj = dataset_obj.get_image_at_idx(i)    
image_obj.image() # Get the image pixels in a numpy array
image_obj.year()  # Get the year the image was taken  
image_obj.grade() # Get the grade of the typhoon at the time of the image
              # For a full list of accessible image metadata, see documentation

Split the dataset into train, test, and validation sets

train, test, val = dataset_obj.random_split([0.7, 0.15, 0.15], split_by='sequence')

trainloader = Dataloader(train, batch_size=16, shuffle=True)

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
docs		docs
pyphoon2		pyphoon2
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyphoon2

About The Project

Getting Started

Prerequisites

Installation

Usage

About

Releases

Packages

Languages

License

kitamoto-lab/pyphoon2

Folders and files

Latest commit

History

Repository files navigation

pyphoon2

About The Project

Getting Started

Prerequisites

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages