Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adds inference_utils.py #2213

Merged
merged 8 commits into from
Jul 13, 2022
Merged

[WIP] Adds inference_utils.py #2213

merged 8 commits into from
Jul 13, 2022

Conversation

geoffreyangus
Copy link
Collaborator

No description provided.

@brightsparc
Copy link
Contributor

So if I look at inference_utils.py entry point, it looks like at a minimum I would need the following files (does this sound right @geoffreyangus ?)

ludwig
│   ├── constants.py
│   ├── utils
│   │   ├── audio_utils.py
│   │   ├── date_utils.py
│   │   ├── fs_utils.py (imported from audio_utils.py)
│   │   ├── image_utils.py
│   │   ├── inference_utils.py (entry point)
│   │   ├── string_utils.py (imported from torch_utils.py)
│   │   ├── torch_utils.py
│   │   ├── misc_utils.py
│   │   ├── nlp_utils.py (imported from tokenizers.py)
│   │   ├── types.py
│   │   ├── tokenizers.py (imported from torch_utils.py)

Which has the following list of deps:

import torch
import pandas
import torchaudio # not yet supported in snowflake
import torchvision
import fsspec
import h5py
import requests # not supported in snowflake
from filelock import FileLock
from fsspec.core import split_protocol

The latter of these are imported by fs_utils.py used by audio_utils.py for which get_bytes_obj_from_path has supported for reading from http.

Given the conda distribution doesn't have requests package, consider replacing with urllib3, eg the following function:

def get_bytes_obj_from_http_path(path: str) -> bytes:
    data = requests.get(path, stream=True)
    if data.status_code == 404:
        upgraded = upgrade_http(path)
        if upgraded:
            logging.info(f"reading url {path} failed. upgrading to https and retrying")
            return get_bytes_obj_from_http_path(upgraded)
        else:
            raise requests.exceptions.HTTPError(f"reading url {path} failed and cannot be upgraded to https")
    return data.raw.read()

Could have internals replaced with something like :

import urllib3
http = urllib3.PoolManager()
r = http.request('GET', 'http://httpbin.org/bytes/8')
r.data

Or if you wanted to stream to file like this:

import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)

with open(path, 'wb') as out:
    while True:
        data = r.read(chunk_size)
        if not data:
            break
        out.write(data)

r.release_conn()

Copy link
Contributor

@brightsparc brightsparc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the latest python libraries in snowflake are

  • certifi==2022.6.15 and
  • urllib3==1.26.9

@github-actions
Copy link

github-actions bot commented Jul 13, 2022

Unit Test Results

       6 files  +    2         6 suites  +2   2h 28m 7s ⏱️ + 1h 8m 38s
2 912 tests  -   18  2 866 ✔️  -   18    46 💤 ±  0  0 ±0 
8 736 runs  +288  8 594 ✔️ +242  142 💤 +46  0 ±0 

Results for commit 6730a91. ± Comparison against base commit 740148a.

♻️ This comment has been updated with latest results.

@geoffreyangus geoffreyangus marked this pull request as ready for review July 13, 2022 22:30
@geoffreyangus geoffreyangus merged commit 376c494 into master Jul 13, 2022
@geoffreyangus geoffreyangus deleted the add-inference-utils branch July 13, 2022 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants