Skip to content

πŸ“‘ Loads any newline-separated file as a generator.

License

Notifications You must be signed in to change notification settings

petlack/unilist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Unilist

πŸ“‘ Load any newline-separated file as a generator.

Currently supporting files located in local filesystem, HTTP(s) endpoint, S3 URI and plaintext, CSV and JSONL formats. Alternatively, you can setup your own virtual URLs.

Install

Install and update using pip:

pip install unilist

Usage

from unilist import Unilist

lines = list(Unilist('./file.txt')))
print(lines)

csv = list(Unilist('https://example.com/file.csv'))
print(csv)

# requires Unilist.setup({ ... })
# or /usr/local/bin/aws
records = list(Unilist('s3://example/file.jsonl.gz'))
print(records)

S3 setup

boto3

If you don't mind extra dependency (boto3), install with

pip install unilist[boto3]

Example setup

Unilist.setup({
    's3': {
      'aws_access_key': '___your_access_key___',
      'aws_secret_access_key': '___your_secret_key___',
    },
})

awscli

Alternatively, you can provide a path to aws binary.

Unilist.setup({
  's3': {
    'aws_bin': '/usr/local/bin/aws'
  }
})

Integration

pandas

import pandas as pd
df = pd.DataFrame(Unilist('vfs://path/to/file.jsonl'))

Configuration

Unilist.setup({
    's3': {
      'cache_dir': '/tmp',
      'aws_access_key': '___your_access_key___',
      'aws_secret_access_key': '___your_secret_key___',
    },
    'virtual': {
      'vfs': '/custom/root/path',
      'c4': './local/c4',
    },
    'http': {
      'headers': {
        'accept': 'text/plain',
      },
      'encoding': 'utf-8',
    },
    'jsonl': {},
})

Development

Install from source

git clone git@github.com:petlack/unilist.git
pip install -e .
pip install -e .[boto3]

Run tests

pipenv run pytest

Meta

CONTRIBUTING

LICENSE (MIT)

About

πŸ“‘ Loads any newline-separated file as a generator.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages