sourced

I don't like writing this in my Python scripts every time:

import json, os, requests
photos_url = 'https://jsonplaceholder.typicode.com/photos'
photos_path = 'photos.json'

# Fetch the file if it doesn't exist yet.
if not os.path.isfile(photos_path):
    res = requests.get(photos_url)
    with open(photos_path, 'wb') as f:
        f.write(res.content)

# Then, use the local copy.
with open('photos.json') as f:
    photos = json.load(f)

print(photos[0]['title'])

So this little library lets you write:

import sourced
photos_url = 'https://jsonplaceholder.typicode.com/photos'

photos = sourced.json('photos.json', url=photos_url)
print(photos[0]['title'])

And offers some other handy behavior.

Documentation

Basic usage

Pass url, and optionally headers, to fetch and decode files from the web. Use encoding to set the encoding for text and JSON. It defaults to utf-8.

t = sourced.text('merci.txt', url=text_url, headers={'Accept-Language': 'fr'})
s = sourced.text('sozai.txt', url=sjis_url, encoding='shift_jis')
j = sourced.json('okay.json', url=json_url, headers={'Authorization': token})
b = sourced.binary('data.bin', url=binary_url)
assert isinstance(s, str)
assert isinstance(j, dict)
assert isinstance(b, bytes)

Alternatively, use create=file_creating_function. This function should return a deserialized result (so, a dict rather than a string, for JSON).

def default_json(): return {'meow': 123}
j = sourced.json('my.json', create=default_json)

Cache Invalidation

Pass max_age='2 weeks' to invalidate cached files after 2 weeks. pytimeparse is used to parse the provided time delta.

stuff = sourced.json('stuff.json', url=my_url, max_age='5 days')

JSON paths

When you only need part of the data at url, you can provide a JSON path to select the parts you want. The JSON path library used is jsonpath-rw.

Use find=path to get an array of matches:

sourced.json('titles.json', url=url, find='[:10].title')

Use pick=path to keep just the first match:

sourced.json('titles.json', url=url, pick='[?(@.id == 99)]')

Pagination

Predefined pages

If url is a list, the results are concatenated with +. This is useful for text or JSON endpoints that return arrays.

sourced.json('combi.json', url=[url1, url2])
sourced.text('combi.txt', url=[url3, url4])

URL-numbered pages

If url is https://blah.com/foo/%p1, then requests are made for https://blah.com/foo/1, https://blah.com/foo/2, … until the result is empty text / an empty JSON array.

Use %p0 to start numbering pages from 0 instead.

sourced.json('tags.json', url='https://testbooru.donmai.us/tags.json?limit=500&page=%p1')

JSON-pathed pages

If using JSON and next_path is a JSON path, it is used to retrieve a URL for the next page until it is null.

kanji = sourced.json('kanji.json',
    url='https://api.wanikani.com/v2/subjects?types=kanji',
    headers={'Wanikani-Revision': '20170710',
             'Authorization': 'Bearer %s' % wk_token},
    find='data[*].data.characters', next_page='pages.next_url')

Custom next-path function

If next_path is a function, it is called with the decoded (e.g. JSON object) result to get the next page URL.

f = lambda json: base_url + '&start_from=' + json['next_page_start']
sourced.json('a.json', url=base_url, next_page=f)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
sourced.py		sourced.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

example.py

example.py

sourced.py

sourced.py

Repository files navigation

sourced

Documentation

Basic usage

Cache Invalidation

JSON paths

Pagination

Predefined pages

URL-numbered pages

JSON-pathed pages

Custom next-path function

About

Releases

Packages

Languages

License

lynn/sourced

Folders and files

Latest commit

History

Repository files navigation

sourced

Documentation

Basic usage

Cache Invalidation

JSON paths

Pagination

Predefined pages

URL-numbered pages

JSON-pathed pages

Custom next-path function

About

Resources

License

Stars

Watchers

Forks

Languages