I don't like writing this in my Python scripts every time:
import json, os, requests
photos_url = 'https://jsonplaceholder.typicode.com/photos'
photos_path = 'photos.json'
# Fetch the file if it doesn't exist yet.
if not os.path.isfile(photos_path):
res = requests.get(photos_url)
with open(photos_path, 'wb') as f:
f.write(res.content)
# Then, use the local copy.
with open('photos.json') as f:
photos = json.load(f)
print(photos[0]['title'])
So this little library lets you write:
import sourced
photos_url = 'https://jsonplaceholder.typicode.com/photos'
photos = sourced.json('photos.json', url=photos_url)
print(photos[0]['title'])
And offers some other handy behavior.
Pass url
, and optionally headers
, to fetch and decode files from the web.
Use encoding
to set the encoding for text and JSON. It defaults to utf-8
.
t = sourced.text('merci.txt', url=text_url, headers={'Accept-Language': 'fr'})
s = sourced.text('sozai.txt', url=sjis_url, encoding='shift_jis')
j = sourced.json('okay.json', url=json_url, headers={'Authorization': token})
b = sourced.binary('data.bin', url=binary_url)
assert isinstance(s, str)
assert isinstance(j, dict)
assert isinstance(b, bytes)
Alternatively, use create=file_creating_function
. This function should
return a deserialized result (so, a dict
rather than a string, for JSON).
def default_json(): return {'meow': 123}
j = sourced.json('my.json', create=default_json)
Pass max_age='2 weeks'
to invalidate cached files after 2 weeks.
pytimeparse is used to parse
the provided time delta.
stuff = sourced.json('stuff.json', url=my_url, max_age='5 days')
When you only need part of the data at url
, you can provide a JSON path
to select the parts you want. The JSON path library used is
jsonpath-rw.
Use find=path
to get an array of matches:
sourced.json('titles.json', url=url, find='[:10].title')
Use pick=path
to keep just the first match:
sourced.json('titles.json', url=url, pick='[?(@.id == 99)]')
If url
is a list, the results are concatenated with +
. This is
useful for text or JSON endpoints that return arrays.
sourced.json('combi.json', url=[url1, url2])
sourced.text('combi.txt', url=[url3, url4])
If url
is https://blah.com/foo/%p1
, then requests are made for
https://blah.com/foo/1
, https://blah.com/foo/2
, … until the result is
empty text / an empty JSON array.
Use %p0
to start numbering pages from 0 instead.
sourced.json('tags.json', url='https://testbooru.donmai.us/tags.json?limit=500&page=%p1')
If using JSON and next_path
is a JSON path, it is used to retrieve a URL
for the next page until it is null
.
kanji = sourced.json('kanji.json',
url='https://api.wanikani.com/v2/subjects?types=kanji',
headers={'Wanikani-Revision': '20170710',
'Authorization': 'Bearer %s' % wk_token},
find='data[*].data.characters', next_page='pages.next_url')
If next_path
is a function, it is called with the decoded (e.g. JSON object)
result to get the next page URL.
f = lambda json: base_url + '&start_from=' + json['next_page_start']
sourced.json('a.json', url=base_url, next_page=f)