A fluid Python library and command line utility for processing and converting data formats like JSON and CSV.
Have you ever sighed when writing code like this?
import csv
import json
with open("names.json") as f:
data = json.loads(f.read())
data = [row["name"] for row in data if "John" in row["name"]]
with open("names.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(["name"])
[writer.writerow([row]) for row in data]Now you can write it like this:
from dataknead import Knead
Knead("names.json").filter(lambda r:"John" in r["name"]).write("names.csv")Or what about simply converting json to csv? With dataknead you get the knead command line utility which makes things easy:
knead names.json names.csvdataknead has inbuilt loaders for CSV, Excel, JSON and XML and you can easily write your own.
Install dataknead from PyPi
pip install datakneadThen import
from dataknead import KneadNote that dataknead is Python 3-only.
Let's say you have a small CSV file with cities called cities.csv.
city,country,population
Amsterdam,nl,850000
Rotterdam,nl,635000
Venice,it,265000And you want to load this csv file and transform it to a json file.
from dataknead import Knead
Knead("cities.csv").write("cities.json")You'll now have a json file called cities.json that looks like this:
[
{
"city" : "Amsterdam",
"country" : "nl",
"population" : 850000
},
...
]Maybe you just want the city names and write them to a CSV filed called city-names.csv.
from dataknead import Knead
Knead("cities.csv").map("city").write("city-names.csv")That will give you this list
Amsterdam
Rotterdam
VeniceNow you want to extract only the cities that are located in Italy, and write that back to a new csv file called cities-italy.csv:
from dataknead import Knead
Knead("cities.csv").filter(lambda r:r["country"] == "it").write("cities-italy.csv")This gives you this:
city,country,population
Venice,it,265000Nice huh?
Check out the advanced example.
dataknead is intended for easy conversion between common data formats and basic manipulation. It's not ment as a replacement for more complex libraries like pandas or numpy.
- Keep the API minimal and fluent
- Don't reinvent the wheel: reuse as many modules and conventions as possible. The XML loader uses the excellent
xmltodictmodule. Thequerymethod is a very thin wrapper aroundjq.
If inp is a string, a filepath is implied and the extension is used to get the correct loader.
Knead("cities.csv")To overwrite this behaviour (for a file that doesn't have the correct extension), use the read_as argument.
Knead("cities", read_as="csv")If inp is not a string, data is implied.
Knead([1,2,3])To force a string to be used as data instead of a file path, set is_data to True.
Knead("http://www.github.com", is_data = True)To force parsing of a string to data (e.g., from a JSON HTTP request), set parse_as to the correct format.
Knead('{"error" : 404}', parse_as="json")Some loaders might come with extra arguments. E.g. the csv loader has an option to force using a header, if it isn't detected automatically
Knead("cities.csv", has_header = True)Add a new loader to the Knead instance. Read the section on extending dataknead how to write your own loader.
Knead.add_loader(YamlLoader)Runs all data through a function.
print(Knead(["a", "b", "c"]).apply(lambda x:"".join(x))) # 'abc'Returns the parsed data.
data = Knead("cities.csv").data()To raise an exception for an invalid instance, pass that to check_instance
data = Knead("cities.csv").data(check_instance = dict)Run a function over the data and only keep the elements that return True in that functon.
Knead("cities.csv").filter(lambda city:city["country"] == "it").write("cities-italy.csv")
# Or do this
def is_italian(city):
return city["country"] == "it"
Knead("cities.csv").filter(is_italian).write("cities-italy.csv")Returns the keys of the data.
Run a function over all elements in the data.
Knead("cities.csv").map(lambda city:city["city"].upper()).write("cities-uppercased.json")To return one key in every item, you can pass a string as a shortcut:
Knead("cities.csv").map("city").write("city-names.csv")
# Is the same as
Knead("cities.csv").map(lambda c:c["city"]).write("city-names.csv")To return multiple keys with values, you can use a tuple:
Knead("cities.csv").map(("city", "country")).write("city-country-names.csv")
# Is the same as
Knead("cities.csv").map(lambda c:{ "city" : c["city"], "country" : c["country"] }).write("city-country-names.csv")
# Or
def mapcity(city):
return {
"city" : city["city"],
"country" : city["country"]
}
Knead("cities.csv").map(mapcity).write("city-country-names.csv")Returns values of the data.
Writes the data to a file. Type is implied by file extension.
Knead("cities.csv").write("cities.json")To force the type to something else, pass the format to write_as.
Knead("cities.csv").map("city").write("cities.txt", write_as="csv")Some of the loaders have extra options you can pass to write:
Knead("cities.csv").write("cities.json", indent = 4)
Knead("cities.csv").map("city").write("ciites.csv", fieldnames=["city"])You can write your own loaders to read and write other formats than the default ones (csv, json and txt). For an example take a look at the YAML example.
Performance drawbacks should be negligible. See this small performance test.
- Note that
datakneadis Python 3-only.
Written by Hay Kranen.
Licensed under the MIT license.
- Breaking change: removed the
querymethod:dataknead's focus is on conversion. Usingapplyyou can easily use a tool likejqto query.
- Adding tuple shortcut to
map(#2) - Adding support for
txtfiles ((#4) - Adding support for loader constructor argument passing, and adding a
has_headeroption toCsvLoader(#5)
Initial release