New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data request (if possible) #1
Comments
So https://rud.is/books/drill-sergeant-rstats/reading-a-streaming-json-ndjson-data-file-with-drill-r.html is a boilerplate recipe that's a bit more involved but may help and i can add more recipes for other examples if needed. |
@hrbrmstr Sure thing! A bunch of example JSON files here: https://gitlab.carlboettiger.info/cboettig/supertreebase/tree/master/json These are JSON-LD representations of phylogenetic trees originally published in XML formats in the public scientific repository http://treebase.org, all CC0 / public domain. |
#ty! |
ZOMGOSH THOSE ARE PERFECT! |
I'm afraid I'm mostly using CSV and TSV files, so I'm very grateful to see chapter 4! |
@benmarwick If you have some specific ones that are share-able, I can make topic-specific recipies as well. |
Thanks, most recently I've been working with these https://dumps.wikimedia.org/other/pagecounts-ez/merged/2012/2012-12/, and wondering if drill might make it easier to work with. As they are, those files a bit impractical for an example. How about I get a small excerpt from one of those and share it here? |
@benmarwick take a look at https://rud.is/books/drill-sergeant-rstats/working-with-custom-delimited-format-files.html and lemme know if that's tracking towards "helpful". Dealing with that last column will require a bit of Java work (to define a UDF - user defined function), but I was going to cover that anyway and this is a nice example for it. And, it's not as scary as it sounds (if it does, indeed, sound, scary :-). Most Drill UDFs are really simple Java functions based on a template that's easy to modify. |
https://rud.is/books/drill-sergeant-rstats/writing-simple-drill-custom-functions-udfs-for-field-transformations.html now has the Drill UDF necessary to make the last column more usable. |
@cboettig What are some "typical" operations one wld be performing on said phylogenetic tree data? I was able to tease out the "tree" but this is one area I've not handled enough SO questions on to be familiar with the data enough to whip up examples (yes, I may answer SO questions both to help folks and to try to get a handle on other disciplines at the same time :-)
|
Great question. Common tasks might be:
|
@cboettig / @benmarwick y'all wldn't have some sample JSON I can use, would you? I'm technically not allowed to put "alot" of work-work JSON data out in the wild since it can enable attackers (it saves them the $ of doing recon scans, at least).
It'd also help me direct (what I think will be chapter 7/recipe 6) more specifically for your needs.
no worries if not. I'll either convert some data, ask for work-forgiveness (er, I mean, 'permission') or go on a JSON data hunt or use some CVE data that isn't confidential but may not be the best example of JSON to help non-cyber folks.
The text was updated successfully, but these errors were encountered: