# Using `jq` on JSON Twitter data
This notebook will use the `jq` tool to parse and explore JSON data. I have different JSON data to work with. One set of data is data gathered from `twarc` (a Twitter harvester/archiver) using the search on the hashtag #resigncameron on 4/8/2016. It's about a 600MB text file.   


In [22]:
DATA="twarc-data/resign.json"

View first tweet's full JSON data, prettified:

In [23]:
!head -1 $DATA | jq '.'

[37m{
  [0m[34;1m"metadata"[0m[37m: [0m[37m{
    [0m[34;1m"result_type"[0m[37m: [0m[32m"recent"[0m[37m,
    [0m[34;1m"iso_language_code"[0m[37m: [0m[32m"en"[0m[37m
  [37m}[0m[37m,
  [0m[34;1m"place"[0m[37m: [0m[30;1mnull[0m[37m,
  [0m[34;1m"in_reply_to_status_id_str"[0m[37m: [0m[30;1mnull[0m[37m,
  [0m[34;1m"created_at"[0m[37m: [0m[32m"Fri Apr 08 22:11:41 +0000 2016"[0m[37m,
  [0m[34;1m"lang"[0m[37m: [0m[32m"en"[0m[37m,
  [0m[34;1m"possibly_sensitive"[0m[37m: [0m[0mfalse[0m[37m,
  [0m[34;1m"in_reply_to_user_id_str"[0m[37m: [0m[30;1mnull[0m[37m,
  [0m[34;1m"geo"[0m[37m: [0m[30;1mnull[0m[37m,
  [0m[34;1m"user"[0m[37m: [0m[37m{
    [0m[34;1m"is_translator"[0m[37m: [0m[0mfalse[0m[37m,
    [0m[34;1m"default_profile"[0m[37m: [0m[0mfalse[0m[37m,
    [0m[34;1m"protected"[0m[37m: [0m[0mfalse[0m[37m,
    [0m[34;1m"time_zone"[0m[37m: [0m[32m"London"[0m[37m,
    

View just the values, with no initial labels:

In [24]:
!head -1 $DATA| jq '.[]'

[37m{
  [0m[34;1m"result_type"[0m[37m: [0m[32m"recent"[0m[37m,
  [0m[34;1m"iso_language_code"[0m[37m: [0m[32m"en"[0m[37m
[37m}[0m
[30;1mnull[0m
[30;1mnull[0m
[32m"Fri Apr 08 22:11:41 +0000 2016"[0m
[32m"en"[0m
[0mfalse[0m
[30;1mnull[0m
[30;1mnull[0m
[37m{
  [0m[34;1m"is_translator"[0m[37m: [0m[0mfalse[0m[37m,
  [0m[34;1m"default_profile"[0m[37m: [0m[0mfalse[0m[37m,
  [0m[34;1m"protected"[0m[37m: [0m[0mfalse[0m[37m,
  [0m[34;1m"time_zone"[0m[37m: [0m[32m"London"[0m[37m,
  [0m[34;1m"contributors_enabled"[0m[37m: [0m[0mfalse[0m[37m,
  [0m[34;1m"created_at"[0m[37m: [0m[32m"Thu Jun 21 14:52:43 +0000 2012"[0m[37m,
  [0m[34;1m"url"[0m[37m: [0m[30;1mnull[0m[37m,
  [0m[34;1m"notifications"[0m[37m: [0m[0mfalse[0m[37m,
  [0m[34;1m"name"[0m[37m: [0m[32m"Only Me"[0m[37m,
  [0m[34;1m"listed_count"[0m[37m: [0m[0m1[0m[37m,
  [0m[34;1m"profile_background_color"[0m[

# Basic Filtering

Filter to just text and time created for first 3 tweets:

In [25]:
!head -3 $DATA | jq '[.created_at, .text]'

[37m[
  [32m"Fri Apr 08 22:11:41 +0000 2016"[0m[37m,
  [32m"RT @epilepsytech: To the panel called The @Twitter Trust &amp; @safety Council: Please stop removing #resigncameron from #TwitterTrends https:/…"[0m[37m
[37m][0m
[37m[
  [32m"Fri Apr 08 22:11:41 +0000 2016"[0m[37m,
  [32m"RT @SarahHenney: The plain awful truth of #ToryBritain laid bare. We need these idiots out. #EnoughisEnough #ResignCameron #ToriesOut https…"[0m[37m
[37m][0m
[37m[
  [32m"Fri Apr 08 22:11:40 +0000 2016"[0m[37m,
  [32m"RT @MatSeadonYoung: I'll just leave this here... #DavidCameron #resigncameron https://t.co/ld7fVNNn8b"[0m[37m
[37m][0m


Check the timezones of the first 10 tweets:  

_Note: There is geolocation for tweets, but a great number of people turn them off, so it would be difficult to draw conclusions from that. Timezones are less precise obviously, but still help with continental distinction_

In [31]:
!head -10 $DATA | jq '[.user.time_zone]'

[37m[
  [32m"London"[0m[37m
[37m][0m
[37m[
  [32m"London"[0m[37m
[37m][0m
[37m[
  [32m"Casablanca"[0m[37m
[37m][0m
[37m[
  [32m"Central Time (US & Canada)"[0m[37m
[37m][0m
[37m[
  [32m"London"[0m[37m
[37m][0m
[37m[
  [32m"Europe/London"[0m[37m
[37m][0m
[37m[
  [30;1mnull[0m[37m
[37m][0m
[37m[
  [32m"Amsterdam"[0m[37m
[37m][0m
[37m[
  [30;1mnull[0m[37m
[37m][0m
[37m[
  [32m"Dublin"[0m[37m
[37m][0m
