# Get some planets and create a DataFrame

Using the fine api from [star wars api](http://swapi.co/api) we get the first planets.
We'll use the `HTTP` module from Julia

In [2]:
using HTTP
res = HTTP.request("GET", "http://swapi.co/api/planets")

HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Thu, 24 May 2018 19:09:40 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d5742f34b44ab3f44f630d70cc8e513dd1527188980; expires=Fri, 24-May-19 19:09:40 GMT; path=/; domain=.swapi.co; HttpOnly; Secure
Etag: "394d177de8df8accc41f9427b749ef4a"
Vary: Accept, Cookie
Allow: GET, HEAD, OPTIONS
X-Frame-Options: SAMEORIGIN
Via: 1.1 vegur
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Server: cloudflare
CF-RAY: 42020ed5ca296403-FRA

{"count":61,"next":"https://swapi.co/api/planets/?page=2","previous":null,"results":[{"name":"Alderaan","rotation_period":"24","orbital_period":"364","diameter":"12500","climate":"temperate","gravity":"1 standard","terrain":"grasslands, mountains","surface_water":"40","population":"2000000000","residents":["https://swapi.co/api/people/5/","https://swapi.co/api/people/68/","https://swapi.co/api/pe

In [3]:
planets_raw = res.body

5039-element Array{UInt8,1}:
 0x7b
 0x22
 0x63
 0x6f
 0x75
 0x6e
 0x74
 0x22
 0x3a
 0x36
 0x31
 0x2c
 0x22
    ⋮
 0x6e
 0x65
 0x74
 0x73
 0x2f
 0x31
 0x31
 0x2f
 0x22
 0x7d
 0x5d
 0x7d

The `planets_raw` variable would be better suited as a String, before parsing it into a JSON variable:

In [5]:
using JSON
planets_json = JSON.Parser.parse(String(planets_raw))

Dict{String,Any} with 4 entries:
  "next"     => "https://swapi.co/api/planets/?page=2"
  "previous" => nothing
  "count"    => 61
  "results"  => Any[Dict{String,Any}(Pair{String,Any}("edited", "2014-12-20T20:…

_Now_ we get our planets, as an array of dictionnaries

In [8]:
planets = planets_json["results"]

10-element Array{Any,1}:
 Dict{String,Any}(Pair{String,Any}("edited", "2014-12-20T20:58:18.420000Z"),Pair{String,Any}("films", Any["https://swapi.co/api/films/6/", "https://swapi.co/api/films/1/"]),Pair{String,Any}("diameter", "12500"),Pair{String,Any}("rotation_period", "24"),Pair{String,Any}("name", "Alderaan"),Pair{String,Any}("created", "2014-12-10T11:35:48.479000Z"),Pair{String,Any}("surface_water", "40"),Pair{String,Any}("climate", "temperate"),Pair{String,Any}("url", "https://swapi.co/api/planets/2/"),Pair{String,Any}("terrain", "grasslands, mountains")…)                                                                                  
 Dict{String,Any}(Pair{String,Any}("edited", "2014-12-20T20:58:18.421000Z"),Pair{String,Any}("films", Any["https://swapi.co/api/films/1/"]),Pair{String,Any}("diameter", "10200"),Pair{String,Any}("rotation_period", "24"),Pair{String,Any}("name", "Yavin IV"),Pair{String,Any}("created", "2014-12-10T11:37:19.144000Z"),Pair{String,Any}("surface_water",

Let's convert this into a DataFrame.
We'll use the quick fix of [a SO answer](https://stackoverflow.com/questions/46143997/reading-json-array-into-julia-dataframe-like-type) to convert the list of dictionnaries into a DataFrame

In [9]:
function jsontodf(a)
    ka = union([keys(r) for r in a]...)
    df = DataFrame(;Dict(Symbol(k)=>get.(a,k,NA) for k in ka)...)
    return df
end

jsontodf (generic function with 1 method)

In [10]:
using DataFrames
planets_df = jsontodf(planets)

Unnamed: 0,residents,surface_water,diameter,gravity,rotation_period,population,terrain,url,name,created,climate,orbital_period,films,edited
1,"Any[""https://swapi.co/api/people/5/"", ""https://swapi.co/api/people/68/"", ""https://swapi.co/api/people/81/""]",40,12500,1 standard,24,2000000000,"grasslands, mountains",https://swapi.co/api/planets/2/,Alderaan,2014-12-10T11:35:48.479000Z,temperate,364,"Any[""https://swapi.co/api/films/6/"", ""https://swapi.co/api/films/1/""]",2014-12-20T20:58:18.420000Z
2,Any[],8,10200,1 standard,24,1000,"jungle, rainforests",https://swapi.co/api/planets/3/,Yavin IV,2014-12-10T11:37:19.144000Z,"temperate, tropical",4818,"Any[""https://swapi.co/api/films/1/""]",2014-12-20T20:58:18.421000Z
3,Any[],100,7200,1.1 standard,23,unknown,"tundra, ice caves, mountain ranges",https://swapi.co/api/planets/4/,Hoth,2014-12-10T11:39:13.934000Z,frozen,549,"Any[""https://swapi.co/api/films/2/""]",2014-12-20T20:58:18.423000Z
4,Any[],8,8900,,23,unknown,"swamp, jungles",https://swapi.co/api/planets/5/,Dagobah,2014-12-10T11:42:22.590000Z,murky,341,"Any[""https://swapi.co/api/films/2/"", ""https://swapi.co/api/films/6/"", ""https://swapi.co/api/films/3/""]",2014-12-20T20:58:18.425000Z
5,"Any[""https://swapi.co/api/people/26/""]",0,118000,"1.5 (surface), 1 standard (Cloud City)",12,6000000,gas giant,https://swapi.co/api/planets/6/,Bespin,2014-12-10T11:43:55.240000Z,temperate,5110,"Any[""https://swapi.co/api/films/2/""]",2014-12-20T20:58:18.427000Z
6,"Any[""https://swapi.co/api/people/30/""]",8,4900,0.85 standard,18,30000000,"forests, mountains, lakes",https://swapi.co/api/planets/7/,Endor,2014-12-10T11:50:29.349000Z,temperate,402,"Any[""https://swapi.co/api/films/3/""]",2014-12-20T20:58:18.429000Z
7,"Any[""https://swapi.co/api/people/3/"", ""https://swapi.co/api/people/21/"", ""https://swapi.co/api/people/36/"", ""https://swapi.co/api/people/37/"", ""https://swapi.co/api/people/38/"", ""https://swapi.co/api/people/39/"", ""https://swapi.co/api/people/42/"", ""https://swapi.co/api/people/60/"", ""https://swapi.co/api/people/61/"", ""https://swapi.co/api/people/66/"", ""https://swapi.co/api/people/35/""]",12,12120,1 standard,26,4500000000,"grassy hills, swamps, forests, mountains",https://swapi.co/api/planets/8/,Naboo,2014-12-10T11:52:31.066000Z,temperate,312,"Any[""https://swapi.co/api/films/5/"", ""https://swapi.co/api/films/4/"", ""https://swapi.co/api/films/6/"", ""https://swapi.co/api/films/3/""]",2014-12-20T20:58:18.430000Z
8,"Any[""https://swapi.co/api/people/34/"", ""https://swapi.co/api/people/55/"", ""https://swapi.co/api/people/74/""]",unknown,12240,1 standard,24,1000000000000,"cityscape, mountains",https://swapi.co/api/planets/9/,Coruscant,2014-12-10T11:54:13.921000Z,temperate,368,"Any[""https://swapi.co/api/films/5/"", ""https://swapi.co/api/films/4/"", ""https://swapi.co/api/films/6/"", ""https://swapi.co/api/films/3/""]",2014-12-20T20:58:18.432000Z
9,"Any[""https://swapi.co/api/people/22/"", ""https://swapi.co/api/people/72/"", ""https://swapi.co/api/people/73/""]",100,19720,1 standard,27,1000000000,ocean,https://swapi.co/api/planets/10/,Kamino,2014-12-10T12:45:06.577000Z,temperate,463,"Any[""https://swapi.co/api/films/5/""]",2014-12-20T20:58:18.434000Z
10,"Any[""https://swapi.co/api/people/63/""]",5,11370,0.9 standard,30,100000000000,"rock, desert, mountain, barren",https://swapi.co/api/planets/11/,Geonosis,2014-12-10T12:47:22.350000Z,"temperate, arid",256,"Any[""https://swapi.co/api/films/5/""]",2014-12-20T20:58:18.437000Z


In [26]:
moreplanets = true
nexturl = "http://swapi.co/api/planets/"
planets = []

while nexturl != nothing
       res = HTTP.request("GET", nexturl)
       data = String(res.body)
       jsondata = JSON.Parser.parse(data)
       planets = vcat(planets, jsondata["results"])
       nexturl = jsondata["next"]       
end

In [48]:
dfplanets = jsontodf(planets)

Unnamed: 0,residents,surface_water,diameter,gravity,rotation_period,population,terrain,url,name,created,climate,orbital_period,films,edited
1,"Any[""https://swapi.co/api/people/5/"", ""https://swapi.co/api/people/68/"", ""https://swapi.co/api/people/81/""]",40,12500,1 standard,24,2000000000,"grasslands, mountains",https://swapi.co/api/planets/2/,Alderaan,2014-12-10T11:35:48.479000Z,temperate,364,"Any[""https://swapi.co/api/films/6/"", ""https://swapi.co/api/films/1/""]",2014-12-20T20:58:18.420000Z
2,Any[],8,10200,1 standard,24,1000,"jungle, rainforests",https://swapi.co/api/planets/3/,Yavin IV,2014-12-10T11:37:19.144000Z,"temperate, tropical",4818,"Any[""https://swapi.co/api/films/1/""]",2014-12-20T20:58:18.421000Z
3,Any[],100,7200,1.1 standard,23,unknown,"tundra, ice caves, mountain ranges",https://swapi.co/api/planets/4/,Hoth,2014-12-10T11:39:13.934000Z,frozen,549,"Any[""https://swapi.co/api/films/2/""]",2014-12-20T20:58:18.423000Z
4,Any[],8,8900,,23,unknown,"swamp, jungles",https://swapi.co/api/planets/5/,Dagobah,2014-12-10T11:42:22.590000Z,murky,341,"Any[""https://swapi.co/api/films/2/"", ""https://swapi.co/api/films/6/"", ""https://swapi.co/api/films/3/""]",2014-12-20T20:58:18.425000Z
5,"Any[""https://swapi.co/api/people/26/""]",0,118000,"1.5 (surface), 1 standard (Cloud City)",12,6000000,gas giant,https://swapi.co/api/planets/6/,Bespin,2014-12-10T11:43:55.240000Z,temperate,5110,"Any[""https://swapi.co/api/films/2/""]",2014-12-20T20:58:18.427000Z
6,"Any[""https://swapi.co/api/people/30/""]",8,4900,0.85 standard,18,30000000,"forests, mountains, lakes",https://swapi.co/api/planets/7/,Endor,2014-12-10T11:50:29.349000Z,temperate,402,"Any[""https://swapi.co/api/films/3/""]",2014-12-20T20:58:18.429000Z
7,"Any[""https://swapi.co/api/people/3/"", ""https://swapi.co/api/people/21/"", ""https://swapi.co/api/people/36/"", ""https://swapi.co/api/people/37/"", ""https://swapi.co/api/people/38/"", ""https://swapi.co/api/people/39/"", ""https://swapi.co/api/people/42/"", ""https://swapi.co/api/people/60/"", ""https://swapi.co/api/people/61/"", ""https://swapi.co/api/people/66/"", ""https://swapi.co/api/people/35/""]",12,12120,1 standard,26,4500000000,"grassy hills, swamps, forests, mountains",https://swapi.co/api/planets/8/,Naboo,2014-12-10T11:52:31.066000Z,temperate,312,"Any[""https://swapi.co/api/films/5/"", ""https://swapi.co/api/films/4/"", ""https://swapi.co/api/films/6/"", ""https://swapi.co/api/films/3/""]",2014-12-20T20:58:18.430000Z
8,"Any[""https://swapi.co/api/people/34/"", ""https://swapi.co/api/people/55/"", ""https://swapi.co/api/people/74/""]",unknown,12240,1 standard,24,1000000000000,"cityscape, mountains",https://swapi.co/api/planets/9/,Coruscant,2014-12-10T11:54:13.921000Z,temperate,368,"Any[""https://swapi.co/api/films/5/"", ""https://swapi.co/api/films/4/"", ""https://swapi.co/api/films/6/"", ""https://swapi.co/api/films/3/""]",2014-12-20T20:58:18.432000Z
9,"Any[""https://swapi.co/api/people/22/"", ""https://swapi.co/api/people/72/"", ""https://swapi.co/api/people/73/""]",100,19720,1 standard,27,1000000000,ocean,https://swapi.co/api/planets/10/,Kamino,2014-12-10T12:45:06.577000Z,temperate,463,"Any[""https://swapi.co/api/films/5/""]",2014-12-20T20:58:18.434000Z
10,"Any[""https://swapi.co/api/people/63/""]",5,11370,0.9 standard,30,100000000000,"rock, desert, mountain, barren",https://swapi.co/api/planets/11/,Geonosis,2014-12-10T12:47:22.350000Z,"temperate, arid",256,"Any[""https://swapi.co/api/films/5/""]",2014-12-20T20:58:18.437000Z


In [38]:
describe(dfplanets)

residents
Summary Stats:
Length:         61
Type:           Array{Any,1}
Number Unique:  50
Number Missing: 0
% Missing:      0.000000

surface_water
Summary Stats:
Length:         61
Type:           String
Number Unique:  16
Number Missing: 0
% Missing:      0.000000

diameter
Summary Stats:
Length:         61
Type:           String
Number Unique:  40
Number Missing: 0
% Missing:      0.000000

gravity
Summary Stats:
Length:         61
Type:           String
Number Unique:  14
Number Missing: 0
% Missing:      0.000000

rotation_period
Summary Stats:
Length:         61
Type:           String
Number Unique:  20
Number Missing: 0
% Missing:      0.000000

population
Summary Stats:
Length:         61
Type:           String
Number Unique:  40
Number Missing: 0
% Missing:      0.000000

terrain
Summary Stats:
Length:         61
Type:           String
Number Unique:  54
Number Missing: 0
% Missing:      0.000000

url
Summary Stats:
Length:         61
Type:           String
Number Unique:  6

We have `String` only columns, so the summary (from the `describe` function is not very useful)


In [49]:
# first, use NA instead of "unknown"
dfplanets[ dfplanets[:diameter] .== "unknown", :diameter] = "0"


"0"

In [50]:
dfplanets[:diameter]

61-element DataArrays.DataArray{String,1}:
 "12500" 
 "10200" 
 "7200"  
 "8900"  
 "118000"
 "4900"  
 "12120" 
 "12240" 
 "19720" 
 "11370" 
 "12900" 
 "4200"  
 "12765" 
 ⋮       
 "0"     
 "0"     
 "0"     
 "0"     
 "0"     
 "0"     
 "13800" 
 "0"     
 "13850" 
 "0"     
 "10465" 
 "0"     

In [51]:
dfplanets[:diameter] = parse.(Int, dfplanets[:diameter])
dfplanets[ dfplanets[:diameter] .== 0, :diameter] = NA

NA

In [52]:
describe(dfplanets[:diameter])

Summary Stats:
Mean:           14344.394737
Minimum:        4200.000000
1st Quartile:   10096.000000
Median:         12135.000000
3rd Quartile:   13497.500000
Maximum:        118000.000000
Length:         38
Type:           Int64
Number Missing: 23
% Missing:      37.704918
