# data.world    
![sparkle-shadow.png](https://data.world/api/nrippner/dataset/images/file/raw/sparkle-shadow.png)

![python.jpg](https://data.world/api/nrippner/dataset/images/file/raw/python.jpg)

# Getting Started with the data.world Python SDK

### * Seamless integration with Python and R
### * Effortlessly load data       
### * SQL queries to pandas DataFrames 
### * data.world and python side by side -- a better way to store, retrieve, and explore your data.    
---------------

### Check to see if datadotworld is installed

As of 5/12/2017, latest version is 1.1.0    

In [45]:
! pip list | grep datadotworld

datadotworld (1.1.0)


### Install or upgrade latest version of datadotworld package

if upgrading, restart kernel after running    
**pip install -- upgrade datadotworld**

In [46]:
#! pip install datadotworld
#! pip install --upgrade datadotworld

In [47]:
import datadotworld as ddw        # data.world SDK
print(ddw.__version__)

1.1.0


### Configure datadotworld using your personal data.world API token    
This will add your API token to a local file in your computer's home directory. You only need to do this once.   
Your authentication token can be obtained on data.world under [Settings > Advanced](https://data.world/settings/advanced)    
     
1. Copy your API token from [Settings > Advanced](https://data.world/settings/advanced)    
2. Open a terminal and enter   
   **dw configure**   
3. Paste your API token and hit enter


In [52]:
# import python packages
import pandas as pd               # data wrangling/analysis library
import numpy as np                # math and data manipulation library
import matplotlib.pyplot as plt   # data visualization library
%matplotlib inline  
from __future__ import print_function # Python 2 to 3 compatibility

# Use the SDK to import data    
## (!) There are a few different ways to load data with the data.world APIs/SDKs.  Below, we examine 3 different approaches to importing data into Python.

# 1. Load entire dataset    
note: a data.world dataset is more than a just a datafile -- there may be multiple data files, as well as documentation, scripts, notebooks, or images.    

In [17]:
refugee_dataset = ddw.load_dataset('nrippner/refugee-host-nations')

The *datadotworld load_dataset* method retrieves the dataset's metadata and lazy loads its contents.

In [18]:
import sys
sys.getsizeof(refugee_dataset)

56

Only 56 bytes...   
    
List all contents of dataset:

In [19]:
for i in refugee_dataset.describe():
    print(i)
contents = refugee_dataset.describe()
print(contents['homepage'])

name
title
description
homepage
resources
keywords
https://data.world/nrippner/refugee-host-nations


List names of data files:    

In [20]:
dataframes = refugee_dataset.dataframes
for df in dataframes:
    print(df)

refugees2011-15
refugees_all_years
refugees_per_capita
unhcr_2015
unhcr_all
worldbank_data_dict
worldbank_indicators
original/refugees2011-15.csv
original/refugees_all_years.csv
original/refugees_per_capita.csv
original/unhcr_2015.csv
original/unhcr_all.csv
original/worldbank_data_dict.csv
original/worldbank_indicators.csv


All files in dataset:

In [9]:
resources = refugee_dataset.describe()['resources']
print('name:')
for r in resources:
    print(r['name'])
print('\ntype of file:')
for r in resources:
    print(r['format'])

name:
refugees2011-15
refugees_all_years
refugees_per_capita
unhcr_2015
unhcr_all
worldbank_data_dict
worldbank_indicators
original/Refugees.ipynb
original/refs.py
original/refugees2011-15.csv
original/refugees_all_years.csv
original/refugees_per_capita.csv
original/unhcr_2015.csv
original/unhcr_all.csv
original/worldbank_data_dict.csv
original/worldbank_indicators.csv

type of file:
csv
csv
csv
csv
csv
csv
csv
ipynb
py
csv
csv
csv
csv
csv
csv
csv


Load a dataframe:

In [11]:
df11_15 = dataframes['refugees2011-15']

print(df11_15.shape)
df11_15.head()

(29560, 6)


Unnamed: 0,Year,Country,Origin,Refugees,AsylumSeekers,TotalPop
0,2011,Aruba,Colombia,,3.0,3.0
1,2011,Aruba,Cuba,,1.0,1.0
2,2011,Afghanistan,Afghanistan,,,1474167.0
3,2011,Afghanistan,Iran,34.0,25.0,59.0
4,2011,Afghanistan,Iraq,3.0,1.0,4.0


# 2. Query a dataset   
    
Go to https://data.world/nrippner/refugee-host-nations    
    
Open 'unhcr_all.csv' in full screen explore mode ![Screen Shot 2017-05-08 at 6.22.40 PM.png](https://data.world/api/nrippner/dataset/images/file/raw/Screen%20Shot%202017-05-08%20at%206.22.40%20PM.png) 

Write an SQL query to extract all records from the year 2010. A couple things to keep in mind when writing your SQL query. Open the query editor and use the schema tool to help format the elements of your query. (note: a new release is coming very soon which will greatly simplify the syntax for the identifiers used in queries). Test your query out in the editor.         
![Screen Shot 2017-05-08 at 6.36.21 PM.png](https://data.world/api/nrippner/dataset/images/file/raw/Screen%20Shot%202017-05-08%20at%206.36.21%20PM.png)

In [43]:
query2010 = ddw.query('nrippner/refugee-host-nations',
                     '''SELECT * FROM `unhcr_all.csv/unhcr_all`
                        WHERE `unhcr_all.csv/unhcr_all`.Year = 2010''')
unhcr2010 = query2010.dataframe
print(unhcr2010.shape)
unhcr2010.head()

(5902, 11)


Unnamed: 0,Year,Country / territory of asylum/residence,Origin,Refugees (incl. refugee-like situations),Asylum-seekers (pending cases),Returned refugees,Internally displaced persons (IDPs),Returned IDPs,Stateless persons,Others of concern,Total Population
0,2010,Aruba,Colombia,,1.0,,,,,,1
1,2010,Afghanistan,Afghanistan,,,,351907.0,3366.0,,838250.0,1193523
2,2010,Afghanistan,Iran (Islamic Rep. of),30.0,21.0,,,,,,51
3,2010,Afghanistan,Iraq,6.0,0.0,,,,,,6
4,2010,Afghanistan,Pakistan,6398.0,9.0,,,,,,6407


You can also export your query directly from the query editor. Simply click and paste into a cell in your notebook:    
![Screen Shot 2017-05-08 at 6.47.48 PM.png](https://data.world/api/nrippner/dataset/images/file/raw/Screen%20Shot%202017-05-08%20at%206.47.48%20PM.png)  


In [44]:
df = pd.read_csv('https://query.data.world/s/9zyo00t5auv1ifob9nmusnprs')
df.head()

Unnamed: 0,Year,Country / territory of asylum/residence,Origin,Refugees (incl. refugee-like situations),Asylum-seekers (pending cases),Returned refugees,Internally displaced persons (IDPs),Returned IDPs,Stateless persons,Others of concern,Total Population
0,2010,Aruba,Colombia,,1.0,,,,,,1
1,2010,Afghanistan,Afghanistan,,,,351907.0,3366.0,,838250.0,1193523
2,2010,Afghanistan,Iran (Islamic Rep. of),30.0,21.0,,,,,,51
3,2010,Afghanistan,Iraq,6.0,0.0,,,,,,6
4,2010,Afghanistan,Pakistan,6398.0,9.0,,,,,,6407


# 3. Copy and paste pandas code    

![Screen Shot 2017-05-09 at 10.18.25 AM.png](https://data.world/api/nrippner/dataset/images/file/raw/Screen%20Shot%202017-05-09%20at%2010.18.25%20AM.png)

In [50]:
df = pd.read_csv('https://query.data.world/s/2ptnfvs9v14cyg70e4hl4q8ug')

In [51]:
print(df.shape)
df.head()

(29560, 6)


Unnamed: 0,Year,Country,Origin,Refugees,AsylumSeekers,TotalPop
0,2011,Aruba,Colombia,,3.0,3.0
1,2011,Aruba,Cuba,,1.0,1.0
2,2011,Afghanistan,Afghanistan,,,1474167.0
3,2011,Afghanistan,Iran,34.0,25.0,59.0
4,2011,Afghanistan,Iraq,3.0,1.0,4.0
