# Elements of Data Science: A First Course 

# COMS W4995 007 2018 3


## Week 2 :  Data Processing and Delivery: ETL and API

Reference
 - PDSH Chapters 2 and 3
 
Reading
 - Visualization with Matplotlib
     - General MatplotLib Tips
     - Simple Line Plots
     - Simple Scatter Plots
     - Multiple Subplots
     - Text and Annotation
     - Visualization with Seaborn

<img src="images/ds_heirarchy_of_needs2.png"/>

## ETL

<img src="images/etl_diagram.png?2"/>

## Extract

**excel**: *.xls

**csv**: one row per record, delimited, possible header

    lastname,purchase_date,stars,price,favorite_flower
    PERKINS,2017-04-08,5,19.599885954165785,iris
    ROBINSON,2017-01-01,5,37.983903616820925,lilac
     

**json**: like a key,value dictionary
<pre>
{
  "colors": [
    {
      "color": "black",
      "category": "hue",
      "type": "primary",
      "code": {
        "rgba": [255,255,255,1],
        "hex": "#000"
      }
    },
    {
      "color": "white",
      "category": "value",
      "code": {
        "rgba": [0,0,0,1],
        "hex": "#FFF"
      }
    },
</pre>

## Read in using pandas

```python
pd.read_excel() 
pd.read_csv() 
pd.read_json() 
pd.read_html()
pd.read_sql()
```

#### Result is a dataframe, but first, arrays and matrices!

### Data Types

-  Continuous
    -  real, numeric, `float`
    -  Ex: 1.0, 32.34, $\pi$

-  Discrete
    - count, `int`
    - Ex: 1, 201, 0, -5

-  Categorical
    -  factor
    -  Ex: red/green/blue, flower_type

-  Ordinal
    -  ordered factor
    -  Ex: 5 star rating, small/medium/large

-  Binary
    -  boolean, indicator, `bool`
    -  Ex: 0/1, True/False, good/bad, positive/negative, Heads/Tails

## Dataset Structure

### Rows

aka: case, example, instance, observation, sample

### Columns

aka: features, predictors, independent variables

### Label

aka: outcome, target, dependent variable

## Lists and Arrays

- in builtin python

In [1]:
L = [5,6,7,8]
print(type(L))

<class 'list'>


- in numpy

In [2]:
import numpy as np
A = np.array([5,6,7,8])
print(type(A))

<class 'numpy.ndarray'>


### Indexing into arrays

In [3]:
print(L[0], A[0])

5 5


In [4]:
print(L[-1], A[-1])

8 8


In [5]:
print(L[:2], A[:2])

[5, 6] [5 6]


### Why numpy array instead of list: Reason 1, Indexing

Want first and last element

In [6]:
try:
    
    L[[0,-1]]

except TypeError as err:
    print('{}: {}'.format(type(err),err))

<class 'TypeError'>: list indices must be integers or slices, not list


In [7]:
A[[0,-1]]

array([5, 8])

Index using a boolean mask

In [8]:
A < 7

array([ True,  True, False, False])

In [9]:
A[A < 7]

array([5, 6])

### Why numpy array instead of list: Reason 2, Ufuncs

In [10]:
# aside: defining a function

def square(x):
    return x**2

In [114]:
L = list(range(100000))
print(type(L))

<class 'list'>


In [12]:
# Aside: list comprehensions
squares = [square(x) for x in L]

In [13]:
%timeit [square(x) for x in L]

34.4 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [14]:
A = np.arange(100000)
print(type(A))

<class 'numpy.ndarray'>


In [15]:
%timeit [square(x) for x in A]

33 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [16]:
# using unary ufunc
%timeit A**2

82.3 µs ± 559 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [17]:
try:
    
    L**2

except TypeError as err:
    print('{}: {}'.format(type(err), err))

<class 'TypeError'>: unsupported operand type(s) for ** or pow(): 'list' and 'int'


## Matrices

### Indexing into matrices

In [18]:
L = [[5,6],[7,8]]
print(L)

[[5, 6], [7, 8]]


In [19]:
A = np.array([[5,6],[7,8]])
print(A)

[[5 6]
 [7 8]]


Get first row, first column

In [20]:
L[0][0]

5

In [21]:
A[0,0]

5

Get values in the 2nd column

In [22]:
[x[1] for x in L]

[6, 8]

In [23]:
A[:,1]

array([6, 8])

In [24]:
# Bonus: getting a matrix's shape
A.shape

(2, 2)

## Pandas Series and DataFrames

### Series

In [25]:
import pandas as pd

In [26]:
S = pd.Series(np.random.randint(100,size=1000))

In [105]:
S.head()

0    64
1    82
2    83
3    22
4    89
dtype: int64

In [28]:
S.index

RangeIndex(start=0, stop=1000, step=1)

In [108]:
S = pd.Series(np.random.randint(100,size=5),index=['A','B','C','D','E'])
S

A    65
B    94
C    70
D    40
E    74
dtype: int64

## DataFrame

In [29]:
np.random.seed(seed=123)

In [30]:
df = pd.DataFrame(np.random.randint(50,100,size=(10,10)))

In [31]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,95,52,78,84,88,67,69,92,72,83
1,82,99,97,59,82,96,82,97,75,69
2,64,86,82,66,54,99,53,52,70,89
3,52,70,97,98,57,91,85,78,88,83
4,71,80,77,84,83,62,90,53,92,55


### Rename columns

In [32]:
columns = {x:'col' + str(x) for x in range(10)}

In [33]:
columns

{0: 'col0',
 1: 'col1',
 2: 'col2',
 3: 'col3',
 4: 'col4',
 5: 'col5',
 6: 'col6',
 7: 'col7',
 8: 'col8',
 9: 'col9'}

In [34]:
df.rename(columns, axis=1, inplace=True)

In [35]:
df.head()

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
0,95,52,78,84,88,67,69,92,72,83
1,82,99,97,59,82,96,82,97,75,69
2,64,86,82,66,54,99,53,52,70,89
3,52,70,97,98,57,91,85,78,88,83
4,71,80,77,84,83,62,90,53,92,55


### Rename rows

In [36]:
rows = {x:'row'+str(x) for x in range(10)}

In [37]:
df.rename(rows,axis=0,inplace=True)

In [38]:
df.head()

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
row0,95,52,78,84,88,67,69,92,72,83
row1,82,99,97,59,82,96,82,97,75,69
row2,64,86,82,66,54,99,53,52,70,89
row3,52,70,97,98,57,91,85,78,88,83
row4,71,80,77,84,83,62,90,53,92,55


### Indexing into a dataframe

  - by labels
  - by index

### Index by labels using .loc()

In [39]:
df.loc['row0','col0']

95

In [40]:
df.loc['row5',:]

col0    50
col1    61
col2    84
col3    60
col4    72
col5    63
col6    68
col7    86
col8    65
col9    93
Name: row5, dtype: int64

In [41]:
df.loc[['row5','row6'],:]

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
row5,50,61,84,60,72,63,68,86,65,93
row6,77,94,80,56,95,76,66,56,64,89


### Index by location with .iloc()

In [42]:
df.iloc[0,0]

95

In [43]:
df.iloc[0,:]

col0    95
col1    52
col2    78
col3    84
col4    88
col5    67
col6    69
col7    92
col8    72
col9    83
Name: row0, dtype: int64

In [44]:
df.iloc[:2,:]

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
row0,95,52,78,84,88,67,69,92,72,83
row1,82,99,97,59,82,96,82,97,75,69


### Using a boolean mask

In [45]:
df.col0 > 80

row0     True
row1     True
row2    False
row3    False
row4    False
row5    False
row6    False
row7    False
row8    False
row9    False
Name: col0, dtype: bool

In [46]:
df.loc[df.col0 >= 80]

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
row0,95,52,78,84,88,67,69,92,72,83
row1,82,99,97,59,82,96,82,97,75,69


In [47]:
# want col0, col3 and col8 of rows where col0 > 50

df.loc[(df.col0 > 50) & (df.col3 < 80),['col0','col3','col8']]

Unnamed: 0,col0,col3,col8
row1,82,59,75
row2,64,66,70
row6,77,56,64


### DataFrame axes

- axis=0 means across rows (think down)

- axis=1 means across columns (think right)

### Reading a csv into a DataFrame

csv: comma separated values

In [48]:
df = pd.read_csv('../data/week1_flowershop_data.csv',
                 header=0,
                 parse_dates=['purchase_date'],
                 delimiter=',')

In [49]:
df.head()

Unnamed: 0,lastname,purchase_date,stars,price,favorite_flower
0,PERKINS,2017-04-08,5,19.599886,iris
1,ROBINSON,2017-01-01,5,37.983904,lilac
2,WILLIAMSON,2017-03-20,4,19.339138,carnation
3,ROBINSON,2017-04-12,5,18.140616,lilac
4,RHODES,2017-03-24,1,22.179522,carnation


### Representing missing data: NaN

In [50]:
np.nan

nan

In [51]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
lastname           1000 non-null object
purchase_date      1000 non-null datetime64[ns]
stars              1000 non-null int64
price              978 non-null float64
favorite_flower    1000 non-null object
dtypes: datetime64[ns](1), float64(1), int64(1), object(2)
memory usage: 39.1+ KB


In [52]:
df[df.price.isnull()].head()

Unnamed: 0,lastname,purchase_date,stars,price,favorite_flower
20,CLARK,2017-01-05,3,,gardenia
41,PETERS,2017-02-01,4,,orchid
54,GREEN,2017-02-13,5,,daffodil
63,BARNETT,2017-08-27,4,,gardenia
145,CARROLL,2017-07-29,3,,tulip


### From dataframe back to matrix

In [53]:

df[df.price.isnull()].head().values

array([['CLARK', Timestamp('2017-01-05 00:00:00'), 3, nan, 'gardenia'],
       ['PETERS', Timestamp('2017-02-01 00:00:00'), 4, nan, 'orchid'],
       ['GREEN', Timestamp('2017-02-13 00:00:00'), 5, nan, 'daffodil'],
       ['BARNETT', Timestamp('2017-08-27 00:00:00'), 4, nan, 'gardenia'],
       ['CARROLL', Timestamp('2017-07-29 00:00:00'), 3, nan, 'tulip']],
      dtype=object)

## Transform

### Dealing with Missing Data

#### Method 1: Drop rows

In [54]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
lastname           1000 non-null object
purchase_date      1000 non-null datetime64[ns]
stars              1000 non-null int64
price              978 non-null float64
favorite_flower    1000 non-null object
dtypes: datetime64[ns](1), float64(1), int64(1), object(2)
memory usage: 39.1+ KB


In [55]:
df.dropna().info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 978 entries, 0 to 999
Data columns (total 5 columns):
lastname           978 non-null object
purchase_date      978 non-null datetime64[ns]
stars              978 non-null int64
price              978 non-null float64
favorite_flower    978 non-null object
dtypes: datetime64[ns](1), float64(1), int64(1), object(2)
memory usage: 45.8+ KB


`df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)`

#### Method 2: Fill forward (ffill) or backward (bfill)

In [56]:
df.iloc[18:22]

Unnamed: 0,lastname,purchase_date,stars,price,favorite_flower
18,ROBINSON,2017-06-14,2,10.547645,lilac
19,RUIZ,2017-07-29,4,20.451789,iris
20,CLARK,2017-01-05,3,,gardenia
21,HARPER,2017-08-24,2,10.525912,tulip


In [57]:
df.iloc[18:22].ffill()

Unnamed: 0,lastname,purchase_date,stars,price,favorite_flower
18,ROBINSON,2017-06-14,2,10.547645,lilac
19,RUIZ,2017-07-29,4,20.451789,iris
20,CLARK,2017-01-05,3,20.451789,gardenia
21,HARPER,2017-08-24,2,10.525912,tulip


#### Method 3: Impute the value

In [58]:
df.price.fillna(df.price.mean()).iloc[18:22]

18    10.547645
19    20.451789
20    23.403241
21    10.525912
Name: price, dtype: float64

### Concatinate, Append and Join

In [59]:
A = pd.DataFrame(np.random.randint(0,10,size=(5,3)), columns=['col0','col1','col2'])
B = pd.DataFrame(np.random.rand(3,2), columns=['col0','col1'])

In [60]:
A

Unnamed: 0,col0,col1,col2
0,6,9,7
1,6,3,9
2,6,6,6
3,1,3,4
4,3,1,0


In [61]:
B

Unnamed: 0,col0,col1
0,0.309884,0.507204
1,0.280793,0.763837
2,0.108542,0.511655


### Append

In [62]:
A.append(B, sort=False)

Unnamed: 0,col0,col1,col2
0,6.0,9.0,7.0
1,6.0,3.0,9.0
2,6.0,6.0,6.0
3,1.0,3.0,4.0
4,3.0,1.0,0.0
0,0.309884,0.507204,
1,0.280793,0.763837,
2,0.108542,0.511655,


### Concatenate

In [63]:
pd.concat([A,B], sort=False)

Unnamed: 0,col0,col1,col2
0,6.0,9.0,7.0
1,6.0,3.0,9.0
2,6.0,6.0,6.0
3,1.0,3.0,4.0
4,3.0,1.0,0.0
0,0.309884,0.507204,
1,0.280793,0.763837,
2,0.108542,0.511655,


In [64]:
pd.concat([A,B],axis=1)

Unnamed: 0,col0,col1,col2,col0.1,col1.1
0,6,9,7,0.309884,0.507204
1,6,3,9,0.280793,0.763837
2,6,6,6,0.108542,0.511655
3,1,3,4,,
4,3,1,0,,


### Join

In [65]:
A.join(B,how='outer',lsuffix='_L',rsuffix='_B')

Unnamed: 0,col0_L,col1_L,col2,col0_B,col1_B
0,6,9,7,0.309884,0.507204
1,6,3,9,0.280793,0.763837
2,6,6,6,0.108542,0.511655
3,1,3,4,,
4,3,1,0,,


In [66]:
A.join(B,how='inner',lsuffix='_L',rsuffix='_B')

Unnamed: 0,col0_L,col1_L,col2,col0_B,col1_B
0,6,9,7,0.309884,0.507204
1,6,3,9,0.280793,0.763837
2,6,6,6,0.108542,0.511655


### Manipulating indices

In [67]:
A.sort_values(by='col0')

Unnamed: 0,col0,col1,col2
3,1,3,4
4,3,1,0
0,6,9,7
1,6,3,9
2,6,6,6


In [68]:
A.sort_values(by='col0').reset_index()

Unnamed: 0,index,col0,col1,col2
0,3,1,3,4
1,4,3,1,0
2,0,6,9,7
3,1,6,3,9
4,2,6,6,6


In [69]:
A.set_index('col2')

Unnamed: 0_level_0,col0,col1
col2,Unnamed: 1_level_1,Unnamed: 2_level_1
7,6,9
9,6,3
6,6,6
4,1,3
0,3,1


In [70]:
idx = pd.Index(['A','B','C','D','E'])
A.set_index(idx)

Unnamed: 0,col0,col1,col2
A,6,9,7
B,6,3,9
C,6,6,6
D,1,3,4
E,3,1,0


### Grouping, Aggregating and Summarizing

In [71]:
df.groupby('lastname').price.mean().head()

lastname
ADAMS        38.617753
ALEXANDER    30.106511
ALLEN        26.657993
ALVAREZ      20.676235
ANDERSON     14.653257
Name: price, dtype: float64

In [72]:
df.groupby('favorite_flower').stars.median().head()

favorite_flower
carnation    4.0
daffodil     4.0
daisy        3.5
gardenia     4.0
gerbera      4.0
Name: stars, dtype: float64

### Dropping Duplicates

In [73]:
df.loc[:,['lastname','favorite_flower']].drop_duplicates().info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 253 entries, 0 to 984
Data columns (total 2 columns):
lastname           253 non-null object
favorite_flower    253 non-null object
dtypes: object(2)
memory usage: 5.9+ KB


### Creating Dummies for Missing or Categorical Variables

#### Dummies for missing values

In [74]:
df.price.isnull().head()

0    False
1    False
2    False
3    False
4    False
Name: price, dtype: bool

In [75]:
df['price_isnull'] = df.price.isnull()

In [76]:
df.iloc[18:22]

Unnamed: 0,lastname,purchase_date,stars,price,favorite_flower,price_isnull
18,ROBINSON,2017-06-14,2,10.547645,lilac,False
19,RUIZ,2017-07-29,4,20.451789,iris,False
20,CLARK,2017-01-05,3,,gardenia,True
21,HARPER,2017-08-24,2,10.525912,tulip,False


In [77]:
df.price_isnull = df.price_isnull.astype(int)

In [78]:
df.iloc[18:22]

Unnamed: 0,lastname,purchase_date,stars,price,favorite_flower,price_isnull
18,ROBINSON,2017-06-14,2,10.547645,lilac,0
19,RUIZ,2017-07-29,4,20.451789,iris,0
20,CLARK,2017-01-05,3,,gardenia,1
21,HARPER,2017-08-24,2,10.525912,tulip,0


#### Dummies for categorical features

In [79]:
pd.get_dummies(df.favorite_flower).head()

Unnamed: 0,carnation,daffodil,daisy,gardenia,gerbera,iris,jasmine,lilac,orchid,rose,sunflower,tulip
0,0,0,0,0,0,1,0,0,0,0,0,0
1,0,0,0,0,0,0,0,1,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,1,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,0,0


In [80]:
A = np.random.randint(1,4,5)
print(A)

[2 2 1 3 2]


In [81]:
dfA = pd.DataFrame([1,3,2,1])
dfA

Unnamed: 0,0
0,1
1,3
2,2
3,1


In [82]:
pd.get_dummies(dfA[0])

Unnamed: 0,1,2,3
0,1,0,0
1,0,0,1
2,0,1,0
3,1,0,0


In [83]:
pd.get_dummies(dfA[0], drop_first=True)

Unnamed: 0,2,3
0,0,0
1,0,1
2,1,0
3,0,0


### Standardize

$z = \frac{x - \mu}{\sigma}$ 

or, more correctly 

$z = \frac{x - \bar{x}}{s}$

`z = (x[0] - np.mean(x)) / np.std(x)`

In [84]:
A = np.random.normal(loc=100, scale=50, size=(5))
B = np.random.normal(loc=-5, scale=.1, size=(5))
print(A)
print(B)

[ 60.95824315 177.09179302  70.340075    33.49001535 204.05866417]
[-4.81961117 -5.03799838 -4.93128985 -5.06545081 -4.86984227]


In [85]:
print(np.mean(A), np.mean(B))
print(np.std(A), np.std(B))

109.18775813834819 -4.944838499519225
68.08351287981361 0.09456846935518677


In [86]:
from scipy.stats import zscore
print(zscore(A))
print(zscore(B))

[-0.70838758  0.99736386 -0.5705887  -1.11183662  1.39344905]
[ 1.32419743 -0.98510512  0.14326813 -1.27539672  0.79303627]


In [87]:
print(np.mean(zscore(A)), np.mean(zscore(B)))
print(np.std(zscore(A)), np.std(zscore(B)))

-2.6645352591003756e-16 3.774758283725532e-15
1.0 0.9999999999999999


In [88]:
df['price_z'] = (df.price - df.price.mean()).div(df.price.std())

### Normalize

All values between 0 and 1

In [89]:
tmp = (df.price - df.price.min())

In [90]:
df['price_n'] = tmp / tmp.max()

In [91]:
df[['price','price_z','price_n']].describe()

Unnamed: 0,price,price_z,price_n
count,978.0,978.0,978.0
mean,23.403241,-1.943458e-16,0.446128
std,11.209242,1.0,0.316871
min,7.621566,-1.407916,0.0
25%,18.190466,-0.4650426,0.29877
50%,20.117401,-0.2931367,0.353242
75%,38.694134,1.364133,0.878383
max,42.996317,1.74794,1.0


### Dealing with date and time

### Dealing with strings

### Transform Review

- Deal with missing data

- Concatinate, append and join

- Setting indices

- Grouping, Aggragating and Summarizing

- Dropping duplicates

- Creating dummy values
    - for missing values
    - for categorical variables

- Standardize and Normalize

## Load

- We've got our DataFrame, but we should save it

### Save to csv or pickle

In [92]:
df.to_csv('data_cleaned.csv')

In [93]:
df.to_pickle('data_cleaned.pkl')

### Save to db? Later

## API

### What is an API?

Application Programming Interface

For us: a tool for taking in requests and returning data

Use for both <b>getting data</b> and <b>delivering data</b>.

### JSON

### Examples

In [94]:
import requests
url = 'http://en.wikipedia.org/w/api.php?action=query&prop=info&format=json&titles='
title = 'Data Science'
title = title.replace(' ','%20')
print(title)
resp = requests.get(url+title)

Data%20Science


In [95]:
resp.json()

{'batchcomplete': '',
 'query': {'pages': {'49495124': {'pageid': 49495124,
    'ns': 0,
    'title': 'Data Science',
    'contentmodel': 'wikitext',
    'pagelanguage': 'en',
    'pagelanguagehtmlcode': 'en',
    'pagelanguagedir': 'ltr',
    'touched': '2018-09-17T17:33:58Z',
    'lastrevid': 706007296,
    'length': 26,
    'redirect': '',
    'new': ''}}}}

In [96]:
with open('/home/bgibson/Downloads/twitter_consumer_key.txt') as f:
    consumer_key = f.read().strip()
with open('/home/bgibson/Downloads/twitter_consumer_secret.txt') as f:
    consumer_secret = f.read().strip()
with open('/home/bgibson/Downloads/twitter_access_token.txt') as f:
    access_token = f.read().strip()
with open('/home/bgibson/Downloads/twitter_access_token_secret.txt') as f:
    access_token_secret = f.read().strip()

import tweepy
    
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

In [97]:
public_tweets = api.search('columbia')
for tweet in public_tweets:
    #print(type(tweet))
    print('-------')
    print(tweet.text)

-------
RT @ApuchuProds: [#Bioshock] In the Lamb of Columbia we trust. ✝ 🐑

Primeras fotos con @MeruruAya , preciosa de Elizabeth Comstock (primer…
-------
RT @emorwee: Meanwhile in British Columbia, absolutely bonkers footage of a fire tornado stealing a firefighter's water hose (source: https…
-------
RT @AlexCalamiaWx: FULL STORY: Old fire hoses are being given a new life in Columbia County to secure schools. Here's how it works to keep…
-------
RT @_ValTown_: 50 Cent got shot the day before he had to meet with Destiny’s Child to shoot his ‘Thug Love’ video off of the Power of The D…
-------
RT @nathanpboston: Watch closely to see if Columbia Gas tries to charge MA Ratepayers $1.9M x 48 Miles of new pipe, through the DPU-approve…
-------
RT @emorwee: Meanwhile in British Columbia, absolutely bonkers footage of a fire tornado stealing a firefighter's water hose (source: https…
-------
FULL STORY: Old fire hoses are being given a new life in Columbia County to secure schools. Here's h

In [98]:
tweet._json

{'created_at': 'Mon Sep 17 22:33:51 +0000 2018',
 'id': 1041817536680091655,
 'id_str': '1041817536680091655',
 'text': "I'm at RE/MAX Little Oak Realty Fleetwood Branch in Surrey, British Columbia https://t.co/9YmQ2kBHZE",
 'truncated': False,
 'entities': {'hashtags': [],
  'symbols': [],
  'user_mentions': [],
  'urls': [{'url': 'https://t.co/9YmQ2kBHZE',
    'expanded_url': 'https://www.swarmapp.com/c/9LiDmZrajl3',
    'display_url': 'swarmapp.com/c/9LiDmZrajl3',
    'indices': [77, 100]}]},
 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
 'source': '<a href="http://foursquare.com" rel="nofollow">Foursquare</a>',
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 26366004,
  'id_str': '26366004',
  'name': 'Bhavesh Chauhan',
  'screen_name': 'BhaveshChauhan',
  'location': 'Disturbing Vancouver & Calgary',
  'description': 'Filmmaker.